开发者

Regular Expressions Using Python's Re

I have the following file full of lines similar to this:

line = 'Weclome - MIsiti International,0,0,-9,0,'

I want to replace 'Weclome - MIsiti International' with the string '1'

here is my code:

exp=re.compile(r"([\./A-Za-z\s\-]+)")
print exp.sub("1",line)

Unfortunately I get the following output:

1,0,0,19,0,

Which is incorrect. i thought this would work:

exp=re.compile(r"([\./A-Za-z\s\-[^0-9]]+)")
print exp.sub("1",line)

But it does not:

开发者_如何学JAVA
[]

Can someone tell me what I am doing wrong here?


Why do you need a regular expression?

>>> line = 'Weclome - MIsiti International,0,0,-9,0,'
>>> s=line.split(",")
>>> s[0]="1"
>>> ','.join(s)
'1,0,0,-9,0,'


exp=re.compile(r"([\./A-Za-z\s\-]+)"

No need to put '\' before '-' between brackets. Put '-' at a place between brackets where it can't have its special meaning.

Also, no need to put '\' before the dot '.' between brackets because a dot between brackets looses its special meaning.

So, instead of exp=re.compile(r"([\./A-Za-z\s\-]+)") , write exp=re.compile(r"([./A-Za-z\s-]+)")

.

Concerning exp=re.compile(r"([\./A-Za-z\s\-[^0-9]]+)") , it doesn't match at all because it is the same for '[' than for '-' : if placed in a position where it can't have a meaning, then it looses its special meaning and is considered simply as the character.

So the '[' before '^0-9]' is the bracket, not the beginninge of a class. Consequently, the ']' at the end of '^0-9]' is the ending bracket of the first left bracket in '[\./A-Z...' AND the last right bracket followed by '+' means "the character ] at least one time and possibly more"

.

import re

line = 'Weclome - MIsiti International,0,0,-9,0,'

exp=re.compile(r"(^[./A-Za-z\s-]+)")
print exp.sub("1",line)

# or

exp=re.compile(r"([./A-Za-z\s-]+(?=,))")
print exp.sub("1",line) 

result

1,0,0,-9,0,
1,0,0,-9,0,


Character classes cannot be nested. The later example will eat '[', '^', etc. Would it not work if you simply did r"(^[^,0-9]+)", i.e. anything at the start not being commaor 0-9?


You're first regex is good but you need to anchor it to the beginning of the line and add the 'm' multiline modifier like so:

import re
line = 'Weclome - MIsiti International,0,0,-9,0,'
exp = re.compile(r"^([./A-Za-z\s\-]+)", re.M)
print (exp.sub("1",line))

Note that this solution fixes an entire file full of lines in one operation.


Most people are giving you answers <snark>often qualified with "Don't use regex! Regex is evil and comes from Perl! We Python users have trancended mere text manipulation!"</snark> but no one is explaining why you're experiencing this problem.

Your regex is working. It takes any alphabet, whitespace, or hyphen character and turns it into the number 1. The problem is that it thinks the negative sign in -9 is "evil text" to turn into a number.

One way to approach this is to provide an anchor for your regex - Make it match the commas (or beginning/ending of the string) surrounding the text. So it would see ,text, and turn it into ,1, but would see ,-9, and know that it's not text.

Another approach is to filter based on "does it not contain digits" instead of "does it contain these things I need" - because what if, later, you need to filter out other punctuation marks? Using ,[^0-9,]+, would match "things that aren't digits or commas", which would turn ,text, into ,1, but keep ,-9, the same.

A third approach is to split the string on commas, then test and change each individual segment - probably to see if it contains digits - and then join them back together.

If you choose the first or second approaches, I leave it up to you to write a regex that either matches a leading comma or the beginning of a string (and a trailing comma or the end of the string - both are similar). It's not terribly difficult.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜