Regular Expressions Using Python's Re
I have the following file full of lines similar to this:
line = 'Weclome - MIsiti International,0,0,-9,0,'
I want to replace 'Weclome - MIsiti International'
with the string '1'
here is my code:
exp=re.compile(r"([\./A-Za-z\s\-]+)")
print exp.sub("1",line)
Unfortunately I get the following output:
1,0,0,19,0,
Which is incorrect. i thought this would work:
exp=re.compile(r"([\./A-Za-z\s\-[^0-9]]+)")
print exp.sub("1",line)
But it does not:
开发者_如何学JAVA[]
Can someone tell me what I am doing wrong here?
Why do you need a regular expression?
>>> line = 'Weclome - MIsiti International,0,0,-9,0,'
>>> s=line.split(",")
>>> s[0]="1"
>>> ','.join(s)
'1,0,0,-9,0,'
exp=re.compile(r"([\./A-Za-z\s\-]+)"
No need to put '\' before '-' between brackets. Put '-' at a place between brackets where it can't have its special meaning.
Also, no need to put '\' before the dot '.' between brackets because a dot between brackets looses its special meaning.
So, instead of exp=re.compile(r"([\./A-Za-z\s\-]+)")
, write exp=re.compile(r"([./A-Za-z\s-]+)")
.
Concerning exp=re.compile(r"([\./A-Za-z\s\-[^0-9]]+)")
, it doesn't match at all because it is the same for '[' than for '-' : if placed in a position where it can't have a meaning, then it looses its special meaning and is considered simply as the character.
So the '[' before '^0-9]'
is the bracket, not the beginninge of a class. Consequently, the ']' at the end of '^0-9]'
is the ending bracket of the first left bracket in '[\./A-Z...'
AND the last right bracket followed by '+' means "the character ] at least one time and possibly more"
.
import re
line = 'Weclome - MIsiti International,0,0,-9,0,'
exp=re.compile(r"(^[./A-Za-z\s-]+)")
print exp.sub("1",line)
# or
exp=re.compile(r"([./A-Za-z\s-]+(?=,))")
print exp.sub("1",line)
result
1,0,0,-9,0,
1,0,0,-9,0,
Character classes cannot be nested. The later example will eat '[', '^', etc. Would it not work if you simply did r"(^[^,0-9]+)", i.e. anything at the start not being commaor 0-9?
You're first regex is good but you need to anchor it to the beginning of the line and add the 'm' multiline modifier like so:
import re
line = 'Weclome - MIsiti International,0,0,-9,0,'
exp = re.compile(r"^([./A-Za-z\s\-]+)", re.M)
print (exp.sub("1",line))
Note that this solution fixes an entire file full of lines in one operation.
Most people are giving you answers <snark>
often qualified with "Don't use regex! Regex is evil and comes from Perl! We Python users have trancended mere text manipulation!"</snark>
but no one is explaining why you're experiencing this problem.
Your regex is working. It takes any alphabet, whitespace, or hyphen character and turns it into the number 1
. The problem is that it thinks the negative sign in -9
is "evil text" to turn into a number.
One way to approach this is to provide an anchor for your regex - Make it match the commas (or beginning/ending of the string) surrounding the text. So it would see ,text,
and turn it into ,1,
but would see ,-9,
and know that it's not text.
Another approach is to filter based on "does it not contain digits" instead of "does it contain these things I need" - because what if, later, you need to filter out other punctuation marks? Using ,[^0-9,]+,
would match "things that aren't digits or commas", which would turn ,text,
into ,1,
but keep ,-9,
the same.
A third approach is to split the string on commas, then test and change each individual segment - probably to see if it contains digits - and then join them back together.
If you choose the first or second approaches, I leave it up to you to write a regex that either matches a leading comma or the beginning of a string (and a trailing comma or the end of the string - both are similar). It's not terribly difficult.
精彩评论