Does anyone see why the first part of my regex isn't working in Python?
I tested this regex out in RegexBuddy
,[A-Z\s]+?,(LA|RO|MU|FE|AV|CA),(ML|FE|MN|FS|UN)?,(\d+/\d+/\d{4})?
and it seems to be able to do what I need it to do - capture a piece of data that looks like one of the following:
,POWDER,RO,ML,8/19/2002
,POWDER,RO,,, ,POWDER,RO,,8/19/2002 ,POWDER,RO,ML,,When I use it in a python string:
r",[A-Z\s]+?,(LA|RO|MU|FE|AV|CA),(ML|FE|MN|FS|UN)?,(\d+/\d+/\d{4})?"
It misses the first part of the match, and my resulting matches look like: RO,ML,8/19/2002, or RO,ML, or jusr RO,
The first token is a word that is stored as all caps and may have spaces (and/or possibly punctuation that i need to address as well shortly) in it. if I remove the space it still doesn't capture the one word names that it should. Did I开发者_运维问答 miss something obvious?
Yes. You did not capture the first group.
r",([A-Z\s]+),(LA|RO|MU|FE|AV|CA),(ML|FE|MN|FS|UN)?,(\d+/\d+/\d{4})?"
# ^ ^
BTW, it seems that you are parsing a CSV file with regex. In Python, there is already a csv
module.
The first part of your regex doesn't have capturing parentheses around it. Try the regex:
,([A-Z\s]+?),(LA|RO|MU|FE|AV|CA),(ML|FE|MN|FS|UN)?,(\d+/\d+/\d{4})?
#^^ This was [A-Z\s]+?; needs to be ([A-Z\s]+?)
which would be this in python:
r",([A-Z\s]+?),(LA|RO|MU|FE|AV|CA),(ML|FE|MN|FS|UN)?,(\d+/\d+/\d{4})?"
Example from the interpreter:
>>> import re
>>> r = re.compile(r",[A-Z\s]+?,(LA|RO|MU|FE|AV|CA),(ML|FE|MN|FS|UN)?,(\d+/\d+/\d{4})?")
>>> r.match(",POWDER,RO,ML,8/19/2002").groups()
('RO', 'ML', '8/19/2002')
>>> r = re.compile(r",([A-Z\s]+?),(LA|RO|MU|FE|AV|CA),(ML|FE|MN|FS|UN)?,(\d+/\d+/\d{4})?")
>>> r.match(",POWDER,RO,ML,8/19/2002").groups()
('POWDER', 'RO', 'ML', '8/19/2002')
I'm not into python, but you just forgot to use brackets to indicate that you want to capture that part:
,([A-Z\s]+)?,(LA|RO|MU|FE|AV|CA),(ML|FE|MN|FS|UN)?,(\d+/\d+/\d{4})?
should do what you want
Yes, you missed the grouping parentheses:
>>> s = ",POWDER,RO,ML,8/19/2002"
>>> pat = r",([A-Z\s]+?),(LA|RO|MU|FE|AV|CA),(ML|FE|MN|FS|UN)?,(\d+/\d+/\d{4})?"
>>> re.match(pat, s).groups()
('POWDER', 'RO', 'ML', '8/19/2002')
精彩评论