Does anyone see why the first part of my regex isn't working in Python?

2023-01-23 04:24 问答作者：

I tested this regex out in RegexBuddy

,[A-Z\s]+?,(LA|RO|MU|FE|AV|CA),(ML|FE|MN|FS|UN)?,(\d+/\d+/\d{4})?

and it seems to be able to do what I need it to do - capture a piece of data that looks like one of the following:

,POWDER,RO,ML,8/19/2002

,POWDER,RO,,,

,POWDER,RO,,8/19/2002

,POWDER,RO,ML,,

When I use it in a python string:

r",[A-Z\s]+?,(LA|RO|MU|FE|AV|CA),(ML|FE|MN|FS|UN)?,(\d+/\d+/\d{4})?"

It misses the first part of the match, and my resulting matches look like: RO,ML,8/19/2002, or RO,ML, or jusr RO,

The first token is a word that is stored as all caps and may have spaces (and/or possibly punctuation that i need to address as well shortly) in it. if I remove the space it still doesn't capture the one word names that it should. Did I开发者_运维问答 miss something obvious?

Yes. You did not capture the first group.

r",([A-Z\s]+),(LA|RO|MU|FE|AV|CA),(ML|FE|MN|FS|UN)?,(\d+/\d+/\d{4})?"
#  ^        ^

BTW, it seems that you are parsing a CSV file with regex. In Python, there is already a csv module.

The first part of your regex doesn't have capturing parentheses around it. Try the regex:

,([A-Z\s]+?),(LA|RO|MU|FE|AV|CA),(ML|FE|MN|FS|UN)?,(\d+/\d+/\d{4})?
 #^^ This was [A-Z\s]+?; needs to be ([A-Z\s]+?)

which would be this in python:

r",([A-Z\s]+?),(LA|RO|MU|FE|AV|CA),(ML|FE|MN|FS|UN)?,(\d+/\d+/\d{4})?"

Example from the interpreter:

>>> import re
>>> r = re.compile(r",[A-Z\s]+?,(LA|RO|MU|FE|AV|CA),(ML|FE|MN|FS|UN)?,(\d+/\d+/\d{4})?")
>>> r.match(",POWDER,RO,ML,8/19/2002").groups()
('RO', 'ML', '8/19/2002')
>>> r = re.compile(r",([A-Z\s]+?),(LA|RO|MU|FE|AV|CA),(ML|FE|MN|FS|UN)?,(\d+/\d+/\d{4})?")
>>> r.match(",POWDER,RO,ML,8/19/2002").groups()
('POWDER', 'RO', 'ML', '8/19/2002')

I'm not into python, but you just forgot to use brackets to indicate that you want to capture that part:

,([A-Z\s]+)?,(LA|RO|MU|FE|AV|CA),(ML|FE|MN|FS|UN)?,(\d+/\d+/\d{4})? should do what you want

Yes, you missed the grouping parentheses:

>>> s = ",POWDER,RO,ML,8/19/2002"
>>> pat = r",([A-Z\s]+?),(LA|RO|MU|FE|AV|CA),(ML|FE|MN|FS|UN)?,(\d+/\d+/\d{4})?"
>>> re.match(pat, s).groups()
('POWDER', 'RO', 'ML', '8/19/2002')

继续阅读：python regex

Does anyone see why the first part of my regex isn't working in Python?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？