How to force Python to ignore re.DOTALL in re.findall() statement?
I have been banging my head against the keyboard in search of enlightenment through Google and all Python docs I could get my hands on, but could not find an answer to an issue I'm encountering.
I have the following regex that I run against a website, but Python insists in setting re.DOTALL on it, even though my code does not tell it to:
\d+. +(?P<season>\d+) *\- *(?P<episode>\d+).*?(?P<day>\d+)(?:\/|\s)+(?P<month>[A-Za-z]+)(?:\/|\s)+(?P<year>\d+) +(?:<a .+><img .+></a>)? ?<a .*?>(?P<name>.*?)</a>
This creates an array of seasons/episodes for TV sho开发者_运维知识库w listings, and it works fine except on epguides.com/BurnNotice (when using the TVRage listings), due to some spacing before newlines (I guess).
Using http://re-try.appspot.com to test, I've narrowed down the issue to the use of re.DOTALL. If I enable it on re-try, it replicates the results I get when I run it standalone on my script. If I untick DOTALL, then it gives me the results I expect.
How can I force Python NOT to use re.DOTALL?
The script runs both on Ubuntu and OS X.
.+>
should change to [^>]+>
and
.*?>
to [^>]*>
You can try replacing others dots into [^\r\n]
too, but above 2 changes should be enough.
精彩评论