Using regex to extract information from a file and need help
I have an html file that has a table of information and I'm trying to extract specific columns. The pattern is like this with alternating "TableDarkRow" and "TableLightRow":
'>817338284254611</A></td><td Class='TableDarkRow' NOWRAP> 01/14/2011</td>
And I'm trying to extract an array of number and date pairs :
817338284254611
01/14/2011
I tried and came up with this:
>([0-9开发者_运维问答])+</A>(.*)NOWRAP> ?([0-9]{2}\/[0-9]{2}\/[0-9]{4})
But the (.*)
is allowing the entire document to be selected between the first and last occurrences.
Replace the .*
with .*?
for non-greedy matching.
Reference: Watch Out for The Greediness!
Try this one(haven't tested):
/[0-9\/ ]+/
You can replace .*
with `[A-Za-z'<> \t]+'.
精彩评论