Regex to identify all sorts of candidate legal numbers
[This is a heavi开发者_C百科ly re-edited version. Please ignore past versions of this question.]
A small python script using a sophisticated regex was provided by eyquem to identify numbers in a string and sanitize them. The test results cover over 50 samples, which I won't repeat here.
The question is, can someone adjust that regexp or provide a new one so that commas are treated more sanely?
In particular, I would like to see the following 4 test inputs produce the associated outputs.
- ' 4,8.3,5 ' -> '4' '8.3' '5'
- ' 44,22,333,888 ' -> '44' '22,333,888' #### Note that 44,22 is never a single number.
- ' 11,333e22,444 ' -> '11,333e22' '444' #### 11,333 is accepted in front of e22, but 22,444 is not accepted after it.
- ' 1,999 people found the code "i+=1999;" to be crystal clear in meaning and to likely lead to less than 1999 kilobytes extra memory consumption; however, the gains in 1, 999, and 1999 KB disk space are anything but ideal, especially this being 1999 and us having over $1,999 to work with! ' -> '1,999' '1999' '1999' '1' '999' '1999' '1999' '1,999'
Despite all the information, your post is actually vague. For starters, you didn't ask any questions. What is it you want?
Are you asking how to find all possible matches? In Perl, you can use
local our @matches;
/(...)(?{ push @matches, $1 })(?!)/
The (?!)
never matches, so it causes the regex engine to backtrack to find another match, but the code block saves what it did find before doing that.
If you're asking to find any match, then it's quite easy to solve: Don't bother looking for option 2, because option 1 will always match what option 2 matches.
精彩评论