Regex for finding "real" 3-digit sequences (ignoring those embedded in 4-digit sequences)
I'd like a regex (using Java) that captures three digits such as "876", but not if they are buried inside a 4-digit sequence.
To capture "876" within "876" and "foo876" and " 876 " and "876" and "food876" and "4foo876".
But NOT within "88foo9876" or "9876" or "a8876" or "a8876foo".
How do I do this?
I want to say something like X(\d\d\d)X, but in 开发者_Python百科place of the first X in that to say "\D or ^ (start-string)" and in place of the second X in that to say "\D or $ (end-string)".
Edit:
For answers, see Xanatos, also Code Jockey, and Tim Pietzcker.
well, then! for X(\d\d\d)X as you asked for, use
(?<=\D|^)(\d\d\d)(?=\D|$)
which is
(?<=\D|^) # lookbehind for «\D or ^ (start-string)»
(\d\d\d) # then match «three digits such as "876"»
(?=\D|$) # lookahead for «\D or $ (end-string)»
and will
...capture "876" within "876" and "foo876" and " 876 " and "876" and "food876".
But NOT within "88foo9876" or "9876" or "a8876" or "a8876foo".
as you specified :D
Here it is shown below in RegexBuddy:
if you're using a language without lookbehind (like ECMA/JavaScript) you'll have to either use
(\D|^)(\d\d\d)(?=\D|$) # and use the second capturing group -or-
# use
(?:\D|^)(\d\d\d)(?=\D|$) # and use the first capturing group
EDIT: Updated according to clarified specs:
(?<!\d)\d{3}(?!\d)
Explanation:
(?<!\d) # Assert that there is no digit before the current position
\d{3} # Match exactly three digits
(?!\d) # Assert that there is no digit after the current position
(initial version preserved for archival purposes :))
^\D*\d{3}$
if I understand you correctly.
Explanation:
^ # start of string
\D* # zero or more non-digits
\d{3} # exactly 3 digits
$ # end of string
^\D*\d{3}$
The above works but your requirements are a little vague. Non digit means literally non digits so everything else is allowed even spaces.
(?<!\d)(\d{3})(?!\d)
Test here: http://gskinner.com/RegExr/?2utct
Using zero width capturing groups. Means 3 digits not preceeded by a digit and not followed by a digit. The only thing captured is the 3 digits.
Note that if you are using .NET, instead of \d
you should use [0-9] to not capture things like 09E6 ০ BENGALI DIGIT ZERO (the ০ is your digit :-) )
I’m assuming that what you actually want is a regexp that matches legal variable names as defined by many programming languages. Let’s say you’re after strings with at least one non-digit at the beginning, then anything: that would be /^\D+.*/
(your mileage may vary, depending on the programming language). Of course, if I’m right in my assumption, \D
is actually not at all what you want at the beginning; you’d rather want a list of characters that can legally start a variable (roughly, alphabetical character, plus the underscore, and possibly a few other characters). Hence that would be more like /[A-Za-z_]+.*/
But you really need to be more specific, as has already been said.
This is a regex that will match a sequence of 3 digits not immediatly preceded or succeeded by another digit.
[^\d](\d{3})[^\d]/
The caret (^) negates the character class which means it matches anything except digits. the {3} specifies how many digits need to be in the digit sequence in the middle.
Edit, sorry I didn't test it on single strings in which the sequence starts and/or ends with a digit in a sequence we want.. This should fix that, gets a few extra captures, but you can ignore those. Fixed that too since I'm too much of a perfectionist
(?:^|[^\d])(\d{3})(?:[^\d]|$)
Some more explanation, in parts.
(?:^|[^\d])
; the ?:
makes the group (everything, between the () brackets) non-capturing. ^|[^\d]
means either the start of the string, or anything that isn't a digit.
(\d{3})
; capture group of exactly 3 digits
(?:[^\d]|$)
; basically does the same as the beginning but then with the end of a string or anything that is not a digit...
精彩评论