Regular expression help for a date (python+
im trying to create an expression that matches 11.11.11 but not 111.11.111 i'm using python
keyword = re.compile(r"[0-9]*[0-9]\.[0-9]*[0-9]\.[0-9]*[0-9]")
the date could be at the start/end of a sentence and not have a white space but a next line before/after. how would i account for both ? as it is this will pick up up 11.11.11开发者_开发技巧 but also 111.11.11111 etc :(
*
means "zero or more of the preceding token". Therefore your regex will match anything from 1.1.1
to 999999.999999.99999
etc.
You can be more specific like this:
keyword = re.compile(r"\b[0-9]{2}\.[0-9]{2}\.[0-9]{2}\b")
The \b
word boundary anchors make sure that the numbers start/end at that position. Otherwise you could pick up substring matches (matching 34.56.78
in the string 1234.56.7890
, for example).
Of course, you'll need to validate whether it's actually a plausible date separately. Don't use regexes for this (it's possible but cumbersome), rather use the datetime
module's strptime()
classmethod.
You can use \b
to match a word boundary. For example, you could make your regular expression:
re.compile(r'\b\d{2}\.\d{2}\.\d{2}\b')
I've also used \d
to match any digit and the {2}
suffix to match two instances of whatever came previously. If you want to match either 1 or 2 digits in any of those cases, you could change the {2}
to {1,2}
.
Try using ?
instead of *
as a wildcard.
The ?
matches 0 or 1 instances of the previous element. In other words, it makes the element optional; it can be present, but it doesn't have to be.
This will match both 1.1.1
and 11.11.11
, but not 1111.1111.1111
:
keyword = re.compile(r"\b[0-9]?[0-9]\.[0-9]?[0-9]\.[0-9]?[0-9]\b")
精彩评论