Regular expression matching anything greater than eight letters in length, in Python
Despite attempts to master grep and related GNU software, I haven't come close to mastering regular expressions. I do like them, but I find them a bit of an eyesore all the same.
I suppose this question isn't difficult for some, but I've spent hours trying to figure out how to search through my favorite book for words greater than a certain length, and in the end, came up with some really ugly code:
twentyfours = [w for w in vocab if re.search('^........................$', w)]
twentyfives = [w for w in vocab if re.search('^.........................$', w)]
twentysixes = [w for w in vocab if re.search('^..........................$', w)]
twentysevens = [w for w in vocab if re.search('^...........................$', w)]
twentyeights = [w for w in vocab if re.search('^............................$', w)]
... a line for each length, all the way from a certain length to another one.
What I want instead is to be able to say 'give me every word in vocab that's greater than eight letters in length.' How would I do that开发者_如何学C?
You don't need regex for this.
result = [w for w in vocab if len(w) >= 8]
but if regex must be used:
rx = re.compile('^.{8,}$')
# ^^^^ {8,} means 8 or more.
result = [w for w in vocab if rx.match(w)]
See http://www.regular-expressions.info/repeat.html for detail on the {a,b}
syntax.
\w will match letter and characters, {min,[max]} allows you to define size. An expression like
\w{9,}
will give all letter/number combinations of 9 characters or more
.{9,}
for "more than eight", .{8,}
for "eight or more"
Or just len(w) > 8
^.{8,}$
This will match something that has at least 8 characters. You can also place a number after the coma to limit the upper bound or remove the first number to not restrict the lower bound.
if you do want to use a regular expression
result = [ w for w in vocab if re.search('^.{24}',w) ]
the {x} says match x characters. but it is probably better to use len(w)
精彩评论