开发者

Regular expression matching anything greater than eight letters in length, in Python

Despite attempts to master grep and related GNU software, I haven't come close to mastering regular expressions. I do like them, but I find them a bit of an eyesore all the same.

I suppose this question isn't difficult for some, but I've spent hours trying to figure out how to search through my favorite book for words greater than a certain length, and in the end, came up with some really ugly code:

twentyfours = [w for w in vocab if re.search('^........................$', w)]
twentyfives = [w for w in vocab if re.search('^.........................$', w)]
twentysixes = [w for w in vocab if re.search('^..........................$', w)]
twentysevens = [w for w in vocab if re.search('^...........................$', w)]
twentyeights = [w for w in vocab if re.search('^............................$', w)]

... a line for each length, all the way from a certain length to another one.

What I want instead is to be able to say 'give me every word in vocab that's greater than eight letters in length.' How would I do that开发者_如何学C?


You don't need regex for this.

result = [w for w in vocab if len(w) >= 8]

but if regex must be used:

rx = re.compile('^.{8,}$')
#                  ^^^^ {8,} means 8 or more.
result = [w for w in vocab if rx.match(w)]

See http://www.regular-expressions.info/repeat.html for detail on the {a,b} syntax.


\w will match letter and characters, {min,[max]} allows you to define size. An expression like

\w{9,}

will give all letter/number combinations of 9 characters or more


.{9,} for "more than eight", .{8,} for "eight or more"
Or just len(w) > 8


^.{8,}$

This will match something that has at least 8 characters. You can also place a number after the coma to limit the upper bound or remove the first number to not restrict the lower bound.


if you do want to use a regular expression

result = [ w for w in vocab if re.search('^.{24}',w) ]

the {x} says match x characters. but it is probably better to use len(w)

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜