开发者

How to extract longest of overlapping groups?

How can I extract the longest of groups which start the same way

For example, from a given string, I want to extract the longest match to either CS or CSI.

I tried this "(CS|CSI).*" and it it will return CS rather than CSI even if CSI is available.

If I do "(CSI|CS).*" then I do get CSI if it's a match, so I gues the solution is to always place the shorter of the overlaping groups after the longer one.

Is there a clearer way to express this with re's? somehow it feels confusing that the result开发者_如何学运维 depends on the order you link the groups.


No, that's just how it works, at least in Perl-derived regex flavors like Python, JavaScript, .NET, etc.

http://www.regular-expressions.info/alternation.html


As Alan says, the patterns will be matched in the order you specified them.

If you want to match on the longest of overlapping literal strings, you need the longest one to appear first. But you can organize your strings longest-to-shortest automatically, if you like:

>>> '|'.join(sorted('cs csi miami vice'.split(), key=len, reverse=True))
'miami|vice|csi|cs'


Intrigued to know the right way of doing this, if it helps any you can always build up your regex like:

import re

string_to_look_in = "AUHDASOHDCSIAAOSLINDASOI"
string_to_match = "CSIABC"

re_to_use = "(" + "|".join([string_to_match[0:i] for i in range(len(string_to_match),0,-1)]) + ")"

re_result = re.search(re_to_use,string_to_look_in)

print string_to_look_in[re_result.start():re_result.end()]


similar functionality is present in vim editor ("sequence of optionally matched atoms"), where e.g. col\%[umn] matches col in color, colum in columbus and full column.

i am not aware if similar functionality in python re, you can use nested anonymous groups, each one followed by ? quantifier, for that:

>>> import re
>>> words = ['color', 'columbus', 'column']
>>> rex = re.compile(r'col(?:u(?:m(?:n)?)?)?')
>>> for w in words: print rex.findall(w)
['col']
['colum']
['column']
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜