开发者

split utf8 strings into parts with regexp

I need to find in text strings, which start with =? and ends with ?= and translate them. I ended with such an expression :

re.sub('=\?[\w\?\-\/=\+\:\;_\,\[\]\(\)\<\>]+\?=', decode_match, string)

It works in 95% cases, but it fails with similar strings:

=?utf-8asdfaDDS23=eFF?=-=?ut开发者_Go百科f-8?eadf-,=?=

Can someone try to help ?


You need the case where you have ? without matching a ?= in your pattern

'=\?(?:[^?]|\?[^=])+\?='


Does str.split('=?') do the trick?


why don't you write ? :

re.sub('=\?.+?\?=', decode_match, string)

This regex will match two times in '=?utf-8asdfaDDS23=eFF?=-=?utf-8?eadf-,=?='

'=?utf-8asdfaDDS23=eFF?='

'=?utf-8?eadf-,=?='

Is it what you want? When evoking a failure, you should give it with more precision.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜