split utf8 strings into parts with regexp
I need to find in text strings, which start with =?
and ends with ?=
and translate them. I ended with such an expression :
re.sub('=\?[\w\?\-\/=\+\:\;_\,\[\]\(\)\<\>]+\?=', decode_match, string)
It works in 95% cases, but it fails with similar strings:
=?utf-8asdfaDDS23=eFF?=-=?ut开发者_Go百科f-8?eadf-,=?=
Can someone try to help ?
You need the case where you have ? without matching a ?= in your pattern
'=\?(?:[^?]|\?[^=])+\?='
Does str.split('=?')
do the trick?
why don't you write ? :
re.sub('=\?.+?\?=', decode_match, string)
This regex will match two times in '=?utf-8asdfaDDS23=eFF?=-=?utf-8?eadf-,=?='
'=?utf-8asdfaDDS23=eFF?='
'=?utf-8?eadf-,=?='
Is it what you want? When evoking a failure, you should give it with more precision.
精彩评论