split utf8 strings into parts with regexp

2023-02-08 22:12 问答作者：

I need to find in text strings, which start with =? and ends with ?= and translate them. I ended with such an expression :

re.sub('=\?[\w\?\-\/=\+\:\;_\,\[\]\(\)\<\>]+\?=', decode_match, string)

It works in 95% cases, but it fails with similar strings:

=?utf-8asdfaDDS23=eFF?=-=?ut开发者_Go百科f-8?eadf-,=?=

Can someone try to help ?

You need the case where you have ? without matching a ?= in your pattern

'=\?(?:[^?]|\?[^=])+\?='

Does str.split('=?') do the trick?

why don't you write ? :

re.sub('=\?.+?\?=', decode_match, string)

This regex will match two times in '=?utf-8asdfaDDS23=eFF?=-=?utf-8?eadf-,=?='

'=?utf-8asdfaDDS23=eFF?='

'=?utf-8?eadf-,=?='

Is it what you want? When evoking a failure, you should give it with more precision.

继续阅读：python regex utf-8

精彩评论