开发者

Python regex sub space

CODE:

word = 'aiuhsdjfööäö ; sdfdfd'
word1=re.sub('[^^äÄöÖåÅA-Za-z0-开发者_开发问答9\t\r\n\f()!{$}.+?|]',"""\[^^0-9\t\r\n\f(!){$}.+?|\]*""", word) ; print 'word=  ', word
word2=re.sub('[^^äÄöÖåÅA-Za-z0-9\t\r\n\f()!{$}.+?|]',"""\[^^0-9\\t\\r\\n\\f(!){$}.+?|\]*""", word) ; print 'word=  ', word
word3=re.sub('[^^äÄöÖåÅA-Za-z0-9\t\r\n\f()!{$}.+?|]',"""\[^^0-9\\\t\\\r\\\n\\\f(!){$}.+?|\]*""", word) ; print 'word=  ', word
word4=re.sub('[^^äÄöÖåÅA-Za-z0-9\s()!{$}.+?|]',"""\[^^0-9\s(!){$}.+?|\]*""", word) ; print 'word=  ', word
word5=re.sub('[^^äÄöÖåÅA-Za-z0-9\s()!{$}.+?|]',"""\[^^0-9\\s(!){$}.+?|\]*""", word) ; print 'word=  ', word
word6=re.sub('[^^äÄöÖåÅA-Za-z0-9\s()!{$}.+?|]',"""\[^^0-9\\\s(!){$}.+?|\]*""", word) ; print 'word=  ', word

F=open('suoriP.txt','w')
F.writelines(word1+'\n\n'+word2+'\n\n'+word3+'\n\n'+word4+'\n\n'+word5+'\n\n'+word6)
F.close

RESULT:

aiuhsdjfööäö\[^^0-9 

(!){$}.+?|\]*\[^^0-9    

(!){$}.+?|\]*\[^^0-9    

(!){$}.+?|\]*sdfdfd

aiuhsdjfööäö\[^^0-9 

(!){$}.+?|\]*\[^^0-9    

(!){$}.+?|\]*\[^^0-9    

(!){$}.+?|\]*sdfdfd

aiuhsdjfööäö\[^^0-9\    \
\
\(!){$}.+?|\]*\[^^0-9\  \
\
\(!){$}.+?|\]*\[^^0-9\  \
\
\(!){$}.+?|\]*sdfdfd

aiuhsdjfööäö \[^^0-9\s(!){$}.+?|\]* sdfdfd

aiuhsdjfööäö \[^^0-9\s(!){$}.+?|\]* sdfdfd

aiuhsdjfööäö \[^^0-9\s(!){$}.+?|\]* sdfdfd

QUESTION:

I do not understand why:

  1. re does not substitute backslashes, \s, \s, \\s are all substituted as \s

  2. re does not substitute \\t\\r\\n\\f for ';'

I am trying to generate complicated re patterns with variable names by analyzing a file.

I am not able to generate space characters representation [^^äÄöÖåÅA-Za-z0-9\t\r\n\f()!{$}.+?|]. I mean if I find in the text file ';' with word1=re.sub('[^^äÄöÖåÅA-Za-z0-9\t\r\n\f()!{$}.+?|]',....

I am not able to substitute this character ';' by string '[^^äÄöÖåÅA-Za-z0-9\t\r\n\f()!{$}.+?|]'

This string is a pattern string, which I use in re.search to extract certain words as variables.

SOLUTION< WHICH EMERGED LATER AND IS ADDED LATER.

In the end I replaced xxxx instead of space special characters. Later merged, split and merged string by adding '\t\n\f\v\r'.

strsub=smart_str('[^^äÄöÖåÅA-Za-z0-9xxxx()!{$}.+?|`\"£$\%&_+~#\'@><]+', encoding='utf-8', strings_only=False, errors='replace' )
word=re.sub('[^^äÄöÖåÅA-Za-z0-9\t\n\r\f()!{$}.+?|£$\%&_+~#\'@><]+',strsub,word)

for line in word.split('xxxx'):
     str2=str2+'\\t\\n\\f\\v\\r'+line 
     F.writelines(str2)


When you use re.sub the second part won't be regex -- you simply should group it and call it in \1 or \2 for example:

 word="aiuhsdjfööäö"
 word1=re.sub("(.+?)[äa](.+?)","\1a\2 [corrected]",word)

What I did above is completely unnecessary but I did it to show my point that using [ doesn't have to come after \ when you use it as the second part of re.sub

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜