Python regex sub space
CODE:
word = 'aiuhsdjfööäö ; sdfdfd'
word1=re.sub('[^^äÄöÖåÅA-Za-z0-开发者_开发问答9\t\r\n\f()!{$}.+?|]',"""\[^^0-9\t\r\n\f(!){$}.+?|\]*""", word) ; print 'word= ', word
word2=re.sub('[^^äÄöÖåÅA-Za-z0-9\t\r\n\f()!{$}.+?|]',"""\[^^0-9\\t\\r\\n\\f(!){$}.+?|\]*""", word) ; print 'word= ', word
word3=re.sub('[^^äÄöÖåÅA-Za-z0-9\t\r\n\f()!{$}.+?|]',"""\[^^0-9\\\t\\\r\\\n\\\f(!){$}.+?|\]*""", word) ; print 'word= ', word
word4=re.sub('[^^äÄöÖåÅA-Za-z0-9\s()!{$}.+?|]',"""\[^^0-9\s(!){$}.+?|\]*""", word) ; print 'word= ', word
word5=re.sub('[^^äÄöÖåÅA-Za-z0-9\s()!{$}.+?|]',"""\[^^0-9\\s(!){$}.+?|\]*""", word) ; print 'word= ', word
word6=re.sub('[^^äÄöÖåÅA-Za-z0-9\s()!{$}.+?|]',"""\[^^0-9\\\s(!){$}.+?|\]*""", word) ; print 'word= ', word
F=open('suoriP.txt','w')
F.writelines(word1+'\n\n'+word2+'\n\n'+word3+'\n\n'+word4+'\n\n'+word5+'\n\n'+word6)
F.close
RESULT:
aiuhsdjfööäö\[^^0-9
(!){$}.+?|\]*\[^^0-9
(!){$}.+?|\]*\[^^0-9
(!){$}.+?|\]*sdfdfd
aiuhsdjfööäö\[^^0-9
(!){$}.+?|\]*\[^^0-9
(!){$}.+?|\]*\[^^0-9
(!){$}.+?|\]*sdfdfd
aiuhsdjfööäö\[^^0-9\ \
\
\(!){$}.+?|\]*\[^^0-9\ \
\
\(!){$}.+?|\]*\[^^0-9\ \
\
\(!){$}.+?|\]*sdfdfd
aiuhsdjfööäö \[^^0-9\s(!){$}.+?|\]* sdfdfd
aiuhsdjfööäö \[^^0-9\s(!){$}.+?|\]* sdfdfd
aiuhsdjfööäö \[^^0-9\s(!){$}.+?|\]* sdfdfd
QUESTION:
I do not understand why:
re does not substitute backslashes, \s, \s, \\s are all substituted as \s
re does not substitute \\t\\r\\n\\f for ';'
I am trying to generate complicated re patterns with variable names by analyzing a file.
I am not able to generate space characters representation [^^äÄöÖåÅA-Za-z0-9\t\r\n\f()!{$}.+?|]
. I mean if I find in the text file ';' with word1=re.sub('[^^äÄöÖåÅA-Za-z0-9\t\r\n\f()!{$}.+?|]',....
I am not able to substitute this character ';' by string '[^^äÄöÖåÅA-Za-z0-9\t\r\n\f()!{$}.+?|]'
This string is a pattern string, which I use in re.search
to extract certain words as variables.
SOLUTION< WHICH EMERGED LATER AND IS ADDED LATER.
In the end I replaced xxxx instead of space special characters. Later merged, split and merged string by adding '\t\n\f\v\r'.
strsub=smart_str('[^^äÄöÖåÅA-Za-z0-9xxxx()!{$}.+?|`\"£$\%&_+~#\'@><]+', encoding='utf-8', strings_only=False, errors='replace' )
word=re.sub('[^^äÄöÖåÅA-Za-z0-9\t\n\r\f()!{$}.+?|£$\%&_+~#\'@><]+',strsub,word)
for line in word.split('xxxx'):
str2=str2+'\\t\\n\\f\\v\\r'+line
F.writelines(str2)
When you use re.sub
the second part won't be regex -- you simply should group it and call it in \1
or \2
for example:
word="aiuhsdjfööäö"
word1=re.sub("(.+?)[äa](.+?)","\1a\2 [corrected]",word)
What I did above is completely unnecessary but I did it to show my point that using [
doesn't have to come after \
when you use it as the second part of re.sub
精彩评论