how to use python re.sub()?
import re
re.sub('[a-zA-Z0-9/*\n\u]', '', string='\n\u3000\u3000xyz')
error:
File "<input>", line 2
re.sub('[a-zA-Z0-9/*\n\u]', '', string='\开发者_运维问答n\u3000\u3000xyz')
^
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 14-15: truncated \uXXXX escape
I want to delete '\u' in string'\n\u3000\u3000xyz', but it didn't work.
As @Akax stated "\u]"
is an invalid bit of Python since \u
is the escape character for an Unicode code. what you can do is say to python it is a raw
string by adding prefix r
in the re.sub
as follows.
import re
re.sub(r'[a-zA-Z0-9/*\n\\u]', '', string='\n\u3000\u3000xyz')
Note: if we using a raw string then \u
should be chnaged to ---> \\u
Since \u is an escape character in python, you will have to convert the matching pattern and input string into raw string by putting r
before your string.
import re
re.sub(r'\\u','',r'\n\u3000\u3000xyz')
Output -
\\n30003000xyz
But this as you can see is a raw string and expected output should be \n30003000xyz
. Hence you'll have to convert it back to normal string.
import re
import codecs
codecs.decode(re.sub(r'\\u','',r'\n\u3000\u3000xyz'),'unicode_escape')
Result -
\n30003000xyz
精彩评论