How do I regex search for weird non-ASCII characters in Python?

2022-12-16 12:51 问答作者：

I'm using the following regular expression basically to search for and delete these characters.

invalid_unicode = re.compile(ur'(Û|²|°|±|É|¹|Í)')

My source code in ASCII encoded, and whenever I try to run the script it spits out:

SyntaxError: Non-ASCII character '\xdb' in file ./release.py on line 273, but no encoding declar开发者_如何学运维ed; see http://www.python.org/peps/pep-0263.html for details

If I follow the instructions at the given website, and place utf-8 on the second line encoding, my script doesn't run. Instead it gives me this error:

SyntaxError: (unicode error) 'utf8' codec can't decode byte 0xdb in position 0: unexpected end of data

How do I get this one regular expression running in an ASCII written script that'd be great.

You need to find out what encoding your editor is using, and set that per PEP263; or, make things more stable and portable (though alas perhaps a bit less readable) and use escape sequences in your string literal, i.e., use u'(\xdb|\xb2|\xb0|\xb1|\xc9|\xb9|\xcd)' as the parameter to the re.compile call.

After telling Python that your source file uses UTF-8 encoding, did you actually make sure that your editor is saving the file using UTF-8 encoding? The error you get indicates that your editor is probably not using UTF-8.

What text editor are you using?

\x{c0de}

In a regex will match the Unicode character at code point c0de.

Python uses PCRE, right? (If it doesn't, it's probably \uC0DE instead...)

继续阅读：ascii python regex unicode

How do I regex search for weird non-ASCII characters in Python?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？