开发者

how to encode/decode escape sequence characters in python

how to encode/decode escape sequence character '\x13' in python into a character that is valid in a RSS or XML.

use case is, I am getting 开发者_开发问答data from arbitrary sources and making a RSS feed for that data. The data source sometimes have escape sequence character which is breaking my RSS feed.

So how can I sanitize the input data with escape sequence character.


\x13 (ASCII 19, ‘DC3’) can't be escaped; it is invalid in XML 1.0, period. You can include one, encoded as &#19; or &#x13; in XML 1.1, but then you have to include the <?xml version="1.1"?> declaration and many tools won't like it.

I've no idea why that character would be included in your data, but the way forward is probably to completely remove control codes. For example:

re.sub('[\x00-\x08\x0B-\x1F]', '', s)

For some kinds of escape sequence (eg. ANSI colour codes) you might get stray (non-control) characters still in there, in which case you'd probably want a custom parser for that particular format.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜