开发者

Python regex help

a = Account(unit = 2, path='/real/os/win/today/axl.xls', realname = 'st')

What I want is escape the ' to html entities, which is '

开发者_JAVA百科

remember, the string after path can be anything, I need a generic way to do this.

The output of this string is

Account(unit = 2, path='/real/os/win/today/axl.xls', realname = 'st')


re.sub(r"path=\'([^\']*)\'", "path='\1'", str)


If you want to convert '/real/os/win/today/axl.xls' to '/real/os/win/today/axl.xls' you can use "'/real/os/win/today/axl.xls'".replace("'", ''') instead of using regex.


What you have are non-HTML entities. If I remember it right, there are 3 such types of &... entities, e.x.-       all mean U+00A0 NO-BREAK SPACE.

  - (the type you have) is a "numeric character reference" (decimal).

  - is a "numeric character reference" (hexadecimal).

  - is an entity.

You could check out Fredrick Luth's Unescape HTML script (for python2.x) & more about HTML entities here


if i understood the question correctly:

>>> a = "Account(unit = 2, path='/real/os/win/today/axl.xls', realname = 'st')"
>>> re.sub("(?<=path=').*", lambda x: '&#39'+x.group(0), a)
"Account(unit = 2, path='&#39/real/os/win/today/axl.xls', realname = 'st')"


I prefer BeautifulSoup for all this stuff. Check out http://www.crummy.com/software/BeautifulSoup/documentation.html#Entity%20Conversion for more.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜