Python regex help
a = Account(unit = 2, path='/real/os/win/today/axl.xls', realname = 'st')
What I want is escape the ' to html entities, which is '
remember, the string after path can be anything, I need a generic way to do this.
The output of this string is
Account(unit = 2, path='/real/os/win/today/axl.xls', realname = 'st')
re.sub(r"path=\'([^\']*)\'", "path='\1'", str)
If you want to convert '/real/os/win/today/axl.xls' to '/real/os/win/today/axl.xls'
you can use "'/real/os/win/today/axl.xls'".replace("'", ''')
instead of using regex.
What you have are non-HTML entities. If I remember it right, there are 3 such types of &...
entities, e.x.-    
all mean U+00A0 NO-BREAK SPACE
.
 
- (the type you have) is a "numeric character reference" (decimal).
 
- is a "numeric character reference" (hexadecimal).
- is an entity.
You could check out Fredrick Luth's Unescape HTML script (for python2.x) & more about HTML entities here
if i understood the question correctly:
>>> a = "Account(unit = 2, path='/real/os/win/today/axl.xls', realname = 'st')"
>>> re.sub("(?<=path=').*", lambda x: '''+x.group(0), a)
"Account(unit = 2, path=''/real/os/win/today/axl.xls', realname = 'st')"
I prefer BeautifulSoup
for all this stuff. Check out http://www.crummy.com/software/BeautifulSoup/documentation.html#Entity%20Conversion for more.
精彩评论