Python regex: Fix one html close tag
<div>random contents without < or > , but has ( ) <div>
Just need to fix the closing div tag
so it looks like <div>random contents</div>
I need to do it in Python by regex.
The input is exact like the first line, there will no any < or > in random contents
replace
(<div>[^<]*<)(div>)
with
$1/$2
Note: This is bad practice, don't do it unless it's absolutely necessary!
I wouldn't recommend a regex - use something like tidy
(which is a Python wrapper around HTML Tidy).
Avoid using regular expressions for dealing with HTML.
This is how it would be parsed in a DOM tree as it currently is:
>>> from BeautifulSoup import BeautifulSoup
>>> BeautifulSoup('<div>random contents<div>')
<div>random contents<div></div></div>
Or are you wanting to turn the second <div>
into </div>
(which a browser certainly would not do)?
精彩评论