Removing spaces and newlines between tags in html (aka unformatting) in python
An example:
<p> Hello</p>
<div>hgello</div>
<pre>
code
code
<pre>
turns in something like:
<p> Hell开发者_开发问答o</p><div>hgello</div><pre>
code
code
<pre>
How to do this in python? I make also intensive use of < pre> tags so substituting all '\n' with '' is not an option.
What's the best way to do that?
You could use re.sub(">\s*<","><","[here your html string]")
.
Maybe string.replace(">\n",">")
, i.e. look for an enclosing bracket and a newline and remove the newline.
I would choose to use the python regex:
string.replace(">\s+<","><")
Where the '\s' finds any whitespace character and the '+' after it shows it matches one or more whitespace characters. This removes the possibility of the replace replacing
<pre>
code
code
<pre>
with
<pre><pre>
More information about regular expressions can be found here, here and here.
精彩评论