Regex to remove div tags but not their content
Let's say this is my HTML:
<ul>
<li><div style="width: 10em;">Hello</div><div class="ble"></div></li>
</ul>
I want to get this:
<ul>
<li>Hello</li>
</ul>
As开发者_如何学Go you can see, all div opening and closing tags were removed but not their content!
This is what I have so far:
$patterns = array();
$patterns[0] = '/<div.*>/';
$patterns[1] = '/</div>/';
$replacements = array();
$replacements[2] = '';
$replacements[1] = '';
echo preg_replace($patterns, $replacements, $html);
replace '/<div.*>/'
with '/<div.*?>/'
This will remove greedy behavior of the *
and match the first >
encountered.
Also, you need to escape the backslash in your pattern for matching the closing tag - use:
'/<\/div>/';
I would start with replacing both <div[^>]*>
and </div[^>]*>
with nothing. Though I know little about the specific PHP regex engine, the following sed
worked fine:
pax> cat qq.in
<ul>
<li><div style="width: 10em;">Hello</div><div class="ble"></div></li>
</ul>
pax> cat qq.in | sed -e 's/<div[^>]*>//g' -e 's/<\/div>//g'
<ul>
<li>Hello</li>
</ul>
In fact, you should be able to combine that into one regex </?div[^>]*>
:
pax> cat qq.in | sed -r -e 's_</?div[^>]*>__g'
<ul>
<li>Hello</li>
</ul>
精彩评论