开发者

Regex to remove div tags but not their content

Let's say this is my HTML:

<ul>
    <li><div style="width: 10em;">Hello</div><div class="ble"></div></li>
</ul>

I want to get this:

<ul>
    <li>Hello</li>
</ul>

As开发者_如何学Go you can see, all div opening and closing tags were removed but not their content!

This is what I have so far:

$patterns = array();
$patterns[0] = '/<div.*>/';
$patterns[1] = '/</div>/';
$replacements = array();
$replacements[2] = '';
$replacements[1] = '';
echo preg_replace($patterns, $replacements, $html);


replace '/<div.*>/' with '/<div.*?>/' This will remove greedy behavior of the * and match the first > encountered.

Also, you need to escape the backslash in your pattern for matching the closing tag - use:

'/<\/div>/';


I would start with replacing both <div[^>]*> and </div[^>]*>with nothing. Though I know little about the specific PHP regex engine, the following sed worked fine:

pax> cat qq.in
<ul>
    <li><div style="width: 10em;">Hello</div><div class="ble"></div></li>
</ul>

pax> cat qq.in | sed -e 's/<div[^>]*>//g' -e 's/<\/div>//g'
<ul>
    <li>Hello</li>
</ul>

In fact, you should be able to combine that into one regex </?div[^>]*>:

pax> cat qq.in | sed -r -e 's_</?div[^>]*>__g'
<ul>
    <li>Hello</li>
</ul>
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜