开发者

regex question - if I knew how to ask it proprly i'd properly know the answer as well?

So basically my regex is not working as I expect & I don't know why.

I am working in a fairly regulated environment so this should not be too much of a problem - all the html tags are generated by a script & follow this pattern: only li, p and h(3-6) tags are present. all text is between tags and there are no spaces between tags.

I 'need' to write something to surround the lis with ul t开发者_如何学运维ags. here is what i got:

preg_replace('#(<li>[^<p|<h]+</li>)(?!<li>)#', '<ul>$1</ul>', $html)

however it only matches the last li pair in a set for some reason. Anyone can tell me why ... please?


[^<p|<h] doesn't do what you expect. It matches a single character that is not any of the characters <p|h. If your HTML really is as constrained as you say, and you cannot have an <li> nested inside another <li>, then the following should work:

preg_replace('#(<li>.*?</li>)+#', '<ul>$0</ul>', $html)

The sequence .*? is just like .* except the trailing ? is the non-greedy modifier. By default .* is greedy - it will consume as many characters as it can, then backtrack if the rest of the pattern doesn't match. The non-greedy modifier inverts this. It consumes as few characters as it can and advances if the rest of the pattern cannot match. As the rest of the pattern is simply </li>, this effectively captures all text up to, but not including, the first sequence </li>. This pattern is then nested inside a capture which is then repeated with +, meaning it will match one or more sequences of <li> tags.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜