开发者

Regexp matching attributes for html element

I'm working on a regular expression pattern to extract tag and attributes from an html element. But I have some problems with matching the attributes :s. Only the last attribute is stored into the matches array.

Here is the code:

<?php
    $subject = '<font face="arial" size="1" color="red">hello world!</font>';
    $find= '/<(?P<tag>\w+)\s+((?P<attr>\w+)=(?P<value>[^\s""\'>]+|"[^"]*"|\'[^\']*\')\s*)*\/?>/si';

    preg_match_all( $find, $subject, $matches );
?>

Can someone help me开发者_JAVA技巧 out?

Many thanks


Some important points:

  • You shouldn't use regex to parse HTML. PHP has many excellent HTML parsing libraries.
  • A group that captures repeatedly in a match only keeps the last capture.
    • One notable exception is .NET regex

References

  • regular-expressions.info/Brackets for Capturing

Related questions

  • Robust, Mature HTML Parser for PHP
  • Is there a regex flavor that allows me to count the number of repetitions matched by the * and + operators? (Yes! .NET keeps all intermediate captures!)
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜