Regexp matching attributes for html element
I'm working on a regular expression pattern to extract tag and attributes from an html element. But I have some problems with matching the attributes :s. Only the last attribute is stored into the matches array.
Here is the code:
<?php
$subject = '<font face="arial" size="1" color="red">hello world!</font>';
$find= '/<(?P<tag>\w+)\s+((?P<attr>\w+)=(?P<value>[^\s""\'>]+|"[^"]*"|\'[^\']*\')\s*)*\/?>/si';
preg_match_all( $find, $subject, $matches );
?>
Can someone help me开发者_JAVA技巧 out?
Many thanks
Some important points:
- You shouldn't use regex to parse HTML. PHP has many excellent HTML parsing libraries.
- A group that captures repeatedly in a match only keeps the last capture.
- One notable exception is .NET regex
References
- regular-expressions.info/Brackets for Capturing
Related questions
- Robust, Mature HTML Parser for PHP
- Is there a regex flavor that allows me to count the number of repetitions matched by the * and + operators? (Yes! .NET keeps all intermediate captures!)
精彩评论