How to strip tags in PHP using regex?
$string = 'text <span style="color:#f09;">text</span>
<span class="data" data-url="http://www.google.com">google.com</span>
text <span clas开发者_如何学Cs="data" data-url="http://www.yahoo.com">yahoo.com</span> text.';
What I want to do is get the data-url from all spans with the class data. So, it should output:
$string = 'text <span style="color:#f09;">text</span>
http://www.google.com text http://www.yahoo.com text.';
And then I want to remove all the remaining html tags.
$string = strip_tags($string);
Output:
$string = 'text text http://www.google.com text http://www.yahoo.com text.';
Can someone please tell me how this can be done?
If your string contains more than just the HTML snippet you show, you should use DOM with this XPath
//span/@data-url
Example:
$dom = new DOMDocument;
$dom->loadHTML($string);
$xp = new DOMXPath($dom);
foreach( $xp->query('//span/@data-url') as $node ) {
echo $node->nodeValue, PHP_EOL;
}
The above would output
http://www.google.com
http://www.yahoo.com
When you already have the HTML loaded, you can also do
echo $dom->documentElement->textContent;
which returns the same result as strip_tags($string)
in this case:
text text
google.com
text yahoo.com text.
Try to use SimpleXML
and foreach by the elements - then check if class
attribute is valid and grab the data-url
's
preg_match_all("/data/" data-url=/"([^']*)/i", $string , $urls);
You can fetch all URls a=by this way.
And you can also use simplexml as hsz mentioned
The short answer is: don't. There's a lovely rant somewhere around SO explaining why parsing html with regexes is a bad idea. Essentially it boils down to 'html is not a regular language so regular expressions are not adequate to parse it'. What you need is something DOM aware.
As @hsz said, SimpleXML is a good option if you know that your html validates as XML. Better might be DOMDocument::loadHTML which doesn't require well-formed html. Once your html is in a DOMDocument object then you can extract what you will very easily. Check out the docs here.
精彩评论