Regular Expression to get innertext of span tag
I would like to parse following string to get the value 开发者_C百科"46.4400 INR"
<div id=currency_converter_result>1 USD = <span class=bld>46.4400 INR</span>
<input type=submit value="Convert">
</div>
What regular expression do I need to use for this?
// Create a DOM object from a URL
$html = file_get_html('http://www.example.com/');
echo $html->find('span.bld', 0)->innertext;
http://simplehtmldom.sourceforge.net/manual.htm
I think people are going too far in this "can't use regex to parse html" holy war. There is a difference between parsing (X|HT)ML and parsing a simple string which happens to contain a few HTML tags.
According to the specifications in the question this should do:
preg_match('#<span class=bld>(.*?)</span>#', $string, $match);
$value = $match[1];
Why would you use regular expressions? I think you should read your x/html document into simlpleXml and use xpath to retrieve the desired value. Of course you can use regular expressions, but a xpath-solution would be nicer, imo.
$xml = simplexml_load_file("/path/to/document.html");
$node = $xml->xpath("/path/in/doc/to/span[class=bld]");
...
$subject = "<div id=currency_converter_result>1 USD = <span class=bld>46.4400 INR</span>";
$pattern = '/<div id=currency_converter_result>.*?<span.*?>(.*?)<\/span>/';
preg_match($pattern, $subject, $matches);
print_r($matches);
DOM+Xpath > Regex:
<?php
$str = '
<div id=currency_converter_result>1 USD = <span class=bld>46.4400 INR</span>
<input type=submit value="Convert">
</div>';
$d = new DOMDocument();
$d->loadHTML( $str );
$x = new DOMXpath($d);
$xpr = $x->evaluate('//span[contains(@class, "bld")]');
if ( count( $xpr ) ) {
foreach ( $xpr as $el ) {
echo $el->nodeValue;
}
}
?>
Of course feel free to use simplexml
or other similar libraries that involve less code.
Example of the chosen answer breaking, if the HTML was altered as Milan suggested:
<?php
$subject = '
<div>
<div id=currency_converter_result/><b/>1 USD = <span class=bld one>46.4400 INR</span>
<input type=submit value="Convert">
</div></div><span/>';
$pattern = '/<div id=currency_converter_result>.*?<span.*?>(.*?)<\/span>/';
preg_match($pattern, $subject, $matches);
print_r($matches); // output is Array ( )
Other regex answer breaking:
<?php
$subject = '
<div>
<div id=currency_converter_result/><b/>1 USD = <span class=bld one>46.4400 INR</span>
<input type=submit value="Convert">
</div></div><span/>';
preg_match('#<span class=bld>(.*?)</span>#', $subject, $match);
$value = $match[1];
var_dump($value); // outputs NULL
My DOM/Xpath solution works perfectly with the altered markup:
<?php
$subject = '
<div>
<div id=currency_converter_result/><b/>1 USD = <span class=bld one>46.4400 INR</span>
<input type=submit value="Convert">
</div></div><span/>';
$d = new DOMDocument();
$d->loadHTML( $subject );
$x = new DOMXpath($d);
$xpr = $x->evaluate('//span[contains(@class, "bld")]');
if ( count( $xpr ) ) {
foreach ( $xpr as $el ) {
echo $el->nodeValue; // output 46.4400 INR
}
}
?>
精彩评论