Scrape a number from separate spans
I need to scrape the number 622104 from this html
How can I get the number?
<div class="numbersBackground">
<div id="ctl00_mainContent_playersOnlineNumberRepeater_ctl00_numberPanel" class="number">
<div class="numberWrapper"><span>6</span></div>
</div><div id="ctl00_mainContent_playersOnlineNumberRepeater_ctl01_numberPanel" class="number">
<div class="numberWrapper"><span>2</span></div>
</div><div id="ctl00_mainContent_playersOnlineNumberRepeater_ctl02_num开发者_如何学JAVAberPanel" class="number">
<div class="numberWrapper"><span>2</span></div>
</div><div id="ctl00_mainContent_playersOnlineNumberRepeater_ctl03_commaPanel" class="comma">
</div><div id="ctl00_mainContent_playersOnlineNumberRepeater_ctl04_numberPanel" class="number">
<div class="numberWrapper"><span>1</span></div>
</div><div id="ctl00_mainContent_playersOnlineNumberRepeater_ctl05_numberPanel" class="number">
<div class="numberWrapper"><span>0</span></div>
</div><div id="ctl00_mainContent_playersOnlineNumberRepeater_ctl06_numberPanel" class="number">
<div class="numberWrapper"><span>4</span></div>
</div>
</div>
Using the DOMDocument
class to parse the HTML string, thanks to its loadHTML
method, you could use an XPath query (using the DOMXpath
class) to find all <div>
tag with a class="numberWrapper"
attribute.
Then, iterate over those, concatenating their content to a variable -- which, at the end of the loop, will contain your number.
For example, you could have this kind of code :
$str = <<<HTML
... HERE YOUR HTML ...
HTML;
$number = '';
$dom = new DOMDocument();
if ($dom->loadHTML($str)) {
$xpath = new DOMXpath($dom);
$results = $xpath->query('//div[@class="numberWrapper"]');
foreach ($results as $div) {
$number .= $div->nodeValue;
}
}
var_dump($number);
And, as output, you'd get :
string '622104' (length=6)
You could also use the following XPath query, to make sure you're only working with the <span>
tags :
$results = $xpath->query('//div[@class="numberWrapper"]/span');
Here, as the <div>
s only contain the <span>
, the result will be the same -- but it might change, in other situations.
Of course (just to make sure it's said) : Regular Expressions are not the right way to extract informations from an HTML string.
Edit after the comment :
If there are other <div>
s you don't want to take into account, you'll have to find another XPath query -- that matches what you want to extract.
For example, maybe something like this would do the trick :
$results = $xpath->query('//div[@class="numbersBackground"]//div[@class="numberWrapper"]/span');
Of course, up to you to find out exactly what matches your the structure of your HTML document.
If you want to download the HTML, you have two solutions :
- If
allow_url_fopen
is enabled on your server, you can useDOMDocument::loadHTMLFile()
, passing it the URL as a parameter. - Else, you'll have to download the HTML content, using, for instance, curl.
As a sidenote, if you get warnings before your HTML is not valid, you'll want to take a look at the libxml_use_internal_errors()
function ;-)
精彩评论