开发者

Scrape a number from separate spans

I need to scrape the number 622104 from this html

How can I get the number?

<div class="numbersBackground">
        <div id="ctl00_mainContent_playersOnlineNumberRepeater_ctl00_numberPanel" class="number">
        <div class="numberWrapper"><span>6</span></div>
    </div><div id="ctl00_mainContent_playersOnlineNumberRepeater_ctl01_numberPanel" class="number">
        <div class="numberWrapper"><span>2</span></div>
    </div><div id="ctl00_mainContent_playersOnlineNumberRepeater_ctl02_num开发者_如何学JAVAberPanel" class="number">
        <div class="numberWrapper"><span>2</span></div>
    </div><div id="ctl00_mainContent_playersOnlineNumberRepeater_ctl03_commaPanel" class="comma">

    </div><div id="ctl00_mainContent_playersOnlineNumberRepeater_ctl04_numberPanel" class="number">
        <div class="numberWrapper"><span>1</span></div>
    </div><div id="ctl00_mainContent_playersOnlineNumberRepeater_ctl05_numberPanel" class="number">
        <div class="numberWrapper"><span>0</span></div>
    </div><div id="ctl00_mainContent_playersOnlineNumberRepeater_ctl06_numberPanel" class="number">
        <div class="numberWrapper"><span>4</span></div>
    </div>
</div>


Using the DOMDocument class to parse the HTML string, thanks to its loadHTML method, you could use an XPath query (using the DOMXpath class) to find all <div> tag with a class="numberWrapper" attribute.

Then, iterate over those, concatenating their content to a variable -- which, at the end of the loop, will contain your number.


For example, you could have this kind of code :

$str = <<<HTML
... HERE YOUR HTML ...
HTML;

$number = '';

$dom = new DOMDocument();
if ($dom->loadHTML($str)) {
    $xpath = new DOMXpath($dom);
    $results = $xpath->query('//div[@class="numberWrapper"]');
    foreach ($results as $div) {
        $number .= $div->nodeValue;
    }
}

var_dump($number);

And, as output, you'd get :

string '622104' (length=6)


You could also use the following XPath query, to make sure you're only working with the <span> tags :

$results = $xpath->query('//div[@class="numberWrapper"]/span');

Here, as the <div>s only contain the <span>, the result will be the same -- but it might change, in other situations.


Of course (just to make sure it's said) : Regular Expressions are not the right way to extract informations from an HTML string.



Edit after the comment :

If there are other <div>s you don't want to take into account, you'll have to find another XPath query -- that matches what you want to extract.

For example, maybe something like this would do the trick :

$results = $xpath->query('//div[@class="numbersBackground"]//div[@class="numberWrapper"]/span');

Of course, up to you to find out exactly what matches your the structure of your HTML document.


If you want to download the HTML, you have two solutions :

  • If allow_url_fopen is enabled on your server, you can use DOMDocument::loadHTMLFile(), passing it the URL as a parameter.
  • Else, you'll have to download the HTML content, using, for instance, curl.


As a sidenote, if you get warnings before your HTML is not valid, you'll want to take a look at the libxml_use_internal_errors() function ;-)

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜