Scrape a number from separate spans

2023-02-17 19:02 问答作者：

I need to scrape the number 622104 from this html

How can I get the number?

<div class="numbersBackground">
        <div id="ctl00_mainContent_playersOnlineNumberRepeater_ctl00_numberPanel" class="number">
        <div class="numberWrapper"><span>6</span></div>
    </div><div id="ctl00_mainContent_playersOnlineNumberRepeater_ctl01_numberPanel" class="number">
        <div class="numberWrapper"><span>2</span></div>
    </div><div id="ctl00_mainContent_playersOnlineNumberRepeater_ctl02_num开发者_如何学JAVAberPanel" class="number">
        <div class="numberWrapper"><span>2</span></div>
    </div><div id="ctl00_mainContent_playersOnlineNumberRepeater_ctl03_commaPanel" class="comma">

    </div><div id="ctl00_mainContent_playersOnlineNumberRepeater_ctl04_numberPanel" class="number">
        <div class="numberWrapper"><span>1</span></div>
    </div><div id="ctl00_mainContent_playersOnlineNumberRepeater_ctl05_numberPanel" class="number">
        <div class="numberWrapper"><span>0</span></div>
    </div><div id="ctl00_mainContent_playersOnlineNumberRepeater_ctl06_numberPanel" class="number">
        <div class="numberWrapper"><span>4</span></div>
    </div>
</div>

Using the DOMDocument class to parse the HTML string, thanks to its loadHTML method, you could use an XPath query (using the DOMXpath class) to find all <div> tag with a class="numberWrapper" attribute.

Then, iterate over those, concatenating their content to a variable -- which, at the end of the loop, will contain your number.

For example, you could have this kind of code :

$str = <<<HTML
... HERE YOUR HTML ...
HTML;

$number = '';

$dom = new DOMDocument();
if ($dom->loadHTML($str)) {
    $xpath = new DOMXpath($dom);
    $results = $xpath->query('//div[@class="numberWrapper"]');
    foreach ($results as $div) {
        $number .= $div->nodeValue;
    }
}

var_dump($number);

And, as output, you'd get :

string '622104' (length=6)

You could also use the following XPath query, to make sure you're only working with the <span> tags :

$results = $xpath->query('//div[@class="numberWrapper"]/span');

Here, as the <div>s only contain the <span>, the result will be the same -- but it might change, in other situations.

Of course (just to make sure it's said) : Regular Expressions are not the right way to extract informations from an HTML string.

Edit after the comment :

If there are other <div>s you don't want to take into account, you'll have to find another XPath query -- that matches what you want to extract.

For example, maybe something like this would do the trick :

$results = $xpath->query('//div[@class="numbersBackground"]//div[@class="numberWrapper"]/span');

Of course, up to you to find out exactly what matches your the structure of your HTML document.

If you want to download the HTML, you have two solutions :

If allow_url_fopen is enabled on your server, you can use DOMDocument::loadHTMLFile(), passing it the URL as a parameter.
Else, you'll have to download the HTML content, using, for instance, curl.

As a sidenote, if you get warnings before your HTML is not valid, you'll want to take a look at the libxml_use_internal_errors() function ;-)

继续阅读：php

Scrape a number from separate spans

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？