Trying to scrape the entire content of a div

2023-01-16 09:56 问答作者：

I have this project i'm working on and id like to add a really small list of nearby places using facebooks places in an iframe featured from touch.facebook.com I can easily just use touch.facebook.com/#/places_friends.php but then that loads the headers the and the other navigation bars for like messges, events ect bars and i just want the content.

I'm pretty sure from looking at the touch.facebook.com/#/places_friends.php source, all i need to load is the div "content" Anyway, i'm extremely new to php and im pretty sure what i think i'm trying to do is called web scraping.

For the sake of figuring things out on stackoverflow and not needing to worry about authentication or anything yet i want to load the login page to see if i can at least get the scraper to work. Once I have a working scraping code i'm pretty sure i can handle the rest. It has load everything inside the div. I've seen this done before so i know it is possible. and it will look exactly like what you see when you try to login at touch.facebook.com but without the blue facebook logo up top and thats what im trying to accomplish right here.

So here's the login page, im trying to load the div which contains the text boxes to login the actual login button. If it's done correctly we should just see those with no blur Facebook header bar above it.

I've tried

<?php
$page = file_get_contents('http://touch.facebook.com/login.php');
$doc = new DOMDocument();
$doc->loadHTML($page);
$divs = $doc->getElementsByTagName('div');
foreach($divs as $div) {
      if ($div->getAttribute('id') === 'login_form') {
         echo $div->nodeValue;
    }
}
?>

all that does is load a blank page.

I'开发者_高级运维ve also tried using http://simplehtmldom.sourceforge.net/

and i modified the example basic selector to

<?php
include('../simple_html_dom.php');

$html = file_get_html('http://touch.facebook.com/login.php');

foreach($html->find('div#login_form') as $e)
    echo $e->nodeValue;

?>

I've also tried

<?php
$stream = "http://touch.facebook.com/login.php";
$cnt = simplexml_load_file($stream);

$result = $cnt->xpath("/html/body/div[@id=login_form]");

for($i = 0; $i < $i < count($result); $i++){
    echo $result[$i];
}
?>

that did not work either

$stream = "http://touch.facebook.com";
$cnt = simplexml_load_file($stream);

$result = $nct->xpath("/html/body/div[@id=content]");

for ($i = 0; $i < count($result); $i++){
    echo $result[$i];
}

there was a syntax error in this line i removed it now just copy and paste and run this code

Im assuming that you can't use the facebook API, if you can, then I strongly suggest you use it, because you will save yourself from the whole scraping deal.

To scrape text the best tech is using xpath, if the html returned by touch.facebook.com is xhtml transitional, which it sould, the you should use xpath, a sample should look like this:

$stream = "http://touch.facebook.com";
$cnt = simplexml_load_file($stream);

$result = $nct->xpath("/html/body/div[@id=content]");

for ($i = 0; $i < $i < count($result); $i++){
    echo $result[$i];
}

You need to learn about your comparison operators

=== is for comparing strictly, you should be using ==

if ($div->getAttribute('id') == 'login_form')
{

}

Scraping isn't always the best idea for capturing data else where. I would suggest using Facebook's API to retrieve the values your needing. Scraping will break any time Facebook decides to change their markup.

http://developers.facebook.com/docs/api

http://github.com/facebook/php-sdk/

继续阅读：php scrape web-scraping

Trying to scrape the entire content of a div

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？