开发者

How do I fetch another websites info from a URL like Digg's submit button?

I'm creating a website using the cakePHP framework and I and a newbie to php and web programming. I want to do something similar to Digg's submit button, where you type a url and it fetches an image, title and sometimes a short description of the article on the webpage. I'm assumin开发者_运维百科g this would be done using php but I'm open to any method.


You'll need to do a few simple things:

  1. You'll need to use the curl functions of PHP to get the source for the webpage. The php.net site provides a great example of this.

  2. From that source, you'll need to find the title of the page, and any images. The easiest way would probably be through a simple regular expression.

Here's a simple script example which does both:

<?php 
$ch = curl_init(); 
curl_setopt($ch, CURLOPT_URL, "stackoverflow.com"); 
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); 
$output = curl_exec($ch); 
curl_close($ch);

$titles = array();
preg_match_all("/<title>(.*)<\/title>/im", $output, &$titles, PREG_PATTERN_ORDER);

$images = array();
preg_match_all("/<img *src= *['\"](.*)['\"](.*)\/*>/iU", $output, &$images, PREG_PATTERN_ORDER);

$page_title = $titles[1][0];
$images_found = $images[1];

echo "Page title was: {$page_title}\n";
foreach($images_found as $image_src) echo "Image: {$image_src}\n";
?>

The regular expressions I included are imperfect, and won't catch all titles or all images in every case, but they're both good starts.

You'll also need to pick which image you want to use from the array $images. You can do this randomly, or based on the largest image on the page, or the first one you find, etc.


You grab the source of the page in question (cURL library or file_get_contents() if fopen() URL wrappers are enabled) and parse it for those details.

Title can be the title element.

Description can be the meta description.

Image can be the largest image (a lot of different ways to look for it).

You can also look for The Open Graph Protocol...

<meta name="og:site_name" content="Stack Overflow" />
<meta name="og:url" content="http://www.stackoverflow.com/" />
<meta name="og:title" content="Hello" />
<meta name="og:image" content="http://www.gravatar.com/avatar/5a9f58455ea36c880bc46820255fb084?s=32&d=identicon&r=PG" />


I'm not too familiar with cake PHP, but I can give you a general idea of what you'll need to do.

First step would be to use AJAX to submit the URL to your server.

Then, the server will need to grab the html source. In php you can do:

$source = file_get_contents('http://www.example.com/')

There are probably other functions, but that one should work.

Once you have the source, you'll have to parse out the data you want. You can use regex or something else to do this part.

Then, you'll probably want to set the data you need to a php array, use

json_encode($my_array)

and return json. Then, do what you wish with it.

Hope this helps

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜