add web scraper to wordpress site similar to facebook functionality
As I'm sure everyone is aware when you enter a url on facebook either in the status or when leaving a comment, it automatically retrieves an image from the article along with the title and meta description I think.
I would really love to implement a feature like this into a site I am building. Only problem is, I have no idea where to start!
Ideally, I would like to have a dedicated page in the webiste that is used to link to other articles of interest. I would just like to display an image, the title and a few lines of descriptive text. The title would link directly to the source.
Does anyone have any advice or pointers that could help me out? Totally appreciate any tips you guys 开发者_高级运维have.
Many thanks
-J
This may help: http://net.tutsplus.com/tutorials/php/html-parsing-and-screen-scraping-with-the-simple-html-dom-library/
The tutorial is using PHP Simple HTML DOM Parser to parser html content from a file or from an url.
I had to do something similar a while ago , I used Jquery (along with php as proxy ) to accomplish this .
<script type="text/javascript">
$(document).ready(function()
{
$("#statusbox").keyup(function()
{
var content=$(this).val();
var urlRegex = /(\b(https?|ftp|file):\/\/[-A-Z0-9+&@#\/%?=~_|!:,.;]*[-A-Z0-9+&@#\/%=~_|])/ig;
var url= content.match(urlRegex);
if(url.length>0)
{
$("#statusbox").slideDown('show');
$("#statusbox").html("<img src='ajax_loader.gif'>");
// php proxy to get details of the page (bypass cross domain thing)
$.get("proxy.php?url="+url,function(response)
{
var title=(/<title>(.*?)<\/title>/m).exec(response)[1];
var logo=(/src='(.*?).jpg'/m).exec(response)[1];
$("#statusbox").html("<img src='"+logo+".jpg' class='img'/><div><b>"+title+"</b><br/>"+url)
});
}
return false;
});
});
Sure this can be improved .. and the php file can be as simple as
<?php
if($_GET['url'])
{
$url=$_GET['url'];
echo file_get_contents($url);
}
?>
Other better way is to use Curl and retrieve the contents of the webpage using php itself using a better html parser..
another solution (Free+ Paid ) is to use Embedly
Edit : Btw Embedly has a worpress plugin ..
精彩评论