Importing /scraping page content form other sites?
i've been play开发者_开发技巧ing with php and also http://www.alchemyapi.com/, and embed.ly but i was wondering if there other options out there to import and parse a webpage, any page, either is a news site or a blog...
thanks
To fetch the data: curl
, file_get_contents
(may be others those are the two common)
To parse the data: PHP: DOM
, SimpleXML
preg_match
**
Since it was tagged with PHP, I only gave working information for PHP. There are tons of ways to do this, if you can narrow your question down to what you are trying to do it would help. The better ways to parse any site, is through their RSS feed if they have one, or through their API, speculating that they offer up the content you want via RSS/API.
** preg_match
is not a great alternative it does "work" but better to use the DOM / Simple XML functions if possible.
I wrote a crawler at work using cURL
and preg_match
Before I chose to do it that way, I had looked at DOM Parsers http://php.net/manual/en/book.dom.php
精彩评论