getting real link from rss feed link
I am experimenting with sc开发者_Go百科raping certain pages from an RSS feed using curl and php. The page scraping was working fine when I was just using actual links, not links from the rss feeds. However, I realize now that links in rss feeds are usually just redirects to the actual page (at least this is what it seems like). Because now when I scrape a page with the rss link, it doesn't actually get the information I am looking for.
Has anyone encountered this and know of a workaround. Is there anyway to see where the rss link is redirecting to and capturing that value?
I think you might need to use the -L
switch to tell it to follow redirects. I'm not sure if you can do this directly from PHP or whether you need to follow this approach http://php.net/manual/en/function.curl-setopt.php#95027. It is always possible that the site you are scraping blocks by user agent or something as well. Maybe try one of the links in a browser while running Fiddler or similar to see if any redirection is actually taking place.
精彩评论