Best way to generate feeds from pages that doesn't have RSS support

2023-02-14 08:59 问答作者：

The best example I saw so far is the http://www.instapaper.com/ . They can get the text from any page.

In my case, I need to get the text and also generate a list considering that I will have one page with the news list of each site.

For example, nytimes.com (just an example). I have to get all links and get the text if it exists. Also, maybe I need to specify some URL criteria, like generate feeds from links where 开发者_运维知识库the url contains something like "/[year]/[month/[day]/[category]/post-name".

I don't want the complete code, just the concept and best approach. Any ideias?

This is painful but the only good solution is to use an HTML parser and parse all the hrefs. I recommend using a library that allows you to easily select all hrefs. I have heard of this one http://code.google.com/p/phpquery/ but never used it. What you would do is load each page and then select all hrefs.

There is really no easier way. If you changed your technology to something like java or python, then you can leverage multi-thread power and speed up the process. Of course once you analyze, save the data in some database so you can later reference it.

Hope this helps.

继续阅读：feed php rss

Best way to generate feeds from pages that doesn't have RSS support

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？