开发者

template removal/detection/difference utility for HTML and other text

I remember reading a whi开发者_开发问答le back on some random website about a program that would look at multiple pages on an HTML site and detect the differences/similarities between the pages to automatically detect which parts were template "boilerplate" and which parts were new content, and then based on this, automatically spit out just the parts that are content.

Unfortunately, I didn't remember enough details about this utility to actually find it on google, so I wonder if any of you guys have run across anything like this, and CAN remember the name of it.

Thanks.


Murphy's Law (or is it some other law) has stricken, and I've found it just moments after I'd given up and posted this question. The project I am thinking of is this:

http://code.google.com/p/boilerpipe/

Thanks.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜