开发者

extracting useful data from arbitary html pages?

2022-12-18 20:20 问答作者：

is there a library for ruby or php that is able to parse html pages and extract unique data by comparing it with other similar pages....开发者_如何转开发should use some sort of text mining to identify which texts are more likely noise and repetivie, while other texts are more unique and useful...

I'm a PHP guy, no idea about Ruby but I think that what you want is trivial to archive:

Use something like Simple HTML DOM to parse the pages.
For each page compare all the DOM elements.
Get the path of all elements that have different content, those will be your signal elements.

继续阅读：data-mining php ruby text text-mining

更多精彩内容

0 赞 0 踩 0 收藏

上一篇:黄金期货需要多少钱啊？黄金期货一手大概要多少钱？？

下一篇:GTK+ widget background image alignment

精彩评论

暂无评论...

登录注册

请自觉遵守互联网相关的政策法规，严禁发布色情、暴力、反动的言论！

验证码：

验证码

取消

最新问答

问答排行榜