开发者

Google Summer of Code: web classification dataset

I heard that Google hosted (or will host) a web classification competition and they provided a large (170k+ d开发者_如何学运维ocuments) dataset of web sites that were classified into multiple categories (sports, computers, science, etc.) I tried looking around in their Summer of Code web site for 2009 through 2011, but didn't find anything. Does anybody know where I can get that dataset?


I think I found it (although I'm not sure if the data was provided by google): the ECML/PKDD 2010 Discovery Challenge Data Set contains 22 training labels (i.e. labels about the content), URLs and hyperlinks, content-based and link-based web spam features, term frequencies and Natural Language Processing features.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜