开发者

Huge amount of plaintext data for parsing experiment

I am developing a parser in ruby which parses some nonunifor开发者_如何学Gom text data. Can anybody tell me, where I can get a good number of plaintext data for that?


Here's you'll get a list of many:

http://www.quora.com/Data/Where-can-I-get-large-datasets-open-to-the-public

And my fav is:

http://ftp.sunet.se/mirror/archive/ftp.sunet.se/pub/tv+movies/imdb/


You could scrape Wikipedia (or just run a bunch of it through lynx -dump). That would also give you a vast source of non-English text as well. Project Gutenberg would be another good source of large amounts of plain text.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜