Large text file dictionary of random words for benchmarking purposes?
I was wondering if anyone could point me to a very very large dictionary of random words that could be used to test some h开发者_如何转开发igh performance string data structures? I'm finding some that are in the ~2MB range... however I'd like some larger if possible. I'm guessing there has to be some large standard string dataset somewhere that could be used. Thanks!
http://norvig.com/big.txt
The above link was mentioned in Norvig's spell checker article - http://norvig.com/spell-correct.html
I'd recommend taking a look through the material available at the TREC (Text REtrieval Conference). Some good datasets which might meet your requirements.
精彩评论