开发者

What is the best data structure to store words found in a document and a counter with their occurences?

Let's say I have a corpus of documents which I want to read one by one and store them in a data structure. The structure will probably be a list of something. That something class will define a single document. Inside that class I'll have to use a data structure to store the contents from each document, what that should be? Also, if I want to count occurrences of words and retrieve the most frequent words in each document, will I have to use a data 开发者_开发百科structure that will allow me to do this in time < O(n) that would take to examine all the contents sequentially?


Use an associative array, also called map or dictionary since different programming languages use different terms for the same data structure.

Every entry key would be a word and the counter would be the value of the entry. For example

{
  'on' -> 15,
  'and' -> 43,
  'I' -> 157,
  'confluence' -> 1,
  'dear' -> 2
}
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜