C : Store and index a HUGH amount of info! School Project

2022-12-13 12:49 问答作者：

i need to do a school project in C (I'm really don't know c++开发者_开发知识库 as well).

I need a data struct to index each word of about 34k documents, its a lot of words, and need to do some ranking about the words, i already did this project about 2 years ago (i'm pause in the school and back this year) and a i use a hash table of binary tree, but i got a small grade cause my project took about 2hours to index all words. I need something a little fast... any sugestions?

Tkz Roberto

If you have the option, I'd strongly recommend using a database engine (MSSQL, MySQL, etc.) as that's exactly the sort of datasets and operations these are written for. Best not to reinvent the wheel.

Otherwise, why use a btree at all? From what you've described (and I realise we're probably not getting the full story...) a straight up hash table with the word as a key and its rank/count of occurences should be useful?

bogofilter (the spam filter) has to keep word counts. It uses dbm as a backend, since it needs persistent storage of the word -> count map. You might want to look at the code for inspiration. Or not, since you need to implement the db part of it for the school project, not so much the spam filter part.

Minimize the amount of pointer chasing you have to do. Data-dependent memory-load operations are slow, esp. on a large working set where you will have cache misses. So make sure your hash table is big enough that you don't need a big tree in each bucket. And maybe check that your binary trees are dense, not degenerate linked lists, when you do get more than one value in a hash bucket.

If it's slow, profile it, and see if your problem is one slow function, or if it's cache misses, or if it's branch mispredictions.

继续阅读：c data-structures hash tree

C : Store and index a HUGH amount of info! School Project

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？