开发者

Synonym dictionary implementation?

How should I approach this problem? I basically need to implement 开发者_如何学Ca dictionary of synonyms. It takes as input some "word/synonim" pairs and I have to be able to "query" it for the list of all synonims of a word.

For example:


Dictionary myDic;

myDic.Add("car", "automobile");
myDic.Add("car", "autovehicle");
myDic.Add("car", "vehicle");
myDic.Add("bike", "vehicle");

myDic.ListOSyns("car") // should return {"automobile","autovehicle","vehicle" ± "car"}
                       // but "bike" should NOT be among the words returned

I'll code this in C++, but I'm interested in an overall idea of the implementation, so the question is not exactly language-specific.

PS: The main idea is to have some groups of words (synonyms). In the example above there would be two such groups:

{"automobile","autovehicle","vehicle", "car"} {"bike", "vehicle"}

"vehicle" belongs to both, "bike" just to the second one, the others just to the first


I would implement it as a Graph + hash table / search tree
each keyword would be a Vertex, and each connection between 2 keywords would be an edge.
a hash table or a search tree will connect from each word to its node (and vice versa).
when a query is submitted - you find the node with your hash/tree and do BFS/DFS of the required depth. (meaning you cannot continue after a certain depth)

complexity: O(E(d)+V(d)) for searching graph (d = depth) (E(d) = number of edges in the relevant depth, same for V(d))
O(1) for creating an edge (not including searching for the node, detailed below its search)
O(logn) / O(1) for finding node (for tree/hash table)
O(logn) /O(1) for adding a keyword to the tree/hash table and O(1) to add a Vertex
p.s. as mentioned: the designer should keep in mind if he needs a directed or indirected Graph, as mentioned in the comments to the question.
hope that helps...


With the clarification in the comments to the question, it's relatively simple since you're not storing groups of mutual synonyms, but rather separately defining the acceptable synonyms for each word. The obvious container is either:

std::map<std::string, std::set<std::string> >

or:

std::multi_map<std::string, std::string>

if you're not worried about duplicates being inserted, like this:

myDic.Add("car", "automobile");
myDic.Add("car", "auto");
myDic.Add("car", "automobile");

In the case of multi_map, use the equal_range member function to extract the synonyms for each word, maybe like this:

struct Dictionary {
    vector<string> ListOSyns(const string &key) const {
        typedef multi_map<string, string>::const_iterator constit;
        pair<constit, constit> x = innermap.equal_range(key);
        vector<string> retval(x.first, x.second);
        retval.push_back(key);
        return retval;
    }
};

Finally, if you prefer a hashtable-like structure to a tree-like structure, then unordered_multimap might be available in your C++ implementation, and basically the same code works.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜