Container to hold URI host names
I've run into a problem that might require the use of a more unusual data structure, but I'm not sure.
Essentially, I want to store URI hostnames in a container, and be able to query the container to tell whether or not a hostname exists in the container. However, if the container contains a higher-level domain of a certain hostname, I want the query to return true
if I lookup a lower-level domain. In other words, if the container contains example.com
, I'd like to be able to lookup www.example.com
, and it would return true
. Or, if the container contains foo.example.com
, I'd like to be able to lookup bar.foo.example.com
, and it would return true.
I've thought about this problem, and there doesn't seem to be any straightforward way to go about doing this. The obvious solution is to just use a regular associati开发者_开发知识库ve container, like a hash-table or tree (an std::unordered_set
or std::set
in C++). Upon lookup, I would have to iterate over each segment of the domain name, and keep querying the container to see if it contains each segment. So, if I need to lookup www.example.com
, I'd have to do three queries: one for com
, one for example.com
, and one for www.example.com
. As soon as I get a positive, I'd return true
, or else return false if none of these are in the container.
This solution is fine, and probably the one I'll end up going with. Except it doesn't seem right, since I have to make N queries depending on the length of the host name. Since host names usually don't have that many segments, I'm not really worried about performance. But I am worried that I should be doing something smarter here, especially because this seems like a problem that someone else has already thought about.
I considered using a more exotic data structure, like a Patrica Trie or some other type of prefix-aware container. I do have a nice library which implements this structure, so it's not a problem to use it. However, after thinking about the problem, I don't think a Patricia Trie would help. Tries are designed for circumstances where the key is a prefix, and the values are full-length strings. In my case, the key would often be longer than any value in the container. In other words, my key might be www.example.com
, and if the container has example.com
, I'd want it to be able to find the example.com
. But, Patricia Tries work the opposite way.
So, is a regular associative container the best way to go here? Or what are some other suggestions?
A simple solution, reverse the node order (ie. make www.example.com
into com.example.www
) and stuff that into your Patrica Trie. then you can traverse the trie until you find your match in one go
精彩评论