开发者

In-memory data structure that supports boolean querying

I need to store data in memory where I map one or more key strings to an object, as follows:

"green", "blue" -> object1
"red", "yellow" -> object2

So, in Java the datastructure might implement:

Map<Set<String>, V>

I need to be able to efficiently receive a list of objects, where the strings match some boolean criteria, such as:

("red" OR "green") AND NOT "blue"

I'm working in Java, so the ideal solution would be an off-the-shelf Java library. I am, however, willing to implement something from scratch if necessary.

Anyone have any ideas? I'd rather avoid the overhe开发者_开发百科ad of an in-memory database if possible, I'm hoping for something comparable in speed to a HashMap (or at least the same order of magnitude).


Actually, I liked the problem so I implemented a full solution in the spirit of my earlier answer:

http://pastebin.com/6iazSKG9

A simple solution, not thread safe or anything, but fun and a good starting point, I guess.

Edit: Some elaboration, as requested


See the unit test for usage.

There are two interfaces, DataStructure<K,V> and Query<V>. DataStructure behaves somewhat like a map (and in my implementation it actually works with an internal map), but it also provides reuseable and immutable query objects which can be combined like this:

    Query<String> combinedQuery = 
    structure.and(
                    structure.or(
                            structure.search("blue"), 
                            structure.search("red")
                    ),
                    structure.not(
                            structure.search("green")
                    )
    );

(A Query that searches for objects that are tagged as (blue OR red) AND NOT green). This query is reuseable, which means that it's results will change whenever the backing map is changed (kind of like an ITunes smart playlist).

The query objects are already thread safe, but the backing map is not, so there is some room for improvement here. Also, the queries could cache their results, but that would probably mean that the interface would have to be extended to provide for a purge method (kind of like the detach method in Wicket's models), which wouldn't be pretty.

As for licensing: if anybody wants this code I'll be happy to put it on SourceForge etc. ...

Sean


Would the criteria be amenable to bitmap indexing: http://en.wikipedia.org/wiki/Bitmap_index ?


I would say that the easiest way is simply to do a recursive filtering and being cleaver, when for instance evaluating X AND Y where X has been evaluated to the empty set.

The mapping however, needs to be from tags (such as "red" or "blue") to sets of objects.

The base case (resolving the atomic tags) of the recursion, would then be a simple lookup in this map. AND would be implemented using intersection, OR using union, and so on.


Check out the Apache Commons - Collections project. They have tons of great stuff that you will be able to use, particularly the CollectionUtils class for performing strong collection-based logic.

For instance, if your values were stored in a HashMap (as suggested by another answer) as follows:

myMap["green"] -> obj1
myMap["blue"] -> obj1
myMap["red"] -> obj2
myMap["yellow"] -> obj2

Then to retrieve results that match: ("red" or "green") and not "blue you might do this:

CollectionUtils.disjunction(CollectionUtils.union(myMap.get("red"), myMap.get("green")), myMap.get("blue"))


You could map string keys to a binary constant, then use bit shifting to produce an appropriate mask.


i truly think some type of database solution is your best bet. SQL easily supports querying data by

(X and Y) and not Z


this would have worked too reusable condition/expression classes


The Google Collections SetMultimap looks like an easy way to get the underlying structure, then combining that with the Maps static filters to get the querying behavior you want.

Construction would go something like

smmInstance.put(from1,to1);
smmInstance.put(from1,to2);
smmInstance.put(from2,to3);
smmInstance.put(from3,to1);
smmInstance.put(from1,to3);
//...

queries would then look like

valueFilter = //...build predicate
Set<FromType> result = Maps.filterValues(smmInstance.asMap(),valueFilter).keySet()

You can do any amount of fancy building the predicate, but Predicates has a few methods that would probably be enough to do contains/doesn't contain style queries.


I wasn't able to find a satisfactory solution, so I decided to cook up my own and release it as an open source (LGPL) project, find it here.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜