Efficiently retrieving entities that match any element of a set of ids
I'm writing software to provide feedback to people across many categories. For example, I might have 30 employees and 40 standards of evaluation (e.g. "arrives on time," "is polite," "appears to brush his teeth," etc). At arbitrary times, the supervisor can submit a piece of feedback like "employee 3 gets a 5/5 for standard 8 (he smells great)" or "employee 10 gets a 1/5 for standard 12 (he just called a customer an idiot)."
My idea is to store these small pieces of feedback individually, linked to the employee and standard by keeping userId and standardId fields.
The problem comes when I want to look at the feedback for all 30 employees and 40 standards. My current approach requires 1200 queries to retrieve all of that data. I'm looking for a better way. I'm using the google appengine datastore, which is a non-relational db.
Things I've been thinking about, and on which I welcome feedback:
I could store the feedback in a grid, with a row per user and column per standard. Then, a single query gets all of the data (better than 1200), but entering new data becomes more difficult (fetch the grid, update the correct bit, store the grid) and changes in the user set or standard set become much more complex (if I add a standard in the middle, this grid needs to be updated). Also, some queries become much harder - I can no longer easily search for the assessments entered on a certain date or by a certain supervisor开发者_JAVA技巧.
I could store all of the feedback for a certain (set of users x set of standards) in an unorganized list, fetch it with a single query, and then sort it out in my own code. This requires me to loop through 1200 entries, but that would be faster than 1200 queries over all of the data in the whole system (there may be many, many irrelevant data for other sets of users and unrelated standards).
So, the short version of my question is: how should I store this data for the best balance of quick retrieval of a large subset and quick insertion of individual pieces of feedback?
You might be able to do this using a RelationIndex. Depending on how exactly you will want to allow user to view and query the data, it should work.
The idea is pretty straight forward, basically you will store a list of "standards" for each employee. And possibly a list of employee's for each standard. Then you'll be able to ask questions such as all employee's who 'smell good'.
Because you have scores for each standard, you might want to do something like store the "score" and "standard number" as a pair in the list ("3:12") so that you can find everyone who has a score of 3 on standard 12.
edit: Updated based on comment.
It sounds like you need to deal with a few different issues. First, you need to deal with editing and maintaining the data. Second, you need to deal with querying the data. Third, you are going to need to handle displaying the data.
For querying the data efficiently you will probably need some approach similar to what I initially suggested. What is more common, editing or viewing the data? That will impact how you setup your models.
If you are only dealing with 30 or 40 employees and 30 or 40 standards, maybe you could use something like the following:
class Evaluations(db.Model):
period = db.StringProperty()
standards = db.TextProperty()
scores = db.TextProperty()
class EvaluationsIndex(db.Model):
index = db.StringListProperty()
Use the standards property on Evaluations to store a list of standards evaluated. Then store your employee-standard-score grid in the scores property. Obviously you'll need to serialize both the standards list and the evaluation grid, perhaps using something like JSON. Use the EvaluationsIndex model as I mentioned above.
With this (or something really similar) you will have pretty easy edits, very easy display, and support for queries.
You could add an additional model to track which supervisor entered the evaluation and her notes.
精彩评论