Elegant schema to log users' actions
I have a database schema to log operations users perform in my webapp:
Log
---
Id
Log_Type_Id
Performed_by_Person_Id
Performed_to_Person_Id
Comment_I开发者_开发百科d
Story_Id
Photo_Id
etc_Id
Person_Log
----------
Person_Id
Log_Id
This way I can notify users of entries in their log with details about what exactly happened. The problem is the Log table has to contain every possible type of user operation (they modified a story or comment, created a story or comment or photo, updated a profile, etc). And almost all of those fields will necessarily be null for each entry.
Ideally I have individual Log tables that the overall Log refers to, maybe something like:
Log
---
Id
Performed_by_Person_Id
Log_Comment
-----------
Id
Log_Id
Comment_Id
Log_Photo
---------
Id
Log_Id
Photo_Id
Person_Log
----------
Person_Id
Log_Id
The problem there is that I then don't have an easy way to notify users of things going on pertaining to them. I easily get the log entry for them, but then I have to query each "child" table to see the specifics... I can store the name of the child Log table in Log, but that seems so inelegant. Is there a better, more relational, way of doing this that also works well with ORM systems?
Your case looks like an instance of the Gen-Spec design pattern. Gen-spec is familiar to object oriented programmers through the superclass-subclass hierarchy. Unfortunately, introductions to relational database design tend to skip over how to design tables for the Gen-Spec situation. Fortunately, it’s well understood. A google search on “Relational database generalization specialization” will yield several articles on the subject. Or you can look at the following previous discussion.
The trick is in the way the PK for the subclass (specialized) tables gets assigned. It’s not generated by some sort of autonumber feature. Instead, it’s a copy of the PK in the superclass (generalized) table, and is therefore an FK reference to it.
Thus, if the case were vehicles, trucks and sedans, every truck or sedan would have an entry in the vehicles table, trucks would also have an entry in the trucks table, with a PK that’s a copy of the corresponding PK in the vehicles table. Similarly for sedans. It’s easy to figure out whether a vehicle is a truck or a sedan by just doing joins, and you usually want to join the data in that kind of query anyway.
This would be a good point to introduce a generic data model, or perhaps at least a generic type system across your data model. The concept is that everything has an entry, even actions, people, pages, processes and so forth. When that's in place, you need a generic way of creating arbitrary relations between these entities making it linking between them is fairly easy. Your question is one of those example of why I promote a more generic data model rather than the super-normalized ones we usually use.
The model I use the most is Topic Maps (even though that information may not be the easiest to use to understand what I'm talking about), where instead of having a table for each entity, there's one that holds all, and a few extra to deal with typification and relationships between them. You don't have to go all the way with this, but perhaps use it for your use case specifically. Here's an article I wrote about it almost 10 years ago, and another one by Marc de Graauw that deals with a specific RDBMS view on it, as well.
Back to your question. An example using Topic Maps needs first the tables ;
Topic
-----
id
name
type
meta_date_created
meta_date_created_topic_ref
meta_date_updated
meta_date_updated_topic_ref
meta_date_deleted
meta_date_deleted_topic_ref
Assoc (relationship)
--------------------
id
type
Assoc member
------------
id
topic_ref
role_topic_ref
This will give you the basics (but there's tons of stuff to extend and implement if you want to go full monty, like support for multiple types, persistent identification, ontology grouping, and on and on which is also part of Topic Maps), and give you the meta_* fields as handy short-cuts if that's really all you want (they're good for fast searching :).
Each person will have an entry in 'Topic', example ;
id: 4572349857
name: Alexander Johannesen
type: 12341234
meta_date_create: {date}
meta_date_create_topic_ref: 5656
In order to find out who created this user, look in 'Topic' for id '5656' ;
id: 5656
name: Billy Bob
type: 12341234
What's that type, though? Look in 'Topic' for id '12341234' ;
id: 12341234
name: Person
The conceptual underpinning here is that each 'thing' (deliberately vague; it could be anything you want to talk about) in your system gets an entry, including actions ;
id: 34598067
name: Add new user
type: 56987 // another topic called 'Action', for example)
By all this your log is basically creating relationships between these entities through the 'Assoc' table ;
id: 45673
type: 45685678
That's the association itself. The 'id' is whatever, not important, but the type is (you guessed it) another entity in the 'Topic' table ;
id: 45685678
name: Did action
Now you fill the 'Assoc member' table with the details of logging the action ;
id: {whatever}
topic_ref: 5656
role_topic_ref: 12341234
First member is Billy Bob, who plays the role of 'Person'. Next ;
id: {whatever}
topic_ref: 34598067
role_topic_ref: 56987
Here, the topic 'Add new user' plays the role of 'Action'. You can extend this association with as many items you feel you need, like add in pre-state, the result of the action, number of tries so far, where the action was taking place (for example if its a function people can do on a number of pages), and on and on. Create entities for those things in the Topic table, create entities for their relationships, and you can make this as complex as you want.
All of this may seem a bit jarring at first, but it is incredibly flexible, and you don't have to change your data model at all for future extensions. I've built systems using this model for many years, and I have nothing but praise for it. A separate table for topic properties will follow the model for association members if you want to go down that path.
One could perhaps make a case for the performance of less tables like this, but in my experience most RDBMS are brilliant with inner joins which is the basic tool you need for making this work (all fields that are identifiers are obvious index candidates), and the good thing is that this is also mostly compatible with NoSQL means of thinking, creating a sufficient abstraction between you and your data, and SQL and the technical mechanics the back-end wants to use.
I do recommend the second design you describe. If you want to get all the columns of each log subtype table, you can use a LEFT OUTER JOIN:
SELECT *
FROM Person_Log AS p
INNER JOIN Log AS l ON p.Log_ID = l.Log_ID
LEFT OUTER JOIN Log_Comment AS lc ON l.Log_Type = 'C' AND l.Log_ID = lc.Log_ID
WHERE p.Person_ID = 1234;
It's not a good idea to do this for all log types in one query, because if you have more than one log entry of a given type, it causes a cartesian product. So do a separate query per log subtype.
You can also use constraints so you're sure only one subtype row across all tables references a given row in Log:
Log
---
Log_Id
Log_Type constrained to ('C', 'P', etc.)
Performed_by_Person_Id
UNIQUE KEY (Log_Id,Log_Type)
Log_Comment
-----------
Log_Id PRIMARY KEY
Log_Type constrained to only 'C'
Comment_Id
FOREIGN KEY (Log_Id,Log_Type) REFERENCES Log(Log_Id,Log_Type)
Log_Photo
---------
Log_Id PRIMARY KEY
Log_Type constrained to only 'P'
Photo_Id
FOREIGN KEY (Log_Id,Log_Type) REFERENCES Log(Log_Id,Log_Type)
Re your comment:
This is basically the same as the gen-spec design that @Walter Mitty mentions.
It's also related to a Martin Fowler pattern, Class Table Inheritance.
The extra Log_Type
column in each child table is necessary if you want to use referential integrity to ensure only one child table's row references a given row in Log
.
I would just have one log table with the affected people, an actionID column and the item_id.
Then in your front end you can display the notification based on the actionID For example actionID 1 could be photo. so you the item_id would be the photoID
精彩评论