Databases: Making a Log of actions, how to handle various references?
hope you all had a happy new year.
So, my question is, what's the best way to make a log of actions. Let me explain it with a example, suppose we have these entities:
User
Friend
(User is a friend of another User, many to many relationship)
Message
(An user can message another user)
Group
(An user can be in various groups)
Game
(A game can be played with various players, has some info like date of the game. this results in two tales, games and games_users, the latter stores a relationship between user and a game)
Now, I wanted to make a log, for example:
User A (link to user) made a new friend, User B (link to user)
User A (link to user), B (link to user) and C (link to user) played a game (link to game)
User C (link to user) joined a group D (link to group)
So, I wanted to make a flexible log, that could store as many references as I wanted and references to different entities (user and game for example).
I know two ways of doing this, but they all have one or more problems:
When logging an action I directly store the pure text I want (i.e: only 1 char field, which would store 'User C joined a group'). But, there is a problem this way, this text needs to be translated to other languages and I can not have a field for each language.
Having a main table
log
, which each rows represent a log action and a code so I know which action is that, i.e: an user joined a group, x users played a game. I then have another table for each of the foreign key types needed, so I'd havelog_user
,log_group
andlog_game
For example,log_user
with a field referencinglog
and another referencinguser
. This way I 开发者_如何学Ccan have multiple users for a same log action. Problems: rather complex and could result in substantial overhead as depending of the log action I'd have to query to multiple tables. Is this correct, would it be too cpu-intensive?
So, I'm open to new ideas and brainstorming. What's the best approach for this kind of problem? Thanks in advance, I hope I have explained it in a clear way. If there is any question, please ask.
Edit: I decided to start a bounty as I'm not really happy with the answers I have received. Will make any clarifications if needed. Thanks
I want something very similar to facebook/orkut/social networks "friend updates". This will be displayed to users.
The following is how I would do it. I have some more comments at the bottom after you have seen the schema.
Log
LogID - unique log ID
Time - date/time of event
LogType - String or ID
(side comment, I would go with an id here so you can use a message table shown below, but if you want quick n dirty you can just just a unique string for each log time (eg "Game Started", "Message Sent", etc)
LogActor
LogID - external key
LogActorType - String or ID (as above, if ID you will need a lookup table)
LogActorID - This is a unique id to the table for the type eg User, Group, Game
Sequence - this is an ordering of the actors.
LogMessage
LogType - exernal key
Message - long string (varchar(max)?)
Language - string(5) so you can key off different language eg "US-en"
Example Data (using your 3 examples)
Log
ID Time LogType
1 1/1/10 1
2 1/1/10 2
3 1/1/10 3
LogActor
LogID LogActorType LogActorID Sequence
1 User 1 1
1 User 2 2
2 User 1 1
2 User 2 2
2 User 2 3
2 Game 1 4
3 User 3 1
3 Group 1 2
LogMessage
LogType Message
1 {0} Made a new friend {1}
2 {0}, {1}, {2} played a game ({3})
3 {0} joined a group ({1})
User
ID Name
1 User A
2 User B
3 User C
Game
ID Name
1 Name of game
Group
ID Name
1 Name of group
So here are the nice things about this design.
It is very easy to extend
It handles multi-language issues independent of the actors
It is self documenting, the LogMessage table explains exactly what the data you are storing should say.
Some bad things about it.
You have to do some complicated processing to read the messages.
You can't just look at the DB and see what has happened.
In my experience the good parts of this kind of a design outweigh the bad bits. What I have done to allow me to do a quick n dirty look at the log is make a view (which I don't use for the application code) that I can look at when I need to see what is going on via the back end.
Let me know if you have questions.
Update - Some example queries
All of my examples are in sqlserver 2005+, let me know if there is a different version you want me to target.
View the LogActor table (There are a number of ways to do this, the best depends on many things including data distribution, use cases, etc) Here are two:
a)
SELECT
LogId,
COLLESCE(U.Name,Ga.Name,Go.Name) AS Name,
Sequence
FROM LogActor A
LEFT JOIN User U ON A.LogActorID = U.[ID] AND LogActorType = "User"
LEFT JOIN Game Ga ON A.LogActorID = Ga.[ID] AND LogActorType = "Game"
LEFT JOIN Group Go ON A.LogActorID = Go.[ID] AND LogActorType = "Group"
ORDER BY LogID, Sequence
b)
SELECT
LogId,
U.Name AS Name,
Sequence
FROM LogActor A
INNER JOIN User U ON A.LogActorID = U.[ID] AND LogActorType = "User"
UNION ALL
SELECT
LogId,
Ga.Name AS Name,
Sequence
FROM LogActor A
INNER JOIN Game Ga ON A.LogActorID = Ga.[ID] AND LogActorType = "Game"
UNION ALL
SELECT
LogId,
Go.Name AS Name,
Sequence
FROM LogActor A
INNER JOIN Group Go ON A.LogActorID = Go.[ID] AND LogActorType = "Group"
ORDER BY LogID, Sequence
In general I think a) is better than b) For example if you are missing an actor type a) will include it (with a null name). However b) is easier to maintain (because the UNION ALL statements make it more modular.) There are other ways to do this (eg CTE, views, etc). I'm inclined to doing it like b) and from what I've seen that seems to be at least standard practice if not best practice.
So, the last 10 items in the log would looks something like this:
SELECT
LogId,
M.Message,
COLLESCE(U.Name,Ga.Name,Go.Name) AS Name,
Time,
A.Sequence
FROM Log
LEFT JOIN LogActor A ON Log.LogID = A.LogID
LEFT JOIN User U ON A.LogActorID = U.[ID] AND LogActorType = "User"
LEFT JOIN Game Ga ON A.LogActorID = Ga.[ID] AND LogActorType = "Game"
LEFT JOIN Group Go ON A.LogActorID = Go.[ID] AND LogActorType = "Group"
LEFT JOIN LogMessage M ON Log.LogType = M.LogMessage
WHERE LogID IN (SELECT Top 10 LogID FROM Log ORDER BY Date DESC)
ORDER BY Date, LogID, A.Sequence
NB - As you can see, it is easier to select all log items from a date than the last X, because we need a (probably very fast) sub-query for this.
My answer to Whats a better strategy for storing log data in a database?:
It depends on the purpose of logging. For debugging and general monitoring purpose, a single log table with dynamic log level would be helpful so you can chronologically look at what the system is going through.
On the other hand, for audit trail purpose, there's nothing like having duplicate table for all tables with every CRUD action. This way, every information captured in the payment table or whatever would be captured in your audit table.
So, the answer is both.
Edit: To implement it cleanly with referential integrity and have all the flexibility, I suggest having duplicate audit trail table for all CRUDs for each table even if it's "heavy." The business rules are more volatile compared to data structures anyway, so by keeping log logic in code/query you retain the flexibility. For example, suppose you decided not to track when the users left a group. Later day, the clients asked that it's very important to track the information. All you have to do now is change the query so deletion of user_group
record is part of the result.
Do you need this for logging/tracking purposes, or for display to users and admin? If your use for logging/tracking (i.e. computer readable), you should probably separate your logging into multiple tables like you specified.
However, if you want this for your users or display on screen, why not just store it in basic html? This way you can easily display it on screen and view.
For example, "User A (link to user), B (link to user) and C (link to user) played a game (link to game)" would be
<a href="/users/showuser.php?id=2341">User A</a>
, <a href="/users/showuser.php?id=311">User B</a>
, and <a href="/users/showuser.php?id=89">User C</a>
played a game of <a href="/games/gameoverview.php?id=3">Chess</a>.
- Keep it simple and extendible
- Don't let translation overhead affect performance, translation only needs to be done for output purposes.
Suggestion:
LogId DateTime Action Role Entity
e.g.
30303 1/1/10 43 Sender John
30303 1/1/10 43 Receiver Sam
30304 1/1/10 44 Game game43
30304 1/1/10 44 Player Sue
30304 1/1/10 44 Player Mike
(In the above table, "Message", ,"Sender", "John", "game43" etc. would not be text but would be foreign keys in either the action, role, or entity table. I've written the keys for "Action" but not for "Role" or "Entity" but they would be keys as well.
Now, instead of text action, Role, Entity you might have keys in there, and store them in a separate table. This can be used for output, e.g.
Action Table
Id ActionKey Text Language
1 43 JoinGame English
2 43 Jeu French
3 44 Message English
...
...
Role Table
Id RoleKey Text Language
1 1 Sender English
2 1 Sendeur French (I don't know french :)
....
Entity Table
EntityKey Text
1 Sam
2 game43
3 Sam
Note that in the entity table, 2 or more entries might have the same text representation, as there could be more than 1 user named Sam. If you want to represent different, orthogonal information about each entity, then you can include the EntityKey in the correspondig table, e.g.
Person Table
Id EntityKey FirstName LastName ....
1 1 Sam Johnson
Game Instance Table
Id EntityKey GameType
1 2 444
Game Table
Id Name MaxPlayers ...
444 Quake 10
Basically we are mapping a predicate (in the first-order predicate logic sense, a set of tuples) to binary relation form (by creating an artificial entity, the action key). Thus, the Entity table basically contains various columns/arguments of the relation, and so it can be practically anything which could be an argument of the relation. The advantage of this representation is it is infinitely extendible to new relations that you may wish to log without changing the schema. Everything in the ActionTable should be a key or foreign key, I just haven't put that in there because it may be harder to read.
精彩评论