What is the best log file format? [closed]
We are developing a database tool and we'd li开发者_如何学Goke to write a log file in a format which is extendable and easy to be imported into a database table. We all feel that filtering this info by using SQL is a good idea, since the log will be a long file and "search" may not be good enough. Could you give me some suggestions? Any experiences will be useful too! Thanks in advance.
The first thing I would say is that your file format ought to be human readable. My reasons are given here: Why should I use a human readable file format.
Beyond that, it is impossible to answer with such a vague question. However, here are some of the issues you should consider:
- How big does this log file grow? How does this compare to the space you have? If space is going to be an issue, then a more parsimonious format is better - eg Protocol Buffers.
- How is the log file going to be looked at? If it is using specific tools, the format matters less than if you are going to be using a text editor or excel
- What sort of data are you storing? If it is just ASCII text then CSV works well.
- Is type information important in your data? Do you need to compare numbers and dates as numbers and dates rather than just strings? If so then some sort of typed system (eg XML or JSON) might be better
- Is the data going to be transferred to other people? In which case something with good language tools for reading and writing might be important
- How quickly does the data need to written? If speed is an issue (which it might be for realtime log files) then a format optimised for this might be important.
- How quickly does the data need to be read?
- Will all the data need to be in memory, or can it be scanned in a serialized way?
When you can answer all these questions, you'll probably know the answer yourself. If not, make your question more specific with these questions answered and it will be easier for someone to help you.
Personally I've always been grateful when log data has been written as CSV. It is flexible enough to expand (add extra columns, change the length of a field), is quick to read and write in to a database spreadsheet, and hundreds of other tools, and is codeable in seconds. However, it does have a number of disadvantages - it is verbose, easy to get escapes wrong, untyped, and easy to break if you rearrange columns.
We have found that logs tend to be a serious performance headache. Creating a log that does not slow down your public website is challenging.
If you have a large log and want to be able to run SQL queries against it without them being slow, then you will need indexes on some of the columns. Every index you add will drastically slow down inserting new log entries, causing load issues under high traffic.
Our technique is:
- use a basic plain text file with simple formatting as the log file (eg: tab separated)
- do not use XML, it makes things more complex (ie. slow) without any benefit.
- the website uses UNIX file locking to simply append a single line for each log entry
- a cron job inserts the contents of the log into an SQL database (we use MySQL, but it's up to you) every 10 minutes.
- this cron job processes the file one line at a time, using UNIX file locking to prevent writes to the log while it's being processed but giving the public site a chance to hit the log after every line is processed and deleted from the file (how to do this in your preferred language would be a nice second question for stack overflow)
- the cron job has a timeout of 5 minutes (so every 10 minutes it will spend a maximum of 5 minutes processing the log. This ensures the server does not indefinitely process the log file if there are performance issues)
This gives us fast recording of log entries without sacrificing our indexes in the log table, giving us fast SQL queries against the log table as well.
We have been using this for about 6 or 7 years on various CentOS servers, and it has been rock solid. I imagine depending on what operating system and how it's configured, this might not be a good way to create log files. But it works great in our testing.
PS: I don't see any point in making the file human readable. You will only ever read it during debugging, and then you'll never touch it again.
We are developing a database tool and we'd like to write a log file in a format which is extendable and easy to be imported into a database table. We all feel that filtering this info by using SQL is a good idea, since the log will be a long file and "search" may not be good enough. Could you give me some suggestions?
Assuming you have some reason for not inserting directly into a database table...
"extendable"
- you may want to have metadata (field names and/or types) in the files themselves
- this could allow you to make a generic and largely future-proof DB import tool that creates and populates a database structure based on the log file (rather than something tightly coupled that needs to be edited as the log file format evolves)
- a record logging format that supoprts hierarchical structure can be extended more easily and cleanly
"easy to be imported"
- you either want some very common format supported by 3rd party tools/libraries (XML, CSV, SQL insert statements or whatever table dump format your SQL tools support) or something very simple you can easily write and maintain
XML is the obvious choice, the potential negatives being:
- verbosity
- performance
- readability
None of which you've expressed concern about at the time I started writing this.
Any experiences will be useful too!
We use a combination of XML and other formats in our logs (some objects have XML serialisation routines but the overall file is not XML)... it's a pain because you can't use XML tools on the file as a whole, and the format's complex enough to frustrate easy and reliable parsing without proper tools. So, go the whole hog or not at all.
As I don't know exactly how it will be stored in a database or somewhere else, I guess I would setup a computable format and make it interpretable by tools to inject in a database or generate a document with.
For example, I would make a simple xml format, or something more human-readable if I need humans to read directly inside the initial format. Otherwise, I would use xml.
The document would provide informations that would be at least a date-time, module name, log level and message. Other informations can be added and maybe ingnored by the conversion tools.
Then I would write a conversion tool for the database, maybe some python scripts, that would parse the xml file and inject the data in the database. That tool totally depends on the context.
I would also maybe write a script to generate an html view of the log.
The main idea is to have an interpretable format that can easily be used by different tools. That format would only provide raw informations, as much informations as necessary. That way the conversion tools will decide what is worth, where and how to put wich data from the log.
精彩评论