Compressing large text data before storing into db?
I have application which retrieves many large log files from a system LAN.
Currently I put all log files on Postgresql, the table has a column type TEXT and I don't plan any search on this text column because I use another external process which nightly retrieves all files and scans for sensitive pattern.
So the column value could be also a BLOB or a CLOB, but now my question is the following, the database has already its compression system, but could I improve this compression manually like with common compressor utilities? And above all WHAT IF I manually pre-compress the large file and then I put as binary into the data table, is it unuseful as da开发者_如何学运维tabase system provides its internal compression?
I don't know who would compress the data more efficiently, you or the db, depends on the algo used etc. But what is sure is that if you compress it, asking the db to compress it again will be a waste of CPU. Once compressed, trying to compress it again yields less gain each time until you end up consuming more space eventually.
The internal compression used in PostgreSQL is designed to err on the side of speed, particularly for decompression. Thus, if you don't actually need that, you will be able to reach higher compression ratios if you compress it in your application.
Note also that if the database does the compression, the data will travel between the database and the application server in uncompressed format - which may or may not be a problem depending on your network.
As others have mentioned, if you do this, be sure to turn off the builtin compression, or you're wasting cycles.
The question you need to ask yourself is do you really need more compression than the database provides, and can you spare the CPU cycles for this on your application server. The only way to find out how much more compression you can get on your data is to try it out. Unless there's a substantial gain, don't bother with it.
My guess here is that if you do not need any searching or querying ability here that you could gain a reduction in disk usage by zipping the file and then just storing the binary data directly in the database.
精彩评论