Performance Impact of Storing Fields as Single text-type JSON string vs Splitting to Separate Table
I have a table called Billing
, which is basically a receipt (for different types of transactions). The app has a feature that you can create new charges (well, all charges except for tax and other constants). Since there would be a dynamic number of charges, we decided to store the charges for a billing on a single text field with a JSON structure. So the Charges
column contains stuff like this:
{"CrateFee":50,"DeliveryFee":90,开发者_如何学Python"PackagingFee":20}
{"DeliveyFee":90,"ServiceCharge":200}
Our alternative would be to create a separate table for these charges, with this structure:
Charges
BillingId | ChargeName | ChargeValue
1 CrateFee 50
1 DeliveryFee 90
1 PackagingFee 20
2 DeliveryFee 90
2 ServiceCharge 200
We decided against the second method because it would be populated by tens of thousands of rows in just a single day (estimate is about a thousand transactions in a day). I know we'll be limited with what we can do with the data if we use the first one, so I really want to push the separate table method. But I have no idea on scaling, optimizing, etc. when it comes to databases, so I need some help with this.
Is it OK to use the second method? How will this impact performance? Are there other alternatives?
The first implementation might make it easier to sstore the data in a single line, but you are opening yourself to a whole world of hurt.
By correctly using indexes on the fields, you should not have major issues, so i would recomend the second approach.
Also, at some later stage you can try implementing archiving, which should also help with the size of the second table.
is it ok to use the second method?
I've yet to meet a DBA that would allow for the JSON format to be stored in their database, myself included.
The app has a feature that you can create new charges
In order to be properly normalized, I'd suggest a separate table containing the charge types. Users can still add charge types to it, and you'd use a foreign key to refer to the charge type - just like what you're doing with the billingid
.
The biggest reason to break out the charges into rows would be for ease of accessing the data for reporting/etc. You could still do it with the JSON format, but you'd be looking at string manipulation and because it's freeform text there's a risk that you couldn't group by charge names. Nevermind the performance hit of doing that string manipulation. It's not worth the hassle - do it right, use option 2.
You might want to consider storing the tax (at least the percentage) in the table as well - taxes do change over time, so you'd want to know what tax was at the time of purchase in order to accurately reproduce a bill at a later date.
If you use the first suggestion, your application will be horrible to work with and the performance will be something that will amaze you in it's awfulness and users will come to your cubical to strangle you. (Ok so there's a bit of hyberbole there).
The very first rule of database design is store only one piece of information in each field.
With the first design when you need to know how much each customer has spent by charge type, you won't be able to easily or quickly get this data.
What you need is a table called charge types that hold the type definitions. If the user adds a new one, it goes into this table. Then you have your charges table simliar to what you have in the second example. By storing standard types and making it painful but possible to add one, it becomes less likely that you will 4317 charge types that all mean the same thing. This is important later when you want to do reporting on the data in your database.
You will have significantly better performance on querying the second possiblity than the first as long as you index correctly.
If your organization honestly even considered using the first structure, then you need to hire an experienced dba immediately as there is a clear misunderstanding as to how to design a database. If you expect thousands of daily charges, you need to be considering performance inthe design not tacking it on later. Permature optimization does not mean don't consider optimization at all. In database performance is one of the three most critical things you need to design for (the other two are data integrity and security).
精彩评论