Best way to store this data?
I have just been assigned to renovate an old website, and I get to move it from some old archaic system to drupal. The only problem is that it's a real-estate system and a lot of data is stored. Currently all the information is stored in a single table, an id represents the house and then everything else is key/value pairs. There are a possible 243 keys per estate, there are 23840 estates in the system. As you can imagine the system is slow and difficult to query.
I don't think a table with 243 rows would be a very good idea, and probably worse than the current situation. I've done some investigating and here's what I've found out:
- Missing data does not indicate a 0 value, data is merged from two, unique sources/formats. Some guessing is involved. I have no control over the source of the data.
- There are 4 keys that are common to all estates, all values look like something that is commonly searched for and could be indexed
- There are 10 keys that are in the [90-100)% range
- 8 of these are information like who's selling it, and it's address.
- The other two seem to belong with the below range
- There are 80 keys that are in the [80-90)% range
- This range seems to mostly just list room types and how many the house has (e.g. bedrooms_possible, bathrooms, family_room_3rd, etc)
- This range also includes some minor information like school districts, one or two more开发者_高级运维 pieces of data on the address.
- The 179 keys that are in the [0-80)% range include all sorts of miscellaneous information about the estate
- The keys are not static and some may be added or removed at will by the inputs (although it is rare).
My best idea was a hybrid approach, create a table that stores important, common information and keep a smaller key/value table.
How would you store this information?
Create a relational model
You've already got
Estate
Seller
Address
Room
Estate n--1 Seller
Estate 1--1 Address
Seller 1--1 Address
Estate 1--n Room
For other misc data (dynamic?) create a id / key / value table
Well in one sense you are lucky, the database has done alot of the work of determining what data is required to be stored for you. Its a shame that lazy programmers who didn't want to be bothered to do their jobs set up the original.
I would put together relational tables that store all of the differnt key values (well you can probably combine some together, I'll be there are duplicated values that just have slightly different names) that are in at least 20% of the records. Most of your fields will allow nulls.
I think you are right, define the the most common stuff and then use key values for the few that are added from here on in. OR if you want you can add an XML field or text field for the non-standard info. If it is non-standard, how likely is it to be queried?
When setting up your tables, don;t make them too wide. YOu might want to do create separate tables that have 1-1 relationships rather than have a wide table. This can help with performance due to some techinical specs of how the databse stores pages of data. JUst make sure to put all the required stuff in the main tables not the extended ones.
You need to look into the entity-relationship model. However, if you have already got a suck table, you should check out the five stages of normalization on Wikipedia. They can convert a suck table into a set of good tables.
精彩评论