Database design : two different column in one table or add another table?
I'm developing a web application to store data about car accident on a sql server database.
I need a way to store the roads where the accident happened. initially i thought to have two different column in the accident table (say FirstRoad, SecondRoad) (the second one is used for crossroads). A friend of mine says that is better to have a different Road table .
I don't know which solution is better .
The accident table is estimated to have about 1000 record per year :could the join with another table be expensive in terms of performance?
One goal of the application is to create statistical reports; one of these is the list of the most dangerous roads and probably having two columns in database coulbe be a problem to create that report. What do you think about?
thanks.
Update: Road name is not the only information i need to store. An important data is the street number (if the accident happens within a city) or the mile (outside the city). At this 开发者_如何学JAVAstage i don't have the geographical coordinates.
What about having the Accident table with these fields ?:
AccidentId (PK), Road1Id (FK), <-- point to the Road table storing all road names Road2Id (FK), Road1_StreetNumber_or_Mile, Road2_StreetNumber_or_Mile etc...
I personally would have seperate tables for everything, especially for statistical analysis. A road can have a name, postcode, town, etc, and each of these could help with the analysis. However if all you are storing is the road name, I see no need for you to have a seperate table. In saying that, 1000 records per year is absolutely nothing. And a simple join to another table should have a negligible impact on performance.
It is best to use 2 tables. One with 2 ID columns, one for each road. The second table would have all of the road names. This decreases the actual amount of information you have to entered over the long term and decreases the chance of entering a road name 2 times with a different name.
It will also make reporting fast as you can sum by the ID of the road instead of the name.
There is a whole database theory/math system behind this called Data Normalization. This, http://en.wikipedia.org/wiki/Database_normalization is a good starting point to learning about it.
From todays standards of Database design you want to eliminate redundancies.
So a seperate Road table is definitely a good suggestion.
The next question is whether to modify with or without null values. Null values are generally faster because you don't need cross joins and are preferrable if they are often populated.
As one accident can occour only in one or two roads i would recommend one accident table with roadA, roadB and a road table with roads.
If you only every expect to see two roads maximum then I probably wouldn't normalise road into a separate table but would include Road1 and Road2 fields in the primary table. If however, as may be likely, that you'll see more than 2 roads then you may want to think about a 1 to many relationship between an accident table (1) and a roads table (many).
As you are planning to do statistical analysis then you may have issues as certainly here in the UK road name alone is insufficient to identify a location. Depending on how much data you have on the accident location, you might want to expand how you record the physical location to enable analysis beyond just the road names.
I don't know which solution is better .
Step 1. Buy a good book on relational database design before you do another thing. Seriously. This is a well-understood problem, and you need a lot more background.
Step 2. After learning about ordinary database design, buy Kimball's The Data Warehouse Toolkit.
The accident table is estimated to have about 1000 record per year :could the join with another table be expensive in terms of performance?
That's microscopically small. Seriously. Until you're talking about 100,000's of records, you don't really have a very big database.
One goal of the application is to create statistical reports; one of these is the list of the most dangerous roads and probably having two columns in database could be be a problem to create that report. What do you think about?
Buy Kimball's The Data Warehouse Toolkit immediately.
You have a "fact" table which is an accident. There may be no measurable quantity to the accident. It becomes a strange "fact-less fact table" because you have no measurements associated with the accident. Ideally, you have a fact (like cost or something).
You have dimensions of the accident:
- time
- road (actually two references from the accident fact to this dimension)
- vehicle?
- type of accident (property damage only, injury, death)
- cause (driver error, mechanical problem, etc.)
Each dimension is a separate table, as is the central fact table.
Maybe you can start with something like this -- just to give you an idea.
精彩评论