Is a junction table the correct solution when storing historical data for many attributes?
I'm designing a web application which stores SEO metrics relating to websites. There are around 50 metrics relating to each website which are calculated and stored each day. I need to be able to track changes for each one of these metrics over time. I've designed the following schema based on my understanding of normalization. It seems to be that the junction table (tbl_website_metric) will grow very large very quickly. I'd like to know if this is the optimal schema for this or if I have made any design errors.
CREATE TABLE `tbl_website` (
`id` INT NOT NULL AUTO_INCREMENT ,
`name` VARCHAR(100) NOT NULL ,
`domain` VARCHAR(100) NULL ,
`url` VARCHAR(100) NULL ,
PRIMARY KEY (`id`) )
ENGINE = InnoDB;
CREATE `tbl_metric` (
`id` INT NOT NULL AUTO_INCREMENT ,
开发者_运维知识库`name` VARCHAR(45) NOT NULL ,
`description` VARCHAR(100) NULL ,
PRIMARY KEY (`id`) )
ENGINE = InnoDB;
CREATE `tbl_website_metric` (
`id` INT NOT NULL AUTO_INCREMENT ,
`metric_id` INT NOT NULL ,
`website_id` INT NOT NULL ,
`created` TIMESTAMP NULL ,
`value` VARCHAR(45) NULL ,
PRIMARY KEY (`id`) ,
CONSTRAINT `fk_tbl_website_metric_tbl_metric1`
FOREIGN KEY (`metric_id` )
REFERENCES `tbl_metric` (`id` )
CONSTRAINT `fk_tbl_website_metric_tbl_website1`
FOREIGN KEY (`website_id` )
REFERENCES `tbl_website` (`id` ) )
ENGINE = InnoDB;
Your DB design seems good for the scenario; a few suggestions though:
I'm not sure how many websites your application will store stats for, but if not more than a few thousands, consider changing
tbl_website
.id
toSMALLINT UNSIGNED
. That will allow you storing 65535 websitesSimilarly since you have around 50 metrics, it makes sense to change
tbl_metric
.id
toTINYINT UNSIGNED
. That will allow you to store 255 metricsI think the FKs automatically created indexes on respective columns but if not, please consider creating indexes for
tbl_website_metric
.metric_id
andtbl_website_metric
.website_id
Please note that for 1 and 2, you'll also need to change data-types accordingly for tbl_website_metric
.metric_id
and tbl_website_metric
.website_id
.
I'm not sure how large the junction table might grow but MySQL is very capable of handling large tables. Anyhow it will be a good approach to consider purging entries that are obsolete from tbl_website_metric
, or perhaps archive them to another table.
I'd like to suggest an alternative approach as well. If 1) your metrics are pretty static, in the sense that it is not often that metrics are added or deleted, and 2) it's the same metrics for all websites, you might as well consider storing your metrics in columns:
CREATE TABLE `tbl_website_metric` (
`id` INT NOT NULL AUTO_INCREMENT,
`website_id` INT NOT NULL,
`created` TIMESTAMP NULL,
`metric_1` VARCHAR(45) NULL,
`metric_2` VARCHAR(45) NULL,
`metric_3` VARCHAR(45) NULL,
`metric_4` VARCHAR(45) NULL,
...
...
`metric_50` VARCHAR(45) NULL
);
This will mean a single insert and single select per website. Plus will reduce the number of records in the table by N times, where N = number of metrics.
Hope it helps.
that looks pretty good.
indexes on the id's should keep your performance healthy.
Looks okay in general... some small issues, mainly regarding column types:
- there is no reason for an explicit primary key in
tbl_website_metric
IMO. A unique index on (metric_id, website_id, created
) should be enough tbl_website_metric.created
should be aDATE
(saves 1 byte and is necessary for the uniqueness of the index)value
has to be a literal?- you declared foreign key constraints for
metric_id
andwebsite_id
; while this of course has some advantages this also means locking issues, cf http://www.mysqlperformanceblog.com/2006/12/12/innodb-locking-and-foreign-keys/
hth
精彩评论