开发者

Is a junction table the correct solution when storing historical data for many attributes?

I'm designing a web application which stores SEO metrics relating to websites. There are around 50 metrics relating to each website which are calculated and stored each day. I need to be able to track changes for each one of these metrics over time. I've designed the following schema based on my understanding of normalization. It seems to be that the junction table (tbl_website_metric) will grow very large very quickly. I'd like to know if this is the optimal schema for this or if I have made any design errors.

CREATE  TABLE `tbl_website` (
  `id` INT NOT NULL AUTO_INCREMENT ,
  `name` VARCHAR(100) NOT NULL ,
  `domain` VARCHAR(100) NULL ,
  `url` VARCHAR(100) NULL ,
  PRIMARY KEY (`id`) )
ENGINE = InnoDB;

CREATE `tbl_metric` (
  `id` INT NOT NULL AUTO_INCREMENT ,
  开发者_运维知识库`name` VARCHAR(45) NOT NULL ,
  `description` VARCHAR(100) NULL ,
  PRIMARY KEY (`id`) )
ENGINE = InnoDB;

CREATE `tbl_website_metric` (
  `id` INT NOT NULL AUTO_INCREMENT ,
  `metric_id` INT NOT NULL ,
  `website_id` INT NOT NULL ,
  `created` TIMESTAMP NULL ,
  `value` VARCHAR(45) NULL ,
  PRIMARY KEY (`id`) ,
  CONSTRAINT `fk_tbl_website_metric_tbl_metric1`
    FOREIGN KEY (`metric_id` )
    REFERENCES `tbl_metric` (`id` )
  CONSTRAINT `fk_tbl_website_metric_tbl_website1`
    FOREIGN KEY (`website_id` )
    REFERENCES `tbl_website` (`id` ) )
 ENGINE = InnoDB;


Your DB design seems good for the scenario; a few suggestions though:

  1. I'm not sure how many websites your application will store stats for, but if not more than a few thousands, consider changing tbl_website.id to SMALLINT UNSIGNED. That will allow you storing 65535 websites

  2. Similarly since you have around 50 metrics, it makes sense to change tbl_metric.id to TINYINT UNSIGNED. That will allow you to store 255 metrics

  3. I think the FKs automatically created indexes on respective columns but if not, please consider creating indexes for tbl_website_metric.metric_id and tbl_website_metric.website_id

Please note that for 1 and 2, you'll also need to change data-types accordingly for tbl_website_metric.metric_id and tbl_website_metric.website_id.

I'm not sure how large the junction table might grow but MySQL is very capable of handling large tables. Anyhow it will be a good approach to consider purging entries that are obsolete from tbl_website_metric, or perhaps archive them to another table.

I'd like to suggest an alternative approach as well. If 1) your metrics are pretty static, in the sense that it is not often that metrics are added or deleted, and 2) it's the same metrics for all websites, you might as well consider storing your metrics in columns:

CREATE TABLE `tbl_website_metric` (
  `id` INT NOT NULL AUTO_INCREMENT,
  `website_id` INT NOT NULL,
  `created` TIMESTAMP NULL,
  `metric_1` VARCHAR(45) NULL,
  `metric_2` VARCHAR(45) NULL,
  `metric_3` VARCHAR(45) NULL,
  `metric_4` VARCHAR(45) NULL,
   ...
   ...
  `metric_50` VARCHAR(45) NULL
);

This will mean a single insert and single select per website. Plus will reduce the number of records in the table by N times, where N = number of metrics.

Hope it helps.


that looks pretty good.

indexes on the id's should keep your performance healthy.


Looks okay in general... some small issues, mainly regarding column types:

  • there is no reason for an explicit primary key in tbl_website_metric IMO. A unique index on ( metric_id, website_id, created ) should be enough
  • tbl_website_metric.created should be a DATE (saves 1 byte and is necessary for the uniqueness of the index)
  • value has to be a literal?
  • you declared foreign key constraints for metric_id and website_id; while this of course has some advantages this also means locking issues, cf http://www.mysqlperformanceblog.com/2006/12/12/innodb-locking-and-foreign-keys/

hth

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜