table design for storing large number of rows

2022-12-29 14:39 问答作者：

I am trying to store in a postgresql database some unique identifiers along with the site they have been seen on. I can't really decide which of the following 3 option to choose in order to be faster and easy maintainable. The table would have to provide the following information:

the unique identifier which unfortunately it's text
the sites on which that unique identifier has been seen

The amount of data that would have to hold is rather large: there are around 22 millions unique identifiers that I know of.

So I thought about the following designs of the table:

id - integer

identifier - text

seen_on_site - an integer, foreign key to a sites table

This approach would require around 22 mil multip开发者_StackOverflow中文版lied by the number of sites.

id - integer

identifier - text

seen_on_site_1 - boolean

seen_on_site_2 - boolean

............

seen_on_site_n - boolean

Hopefully the number of sites won't go past 10. This would require only the number of unique identifiers that I know of, that is around 20 millions, but it would make it hard to work with it from an ORM perspective.

one table that would store only unique identifiers, like in:

id - integer

unique_identifier - text,

one table that would store only sites, like in:

id - integer

site - text

and one many to many relation, like:

id - integer,

unique_id - integer (fk to the table storing identifiers)

site_id - integer (fk to sites table)

another approach would be to have a table that stores unique identifiers for each site

So, which one seems like a better approach to take on the long run?

Have two tables.
Table 1 Site ID,Site Name, Site Description
Site ID -> Primary Key
Site Name -> Index

Table 2 would be the one you are talking about.
Row ID, Site ID, Whatever Info.
Row ID -> Primary Key
Site ID -> Foreign Key into table 1
Index (Row Id, Site ID)

If you already have a natural text-based unique identifier for site (url, perhaps?), then the only thing you need is ONE table with two fields:

CREATE TABLE (
    unique_identifier TEXT NOT NULL,
    site_identifier TEXT NOT NULL,
    PRIMARY KEY (unique_identifier, site_identifier)
);

You could then also add a UNIQUE INDEX on (site_identifier, unique_identifier) to facilitate lookups by site.

You may end up using a little extra space for the main table this way, but it is dead simple to query, update, and maintain.

I'd definitely avoid the ten column boolean horror if I were you, as there will always be more sites later. I'd agree with Romain Hippeau, with the added suggestion that you might want an index on sites to answer questions like 'Who has visited site x?'.

继续阅读：database-design postgresql

table design for storing large number of rows

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？