SQL Design - How to store large amount of URLs

2023-03-17 01:32 问答作者：

I'm writing an application that will have a SQL Server backend that will store (among other things) urls. URLS will be mapped to users, and some URLs may be common between different users. In absence of a true DBA, I'm trying to design a solution that can handle hundreds of thousands of URLs as efficiently as possible.

Ideas:

Create table that simply has ID, URL

Pro: simple, complete.
CON: duplicate entries for a URL will exist wh开发者_如何学Pythonich will cause the table to be larger than it needs to be.
Break up the user and URLs into separate tables. One table containing USER ID, and URL ID . Another table with URL ID and URL itself.

Pro: single URL in the system, seems more "enterprisey"
Con: must join two tables when trying to pull back results, and not really sure what the benefit of this approach is?
Expand on the 2 idea, except REALLY break it up. So have a table for domain, another for path/query string. Then, user table would have userid, domain ID, path ID.

Pro: urls could share data even if it was unrelated (meaning, cnn.com/helloworld and nbc.com/helloworld would have different domain ids, but same path ids.. seems this could be useful when running metrics later?

Con: Seems like a nightmare from a performance perspective (again, because joins would be necessary to pull a URL.

Any thoughts?

I would do the following in my design:

UserId  UrlId
1       1
2       2
1       1

UrlId  Url
1      http://www.google.com
2      http://www.yahoo.com

Storing your URLs in a seperate table and only creating a new entry in the URL table, if an exact match does not already exist. If you have a lot of common URLs, this will save some space. You could take it a step farther and add a third table as you mentioned, e.g.

UrlPathId  UrlId  UrlPath
1          1      /shopping

...and then tieing the UrlPathId to the User table. And perhaps even further:

UrlPathId  UrlId  UrlQueryString
1          1      ?product=speakers

...and again, referencing this from your User table.

It sounds like you are describing a many to many relationship between users and URL's.

I would highly suggest ruling out option 1. Not only will this increase size, but because if you need to update a URL or a User, you'll have to do it every time that it's duplicated, instead of once.

Choosing between 2 and 3 is more difficult, because it depends much more on how this is going to be used. #2 is a lot more simplistic, and is still normalized. The features in #3 don't seem to outweigh the complexity to me, so personally I'd pick #2.

Edit: Upon seeing George's answer, I completely agree with the first section.

Are you really that short on space? Unless you need to treat URLs as an object in their own right I would just go for option 1 and cover it with indexes if you have specific performance requirements on URLs alone.

See my other comment here on dealing with orphan URLs.

继续阅读：sql-server

SQL Design - How to store large amount of URLs

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？