Movie DB - storage of Actors / actress / Tags?

2023-02-01 00:08 问答作者：

Creating a movie db and I dont like the idea of giving each actor/actress and also each tag its own row as if there are 10 million moives total, each has a cast of atleast 20-30 people we will have 200-300 million rows in the table.

And it gets more complex with tags which can be unlimited per movie. So how to best store these 3 items? Ideally these can be modeled as a Many to Many but 开发者_如何学Cstill it will have hundreds of millions of rows. Any better suggestions on storing these? I am using MySQL.

I would dump it all in a textfile but I need to link actors between movies and do some analytics also and allow users to rate actors find movies by tag, etc so need to use a DB.

10 million movies seems pretty ambitious. IMDb's current statistics show that they have less than 1.8M titles and around 3.9M people.

Having said that, I see no problem with creating a table of titles, a table of actors, and a junction table to resolve the many-to-many relationship between the two. The same holds true for tags.

Movie DB - storage of Actors / actress / Tags?

It sounds perhaps a bit of premature optimization here. You could denormalize all actors into a TEXT column of some kind onto the Movie table, but your performance + search would suffer, as well as losing all benefits of relational data.

Suggest to keep the normalized schema, as you were originally thinking:

Movie (ID)
Actor (ID)
Tag (ID) --horror, comedy, etc.

MovieActor (MovieID, ActorID)
MovieTag (MovieID, TagID)

Create indexes as per normal on the associative entities: MovieActor and MovieTag.
Load some dummy data in a Test environment. 10 million movies with 100 million actors with 1 million tags. Create associative entries for each as required.
Baseline and performance test.
Horizontal partitioning (sharding) if your performance metrics require more performance.

Regardless of the number of movies, or whether the data is DNA sequences: implement a design, test it, judge its performance based on your requirements (user acceptance, SLA, etc)

What's the reason for your aversion to the millions of rows? A perceived performance issue?

It's going to have hundreds of millions of relations somewhere. You do have to capture the mapping between actor and film and as you say, there are 200-300 million of those (although I don't believe there are 10 million movies in existence?)

If you really wanted, you could (for example) pack the ids for actors for a film into multiple columns (or into one column) but that would make searching unpleasant.

10 million movies w/ 20 to 30 cast members each (although the number sound higher than real life) will invariably lead to 200-300 million associations. If you're storing your data in a relational database, each association will naturally be one row in a table linking movies to actors. Each row will be very small (two columns - movie PK and actor PK; possibly an extra surrogate key column); the bulk of the data will be stored in the movies and actors table.

Any other solution (in an SQL database) will store the same amount of data in a less optimal format.

继续阅读：database

Movie DB - storage of Actors / actress / Tags?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集 河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？