best way to store similar music

2023-03-15 14:39 问答作者：

I have millions of songs, each song has its unique Song ID. Corresponding to each Song ID I have some attributes like song name, artist name, album name, year etc.

Now, I have implemented a mechanism to find out similarity ratio between two songs. It gives me a value between 0 - 100.

So, I need开发者_JS百科 to show similar music to users, which can not be done on a run time. I need to preprocess the similarity values between each and every song.

Hence, if I create a DB with three attributes,

song1, song2, similarity

I will be having n*n records where n is the number of songs.

And whenever I want to fetch the similar music, I need to execute this query:

SELECT song2 WHERE song1 = x AND similarity > 80 ORDER BY similarity DESC;

Please suggest something to maintain such information.

Thanks.

I think you'd be better off comparing similarity to a "prototypical" song or classification. Devise a fingerprint mechanism that includes information metadata about the song and whatever audio mechanism you use to judge similarity. Place each song into one (or more) categories and score the song within that category -- how closely does it match the prototype for the category using the fingerprint. Note that you could have hundreds or thousands of categories, i.e., they're not the typical categories that you think of when you think of music.

Once you have this done, you can then maintain indexes by category and when finding similar songs you devise a weight based on the category and similarity measures within the category -- say by giving greater weight to the category in which the song is closest to the prototype. Multiply the weight by the square of the difference between the candidate song and the current song to the prototype for the category. Sum the weights for the say top 3 categories with lower values being more similar.

This way you only need to store a few items of metadata for each song rather than keep relationship between pairs of songs. If the main algorithm runs too slowly, you could keep cached pair-wise data for the most common songs and default to the algorithmic comparison when a song isn't in your cached data set.

What you are proposing will work, however, you can reduce the number of rows by storing each pair only once. Then modifying your query to select the song id in song1 or song2.

Something like:

SELECT if(song1=?,song2,song1) as similar WHERE (song1 = ? or song2 =?) AND similarity > 80 ORDER BY similarity DESC;

It seems required mass computation power to maintain and access the similarity information. For example, if you already have 2000 songs processed, and you still need to perform the similarity analyze 2000 times for the next new song. It may have scalability problem and the data scheme can make the database slow in just a short time period.

I recommend that you can find some pattern and tag each song. For example, you can analyze the songs for "blues", "rocks", "90's" pattern and give them tags. If you want to find similar song based on one song, you can just query all tags that the given songs have. ex. "New age", "Slow" and "techno"

继续阅读：data-structures database database-design php

best way to store similar music

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？