Implementing a find-or-insert for one-to-many tables
I have 2 tables, tracklist
and track
, where tracklist
has many tracks
. At some points, I will receive user input which refers to a list of tracks, and I need to either create that tracklist, or return an existing tracklist (this is because tracklists are meant to be entirely transparent to users).
My naive solution to this was to find all tracklists with n
tracks, and join track
against tracklist
n
times, checking each join against the user input data. For example, with 2 tracks:
SELECT tracklist.id FROM tracklist
JOIN track t1 ON tracklist.id = t1.tracklist
JOIN track_name tn1 ON t1.name = tn1.id
JOIN track t2 ON tracklist.id = t2.tracklist
JOIN track_name tn2 ON t2.name = tn2.id
WHERE tracklist.track_count = '20'
AND (t1.position = 1 AND tn1.name = 'Pancakes' AND t1.artist_credit = '42' AND t1.recording = 1)
AND (t2.position = 2 AND tn2.name = 'Waffles' AND t2.artist_credit = '9001' AND t2.recording = 2)
However, this really doesn't scale well to large tracklists. My very rudimentary timing shows this can take >500ms for 10 track tracklists, and ~7s for tracklists with 100 tracks. While the latter is an edge case, whatever algorithm I use needs to be able to scale at least up to this.
I'm stuck on other solutions however. The only other thing I can think of is to select all tracklists with n
tracks, and all their tracks, and then do the comparison in application code. However, I'd really like to keep this on the database server if I can.
Here is the schema I am working with:
CREATE TABLE track
(
id SERIAL,
recording INTEGER NOT NULL, -- references recording.id
tracklist INTEGER NOT NULL, -- references tracklist.id
position INTEGER NOT NULL,
name INTEGER NOT NULL, -- references track_name.id
artist_credit INTEGER NOT NULL, -- references artist_credit.id
length INTEGER CHECK (length IS NULL OR length > 0),
edits_pending INTEGER NOT NULL DEFAULT 0,
last_updated TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);
CREATE TABLE track_name (
id SERIAL,
name VARCHAR NOT NULL
);
CREATE TABLE tracklist
(
id SE开发者_JS百科RIAL,
track_count INTEGER NOT NULL DEFAULT 0,
last_updated TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);
Any suggestions?
SELECT DISTINCT tracklist
FROM track t0
WHERE
(SELECT COUNT(DISTINCT tracklist)
FROM track t1
WHERE
(
(t1.id='test1.id')
OR
(t1.id='test2.id')
......
OR
(t1.id='testn.id')
)
= 1);
-- This is OK if you have the track ids for this query.
-- If you do not then you need to replace each of the t1.id='testm.id' statements
-- with:
-- t1.recording='testm.recording' AND
-- t1.tracklist='testm.tracklist' AND
-- t1.position='testm.position' AND
-- t1.name='testm.name' AND
-- t1.artist_credit='testm.artist_credit' AND
-- t1.length='testm.length' AND
-- t1.edits_pending='testm.edits_pending' AND
-- t1.last_updated='testm.last_updated'
As I may not have the syntax exactly correct, and have had no opportunity to test it, a written description of what I am trying to achieve is next:
I build up a query returning the list of tracks that you have. Once I have built this query I am checking whether the tracklists for these tracks are all the same. If they are, ie there is only one tracklist in the query, then this is the tracklist you require. If there are no tracklists in the query, or there is more than one, then the set of tracks you have do not correspond to any single existing tracklist, so you need to create a new tracklist. This query does not deal with the actual creation, if it proves necessary. I am not sure how it will deal with degenerate cases - there are no tracks at all in the query; or there are no tracklists listed for any of the tracks.
精彩评论