SQLite select statement optimisation advice
I have a SQLite table 'Details' with structure:
ID Name Category
---------------------
1 Matt 0
2 Shervin 0
3 Bob 0
4 Lee 0
5 Rick 0
6 Suraya 0
7 Susan 0
8 Adam 0
9 Jon 1
10 Lorna 1
... and so on .......
I want to select a row at random, and then three names from three different rows (again preferably at random). I would like this to all be returned from one SQLite statement. E.g.
ID Name Category Name1 Name2 Name 3
----------------------------------------
3 Bob 0 Matt Lee Susan
My attempt at this can be seen below, but it has two problems:
- The three extra names are not necessarily always different - I can't seem to exclude a name that has been previously selected because variables b/c/d are not in scope apart from their own COALESCE function.
- As each nested select uses the Random() function it’s not very efficient.
Can anyone suggest another way to select the data I need (using SQLite database)? Any help/advice is welcome - hope it is clear what I am trying to achieve, feel free to ask for any clarifications.
My current attempt:
SELECT a.Id,
a.Name,
a.Category,
COALESCE((SELECT b.Name
FROM Details b
WHERE b.Id NOT IN (a.Id)
AND b.Category IN (0)
ORDER BY Random()
LIMIT 1),'') as "Name1",
COALESCE((SELECT c.Name
FROM Details c
WHERE c.Id NOT IN (a.Id)
AND c.Category IN (0)
ORDER BY Random()
LIMIT 1),'') as "Name2",
COALESCE((SELECT d.Name
FROM Details d
WHERE d.Id NOT IN (a.Id)
AND d.Category IN (0)
ORDER BY Random()
LIMIT 1),'') as "Name3"
FROM Details a
AND a.Category IN (0)
ORDER开发者_如何学运维 BY Random()
LIMIT 1
I'm with neurino here. You have not said why you need to put the four names chosen at random into a single row, and why this has to be done in the back-end.
If you're concerned about performance, generate random integers in your client (range >= min(pkcol) and <= max(pkcol) ) until you've found four distinct rows (i.e. entities/names). There's a chance that no row exists with one of the generated ids, but that takes mere milliseconds to find out. Taking that random-key approach you could avoid an order by. The approach would work quickly even for tables with billions of rows.
P.S. (After finding out it was an iPhone app) You need one call to get the min and max ID values (it's the PK so that uses an index). Then you need at least another call to the DB (again, index-assisted) to get the four distinct rows using your randomly generated PK values [where ID in (a, b, c , d) ] The maximum number of calls is unknown; how many will depend on the density of your primary key sequence. I do not believe this would be an inordinate amount of I/O and it would be considerably less resource-intensive than an order by Random()--especially if the table has many rows. You could always generate an ID list of 8, 12, 16 ids at random and have your client cull only the 4 rows required if more than 4 are returned.
P.P.S. Typically it is the instantiation of the database connection that is expensive, and you don't want to do that in a loop or any more often than you need to. But you can open a connection, run two or three efficient selects that return a few rows each, and then close if you're done with the task at hand.
A multi-statement solution, which uses a temporary table:
CREATE TEMP TABLE names
AS
SELECT
Id,
Name,
Category
FROM Details
WHERE Category IN (0)
ORDER BY Random()
LIMIT 4;
SELECT
MAX(CASE rowid WHEN 1 THEN Id END) AS Id,
MAX(CASE rowid WHEN 1 THEN Name END) AS Name,
MAX(CASE rowid WHEN 1 THEN Category END) AS Id,
MAX(CASE rowid WHEN 2 THEN Name END) AS Name1,
MAX(CASE rowid WHEN 3 THEN Name END) AS Name2,
MAX(CASE rowid WHEN 4 THEN Name END) AS Name3
FROM names;
DROP TABLE names;
How about doing a full outer join x3, then simply choosing a row at random where the names are not equal?
You could also achieve what you want by nesting the queries to have the names as a returned value. You essentially get the fourth value first and then the third and so on. All the while ensuring that they don't match. I should have passed through the Id field and checked that the Id's don't conflict rather than the names, but this way means unique names.
SELECT Id
,Name
,Category
,bName
,cName
,dName
FROM Details,
(
SELECT Name AS bName, cName, dName
FROM Details,
(
SELECT Name AS cName, dName
FROM Details,
(
SELECT Name AS dName
FROM Details
WHERE Category IN (0)
ORDER BY Random()
LIMIT 1
) td
WHERE Name <> dName
AND Category IN (0)
ORDER BY Random()
LIMIT 1
) tc
WHERE Name <> dName
AND Name <> cName
AND Category IN (0)
ORDER BY Random()
LIMIT 1
) tb
WHERE Name <> dName
AND Name <> cName
AND Name <> bName
AND Category IN (0)
ORDER BY Random()
LIMIT 1;
I don't see a way around the Random() function and the slowness it generates other than generating the random Id's in code, but that has other problems.
There are two main methods to optimize an order by random() statement.
The first is to remove the sorting of the whole table step altogether, but it doesn't work on all platforms: limit 1 offset random()
, rather than order by random() limit 1
.
The other works on all platforms but requires that your primary keys be reasonably dense (an auto-incrementing integer with no little if any deletes ensures they are). Pre-fetch a smaller set of IDs starting at a random starting point, and to use them in a subquery:
select *
from (select *
from tbl
where id between :x and :x + 20
)
order by random()
limit 1
精彩评论