SQL select groups of distinct items in prepared statement?
I have a batch job that I run on a table which I'm sure 开发者_JS百科I could write as a prepared statement. Currently it's all in Java and no doubt less efficient than it could be. For a table like so:
CREATE TABLE thing (
`tag` varchar,
`document` varchar,
`weight` float,
)
I want to create a new table that contains the top N entries for every tag. Currently I do this:
create new table with same schema
select distinct tag
for each tag:
select * limit N insert into the new table
This requires executing a query to get the distinct tag
s, then selecting the top N
items for that tag and inserting them... all very inefficient.
Is there a stored procedure (or even a simple query) that I could use to do this? If dialect is important, I'm using MySQL.
(And yes, I do have my indexes sorted!)
Cheers
Joe
I haven't done this in a while (spoiled by CTE's in SQL Server), and I'm assuming that your data is ordered by weight; try
SELECT tag, document, weight
FROM thing
WHERE (SELECT COUNT(*)
FROM thing as t
WHERE t.tag = thing.tag AND t.weight < thing.weight
) < N;
I think that will do it.
EDIT: corrected error in code; need < N, not <= N.
If you were using SQL Server, I would suggest using the ROW_NUMBER function, grouped by tag, and select where row_number < N. (So in other words, order and number the rows for each tag according to their position in the tag group, then pick the top N rows from each group.) I found an article about simulating the ROW_NUMBER function in MySQL here:
http://www.xaprb.com/blog/2006/12/02/how-to-number-rows-in-mysql/
See if this helps you out!
精彩评论