开发者

SQL select groups of distinct items in prepared statement?

I have a batch job that I run on a table which I'm sure 开发者_JS百科I could write as a prepared statement. Currently it's all in Java and no doubt less efficient than it could be. For a table like so:

CREATE TABLE thing (
  `tag` varchar,
  `document` varchar,
  `weight` float,
)

I want to create a new table that contains the top N entries for every tag. Currently I do this:

create new table with same schema
select distinct tag
for each tag:
  select * limit N insert into the new table

This requires executing a query to get the distinct tags, then selecting the top N items for that tag and inserting them... all very inefficient.

Is there a stored procedure (or even a simple query) that I could use to do this? If dialect is important, I'm using MySQL.

(And yes, I do have my indexes sorted!)

Cheers

Joe


I haven't done this in a while (spoiled by CTE's in SQL Server), and I'm assuming that your data is ordered by weight; try

SELECT tag, document, weight
FROM thing
WHERE (SELECT COUNT(*)
       FROM thing as t
       WHERE t.tag = thing.tag AND t.weight < thing.weight
) < N;

I think that will do it.

EDIT: corrected error in code; need < N, not <= N.


If you were using SQL Server, I would suggest using the ROW_NUMBER function, grouped by tag, and select where row_number < N. (So in other words, order and number the rows for each tag according to their position in the tag group, then pick the top N rows from each group.) I found an article about simulating the ROW_NUMBER function in MySQL here:

http://www.xaprb.com/blog/2006/12/02/how-to-number-rows-in-mysql/

See if this helps you out!

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜