开发者

Is there a better way to sort this query?

We generate a lot of SQL procedurally and SQL Server is killing us. Because of some issues documented elsewhere we basically do SELECT TOP 2 ** 32 instead of TOP 100 PERCENT.

Note: we must use the subqueries.

Here's our query:

SELECT * FROM ( 
    SELECT [me].*, ROW_NUMBER() OVER( ORDER BY (SELECT(1)) ) 
    AS rno__row__index FROM (
       SELECT [me].[id], [me].[status] FROM (
          SELECT TOP 4294967296 [me].[id], [me].[status] FROM 
          [PurchaseOrders] [me] 
          LEFT JOIN [POLineItems] [line_items] 
          ON [line_items].[id] = [me].[id] 
          WHERE ( [line_items].[part_id] = ? ) 
          ORDER BY [me].[id] ASC
       ) [me]
   ) [me] 
) rno_subq 
WHERE rno__row__index BETWEEN 1 AND 25 

Are there better ways to do this tha开发者_高级运维t anyone can see?

UPDATE: here is some clarification on the whole subquery issue:

The key word of my question is "procedurally". I need the ability to reliably encapsulate resultsets so that they can be stacked together like building blocks. For example I want to get the first 10 cds ordered by the name of the artist who produced them and also get the related artist for each cd.. What I do is assemble a monolithic subselect representing the cds ordered by the joined artist names, then apply a limit to it, and then join the nested subselects to the artist table and only then execute the resulting query. The isolation is necessary because the code that requests the ordered cds is unrelated and oblivious to the code selecting the top 10 cds which in turn is unrelated and oblivious to the code that requests the related artists.

Now you may say that I could move the inner ORDER BY into the OVER() clause, but then I break the encapsulation, as I would have to SELECT the columns of the joined table, so I can order by them later. An additional problem would be the merging of two tables under one alias; if I have identically named columns in both tables, the select me.* would stop right there with an ambiguous column name error.

I am willing to sacrifice a bit of the optimizer performance, but the 2**32 seems like too much of a hack to me. So I am looking for middle ground.


  • If you want top rows by me.id, just ask for that in the ROW_NUMBER's ORDER BY. Don't chase your tail around subqueries and TOP.
  • If you have a WHERE clause on a joined table field, you can have an outer JOIN. All the outer fields will be NULL and filtered out by the WHERE, so is effectively an inner join.

.

WITH cteRowNumbered AS (
  SELECT [me].id, [me].status 
  ROW_NUMBER() OVER (ORDER BY me.id ASC) AS rno__row__index 
  FROM [PurchaseOrders] [me]           
  JOIN [POLineItems] [line_items] ON [line_items].[id] = [me].[id]          
  WHERE [line_items].[part_id] = ?)
SELECT me.id, me.status 
FROM cteRowNumbered
WHERE rno__row__index BETWEEN 1 and 25

I use CTEs instead of subqueries just because I find them more readable.


Use:

SELECT x.*
  FROM (SELECT po.id, 
               po.status,
               ROW_NUMBER() OVER( ORDER BY po.id) AS rno__row__index
          FROM [PurchaseOrders] po
          JOIN [POLineItems] li ON li.id = po.id
         WHERE li.pat_id = ?) x
 WHERE x.rno__row__index BETWEEN 1 AND 25
ORDER BY po.id ASC

Unless you've omitted details in order to simplify the example, there's no need for all your subqueries in what you provided.


Kudos to the only person who saw through naysaying and actually tried the query on a large table we do not have access to. To all the rest saying this simply will not work (will return random rows) - we know what the manual says, and we know it is a hack - this is why we ask the question in the first place. However outright dismissing a query without even trying it is rather shallow. Can someone provide us with a real example (with preceeding CREATE/INSERT statements) demonstrating the above query malfunctioning?


Your update makes things much clearer. I think that the approach which you're using is seriously flawed. While it's nice to be able to have encapsulated, reusable code in your applications, front-end applications are a much different animal than a database. They typically deal with small structures and small, discrete process that run against those structures. Databases on the other hand often deal with tables that are measured in the millions of rows and sometimes more than that. Using the same methodologies will often result in code that simply performs so badly as to be unusable. Even if it works now, it's very likely that it won't scale and will cause major problems down the road.

Best of luck to you, but I don't think that this approach will end well in all but the smallest of databases.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜