SQL Primary key sort order
I am using MS SQL server 2005
I have a table with 3 columns where I store user-message mapping like:
msg_for msg_from msg_id
bob bob 1
bob john 1
bob steve 1
bob bob 2
bob john 2
bob bob 3
bob john 3
bob steve 3
The PK is on 3 columns and msg_id is FK to messages table that stores the messages
The above is the physical storage I see according to the PK on 3 columns
Now my query MUST return the messages for a given user having latest msg at top (order by msg_id DESC)
bob john 3
bob steve 3
bob john 2
bob steve 2
bob john 1
bob steve 1
This mapping table has millions of rows. I see 95% of the cost is to SORT the result.
Is it 开发者_JS百科possible to have the PK or some other way store data physically like this (avoid SORT)?
msg_for msg_from msg_id
bob bob 3
bob john 3
bob steve 3
bob bob 2
bob john 2
bob bob 1
bob john 1
bob steve 1
Thanks
Yes.
When you set up the Primary Key (or any index) you can define this
ALTER TABLE dbo.[Messages] ADD CONSTRAINT [PK_Messages] PRIMARY KEY CLUSTERED
(
msg_for ASC, msg_from ASC, msg_id DESC
)
SQL Server can scan in either direction so it only makes sense if you want to control the sort order combination for multiple columns.
Edit: You say in the comments that the problem query is
select top 10 msg_id
from message_user
where msg_for = @user_name
and msg_from <> @user_name
order by msg_id DESC
The issue here isn't one of Ascending, Descending.
To give an analogy. Phone books are listed in surname, forename order but if you needed to know the lexicographically last 10 forenames in the directory you would need to scan the whole book. This would be unavoidable regardless of whether or not within each section forenames were listed in ascending or descending order.
Similarly the composite index keys would need to be msg_for, msg_id, msg_from
to satisfy this query optimally not msg_for, msg_from, msg_id
With this latter order it will still need to scan the whole section of the index satisfying the msg_for = @user_name
criteria as it cannot know if there will be a later msg_id
still to come belonging to a later msg_from
Additionally regardless in which direction msg_id
is sorted in their individual sub sections a sequential scan of the msg_for = @user_name
part of the index will still require a sort as it they are fragmented by being in subsections according to msg_from
.
The only way to guarantee order in the result set is by using an ORDER BY.
In SQL Server, a clustered index can help... assuming the optimizer sees the index as being useful.
Well no wonder sorting takes forever. Varchar/string types are usually types that are very heavy when it comes to sorting, whether it's SQL or any programming language for that matter. Whenever possible use integral types for such things.
I suggest you use integral values to identify members. Have a Members
table (MemberId INT, MemberName VARCHAR, etc)
, then a Messages
table (MessageId INT, MessageBody VARCHAR, etc)
and then have a join table, say Correspondence
with (SenderMemberId INT, RecipientMemberId INT, MessageId INT)
. Sorting on integral values will be way faster this way.
I think you can easily refactor your data to fit on such a new structure.
Depending on your DBMS, you can use a clustered index to achieve that.
精彩评论