100k Rows Returned in a random order, without a SQL time out please

2022-12-20 21:37 问答作者：

Ok,

I've been doing a lot of reading on returning a random row set last year, and the solution we came up with was

ORDER BY newid()开发者_开发知识库

This is fine for <5k rows. But when we are getting >10-20k rows we are getting SQL time outs, the Execution planned tells me that 76% of my query cost comes from this line. and removing this line increase the speed by an order of magnitude when we have a large amount of rows.

Our users have a requirement of doing up to 100k rows at a time like this.

To give you all a bit more details.

We have a table with 2.6 million 4 digit alpha-numeric codes. We use a random set of these to gain entry into a venue. For example, if we have an event with a 5000 capacity, a random set of 5000 of these will be drawn from the table then issued to the each customer as a bar-code, then the bar-code scanning app at the door with have the same list of 5000. The reason for using a 4 digit alpha numeric code (and not a stupidly long number like a GUID) is that it easy for people to write the number down (or SMS it to a friend) and just bring the number and have it entered manually, so we don't want large amount of characters. Customers love the last bit btw.

Is there a better way than ORDER BY newid(), or is there a faster way to get 100k random rows from a table with 2.6 mil?

Oh, and we are using MS SQL 2005.

Thanks,

There is an MSDN article entitled "Selecting Rows Randomly from a Large Table" that talks about this exact problem and shows a solution (using no sorting but instead using a WHERE clause on a generated column to filter the rows).

The reason your query is slow is that the ORDER BY clause causes the whole table to be copied into tempdb for sorting.

If you want to generate random 4-digit codes, why not just generate them instead of trying to pull them out of a database?

Generate 100k unique numbers from 0 to 1,679,616 (which is the number of unique four-digit alphanumeric codes, ignoring case - 2.6 million rows must have some duplicates) and convert them to your four-digit codes.

You don't have to sort.

 DECLARE @RandomNumber int
 DECLARE @Threshold float
 SELECT @RandomNumber = COUNT(*) FROM customers
 SELECT @Threshold = 50000 / @RandomNumber

 SELECT TOP 50000 * FROM customers WHERE rand() > @Threshold ORDER BY newid()

Just as a matter of interest, what is the performance like if you replace

ORDER BY newid()

ORDER BY CHECKSUM(newid())

One thought is to break down the process into steps. Add a column in the table for a GUID then do an update statement into the table adding the GUIDs. This can be done ahead of time if necessary. You should then be able to run the query with an orderby on the GUID column to recieve the results the same way.

Have you tried using % (modulo) on a given int column? Not sure what your table structure is, but you could do something like this:

select top 50000 * from your_table where CAST((CAST(ASCII(SUBSTRING(venuecode,1,1)) as varchar(3))+ CAST(ASCII(SUBSTRING(venuecode,2,1))as varchar(3))+ CAST(ASCII(SUBSTRING(venuecode,3,1))as varchar(3))+ CAST(ASCII(SUBSTRING(venuecode,4,1))as varchar(3))) as bigint) % 500000 between 0 and 50000

The above code will take all of your alpha numeric venues and convert them to an integer and then split the entire table into 500,000 buckets of which you are taking the top 50000 that fall between 0 and 50000. You can play with the number after the % since (500,000) and you can play with the between. This should randomize it for you. Not sure if the where clause will bite you on performance, but it's worth a shot. Also, without an order by, there is no guarantee of the order (if you have multiple cpus and threading).

继续阅读：random sql-server sql-server-2005

100k Rows Returned in a random order, without a SQL time out please

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？