Which solution is better for getting random record set from db?
I want to get some random records from db. There is two solution for th开发者_运维技巧is :
1- Using TABLESAMPLE
for getting data from db directly.
2- Write a method In my application for doing this. In this method we generate multiple random number and get data like this :
select * from db where ID = @RandomNumber
if this ID does not exist, I pass a new number.
Now which one has better performance?
According to the documentation for TABESAMPLE you shouldn't use it if you "really want a sample of individual rows":
If you really want a random sample of individual rows, modify your query to filter out rows randomly, instead of using TABLESAMPLE. For example, the following query uses the NEWID function to return approximately one percent of the rows of the Sales.SalesOrderDetail table:
SELECT * FROM Sales.SalesOrderDetail WHERE 0.01 >= CAST(CHECKSUM(NEWID(), SalesOrderID) & 0x7fffffff AS float) / CAST (0x7fffffff AS int)
The SalesOrderID column is included in the CHECKSUM expression so that NEWID() evaluates once per row to achieve sampling on a per-row basis. The expression
CAST(CHECKSUM(NEWID(),> SalesOrderID) & 0x7fffffff AS float / CAST(0x7fffffff AS int)
evaluates to a random float value between 0 and 1.
Either way, given the potentially endless number of requests you could make by passing in @RandomNumber (in theory the first 1000 requests you make might return nothing), the better approach is to limit the resultset on the server.
try this:
SELECT TOP 1 * FROM db
ORDER BY NEWID()
the NewID function will generate UniqueIdentifier value and it will be random. Source: SQL to Select a random row from a database table
I would use TABLESAMPLE, as its makes it very easy to generate sample data. I expect it would be more efficient as you only call one piece of SQL.
e.g.
USE AdventureWorks ;
GO
SELECT FirstName, LastName
FROM Person.Contact
TABLESAMPLE (10 PERCENT)
In your other example, you will have to keep on calling select * from db where ID = @RandomNumber
many times.
If you after individual rows then i would use another method, some form of random TOP 1 etc...
I recommend to read a post about various methods to get random row from table. It's based on PostgreSQL, but I'm sure that 90% applies to SQL Server too.
Of course most flexible and best performing solution can be achieved by writing a stored procedure.
Cost (hence: best performance) of getting truly random sample depends on data (type of data, statistics and distribution, including sparseness).
精彩评论