Assign values to records randomly based on given distributions
I have a t-sql
procedure that generates test data and as part of this, several columns need to be set randomly to a set of values following a predefined distribution. I currently have a method for doing this using an identity column and the remainder operator % but I wondered if th开发者_运维技巧ere was a more elegant way of designing a solution to this problem that would allow me to parameterise the values and distribution that the columns are assigned.
For example,
I want to assign Column A
one of three values {Horse, Donkey, Pony}
and column B
one of two values {Big, Small}
. In this case I want the distributions to be equal, {Horse:1/3, Donkey 1/3, Pony, 1/3}, {Big:1/2, Small:1/2}
but Column B's
distribution should depend on Column A
(i.e. 1/2 of the Ponies should be Big)
Record | ColumnA | ColumnB | OtherData
1 | Horse | Big |...
2 | Horse | Small |...
3 | Donkey | Big |...
4 | Donkey | Small |...
5 | Pony | Big |...
6 | Pony | Small |...
I would prefer to have a semi-random allocation of Column A and B with respect to record numbers but this is not essential.
In the case where a distribution is not exactly possible, all remaining records should be assigned one of the possible values. It is assumed that in large data sets this will be work itself out.
Not sure if this will help in your case, but this will work on SQL Server 2005+. This is a set of 300 records, yielding 100 of each animal, and 50 of each size per animal.
There may be a similar way to approach this in other SQL languages.
DECLARE @MaxCount INT
SET @MaxCount = 300
SET NOCOUNT ON;
DECLARE @Numbers TABLE (Number INT NOT NULL IDENTITY);
INSERT @Numbers DEFAULT VALUES;
WHILE SCOPE_IDENTITY() < @MaxCount INSERT @Numbers DEFAULT VALUES;
SELECT *
FROM (
SELECT RandomSize.Number
, RandomSize.Animal
, CASE RandomSize.RowNumber % 2
WHEN 0 THEN 'Big'
WHEN 1 THEN 'Small'
END Size
FROM (
SELECT DerivedAnimal.Number
, DerivedAnimal.Animal
, ROW_NUMBER() OVER (PARTITION BY DerivedAnimal.Animal ORDER BY NEWID()) RowNumber
FROM (
SELECT RandomAnimals.Number
, CASE RandomAnimals.RowNumber % 3
WHEN 0 THEN 'Horse'
WHEN 1 THEN 'Donkey'
WHEN 2 THEN 'Pony'
END Animal
FROM (
SELECT Number
, ROW_NUMBER() OVER (ORDER BY NEWID()) RowNumber
FROM @Numbers
) RandomAnimals
) DerivedAnimal
) RandomSize
) FinalList
ORDER BY FinalList.Number
精彩评论