Tuesday, October 23, 2012

TSQL - Returning Random Numbers

Someone was updating a table with test data, and asked me how to fill a column with random numbers.

I was sure rand() would do it.  I was wrong:


declare @tmpTable table (
                ID int
                ,random float
)
declare @intID int=100
while @intID>0
begin
                insert into @tmpTable (ID) values (@intID)
                set @intID=@intID-1
end

update @tmpTable set random=rand()

select * from @tmpTable

 

It turns out that rand() generates only one number per statement, not per record.  Boo.

After mucho Googling, I found two ways of returning random values per record.  They are newid() and crypt_gen_random().  Actually, I found more.  I saw some people that created functions, but I didn't want to to create any new objects just for this.

Between the two options, I went for the crypt_gen_random() route.  I guess using either would be equivalent at some level, but using crypt_gen_random() looked more self-explanatory than newid() for returning random numbers, and crypt_gen_random() was completely new to me.


declare @Lower int=1 -- lower boundary of random numbers, inclusive
declare @Upper int=1000 -- upper boundary of random numbers, inclusive

declare @tmpTest table (
                ID int
                ,random int
)
declare @intID int=10
while @intID>0
begin
                insert into @tmpTest (ID) values (@intID)
                set @intID=@intID-1
end

update @tmpTest set
                random=cast(@Lower+(@Upper-@Lower+1)*(cast(crypt_gen_random(6) as bigint) / power(256.0, 6)) as int)

select * from @tmpTest order by random desc

crypt_gen_random(nnn) returns a random number "nnn" bytes long.  I had to divide it by its maximum value possible [power(256.0, nnn)] to return a decimal value <= 1.0.  For the power function, I had to use a floating point base (256.0) due to arithmetic overflows against the integer datatype.  Then, I worked in a formula to range the values between the upper and lower boundaries.  Then, cast the whole thing as an integer, to remain faithful to the resulting datatype we need.  I suppose you could cast it as a bigint or float, but my friend only needed an integer, so there you go.

I stress tested this to return random numbers up to 10M.

I hope the hours I spent on this saves you time!