开发者

Scramble a column in SQL Server?

We have a web app we'd like to demo to prospects, but our best way of doing so is with existing data, for a full experience. Certainly, we don't want to do this with actual customer names or addresses, etc visible in the app. Is there an easy way in SQL Server to randomize or scramb开发者_运维问答le a varchar or text field?

None of these columns are keys in anyway, either primary or foreign.


This is a late answer but I was not satisfied with any internet search on this matter. Here's an example that will shuffle the firstname and lastname in a customers table to create new names:

--Replace Customers with your table name
select * from Customers

--Be sure int match your id column datatype
Declare @id int

--Add a WHERE here to select just a subset of your table
DECLARE mycursor CURSOR FOR SELECT id FROM Customers
OPEN mycursor
FETCH NEXT FROM mycursor INTO @id;

WHILE (@@FETCH_STATUS = 0) 
BEGIN
    --We loop
    --Warning: NEWID() is generated once per query, so we update the fullname in two queries.
    UPDATE Customers
        SET FirstName = (SELECT TOP 1 FirstName FROM Customers ORDER BY NEWID())
    WHERE id = @id
    UPDATE Customers
        SET LastName = (SELECT TOP 1 LastName FROM Customers ORDER BY NEWID())
    WHERE id = @id
    FETCH NEXT FROM mycursor INTO @id;
END

CLOSE mycursor;
DEALLOCATE mycursor;

select * from Customers


Redgate has tool for it: http://www.red-gate.com/products/SQL_Data_Generator/index.htm

Didn't use it, but redgate tools are very good.

EDIT

It generates data, not scrambles, but still can be useful.


I scrambled data once by changing letters within the field. So, if you have a name "Mike Smith" and you change all the i's to o's, the m's to l's, the e's to a's, the s's to t's, the t's to rr's, you end up with

Moke Smoth
Loke Sloth
Loka Sloth
Loka Tloth
Loka Rrlorrh

which was enough to make the name unreadable, and also you can't go back and determine what it was (I changed some letters which had already had letters changed to them.) But, it's still kind of readable.


It's not possible to just leave your data in the tables and somehow only display it in a scrambled form.

Your options are to either replace the data by scrambling it in some way, generate new data that's in the same general form, write a function (CLR or T-SQL) that scrambles it as part of the queries you use, or encrypt the data, in which case it can only be displayed if the user also has the appropriate decryption key.

If you decide to replace the data, in addition to the Red Gate tool previously mentioned, you might also consider using the data generator that comes with Visual Studio Team Database, or perhaps Integration Services. The latter can be particularly useful if you would benefit from a more complex transformation.


dbForge has a free tool for data generation: http://www.devart.com/dbforge/sql/data-generator/


Here are couple simple methods that have quite nice performance and can be applied to a table:

use master;

declare @length as  int = 50;   --acts as maximum length for random length expressions
declare @rows   as  int = 10;

SELECT 
    CONVERT( VARCHAR(max), crypt_gen_random( @length     ))                                 as  FixedLengthText
,   CONVERT(NVARCHAR(max), crypt_gen_random( @length * 2 ))                                 as  FixedLengthUnicode
,   (   select crypt_gen_random((@length/8*6)) 
        where value."type" is not null  --refer to outer query, to get different value for each row
        FOR XML PATH(''))                                                                   as  FixedLengthBase64
,   CONVERT( VARCHAR(max), crypt_gen_random( (ABS(CHECKSUM(NewId())) %  @length     )+1 ))  as  RandomLengthText
,   CONVERT(NVARCHAR(max), crypt_gen_random( (ABS(CHECKSUM(NewId())) % (@length * 2))+1 ))  as  RandomLengthUnicode
,   (   select crypt_gen_random( ( (ABS(CHECKSUM(NewId())) % @length )+1 )/8*6 )
        where value."type" is not null  --refer to outer query, to get different value for each row
        FOR XML PATH(''))                                                                   as  RandomLengthBase64
FROM  dbo.spt_values  AS  value
WHERE   value."type" = 'P'  --Limit "number" to integers between 0-2047
    and value.number <= @rows
;


You can create a list of the columns that need updating and then simply iterate over said list and execute some dynamic sql that will update the row in some fashion. I made a fairly basic scramble function that will just sha1 the data (with a random salt) so that it should be secure enough for most purposes.

if exists (select 1 where object_id('tempdb..#columnsToUpdate') is not null)
begin
    drop table #columnsToUpdate
end
create table #columnsToUpdate(tableName varchar(max), columnName varchar(max), max_length int)

if exists (select 1 where object_id('fnGetSanitizedName') is not null)
begin 
    drop function fnGetSanitizedName
end

if exists (select 1 where object_id('random') is not null)
begin 
    drop view random
end

if exists (select 1 where object_id('randUniform') is not null)
begin 
    drop function randUniform
end

GO

create view random(value) as select rand();
go

create function dbo.randUniform() returns real
begin
    declare @v real
    set @v = (select value from random)
    return @v
end

go

CREATE FUNCTION dbo.fnGetSanitizedName 
(
    @functionName nvarchar(max),
    @length int
)
RETURNS varchar(max)
AS
BEGIN
    return left(SUBSTRING(master.dbo.fn_varbintohexstr(HashBytes('SHA1', cast(cast(cast(dbo.randUniform() * 10000 as int) as varchar(8)) as varchar(40)) + @functionName)), 3, 32), @length)

END
GO

begin transaction
set nocount on

insert into #columnsToUpdate
select tables.name, columns.name, 
    case
        when types.name = 'nvarchar' then columns.max_length / 2
        else columns.max_length
    end as max_length
from sys.tables tables
inner join sys.columns columns on tables.object_id=columns.object_id
inner join sys.types types on columns.system_type_id = types.system_type_id
where types.name in ('nvarchar', 'varchar')


declare @tableName varchar(max)
declare @columnName varchar(max)
declare @length int
declare @executingSql varchar(max)

declare tableUpdateCursor cursor 
    for select tableName, columnName, max_length from #columnsToUpdate
open tableUpdateCursor

    fetch next from tableUpdateCursor into @tableName, @columnName, @length
    while @@fetch_status = 0
    begin
        set @executingSql = 'update ' + @tableName + ' set ' + @columnName + ' = dbo.fnGetSanitizedName(' + @columnName + ',' + cast(@length as varchar(max)) + ')'
        print @executingSql
        exec(@executingSql)

        fetch next from tableUpdateCursor into @tableName, @columnName, @length
    end


close tableUpdateCursor
deallocate tableUpdateCursor

set nocount off

rollback -- Can remove the rollback when you are sure about what your are doing.

drop table #columnsToUpdate
drop function dbo.fnGetSanitizedName
drop view random
drop function randUniform
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜