Scramble a column in SQL Server?
We have a web app we'd like to demo to prospects, but our best way of doing so is with existing data, for a full experience. Certainly, we don't want to do this with actual customer names or addresses, etc visible in the app. Is there an easy way in SQL Server to randomize or scramb开发者_运维问答le a varchar or text field?
None of these columns are keys in anyway, either primary or foreign.
This is a late answer but I was not satisfied with any internet search on this matter. Here's an example that will shuffle the firstname and lastname in a customers table to create new names:
--Replace Customers with your table name
select * from Customers
--Be sure int match your id column datatype
Declare @id int
--Add a WHERE here to select just a subset of your table
DECLARE mycursor CURSOR FOR SELECT id FROM Customers
OPEN mycursor
FETCH NEXT FROM mycursor INTO @id;
WHILE (@@FETCH_STATUS = 0)
BEGIN
--We loop
--Warning: NEWID() is generated once per query, so we update the fullname in two queries.
UPDATE Customers
SET FirstName = (SELECT TOP 1 FirstName FROM Customers ORDER BY NEWID())
WHERE id = @id
UPDATE Customers
SET LastName = (SELECT TOP 1 LastName FROM Customers ORDER BY NEWID())
WHERE id = @id
FETCH NEXT FROM mycursor INTO @id;
END
CLOSE mycursor;
DEALLOCATE mycursor;
select * from Customers
Redgate has tool for it: http://www.red-gate.com/products/SQL_Data_Generator/index.htm
Didn't use it, but redgate tools are very good.
EDIT
It generates data, not scrambles, but still can be useful.
I scrambled data once by changing letters within the field. So, if you have a name "Mike Smith" and you change all the i's to o's, the m's to l's, the e's to a's, the s's to t's, the t's to rr's, you end up with
Moke Smoth
Loke Sloth
Loka Sloth
Loka Tloth
Loka Rrlorrh
which was enough to make the name unreadable, and also you can't go back and determine what it was (I changed some letters which had already had letters changed to them.) But, it's still kind of readable.
It's not possible to just leave your data in the tables and somehow only display it in a scrambled form.
Your options are to either replace the data by scrambling it in some way, generate new data that's in the same general form, write a function (CLR or T-SQL) that scrambles it as part of the queries you use, or encrypt the data, in which case it can only be displayed if the user also has the appropriate decryption key.
If you decide to replace the data, in addition to the Red Gate tool previously mentioned, you might also consider using the data generator that comes with Visual Studio Team Database, or perhaps Integration Services. The latter can be particularly useful if you would benefit from a more complex transformation.
dbForge has a free tool for data generation: http://www.devart.com/dbforge/sql/data-generator/
Here are couple simple methods that have quite nice performance and can be applied to a table:
use master;
declare @length as int = 50; --acts as maximum length for random length expressions
declare @rows as int = 10;
SELECT
CONVERT( VARCHAR(max), crypt_gen_random( @length )) as FixedLengthText
, CONVERT(NVARCHAR(max), crypt_gen_random( @length * 2 )) as FixedLengthUnicode
, ( select crypt_gen_random((@length/8*6))
where value."type" is not null --refer to outer query, to get different value for each row
FOR XML PATH('')) as FixedLengthBase64
, CONVERT( VARCHAR(max), crypt_gen_random( (ABS(CHECKSUM(NewId())) % @length )+1 )) as RandomLengthText
, CONVERT(NVARCHAR(max), crypt_gen_random( (ABS(CHECKSUM(NewId())) % (@length * 2))+1 )) as RandomLengthUnicode
, ( select crypt_gen_random( ( (ABS(CHECKSUM(NewId())) % @length )+1 )/8*6 )
where value."type" is not null --refer to outer query, to get different value for each row
FOR XML PATH('')) as RandomLengthBase64
FROM dbo.spt_values AS value
WHERE value."type" = 'P' --Limit "number" to integers between 0-2047
and value.number <= @rows
;
You can create a list of the columns that need updating and then simply iterate over said list and execute some dynamic sql that will update the row in some fashion. I made a fairly basic scramble function that will just sha1 the data (with a random salt) so that it should be secure enough for most purposes.
if exists (select 1 where object_id('tempdb..#columnsToUpdate') is not null)
begin
drop table #columnsToUpdate
end
create table #columnsToUpdate(tableName varchar(max), columnName varchar(max), max_length int)
if exists (select 1 where object_id('fnGetSanitizedName') is not null)
begin
drop function fnGetSanitizedName
end
if exists (select 1 where object_id('random') is not null)
begin
drop view random
end
if exists (select 1 where object_id('randUniform') is not null)
begin
drop function randUniform
end
GO
create view random(value) as select rand();
go
create function dbo.randUniform() returns real
begin
declare @v real
set @v = (select value from random)
return @v
end
go
CREATE FUNCTION dbo.fnGetSanitizedName
(
@functionName nvarchar(max),
@length int
)
RETURNS varchar(max)
AS
BEGIN
return left(SUBSTRING(master.dbo.fn_varbintohexstr(HashBytes('SHA1', cast(cast(cast(dbo.randUniform() * 10000 as int) as varchar(8)) as varchar(40)) + @functionName)), 3, 32), @length)
END
GO
begin transaction
set nocount on
insert into #columnsToUpdate
select tables.name, columns.name,
case
when types.name = 'nvarchar' then columns.max_length / 2
else columns.max_length
end as max_length
from sys.tables tables
inner join sys.columns columns on tables.object_id=columns.object_id
inner join sys.types types on columns.system_type_id = types.system_type_id
where types.name in ('nvarchar', 'varchar')
declare @tableName varchar(max)
declare @columnName varchar(max)
declare @length int
declare @executingSql varchar(max)
declare tableUpdateCursor cursor
for select tableName, columnName, max_length from #columnsToUpdate
open tableUpdateCursor
fetch next from tableUpdateCursor into @tableName, @columnName, @length
while @@fetch_status = 0
begin
set @executingSql = 'update ' + @tableName + ' set ' + @columnName + ' = dbo.fnGetSanitizedName(' + @columnName + ',' + cast(@length as varchar(max)) + ')'
print @executingSql
exec(@executingSql)
fetch next from tableUpdateCursor into @tableName, @columnName, @length
end
close tableUpdateCursor
deallocate tableUpdateCursor
set nocount off
rollback -- Can remove the rollback when you are sure about what your are doing.
drop table #columnsToUpdate
drop function dbo.fnGetSanitizedName
drop view random
drop function randUniform
精彩评论