Masking or hiding inaccurately entered data in SQL Server 2008

2023-04-09 19:46 问答作者：

OK, so my subject line isn't very descriptive, but here's the scenario:

An end-user has a legal obligation to submit transaction data to a government agency. The transactions contain the name and address of various individuals and organizations. HOWEVER, end users frequently misspell the names of the reported individuals and organizations, or they badly mangle the address, etc.

The information submitted by the end user is a legal 'document', so it cannot be altered by the agency that received it. Also, the transactions can be viewed and searched by the public. When the government agency notices an obvious misspelling or bad address, they开发者_开发问答 would like to 'hide' or 'mask' that bad value with a known good value. For example, if an end user entered 'Arnie Schwarzeger', the agency could replace that name with 'Arnold Schwarzenegger'. The public that viewed the data would see (and search for) the correct spelling, but could view the original data as entered by the end user after they found the data record in question.

Hopefully that explains the business case well enough...on to the SQL part! So to address this problem, we have tables that look like this:

CREATE TABLE [dbo].[SomeUserEnteredData](
    [Id] [uniqueidentifier] NOT NULL,
    [LastOrOrganizationName] [nvarchar](350) NOT NULL,  // data as entered by end-user
    [FirstName] [nvarchar](50) NULL, // data as entered by end-user
    [FullName]  AS ([dbo].[FullNameValue]([FirstName],[LastName])) PERSISTED,  // data as entered by end-user
    [MappedName]  AS ([dbo].[MappedNameValue]([FirstName],[LastName])))  // this is the 'override' data from the agency

CREATE TABLE [dbo].[CorrectionsByAgency](
    [Id] [uniqueidentifier] NOT NULL,
    [ReplaceName] [nvarchar](400) NOT NULL,
    [KeepName] [nvarchar](400) NOT NULL)

CREATE FUNCTION [dbo].[FullNameValue]
(
    @FirstName as NVARCHAR(40),
    @LastOrOrganizationName as NVARCHAR(350)
)
RETURNS NVARCHAR(400)
WITH SCHEMABINDING
AS
BEGIN
    DECLARE @result NVARCHAR(400)
    IF @FirstName = '' OR @FirstName is NULL
        SET @result = @LastOrOrganizationName
    ELSE 
        SET @result = @LastOrOrganizationName + ', ' + @FirstName
    RETURN @result
END

CREATE FUNCTION [dbo].[MappedNameValue]
(
    @FirstName as NVARCHAR(50),
    @LastOrOrganizationName as NVARCHAR(350)
)
RETURNS NVARCHAR(400)
AS
BEGIN
    DECLARE @result NVARCHAR(400)
    DECLARE @FullName NVARCHAR(400)
    SET @FullName = dbo.FullNameValue(@FirstName, @LastOrOrganizationName)
    SELECT top 1 @result = KeepName from CorrectionsByAgency where ReplaceName = @FullName
    if @result is null
        SET @result = @FullName
    RETURN @result
END

Hopefully, if my sample isn't TOO convoluted, you can see that if the agency enters a name correction, it will replace all occurrences of the misspelled name. From a business logic perspective, this works exactly right: the agency staff only enters a few corrections and the corrections can override everywhere there are misspelled names.

From a server performance standpoint, this solution STINKS. The calculated SomeUserEnteredData.MappedName column can't be indexed, and no view that reads from that column can be indexed either! There's no way this can work for our needs if we can't index the MappedName values.

The only alternative I've been able to see as a possibility is to create an additional linking table between the end-user created data and the agency created data -- when the agency enters a correction record, a record is created in the linking table for every occurrence of the bad column value. The down side to this seems to be the very real likelihood of creating/destroying many (hundreds of thousands) of those linking records for every correction entered by an agency user...

Do any of you SQL geniuses out there have great ideas about how to address this problem?

I'm not sure if this is answering your question directly, but I would try to simplify the whole thing: stop using functions, persist "calculated" values and use application logic (possibly in a stored procedure) to manage the data.

Assuming that one agency correction can be applied to many user-entered names, then you could have something like this:

create table dbo.UserEnteredData (
  DocumentId uniqueidentifier not null primary key,
  UserEnteredName nvarchar(1000) not null,
  CorrectedNameId uniqueidentifier null,
  constraint FK_CorrectedNames foreign key (CorrectedNameId)
    references dbo.CorrectedNames (CorrectedNameId)
)

create table dbo.CorrectedNames (
  CorrectedNameId uniqueidentifier not null primary key,
  CorrectedName nvarchar(1000) not null
)

Now, you need to make sure your application logic can do something like this:

External user enters dirty data
Agency user reviews the dirty data and identifies both the incorrect name and the corrected name
Application checks if the corrected name already exists
If no, create a new row in dbo.CorrectedNames
Create a new row in dbo.UserEnteredData, with the CorrectedNameId

I'm assuming that things are rather more complicated in reality and corrections are made based on addresses and other data as well as just names, but the basic relationship you describe seems simple enough. As you said, the functions add a lot of overhead and it's not clear (to me) what benefit they provide over just storing the data you need directly.

Finally, I don't understand your comment about creating/destroying linking records; it's up to your application logic to handle data changes correctly.

继续阅读：sql sql-server sql-server-2008

Masking or hiding inaccurately entered data in SQL Server 2008

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？