开发者

Detecting duplicate values in a column of a Datatable while traversing through It

I have a Datatable with Id(guid) and Name(string) columns. I traverse through the data table and run a validation criteria on the Name (say, It should contain only letters and numbers) and then adding the corresponding Id to a List If name passes the validation.

Something like below:-

List<Guid> validIds=new List<Guid>();
fo开发者_如何学JAVAreach(DataRow row in DataTable1.Rows)
{
      if(IsValid(row["Name"])
        {
            validIds.Add((Guid)row["Id"]);
        }

}

In addition to this validation I should also check If the name is not repeating in the whole datatable (even for the case-sensitiveness), If It is repeating, I should not add the corresponding Id in the List.

Things I am thinking/have thought about:-

1) I can have another List, check for the "Name" in the same, If It exists, will add the corresponding Guild 2) I cannot use HashSet as that would treat "Test" and "test" as different strings and not duplicates. 3) Take the DataTable to another one where I have the disctict names (this I havent tried and the code might be incorrect, please correct me whereever possible)

DataTable dataTableWithDistinctName = new DataTable();
dataTableWithDistinctName.CaseSensitive=true
CopiedDataTable=DataTable1.DefaultView.ToTable(true,"Name");

I would loop through the original datatable and check the existence of the "Name" in the CopiedDataTable, If It exists, I wont add the Id to the List.

Are there any better and optimum way to achieve the same? I need to always think of performance. Although there are many related questions in SO, I didnt find a problem similar to this. If you could point me to a question similar to this, It would be helpful.

EDIT :- The number of records might vary from 2000-3000.

Thanks


if you are looking to prevent duplicates, it may be grueling work, and I don't know how many records your dealing with at at atime... If a small set, I'd consider doing a query before each attempted insert from your LIVE source based on

select COUNT(*) as CountOnFile from ProductionTable where UPPER(name) = UPPER(name from live data).  

If the result set CountOnFile > 0, don't add.

If you are dealing with a large dataset, like a bulk import, I would pull all the data into a temp table, then do a query where NOT IN... something like

create table OkToBeAdded as 
select distinct upper( TempTable.Name ) as Name, GUID
  from TempTable
  where upper( TempTable.Name ) 
      NOT IN ( select upper( LiveTable.Name )
                 from LiveTable
                 where upper( TempTable.Name ) = upper( LiveTable.Name )
             );

insert into LiveTable ( Name, GUID )
  select Name, GUID from OkToBeAdded;

Obviously, the SQL is sample and would need to be adjusted based on your specific back-end source


    /* I did this entirely in SQL and avoided ADO.NET*/

    /*I Pass the CSV of valid object Ids and split that in a table*/

DECLARE @TableTemp TABLE        
(        
    TempId uniqueidentifier      
)        
INSERT INTO @TableTemp 
SELECT cast(Data AS uniqueidentifier )AS ID FROM dbo.Split1(@ValidObjectIdsAsCSV,',')    


/*Self join with table1 for any duplicate rows and update the column value*/    
UPDATE Table1 
SET IsValidated=1
FROM Table1 AS A INNER JOIN @TableTemp AS Temp
ON A.ID=Temp.TempId 
WHERE NOT EXISTS (SELECT Name,Count(Name) FROM Table1 
WHERE A.Name=B.Name
GROUP BY Name HAVING Count(Name)>1)
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜