Detecting duplicate values in a column of a Datatable while traversing through It
I have a Datatable with Id(guid) and Name(string) columns. I traverse through the data table and run a validation criteria on the Name (say, It should contain only letters and numbers) and then adding the corresponding Id to a List If name passes the validation.
Something like below:-
List<Guid> validIds=new List<Guid>();
fo开发者_如何学JAVAreach(DataRow row in DataTable1.Rows)
{
if(IsValid(row["Name"])
{
validIds.Add((Guid)row["Id"]);
}
}
In addition to this validation I should also check If the name is not repeating in the whole datatable (even for the case-sensitiveness), If It is repeating, I should not add the corresponding Id in the List.
Things I am thinking/have thought about:-
1) I can have another List, check for the "Name" in the same, If It exists, will add the corresponding Guild 2) I cannot use HashSet as that would treat "Test" and "test" as different strings and not duplicates. 3) Take the DataTable to another one where I have the disctict names (this I havent tried and the code might be incorrect, please correct me whereever possible)
DataTable dataTableWithDistinctName = new DataTable();
dataTableWithDistinctName.CaseSensitive=true
CopiedDataTable=DataTable1.DefaultView.ToTable(true,"Name");
I would loop through the original datatable and check the existence of the "Name" in the CopiedDataTable, If It exists, I wont add the Id to the List.
Are there any better and optimum way to achieve the same? I need to always think of performance. Although there are many related questions in SO, I didnt find a problem similar to this. If you could point me to a question similar to this, It would be helpful.
EDIT :- The number of records might vary from 2000-3000.
Thanks
if you are looking to prevent duplicates, it may be grueling work, and I don't know how many records your dealing with at at atime... If a small set, I'd consider doing a query before each attempted insert from your LIVE source based on
select COUNT(*) as CountOnFile from ProductionTable where UPPER(name) = UPPER(name from live data).
If the result set CountOnFile > 0, don't add.
If you are dealing with a large dataset, like a bulk import, I would pull all the data into a temp table, then do a query where NOT IN... something like
create table OkToBeAdded as
select distinct upper( TempTable.Name ) as Name, GUID
from TempTable
where upper( TempTable.Name )
NOT IN ( select upper( LiveTable.Name )
from LiveTable
where upper( TempTable.Name ) = upper( LiveTable.Name )
);
insert into LiveTable ( Name, GUID )
select Name, GUID from OkToBeAdded;
Obviously, the SQL is sample and would need to be adjusted based on your specific back-end source
/* I did this entirely in SQL and avoided ADO.NET*/
/*I Pass the CSV of valid object Ids and split that in a table*/
DECLARE @TableTemp TABLE
(
TempId uniqueidentifier
)
INSERT INTO @TableTemp
SELECT cast(Data AS uniqueidentifier )AS ID FROM dbo.Split1(@ValidObjectIdsAsCSV,',')
/*Self join with table1 for any duplicate rows and update the column value*/
UPDATE Table1
SET IsValidated=1
FROM Table1 AS A INNER JOIN @TableTemp AS Temp
ON A.ID=Temp.TempId
WHERE NOT EXISTS (SELECT Name,Count(Name) FROM Table1
WHERE A.Name=B.Name
GROUP BY Name HAVING Count(Name)>1)
精彩评论