Mark non-unique rows in a DataTable
I have a DataTable which I want to check if values in three of the columns are unique. If not, the last column should be filled with the line number of the first appearance of the value-combination.
For example, this table:
ID Name LastName Age Flag
-------------------------------------
1 Bart Simpson 10 -
2 Lisa Simpson 8 -
3 Bart Simpson 10 -
4 Ned Flanders 40 -
5 Bart Simpson 10 -
Should lead to this result:
Line Name LastName Age Flag
-------------------------------------
1 Bart Simpson 10 -
2 Lisa Simpson 8 -
3 Bart Simpson 10 1
4 Ned Flanders 40 -
5 Bart Simpso开发者_StackOverflown 10 1
I solved this by iterating the DataTable with two nested for
loops and comparing the values. While this works fine for a small amount of data, it gets pretty slow when the DataTable contains a lot of rows.
My question is: What is the best/fastest solution for this problem, regarding that the amount of data can vary between let's say 100 and 20000 rows?
Is there a way to do this using LINQ? (I'm not too familiar with it, but I want to learn!)I can't comment on how you might do this in C#/VB with a data table, but if you could move it all to SQL, your query would look like:
declare @t table (ID int, Name varchar(10), LastName varchar(10), Age int)
insert into @t values (1, 'Bart' , 'Simpson', 10 )
insert into @t values (2, 'Lisa', 'Simpson' , 8 )
insert into @t values (3, 'Bart', 'Simpson' , 10 )
insert into @t values (4, 'Ned', 'Flanders' , 40 )
insert into @t values (5 , 'Bart', 'Simpson' , 10 )
select t.*,
(select min(ID) as ID
from @t t2
where t2.Name = t.Name
and t2.LastName = t.LastName
and t2.id < t.id)
from @t t
Here I've defined a table for demo purposes. I suppose you might be able to translate this into LINQ.
Okay, I think I got an answer myself. Based on the suggestion in James Wiseman's answer, I tried something with LINQ.
Dim myErrnrFnct = Function( current, first) If(first <> current, first, 0)
Dim myQuery = From row As DataRow In myDt.AsEnumerable _
Select New With { _
.LINE = row.Item("LINE"), _
.NAME = row.Item("NAME"), _
.LASTNAME = row.Item("LASTNAME"), _
.AGE = row.Item("AGE"), _
.FLAG = myErrnrFnct(row.Item("LINE"), myDt.AsEnumerable.First(Function(rowToCheck) _
rowToCheck.Item("NAME") = row.Item("NAME") AndAlso _
rowToCheck.Item("LASTNAME") = row.Item("LASTNAME") AndAlso _
rowToCheck.Item("AGE") = row.Item("AGE")).Item("LINE")) _
}
With this query I get exactly the result that's described in the Question. The myErrnrFnct
Function is necessary because I want the Flag
column to have the value 0
if there is no other row with the same values.
To get a DataTable out of myQuery
again, I had to add some extensions described here:
How to: Implement CopyToDataTable Where the Generic Type T Is Not a DataRow
And then, this line will do:
Dim myNewDt As DataTable = myQuery.CopyToDataTable()
This seems to work just fine. Any suggestions to do this better?
精彩评论