how to get distinct records in datatable?
I am using C# + VS2008 + .Net + ASP.Net + IIS 7.0 + ADO.Net + SQL Server 2开发者_StackOverflow中文版008. I have a ADO.Net datatable object, and I want to filter out duplicate/similar records (in my specific rule to judge whether records are duplicate/similar -- if record/row has the same value for a string column, I will treat them as duplicate/similar records), and only keep one of such duplicate/similar records.
The output needs to be a datatable, may output the same datatable object if filter operation could be operated on the same datatable object.
What is the most efficient solution?
Are you using .NET 3.5? If you cast your data rows, you can use LINQ to Objects:
var distinctRows = table.Rows.Cast<DataRow>().Distinct(new E());
...
public class E : IEqualityComparer<DataRow>
{
bool IEqualityComparer<DataRow>.Equals(DataRow x, DataRow y)
{
return x["colA"] == y["colA"];
}
int IEqualityComparer<DataRow>.GetHashCode(DataRow obj)
{
return obj["colA"].GetHashCode();
}
}
Or an even simpler way, since you're basing it on a single column's values:
var distinct = from r in table.Rows.Cast<DataRow>()
group r by (string)r["colA"] into g
select g.First();
If you need to make a new DataTable out of these distinct rows, you can do this:
var t2 = new DataTable();
t2.Columns.AddRange(table.Columns.Cast<DataColumn>().ToArray());
foreach(var r in distinct)
{
t2.Rows.Add(r);
}
Or if it would be more handy to work with business objects, you can do an easy conversion:
var persons = (from r in distinct
select new PersonInfo
{
EmpId = (string)r["colA"],
FirstName = (string)r["colB"],
LastName = (string)r["colC"],
}).ToList();
...
public class PersonInfo
{
public string EmpId {get;set;}
public string FirstName {get;set;}
public string LastName {get;set;}
}
Update
Everything you can do in LINQ to Objects can also be done without it: it just takes more code. For example:
var table = new DataTable();
var rowSet = new HashSet<DataRow>(new E());
var newTable = new DataTable();
foreach(DataColumn column in table.Columns)
{
newTable.Columns.Add(column);
}
foreach(DataRow row in table.Rows)
{
if(!rowSet.Contains(row))
{
rowSet.Add(row);
newTable.Rows.Add(row);
}
}
You could also use a similar strategy to simply remove duplicate rows from the original table instead of creating a new table.
You can do a select into
with a group by
clause, so not duplicates are created. Then drop the old table and rename the table into which you selected to the original table name.
I would do this in the database layer:
SELECT Distinct...
FROM MyTable
Or if you need aggregates:
SELECT SUM(Field1), ID FROM MyTable
GROUP BY ID
Put the SELECT statement in a stored procedure. Then in .net make a connection to the database, call the stored procedure, execute .ExecuteNonQuery(). Return the rows in a datatable and return the datatable back to your UI.
精彩评论