Using LINQ to Objects to find items in one collection that do not match another
I want to find all items in one collection that do not match another collection. The collections are not of the same type, though; I want to write a lambda expression to specify equality.
A LINQPad example of what I'm trying to do:
void Main()
{
var employees = new[]
{
new Employee { Id = 20, Name = "Bob" },
new Employee { Id = 10, Name = "Bill" },
new Employee { Id = 30, Name = "Frank" }
};
var managers = new[]
{
new Manager { EmployeeId = 20 },
new Manager { EmployeeId = 30 }
};
var nonManagers =
from employee in employees
where !(managers.Any(x => x.EmployeeId == employee.Id))
select employee;
nonManagers.Dump();
// Based on cdonner's answer:
var nonManagers2 =
from employee in employees
join manager in managers
on employee.Id equals manager.EmployeeId
into tempManagers
from manager in tempManagers.DefaultIfEmpty()
where manager == null
select employee;
nonManagers2.Dump();
// Based on Richard Hein's answer:
var nonManagers3 =
employees.Except(
from employee in employees
join manager in managers
on employee.Id equals manager.EmployeeId
select employee);
nonManagers3.Dump();
}
public class Employee
{
public int Id { get; set; }
public str开发者_JAVA技巧ing Name { get; set; }
}
public class Manager
{
public int EmployeeId { get; set; }
}
The above works, and will return Employee Bill (#10). It does not seem elegant, though, and it may be inefficient with larger collections. In SQL I'd probably do a LEFT JOIN and find items where the second ID was NULL. What's the best practice for doing this in LINQ?
EDIT: Updated to prevent solutions that depend on the Id equaling the index.
EDIT: Added cdonner's solution - anybody have anything simpler?
EDIT: Added a variant on Richard Hein's answer, my current favorite. Thanks to everyone for some excellent answers!
This is almost the same as some other examples but less code:
employees.Except(employees.Join(managers, e => e.Id, m => m.EmployeeId, (e, m) => e));
It's not any simpler than employees.Where(e => !managers.Any(m => m.EmployeeId == e.Id)) or your original syntax, however.
/// <summary>
/// This method returns items in a set that are not in
/// another set of a different type
/// </summary>
/// <typeparam name="T"></typeparam>
/// <typeparam name="TOther"></typeparam>
/// <typeparam name="TKey"></typeparam>
/// <param name="items"></param>
/// <param name="other"></param>
/// <param name="getItemKey"></param>
/// <param name="getOtherKey"></param>
/// <returns></returns>
public static IEnumerable<T> Except<T, TOther, TKey>(
this IEnumerable<T> items,
IEnumerable<TOther> other,
Func<T, TKey> getItemKey,
Func<TOther, TKey> getOtherKey)
{
return from item in items
join otherItem in other on getItemKey(item)
equals getOtherKey(otherItem) into tempItems
from temp in tempItems.DefaultIfEmpty()
where ReferenceEquals(null, temp) || temp.Equals(default(TOther))
select item;
}
I don't remember where I found this method.
var nonManagers = ( from e1 in employees select e1 ).Except( from m in managers from e2 in employees where m.EmployeeId == e2.Id select e2 );
It's a bit late (I know).
I was looking at the same problem, and was considering a HashSet because of various performance hints in that direction inc. @Skeet's Intersection of multiple lists with IEnumerable.Intersect() - and asked around my office and the consensus was that a HashSet would be faster and more readable:
HashSet<int> managerIds = new HashSet<int>(managers.Select(x => x.EmployeeId));
nonManagers4 = employees.Where(x => !managerIds.Contains(x.Id)).ToList();
Then I was offered an even faster solution using native arrays to create a bit mask-ish type solution (the syntax on the native array queries would put me off using them except for extreme performance reasons though).
To give this answer a little credence after an awful long time I've extended your linqpad program and data with timings so you can compare what are now six options :
void Main()
{
var employees = new[]
{
new Employee { Id = 20, Name = "Bob" },
new Employee { Id = 10, Name = "Kirk NM" },
new Employee { Id = 48, Name = "Rick NM" },
new Employee { Id = 42, Name = "Dick" },
new Employee { Id = 43, Name = "Harry" },
new Employee { Id = 44, Name = "Joe" },
new Employee { Id = 45, Name = "Steve NM" },
new Employee { Id = 46, Name = "Jim NM" },
new Employee { Id = 30, Name = "Frank"},
new Employee { Id = 47, Name = "Dave NM" },
new Employee { Id = 49, Name = "Alex NM" },
new Employee { Id = 50, Name = "Phil NM" },
new Employee { Id = 51, Name = "Ed NM" },
new Employee { Id = 52, Name = "Ollie NM" },
new Employee { Id = 41, Name = "Bill" },
new Employee { Id = 53, Name = "John NM" },
new Employee { Id = 54, Name = "Simon NM" }
};
var managers = new[]
{
new Manager { EmployeeId = 20 },
new Manager { EmployeeId = 30 },
new Manager { EmployeeId = 41 },
new Manager { EmployeeId = 42 },
new Manager { EmployeeId = 43 },
new Manager { EmployeeId = 44 }
};
System.Diagnostics.Stopwatch watch1 = new System.Diagnostics.Stopwatch();
int max = 1000000;
watch1.Start();
List<Employee> nonManagers1 = new List<Employee>();
foreach (var item in Enumerable.Range(1,max))
{
nonManagers1 = (from employee in employees where !(managers.Any(x => x.EmployeeId == employee.Id)) select employee).ToList();
}
nonManagers1.Dump();
watch1.Stop();
Console.WriteLine("Any: " + watch1.ElapsedMilliseconds);
watch1.Restart();
List<Employee> nonManagers2 = new List<Employee>();
foreach (var item in Enumerable.Range(1,max))
{
nonManagers2 =
(from employee in employees
join manager in managers
on employee.Id equals manager.EmployeeId
into tempManagers
from manager in tempManagers.DefaultIfEmpty()
where manager == null
select employee).ToList();
}
nonManagers2.Dump();
watch1.Stop();
Console.WriteLine("temp table: " + watch1.ElapsedMilliseconds);
watch1.Restart();
List<Employee> nonManagers3 = new List<Employee>();
foreach (var item in Enumerable.Range(1,max))
{
nonManagers3 = employees.Except(employees.Join(managers, e => e.Id, m => m.EmployeeId, (e, m) => e)).ToList();
}
nonManagers3.Dump();
watch1.Stop();
Console.WriteLine("Except: " + watch1.ElapsedMilliseconds);
watch1.Restart();
List<Employee> nonManagers4 = new List<Employee>();
foreach (var item in Enumerable.Range(1,max))
{
HashSet<int> managerIds = new HashSet<int>(managers.Select(x => x.EmployeeId));
nonManagers4 = employees.Where(x => !managerIds.Contains(x.Id)).ToList();
}
nonManagers4.Dump();
watch1.Stop();
Console.WriteLine("HashSet: " + watch1.ElapsedMilliseconds);
watch1.Restart();
List<Employee> nonManagers5 = new List<Employee>();
foreach (var item in Enumerable.Range(1, max))
{
bool[] test = new bool[managers.Max(x => x.EmployeeId) + 1];
foreach (var manager in managers)
{
test[manager.EmployeeId] = true;
}
nonManagers5 = employees.Where(x => x.Id > test.Length - 1 || !test[x.Id]).ToList();
}
nonManagers5.Dump();
watch1.Stop();
Console.WriteLine("Native array call: " + watch1.ElapsedMilliseconds);
watch1.Restart();
List<Employee> nonManagers6 = new List<Employee>();
foreach (var item in Enumerable.Range(1, max))
{
bool[] test = new bool[managers.Max(x => x.EmployeeId) + 1];
foreach (var manager in managers)
{
test[manager.EmployeeId] = true;
}
nonManagers6 = employees.Where(x => x.Id > test.Length - 1 || !test[x.Id]).ToList();
}
nonManagers6.Dump();
watch1.Stop();
Console.WriteLine("Native array call 2: " + watch1.ElapsedMilliseconds);
}
public class Employee
{
public int Id { get; set; }
public string Name { get; set; }
}
public class Manager
{
public int EmployeeId { get; set; }
}
var nonmanagers = employees.Select(e => e.Id)
.Except(managers.Select(m => m.EmployeeId))
.Select(id => employees.Single(e => e.Id == id));
Have a look at the Except() LINQ function. It does exactly what you need.
Its better if you left join the item and filter with null condition
var finalcertificates = (from globCert in resultCertificate
join toExcludeCert in certificatesToExclude
on globCert.CertificateId equals toExcludeCert.CertificateId into certs
from toExcludeCert in certs.DefaultIfEmpty()
where toExcludeCert == null
select globCert).Union(currentCertificate).Distinct().OrderBy(cert => cert.CertificateName);
Managers are employees, too! So the Manager
class should subclass from the Employee
class (or, if you don't like that, then they should both subclass from a parent class, or make a NonManager
class).
Then your problem is as simple as implementing the IEquatable
interface on your Employee
superclass (for GetHashCode
simply return the EmployeeID
) and then using this code:
var nonManagerEmployees = employeeList.Except(managerList);
精彩评论