I have a list of objects and I want to filter the list by removing some objects that have ids
present in a separate list i.e.
List<Vehicle> vehicles; //All Vehicles List (260 vehicles data)
List<int> vehiclesIds; //Ids List to remove from Original List (Length < 260)
Now for removing if i’m using RemoveAll()
it takes areound 25 to 30 seconds
.
vehicles.RemoveAll(e => vehiclesIds.Contains(e.Id));
But if I’m using for loop
as the following it takes only 8 to 20 milliseconds
.
foreach (var vehiclesId in vehiclesIds)
{
vehicles.RemoveAll(e=> vehiclesId == e.Id);
}
I multiple times run this code and analyse this after using StopWatch
that List.Contains()
is the reason for slow processing.
2
Answers
The logic between these two versions is different.
The first version loops the vehicles first, looking for matches in the IDs.
Contains
uses the default comparer, which might be a slow way of comparing because it uses interfaces.You can improve this version by using a
HashSet<int>
to find matching IDs. This should be faster (even though it also usesContains
) as it uses a hashtable internally.The second version instead loops the smaller list of IDs first. You can improve this version by using a
Dictionary
for thevehicles
i would imagine this last version to be the fastest.
TLDR: The answer to your question "Why List.Contains() is too slow?" is: It isn’t. There must be something wrong elsewhere that you haven’t shown us.
Longer answer:
The first version should be faster because it’s only removing elements from the list once rather than multiple times. (The fact that it isn’t faster points to some problem elsewhere.)
However, the first version does call
vehiclesIds.Contains(e.Id)
once for each vehicle, which is an O(N) operation.You could improve that by creating a
HashSet<int>
from the vehicle IDs and then use that for the lookup.I tried the three approaches (your two plus the
HashSet
one) in a benchmark:The results I obtained on my PC were as follows:
As you can see, using your first version is significantly faster than your second version (as I expected), but using a
HashSet
is even faster.The conclusion that we must draw is that the slowdown you’re seeing is due to code that you haven’t shown us.