I have a below linq query and getting data like below example want to remove duplications
List<EmployeeSalary> lstEmployeeSalary =
new EmployeeSalaryFactory().GetRelatedObjects(inValue, ddlPayDate, payRollType, payrollSearch)
.Select(m => (EmployeeSalary)m)
.ToList();
For ex.:
Id Name EmpCode Salary DateOfSalary
-------------------------------------------------------------
1 Item1 IT00001 $100 5/26/2021
2 Item2 IT00002 $200 4/26/2021
3 Item3 IT00003 $150 5/26/2021
1 Item1 IT00001 $100 4/26/2021
3 Item3 IT00003 $150 4/26/2021
Output
Id Name EmpCode Salary DateOfSalary
-------------------------------------------------------------
1 Item1 IT00001 $100 5/26/2021
2 Item2 IT00002 $200 4/26/2021
3 Item3 IT00003 $150 5/26/2021
2
Answers
If suppose that
new EmployeeSalaryFactory().GetRelatedObjects(...)
returns list ofEmployeeSalary
objects:Test
OUTPUT:
First of all, don’t do the
ToList()
inside your procedures, unless you will be using the fact that the result is aList<EmployeeSalary>
.If you only intend to return the fetched data to your caller, consider to return
IEnumerable<EmployeeSalary>
and let the caller do the ToList.The reason for this, is that if you caller doesn’t want to use all fetched data, it would be a waste of processing power to materialize it all:
Suppose you have the following methods to get the EmployeeSalaries:
It might be that inValue, ddlPayDate, etc are parameters of this method, but that’s outside the question.
Now let’s use this method:
If GetEmployeeSalaries would have returned a
List<EmployeeSalary>
then all salaries would have been materialized, while the caller might only needed a few.Back to your question
The answer depends on what you would call a duplicate: When are two EmployeeSalaries equal? Is that if all properties have equal value, or are two salaries equal if they have the same Id (but possibly different Salary).
I assume the first: all values should be checked for equality
The quick solution
If you only need to do this for this usage only, if you don’t need to massively unit test it, don’t need to prepare for future changes, don’t want to be able to reuse the code for similar problems, consider to use Queryable.Distinct before your Select.
The result of
Of course, if the data is in your local process (not in a database), you can use the IEnumerable equivalent.
Before the Distinct, the selected objects are of anonymous type. They have a default equality comparer that compares by value, not by reference. So two objects of this anonymous type that have equal value for every property are considered to be equal. Distinct will remove duplicates.
If you really need that the result is
IEnumerable<EmployeeSalary>
, you’ll need a second select:Proper solution
If the input data is in your local process (= it is IEnumerable), you have more LINQ methods at your disposal, like the overload of Enumerable.Distinct that has a parameter EqualityComparer.
In that case, my advice would be to create an Equality comparer for EmployeeSalaries. This will have the advantage that you can reuse the equality comparer for other EmployeeSalary problems. The code will look easier to read. You are prepared for future changes: if you add or remove a property from your definition of equality, for instance if you only need to check the Id, there is only one place that you have to change. You can unit test the comparer: didn’t you forget some properties?
To get the unique salaries:
Did you notice, that because I reuse a lot of code, the specific problem of unique salaries is quite easy to understand.
I cheated a little, I moved the problem to the equality comparer.
IEquality
Creating a reusable equality comparer is fairly straightforward. The advantage is that you can reuse it in all cases where you need to compare EmployeeSalaries. If in future your definition of equality changes, there is only one place that you need to change. Finally: only one place where you need to unit test whether you implemented the proper definition of equality.
Usage would be:
Implement equality
Almost all equality comparers start with the following lines:
After this, the real comparing for equality starts. The implementation depends on what you call equality. You might say: same Id is equal EmployeeSalary. Our aproach is to check all fields, for instance to see if we need to update the database, because some values are changed:
Are in your definition the names: "John Doe" and "john doe" equal? And when are EmpCodes equal?
If you think they are not default, or might change in future, consider to add properties to the EmployeeSalaryComparer:
The check for equality will end like:
If company policy about names in future changes, then all you have to do is select a different name comparer. And if EmpCode "Boss" is the same as EmpCode "boss": only one place to change the code.
Of course, after spec changes you need to change your unit tests, so they will tell you automatically where you forgot to change the proper equality comparers.
GetHashCode
GetHashCode is used to quickly check for inequality. Keywords: quickly, and inequality. If two Hash codes are different, we know that the object are not equal. It is not the other way round: if two hash codes are equal, we don’t know whether the objects are equal.
The hash code is meant to quickly throw away most unequal objects. For instance, in a Distinct method, it would be nice if you could quickly throw away 99% of the objects, so you only have to thoroughly check 1% of the objects for equality.
With EmployeeSalaries we know that if the Id is different, than the Salaries are not equal. It will seldom be that two EmployeeSalaries will have the same Id, but different EmpCode. So by checking the Id only, we throw away most unequal EmployeeSalaries.
How about this:
Conclusion