I have a table named "Applications" where I keep track of the applications made by my candidates to vacancies. Now, I need to write a statistical query for HRs, and when an HR logs in, I should bring the candidates who applied to the vacancies created by that HR. Since there are filters concurrently in my query, I had to include other tables related to candidates as well. However, my query is slow and consumes an incredible amount of RAM. My 16 GB RAM Linux server gets filled up after 1-2 hours of running the project and crashes it. When I checked the server, I saw that this query alone was constantly eating up RAM. How do you think I can optimize this, or what might be wrong with the query? Additionally, there are +400,000 applications and +100,000 candidates
var entities = _context.Applications
.Include(p => p.Candidate)
.ThenInclude(p => p.CandidateIndustryFields)
.ThenInclude(p => p.DesiredField)
.Include(p => p.Vacancy)
.Include(p => p.Candidate)
.ThenInclude(p => p.CandidateIndustrySubFields)
.ThenInclude(p => p.SubField)
.Include(p => p.Candidate)
.ThenInclude(x => x.CandidateSkills)
.ThenInclude(x => x.Skill)
.Include(p => p.Candidate)
.ThenInclude(x => x.CandidateBackgrounds)
.ThenInclude(x => x.WorkExperienceType)
.Where(m => (m.Vacancy.RecruiterId == userId) && (m.StatusId == searchParameter.StatusId || searchParameter.StatusId == null)
&& (string.IsNullOrEmpty(searchParameter.Name) || m.Candidate.Name.Contains(searchParameter.Name))
&& (string.IsNullOrEmpty(searchParameter.Surname) || m.Candidate.Surname.Contains(searchParameter.Surname))
&& (string.IsNullOrEmpty(searchParameter.PassportFin) || m.Candidate.CandidatePassports.Any(p => p.PassportFin == searchParameter.PassportFin.ToUpper()))
&& (string.IsNullOrEmpty(searchParameter.Patronymic) || m.Candidate.FatherName.Contains(searchParameter.Patronymic))
&& (searchParameter.Gender == null || m.Candidate.GenderId == searchParameter.Gender)
&& (searchParameter.MaritalStatus == null || m.Candidate.MaritalStatusId == searchParameter.MaritalStatus)
&& (searchParameter.MilitaryStatus == null || m.Candidate.MillitaryStatusId == searchParameter.MilitaryStatus)
&& (searchParameter.HasPhoto == null || (searchParameter.HasPhoto == 0 ? m.Candidate.CandidateImage == null : m.Candidate.CandidateImage != null))
&& (searchParameter.MinBirthDate == null || m.Candidate.BirthDate >= searchParameter.MinBirthDate)
&& (searchParameter.MaxBirthDate == null || m.Candidate.BirthDate <= searchParameter.MaxBirthDate)
&& (string.IsNullOrEmpty(searchParameter.Address) || m.Candidate.CandidateContactInfo.ActualAddress.Contains(searchParameter.Address) || m.Candidate.CandidateContactInfo.RegisteredAddress.Contains(searchParameter.Address))
&& (string.IsNullOrEmpty(searchParameter.Email) || m.Candidate.User.Email.Contains(searchParameter.Email))
&& (regions.Count == 0 || regions.Contains(m.Candidate.CandidateContactInfo.RegionId.Value))
&& (districts.Count == 0 || districts.Contains(m.Candidate.CandidateContactInfo.DistrictId.Value))
&& (educationDegrees.Count == 0 || m.Candidate.CandidateEducations.Any(k => educationDegrees.Contains(k.DegreeId)) || m.Candidate.CandidateCustomEducations.Any(l => educationDegrees.Contains(l.DegreeId)))
&& (searchParameter.EntranceScoreMin == null || m.Candidate.CandidateEducations.Any(k => k.EntrancePoint >= searchParameter.EntranceScoreMin) || m.Candidate.CandidateCustomEducations.Any(l => l.EntrancePoint >= searchParameter.EntranceScoreMin))
&& (searchParameter.EntranceScoreMax == null || m.Candidate.CandidateEducations.Any(k => k.EntrancePoint <= searchParameter.EntranceScoreMax) || m.Candidate.CandidateCustomEducations.Any(l => l.EntrancePoint <= searchParameter.EntranceScoreMax))
&& (string.IsNullOrEmpty(searchParameter.Profession) || m.Candidate.CandidateCustomEducations.Any(p => p.Profession.Contains(searchParameter.Profession)) || m.Candidate.CandidateEducations.Any(p => p.Profession.Contains(searchParameter.Profession)))
&& (string.IsNullOrEmpty(searchParameter.Position) || m.Candidate.CandidateBackgrounds.Any(k => k.Position.Contains(searchParameter.Position)))
&& (string.IsNullOrEmpty(searchParameter.CompanyName) || m.Candidate.CandidateBackgrounds.Any(k => k.CompanyName.Contains(searchParameter.CompanyName)))
&& (computerSkills.Count == 0 || m.Candidate.ComputerSkills.Any(l => computerSkills.Contains(l.SkillId)))
//&& (languageSkills.Count == 0 || m.Candidate.CandidateLanguages.Any(l => languageSkills.Any(c => c.LanguageId == l.LanguageId && c.Writing >= l.Writing && c.Reading >= l.Reading && c.Speaking >= l.Speaking)))
&& (string.IsNullOrEmpty(searchParameter.Phone) || m.Candidate.CandidateMobilPhones.Any(l => l.PhoneNumber.Contains(searchParameter.Phone)))
&& (string.IsNullOrEmpty(searchParameter.CertificateName) || m.Candidate.CandidateCertificates.Any(n => n.Name.Contains(searchParameter.CertificateName)))
&& (string.IsNullOrEmpty(searchParameter.Organization) || m.Candidate.CandidateCertificates.Any(n => n.IssuedBy.Contains(searchParameter.Organization)))
&& (fields.Count == 0 || m.Candidate.CandidateIndustryFields.Any(l => fields.Contains(l.DesiredFieldId)))
&& (skills.Count == 0 || m.Candidate.CandidateSkills.Any(l => skills.Contains(l.SkillId)))
&& (workExperiences.Count == 0 || m.Candidate.CandidateBackgrounds.Any(l => workExperiences.Contains(l.WorkExperienceId ?? -1)))
&& (searchParameter.HasExperience == null || (!searchParameter.HasExperience.Value ? m.Candidate.CandidateBackgrounds.Count == 0 : m.Candidate.CandidateBackgrounds.Count != 0))
&& (searchParameter.GeneralSearch == null || m.Candidate.Name.Contains(searchParameter.GeneralSearch)
|| m.Candidate.Surname.Contains(searchParameter.GeneralSearch)
|| m.Candidate.FatherName.Contains(searchParameter.GeneralSearch)
|| m.Candidate.CandidateContactInfo.ActualAddress.Contains(searchParameter.GeneralSearch)
|| m.Candidate.CandidateContactInfo.RegisteredAddress.Contains(searchParameter.GeneralSearch)
|| m.Candidate.User.Email.Contains(searchParameter.GeneralSearch)
|| m.Candidate.CandidateCustomEducations.Any(p => p.InstitutionName.Contains(searchParameter.GeneralSearch))
|| m.Candidate.CandidateCustomEducations.Any(p => p.Profession.Contains(searchParameter.GeneralSearch))
|| m.Candidate.CandidateEducations.Any(p => p.Profession.Contains(searchParameter.GeneralSearch))
|| m.Candidate.CandidateBackgrounds.Any(p => p.Position.Contains(searchParameter.GeneralSearch))
|| m.Candidate.CandidateBackgrounds.Any(p => p.CompanyName.Contains(searchParameter.GeneralSearch))
|| m.Candidate.CandidateMobilPhones.Any(p => p.PhoneNumber.Contains(searchParameter.GeneralSearch))
|| m.Candidate.CandidateCertificates.Any(p => p.Name.Contains(searchParameter.GeneralSearch))
|| m.Candidate.CandidateCertificates.Any(p => p.IssuedBy.Contains(searchParameter.GeneralSearch))));
2
Answers
There is a lot you can do. You will need to work incrementally, and expect to do things that make things worse at times. DB query execution engines are complex and their behaviour can be unexpected (eg. I made a small change to a query once and it was ~100 times faster!). You have lots to learn.
All of these are likely going to be part of the solution.
First: use EF’s logging capabilities to capture the SQL and parameters. Then execute manually to (1) see how long the DB side of things takes, & (2) use
EXPLAIN
to see how the DB is executing the query: take the time to understand the results.You’ll need this to identify indexes to add (and evaluate changes to the SQL).
Second: if you do not need to use change tracking to allow setting properties and then saving changes: avoid
Include
,ThenInclude
: instead usejoin
and project the results (select
clause). This should also allow you to avoid getting all the data from all the tables: just get the data you need (potentially saving a huge amount of memory). Remember you can put query expressions into the projection when you want a collection.This seems likely if this is for an enquiry page.
Also, use
AsNoTracking()
! (See here.)Third: Replace
With
To build only the conditions that will change the result.
This will save work for the DB that does not need to be done on the DB, and will allow these clauses to be sargable.
(For the set of or conditions you can also look at manually using
Expression
types to build the expression tree as you cannot just layer one where on another.)Finally: LINQ to SQL is great…. until queries start getting seriously complex, at which point you will be better off writing your own SQL to give full control (and also access DB query features not exposed in LINQ).
Do not suffer from the sunk cost fallacy: be prepared to throw away workthat only lead to a dead end.
Richard covered most of the important bases. From a design perspective rather than giving users a carte blanche search capability, reigning search expectations down to reasonable criteria can go a long way. For instance if 95% of your searches will be against a few common, indexable columns then limit the default search to those fields, then have an "advanced" search where users can enter more specific search criteria. (like address searches, etc.) When you default to a powerful, flexible search you make all searches inefficient. (I.e. if that "General Search" option is used frequently)
Similarly, Be careful when allowing text searches and avoid defaulting to
Contains
. Instead default toStartsWith
and consider giving users a wildcard support or an option if they explicitly want to do aContains
search. For instance if they type "Fred" for a name searchx.Candidate.Name == searchParameter.Name
is the most efficient since it can use an existing index, but it is the most restrictive.x.Candidate.Name.StartsWith(searchParameter.Name)
will be the next most practical, typical search option for relevant matches.x.Candidate.Name.Contains(searchParameter.Name)
is the least efficient, so we should avoid defaulting to it for all searches. Also ensure that text search arguments have validation rules to ignore values that are overly broad and could cripple searching by returning far too much data. For instance users should not be able to search for names that contain "e".Next, for search results, use projection (I.e.
Select
) to retrieve just a summary of the information you want to display. This means not usingInclude
for related data. It isn’t needed to query against related tables, or when usingSelect
to select values/counts/etc. from related tables. You useSelect
to return a view model with just the details you want to display in your results list plus the PKs of each row, rather than loading entire entity graphs. Then, when you want to open a detailed view, or perform an action against a selected result you can fetch that single row withInclude
s as needed.It isn’t clear from your example but you should also be implementing server-side pagination (
OrderBy()
+Skip()
+Take()
) on searches that have the potential to return a large result set. Users don’t need to scroll through tens of thousands of results at a time. The point of a search is to limit results to a reasonable number. If there are too many result pages then they should refine their search.For instance if I want to display a list of found applications and I want to display the application #, date, candidate name, and their Email address:
Using projection, pagination, and simplifying/optimizing your search criteria to cater to the most common search scenarios, plus ensuring expensive useless searches are avoided can improve performance to sub-second or seconds instead of minutes/hours.