skip to Main Content

I have a table named "Applications" where I keep track of the applications made by my candidates to vacancies. Now, I need to write a statistical query for HRs, and when an HR logs in, I should bring the candidates who applied to the vacancies created by that HR. Since there are filters concurrently in my query, I had to include other tables related to candidates as well. However, my query is slow and consumes an incredible amount of RAM. My 16 GB RAM Linux server gets filled up after 1-2 hours of running the project and crashes it. When I checked the server, I saw that this query alone was constantly eating up RAM. How do you think I can optimize this, or what might be wrong with the query? Additionally, there are +400,000 applications and +100,000 candidates

var entities = _context.Applications
    .Include(p => p.Candidate)
    .ThenInclude(p => p.CandidateIndustryFields)
    .ThenInclude(p => p.DesiredField)
    .Include(p => p.Vacancy)
    .Include(p => p.Candidate)
    .ThenInclude(p => p.CandidateIndustrySubFields)
    .ThenInclude(p => p.SubField)
    .Include(p => p.Candidate)
    .ThenInclude(x => x.CandidateSkills)
    .ThenInclude(x => x.Skill)
    .Include(p => p.Candidate)
    .ThenInclude(x => x.CandidateBackgrounds)
    .ThenInclude(x => x.WorkExperienceType)
    .Where(m => (m.Vacancy.RecruiterId == userId) && (m.StatusId == searchParameter.StatusId || searchParameter.StatusId == null)
        && (string.IsNullOrEmpty(searchParameter.Name) || m.Candidate.Name.Contains(searchParameter.Name))
        && (string.IsNullOrEmpty(searchParameter.Surname) || m.Candidate.Surname.Contains(searchParameter.Surname))
        && (string.IsNullOrEmpty(searchParameter.PassportFin) || m.Candidate.CandidatePassports.Any(p => p.PassportFin == searchParameter.PassportFin.ToUpper()))
        && (string.IsNullOrEmpty(searchParameter.Patronymic) || m.Candidate.FatherName.Contains(searchParameter.Patronymic))
        && (searchParameter.Gender == null || m.Candidate.GenderId == searchParameter.Gender)
        && (searchParameter.MaritalStatus == null || m.Candidate.MaritalStatusId == searchParameter.MaritalStatus)
        && (searchParameter.MilitaryStatus == null || m.Candidate.MillitaryStatusId == searchParameter.MilitaryStatus)
        && (searchParameter.HasPhoto == null || (searchParameter.HasPhoto == 0 ? m.Candidate.CandidateImage == null : m.Candidate.CandidateImage != null))
        && (searchParameter.MinBirthDate == null || m.Candidate.BirthDate >= searchParameter.MinBirthDate)
        && (searchParameter.MaxBirthDate == null || m.Candidate.BirthDate <= searchParameter.MaxBirthDate)
        && (string.IsNullOrEmpty(searchParameter.Address) || m.Candidate.CandidateContactInfo.ActualAddress.Contains(searchParameter.Address) || m.Candidate.CandidateContactInfo.RegisteredAddress.Contains(searchParameter.Address))
        && (string.IsNullOrEmpty(searchParameter.Email) || m.Candidate.User.Email.Contains(searchParameter.Email))
        && (regions.Count == 0 || regions.Contains(m.Candidate.CandidateContactInfo.RegionId.Value))
        && (districts.Count == 0 || districts.Contains(m.Candidate.CandidateContactInfo.DistrictId.Value))
        && (educationDegrees.Count == 0 || m.Candidate.CandidateEducations.Any(k => educationDegrees.Contains(k.DegreeId)) || m.Candidate.CandidateCustomEducations.Any(l => educationDegrees.Contains(l.DegreeId)))
        && (searchParameter.EntranceScoreMin == null || m.Candidate.CandidateEducations.Any(k => k.EntrancePoint >= searchParameter.EntranceScoreMin) || m.Candidate.CandidateCustomEducations.Any(l => l.EntrancePoint >= searchParameter.EntranceScoreMin))
        && (searchParameter.EntranceScoreMax == null || m.Candidate.CandidateEducations.Any(k => k.EntrancePoint <= searchParameter.EntranceScoreMax) || m.Candidate.CandidateCustomEducations.Any(l => l.EntrancePoint <= searchParameter.EntranceScoreMax))
        && (string.IsNullOrEmpty(searchParameter.Profession) || m.Candidate.CandidateCustomEducations.Any(p => p.Profession.Contains(searchParameter.Profession)) || m.Candidate.CandidateEducations.Any(p => p.Profession.Contains(searchParameter.Profession)))
        && (string.IsNullOrEmpty(searchParameter.Position) || m.Candidate.CandidateBackgrounds.Any(k => k.Position.Contains(searchParameter.Position)))
        && (string.IsNullOrEmpty(searchParameter.CompanyName) || m.Candidate.CandidateBackgrounds.Any(k => k.CompanyName.Contains(searchParameter.CompanyName)))
        && (computerSkills.Count == 0 || m.Candidate.ComputerSkills.Any(l => computerSkills.Contains(l.SkillId)))
        //&& (languageSkills.Count == 0 || m.Candidate.CandidateLanguages.Any(l => languageSkills.Any(c => c.LanguageId == l.LanguageId && c.Writing >= l.Writing && c.Reading >= l.Reading && c.Speaking >= l.Speaking)))
        && (string.IsNullOrEmpty(searchParameter.Phone) || m.Candidate.CandidateMobilPhones.Any(l => l.PhoneNumber.Contains(searchParameter.Phone)))
        && (string.IsNullOrEmpty(searchParameter.CertificateName) || m.Candidate.CandidateCertificates.Any(n => n.Name.Contains(searchParameter.CertificateName)))
        && (string.IsNullOrEmpty(searchParameter.Organization) || m.Candidate.CandidateCertificates.Any(n => n.IssuedBy.Contains(searchParameter.Organization)))
        && (fields.Count == 0 || m.Candidate.CandidateIndustryFields.Any(l => fields.Contains(l.DesiredFieldId)))
        && (skills.Count == 0 || m.Candidate.CandidateSkills.Any(l => skills.Contains(l.SkillId)))
        && (workExperiences.Count == 0 || m.Candidate.CandidateBackgrounds.Any(l => workExperiences.Contains(l.WorkExperienceId ?? -1)))
        && (searchParameter.HasExperience == null || (!searchParameter.HasExperience.Value ? m.Candidate.CandidateBackgrounds.Count == 0 : m.Candidate.CandidateBackgrounds.Count != 0))
        && (searchParameter.GeneralSearch == null || m.Candidate.Name.Contains(searchParameter.GeneralSearch)
                || m.Candidate.Surname.Contains(searchParameter.GeneralSearch)
                || m.Candidate.FatherName.Contains(searchParameter.GeneralSearch)
                || m.Candidate.CandidateContactInfo.ActualAddress.Contains(searchParameter.GeneralSearch)
                || m.Candidate.CandidateContactInfo.RegisteredAddress.Contains(searchParameter.GeneralSearch)
                || m.Candidate.User.Email.Contains(searchParameter.GeneralSearch)
                || m.Candidate.CandidateCustomEducations.Any(p => p.InstitutionName.Contains(searchParameter.GeneralSearch))
                || m.Candidate.CandidateCustomEducations.Any(p => p.Profession.Contains(searchParameter.GeneralSearch))
                || m.Candidate.CandidateEducations.Any(p => p.Profession.Contains(searchParameter.GeneralSearch))
                || m.Candidate.CandidateBackgrounds.Any(p => p.Position.Contains(searchParameter.GeneralSearch))
                || m.Candidate.CandidateBackgrounds.Any(p => p.CompanyName.Contains(searchParameter.GeneralSearch))
                || m.Candidate.CandidateMobilPhones.Any(p => p.PhoneNumber.Contains(searchParameter.GeneralSearch))
                || m.Candidate.CandidateCertificates.Any(p => p.Name.Contains(searchParameter.GeneralSearch))
                || m.Candidate.CandidateCertificates.Any(p => p.IssuedBy.Contains(searchParameter.GeneralSearch))));

2

Answers


  1. There is a lot you can do. You will need to work incrementally, and expect to do things that make things worse at times. DB query execution engines are complex and their behaviour can be unexpected (eg. I made a small change to a query once and it was ~100 times faster!). You have lots to learn.

    All of these are likely going to be part of the solution.

    First: use EF’s logging capabilities to capture the SQL and parameters. Then execute manually to (1) see how long the DB side of things takes, & (2) use EXPLAIN to see how the DB is executing the query: take the time to understand the results.

    You’ll need this to identify indexes to add (and evaluate changes to the SQL).

    Second: if you do not need to use change tracking to allow setting properties and then saving changes: avoid Include, ThenInclude: instead use join and project the results (select clause). This should also allow you to avoid getting all the data from all the tables: just get the data you need (potentially saving a huge amount of memory). Remember you can put query expressions into the projection when you want a collection.

    This seems likely if this is for an enquiry page.

    Also, use AsNoTracking()! (See here.)

    Third: Replace

    && (string.IsNullOrEmpty(searchParameter.Name) || m.Candidate.Name.Contains(searchParameter.Name))
    

    With

    var query = /* everything before this point */
    
    if (!string.IsNullOrEmpty(searchParameter.Name) {
      query = query.Where(x => x.Candidate.Name.Contains(searchParameter.Name));
    }
    

    To build only the conditions that will change the result.

    This will save work for the DB that does not need to be done on the DB, and will allow these clauses to be sargable.

    (For the set of or conditions you can also look at manually using Expression types to build the expression tree as you cannot just layer one where on another.)

    Finally: LINQ to SQL is great…. until queries start getting seriously complex, at which point you will be better off writing your own SQL to give full control (and also access DB query features not exposed in LINQ).

    Do not suffer from the sunk cost fallacy: be prepared to throw away workthat only lead to a dead end.

    Login or Signup to reply.
  2. Richard covered most of the important bases. From a design perspective rather than giving users a carte blanche search capability, reigning search expectations down to reasonable criteria can go a long way. For instance if 95% of your searches will be against a few common, indexable columns then limit the default search to those fields, then have an "advanced" search where users can enter more specific search criteria. (like address searches, etc.) When you default to a powerful, flexible search you make all searches inefficient. (I.e. if that "General Search" option is used frequently)

    Similarly, Be careful when allowing text searches and avoid defaulting to Contains. Instead default to StartsWith and consider giving users a wildcard support or an option if they explicitly want to do a Contains search. For instance if they type "Fred" for a name search x.Candidate.Name == searchParameter.Name is the most efficient since it can use an existing index, but it is the most restrictive. x.Candidate.Name.StartsWith(searchParameter.Name) will be the next most practical, typical search option for relevant matches. x.Candidate.Name.Contains(searchParameter.Name) is the least efficient, so we should avoid defaulting to it for all searches. Also ensure that text search arguments have validation rules to ignore values that are overly broad and could cripple searching by returning far too much data. For instance users should not be able to search for names that contain "e".

    Next, for search results, use projection (I.e. Select) to retrieve just a summary of the information you want to display. This means not using Include for related data. It isn’t needed to query against related tables, or when using Select to select values/counts/etc. from related tables. You use Select to return a view model with just the details you want to display in your results list plus the PKs of each row, rather than loading entire entity graphs. Then, when you want to open a detailed view, or perform an action against a selected result you can fetch that single row with Includes as needed.

    It isn’t clear from your example but you should also be implementing server-side pagination (OrderBy() + Skip() + Take()) on searches that have the potential to return a large result set. Users don’t need to scroll through tens of thousands of results at a time. The point of a search is to limit results to a reasonable number. If there are too many result pages then they should refine their search.

    For instance if I want to display a list of found applications and I want to display the application #, date, candidate name, and their Email address:

    var query = _context.Applications.AsQueryable();
    
    if (!string.IsNullOrEmpty(searchParameter.Name))
        query = query.Where(x => x.Candidate.Name.StartsWith(searchParameter.Name));
    
    // Repeat for each search criteria.
    
    var resultsQuery = query.Select(x => new ApplicationResultViewModel
    {
        ApplicationId = x.Id,
        CandidateId = x.Candidate.Id,
        ApplicationNumber = x.ApplicationNumber,
        ApplicationDate = x.ApplicationDate,
        CandidateName = x.Candidate.Name,
        CandidateEmail = x.Candidate.User.Email,
        // Any other useful fields for search results.
    });
    var count = await resultsQuery.CountAsync(); // For pagination controls.
    var results = await resultsQuery.OrderByDescending(x => x.ApplicationDate)
        .Skip(pageSize * (pageNumber-1))
        .Take(pageSize)
        .ToListAsync();
    

    Using projection, pagination, and simplifying/optimizing your search criteria to cater to the most common search scenarios, plus ensuring expensive useless searches are avoided can improve performance to sub-second or seconds instead of minutes/hours.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search