skip to Main Content

I need to compare the 2 arrays declared here to return records that exist only in the filtered_apps array. I am using the contents of previous_apps array to see if an ID in the record exists in filtered_apps array. I will be outputting the results to a CSV and displaying records that exist in both arrays to the console.

My question is this: How do I get the records that only exist in filtered_apps? Easiest for me would be to put those unique records into a new array to work with on the csv.

start_date = Date.parse("2022-02-05")
end_date   = Date.parse("2022-05-17")
valid_year = start_date.year
dupe_apps = []
uniq_apps = []

# Finding applications that meet my criteria:
filtered_apps = FinancialAssistance::Application.where(
    :is_requesting_info_in_mail => true, 
    :aasm_state => "determined",
    :submitted_at => {
        "$exists" => true, 
        "$gte" => start_date, 
        "$lte" => end_date })

# Finding applications that I want to compare against filtered_apps
previous_apps = FinancialAssistance::Application.where(
    is_requesting_info_in_mail: true,
    :submitted_at => {
        "$exists" => true, 
        "$gte" => valid_year })

# I'm using this to pull the ID that I'm using for comparison just to make the comparison lighter by only storing the family_id
previous_apps.each do |y|    
    previous_apps_array << y.family_id
end

# This is where I'm doing my comparison and it is not working.
filtered_apps.each do |app|
    if app.family_id.in?(previous_apps_array) == false
                then @non_dupe_apps << app
            else "No duplicate found for application #{app.hbx_id}"
            end
        end
    end

So what am I doing wrong in the last code section?

2

Answers


  1. Chosen as BEST ANSWER

    EDIT: My last answer did, in fact, not work.

    Here is the code all nice and working.

    It turns out the issue was that when comparing family_id from the set of records I forgot that the looped record was a part of the set, so it would return it, too. I added a check for the ID of the array to match the looped record and bob's your uncle.

    I added the pass and reject arrays so I could check my work instead of downloading a csv every time. Leaving them in mostly because I'm scared to change anything else.

        start_date    =  Date.parse(date_from)
        end_date      =  Date.parse(date_to)
        valid_year    =  start_date.year
        date_range    =  (start_date)..(end_date)
        comparison_apps =  FinancialAssistance::Application.by_year(start_date.year).where(
            aasm_state:'determined',
            is_requesting_voter_registration_application_in_mail:true)
    
    apps = FinancialAssistance::Application.where(
            :is_requesting_voter_registration_application_in_mail => true, 
            :submitted_at => date_range).uniq{ |n| n.family_id}
    
            @pass_array   = []
            @reject_array = []
            apps.each do |app|
                family = app.family
                app_id = app.id
                previous_apps = comparison_apps.where(family_id:family.id,:id.ne => app.id)
    
                if previous_apps.count > 0
                    @reject_array << app
                    puts "e[32mApplicant hbx id e[31m#{app.primary_applicant.person_hbx_id}e[32m in family ID e[31m#{family.id}e[32m has registered to vote in a previous application.e[0m"
                else
                    <csv fields here>
            
                    csv << [csv fields here]
                end
            end
    

    Basically, I pulled the applications into the app variable array, then filtered them by the family_id field in each record.

    I had to do this because the issue at the bottom of everything was that there were records present in app that were themselves duplicates, only submitted a few days apart. Since I went on the assumption that the initial app array would be all unique, I thought the duplicates that were included were due to the rest of the code not filtering correctly.

    I then use the uniq_apps array to filter through and look for matches in uniq_apps.each do, and when it finds a duplicate, it adds it to the previous_applications array inside the loop. Since this array resets each go-round, if it ever has more than 0 records in it, the app gets called out as being submitted already. Otherwise, it goes to my csv report.

    Thanks for the help on this, it really got my brain thinking in another direction that I needed to. It also helped improve the code even though the issue was at the very beginning.


  2. Let’s check your original method first (I fixed the indentation to make it clearer). There’s quite a few issues with it:

    filtered_apps.each do |app|
        if app.family_id.in?(previous_apps_array) == false
            # Where is "@non_dupe_apps" declared? It isn't anywhere in your example...
            # Also, "then" is not necessary unless you want a one-line if-statement
            then @non_dupe_apps << app
    
        # This doesn't do anything, it's just a string
        # You need to use "p" or "puts" to output something to the console
        # Note that the "else" is also only triggered when duplicates WERE found...
        else "No duplicate found for application #{app.hbx_id}"
            end # Extra "end" here, this will mess things up
        end
    end
    

    Also, you haven’t declared previous_apps_array anywhere in your example, you just start adding to it out of nowhere.

    Getting the difference between 2 arrays is dead easy in Ruby: just use -!

    uniq_apps = filtered_apps - previous_apps
    

    You can also do this with ActiveRecord results, since they are just arrays of ActiveRecord objects. However, this doesn’t help if you specifically need to compare results using the family_id column.

    TIP: Getting the values of only a specific column/columns from your database is probably best done with the pluck or select method if you don’t need to store any other data about those objects. With pluck, you only get an array of values in the result, not the full objects. select works a bit differently and returns ActiveRecord objects, but filters out everything but the selected columns. select is usually better in nested queries, since it doesn’t trigger a separate query when used as a part of another query, while pluck always triggers one.

    # Querying straight from the database
    # This is what I would recommend, but it doesn't print the values of duplicates
    uniq_apps = filtered_apps.where.not(family_id: previous_apps.select(:family_id))
    

    I highly recommend getting really familiar with at least filter/select, and map out of the basic array methods. They make things like this way easier. The Ruby docs are a great place to learn about them and others. A very simple example of doing a similar thing to what you explained in your question with filter/select on 2 arrays would be something like this:

    arr = [1, 2, 3]
    full_arr = [1, 2, 3, 4, 5]
    
    unique_numbers = full_arr.filter do |num|
        if arr.include?(num)
            puts "Duplicates were found for #{num}"
            false
        else
            true
        end
    end
    
    # Duplicates were found for 1
    # Duplicates were found for 2
    # Duplicates were found for 3
    => [4, 5]
    

    NOTE: The OP is working with ruby 2.5.9, where filter is not yet available as an array method (it was introduced in 2.6.3). However, filter is just an alias for select, which can be found on earlier versions of Ruby, so they can be used interchangeably. Personally, I prefer using filter because, as seen above, select is already used in other methods, and filter is also the more common term in other programming languages I usually work with. Of course when both are available, it doesn’t really matter which one you use, as long as you keep it consistent.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search