skip to Main Content

I am getting duplicate and sometimes unordered results with doing a LEFT OUTER JOIN.

I think the issue is that there are many artwork_colors that can match the WHERE condition. But I just want the artwork’s artwork_color with the highest pixel_percent (that also matches the WHERE clause), and to use that value for ordering the artworks. And of course for each artwork to appear only once in the results.

How do I rewrite the ActiveRecord query to order the results correctly while avoiding duplicates? (This query must be made with ActiveRecord so that I can plug it into an existing pagination system, etc.)

The query, simplified for this question:

# color_ids is an group of ids of colors that are similar to the colors being searched for
# for example if you search for FF0000 it will include similar shades of red

SELECT DISTINCT
    "artwork_colors"."pixel_percent",
    artworks.*
FROM
    "artworks"
    LEFT OUTER JOIN "artwork_colors" ON "artwork_colors"."artwork_id" = "artworks"."id"
WHERE
    "artwork_colors"."color_id" IN(106, 108, 119, 120, 128, 133, 156, 160)
ORDER BY
    "artwork_colors"."pixel_percent" DESC
LIMIT 120 OFFSET 0;

The original query in ActiveRecord:

artworks
  .includes(:artwork_colors)
  .where('artwork_colors.color_id': color_ids)
  .order(pixel_percent: :desc)
  .select('artworks.*', 'artwork_colors.pixel_percent')
  .distinct

Relevant models and tables:

class Artwork < ApplicationRecord
  has_many :artwork_colors, dependent: :destroy
  has_many :colors, through: :artwork_colors
end

class ArtworkColor < ApplicationRecord
  belongs_to :artwork
  belongs_to :color
end

CREATE TABLE public.artwork_colors (
  id bigint NOT NULL,
  pixel_percent double precision, # this is the desired sort column
  artwork_id bigint,
  color_id bigint
);

class Color < ApplicationRecord
  # These colors were extracted from the Artwork images by Amazon Rekognition, an image analysis tool
  has_many :artwork_colors, dependent: :destroy
  has_many :artworks, through: :artwork_colors
end

# h s l are hue, saturation, lightness (the color value)
CREATE TABLE public.colors (
  id bigint NOT NULL,
  h double precision,
  s double precision,
  l double precision
);

2

Answers


  1. Chosen as BEST ANSWER

    I used a little Ruby to kind of cheat and get the order I needed, and then returned an ActiveRecord collection that just loads an ordered list of the desired artwork ids. This might not work in all use cases, just be warned.

        color_ids = Color.where(
          h: hue_range(:h, 30),
          s: color_range(:s, similarity * 0.5),
          l: color_range(:l, similarity)
        ).pluck(:id)
    
        artwork_ids = artworks
                      .includes(:artwork_colors)
                      .where('artwork_colors.color_id': color_ids)
                      .order(pixel_percent: :desc)
                      .limit(1500)
                      .pluck(:id)
                      .uniq
    
        # preserve the array order https://stackoverflow.com/a/9475755/8749164
        ordered_ids = artwork_ids
                      .map { |id|  "artworks.id=#{id} DESC" }
                      .join(', ')
    
        Artwork
          .where(id: artwork_ids)
          .order(Arel.sql(ordered_ids))
    
    

  2. If I understand the goal correctly you are looking for each Artwork ordered by the highest artwork_colors.pixel_percent.

    In order to accomplish this you will need a sub query to find the maximum pixel_percent for each artwork and then join that subquery to the artworks table.

    This can be accomplished as follows:

    subquery = ArtworkColors.select(
                 :artwork_id,
                 ArtworkColors.arel_table[:pixel_percent].maximum.as('highest_pixel_percent'))
              .where(color_id: color_ids)
              .group(:artwork_id)
              .arel.as('artwork_colors')
    
    Artwork.joins(
      Arel::Nodes::OuterJoin.new(
        subquery,
        subquery.create_on(subquery[:artwork_id].eq(Artwork.arel_table[:id]))
    ))
    .select(Artwork.arel_table[Arel.star], subquery[:highest_pixel_percent])
    .order(subquery[:highest_pixel_percent].desc)
    

    This should produce the following query:

    SELECT 
      artworks.*, 
      artwork_colors.highest_pixel_percent
    FROM 
      artworks
      LEFT OUTER JOIN 
       ( SELECT 
           artwork_colors.artwork_id, 
           MAX(artwork_colors.pixel_percent) AS highest_pixel_percent 
         FROM 
           artwork_colors 
         WHERE 
           artwork_colors.color_id IN (106, 108, 119, 120, 128, 133, 156, 160)
         GROUP BY 
           artwork_colors.artwork_id
       ) artwork_colors ON artwork_colors.[artwork_id] = [artworks].[id]
    ORDER BY 
      artwork_colors.highest_pixel_percent DESC
    

    Since this is an Outer Join it is possible that "highest_pixel_percent" may be NULL. If you would prefer this be represented as 0 then you can change this subquery[:highest_pixel_percent] to:

    Arel::Nodes::NamedFunction.new(
      'ISNULL',
      [subquery[:highest_pixel_percent],0]
    ).as('highest_pixel_percent')
    

    which will result in ISNULL(artwork_colors.highest_pixel_percent,0) AS highest_pixel_percent

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search