skip to Main Content

I am working in Rails and Postgres.
I have a table Problems, which has a few columns. I have another table ExtraInfos, which references Problems and has three columns: problem_id, info_type, info_value.

For example:

Problems:

id problem_type problem_group
0 type_x grp_a
1 type_y grp_b
2 type_z grp_c

ExtraInfos:

id problem_id info_type:String info_value
0 0 info_1 v1
1 0 info_2 v2
2 0 info_3 v3
3 1 info_1 v4
4 1 info_3 v5

As you can see, each problem has a variable number of extra information.

What is the best way to join both tables to create something that looks like:

id problem_type problem_group info_1 info_2 info_3
0 type_x grp_a v1 v2 v3
1 type_y grp_b v4 v5
2 type_z grp_c

I used the ruby pivot_table gem, and I did manage to create the view that I wanted, by

@table = PivotTable::Grid.new do |g|
  g.source_data  = ExtraInfos.all.includes(:problem))
  g.column_name  = :info_type
  g.row_name     = :problem
  g.field_name   = :info_value
end
@table.build

and then iterating over it by

...
<% @table.columns.each do |col| %>
  <th><%= col.header %></th>
<% end %>
...
<% if @table.row_headers.include? problem %>
  <% table.rows[table.row_headers.index(problem)].data.each do |cell| %>
    <td><%= cell %></td>
  <% end %>
<% end %>
...

but this is very clunky and doesn’t leave me with good ways to, for instance, sort by these extra columns. As far as I know, the tables are simply a grid, an object, and can’t LEFT JOIN with my Problems.all table, which would be the ideal solution.

I have tried looking up various pure SQL methods, but all seem to start with the assumption that these extra columns will be hard coded in, which is what I am trying to avoid. I came across crosstab, but I haven’t managed to get it working as it should.

sql = "CREATE EXTENSION IF NOT EXISTS tablefunc;
    SELECT * FROM crosstab(
      'SELECT problem_id, info_type, info_value
      FROM pre_maslas
      ORDER BY 1,2'
    ) AS ct(problem_id bigint, info_type varchar(255), info_value varchar(255))"

@try = ActiveRecord::Base.connection.execute(sql)

This gives me the result {"problem_id"=>44, "info_type"=>"6", "info_value"=>"15"} {"problem_id"=>45, "info_type"=>"6", "info_value"=>"15"} which is clearly not correct.

Another method seems to be creating a separate reference table containing a list of all possible infoTypes, which will then be referenced by the ExtraInfos table, making it easier to join the tables. However, I don’t want the infoTypes coded in at all. I want the user to be able to give me any type and value strings, and my tables should be able to deal with this.

What is the best solution for accomplishing this?

2

Answers


  1. In postgres, pivot table or crosstab are not relevant when the list of columns may vary in time, ie the list of values in column info_type may increase or decrease.

    There is an other solution which consists in creating a composite type dynamically and then using the standard functions jsonb_build_agg and jsonb_populate_record :

    Creating the composite type column_list dynamically :

    CREATE OR REPLACE PROCEDURE column_list() LANGUAGE plpgsql AS $$
    DECLARE
      clist text ;
    BEGIN
      SELECT string_agg(DISTINCT info_type || ' text', ',')
        INTO clist
        FROM ExtraInfos ;
     
      EXECUTE 'DROP TYPE IF EXISTS column_list' ;
      EXECUTE 'CREATE TYPE column_list AS (' || clist || ')' ;
    END ; $$ ;
    

    Then setting up the composite type column_list for the first time :

    CALL column_list() ;
    

    But this composite type must be updated after every change of column ExtraInfos. This can be achieved with a trigger function :

    CREATE OR REPLACE FUNCTION After_Insert_Update_Delete_ExtraInfos () RETURNS trigger LANGUAGE plpgsql AS $$
    BEGIN
      CALL column_list() ;
      RETURN NULL ;
    END ; $$ ;
    
    CREATE OR REPLACE TRIGGER After_Insert_Update_Delete_ExtraInfos AFTER INSERT OR UPDATE OF info_type OR DELETE ON ExtraInfos
    FOR EACH STATEMENT EXECUTE FUNCTION After_Insert_Update_Delete_ExtraInfos () ;
    

    The final query is :

    SELECT p.id, p. problem_type, p.problem_group, (jsonb_populate_record(NULL :: column_list, jsonb_object_agg(info_type, info_value))).*
      FROM Problems AS p
     INNER JOIN ExtraInfos AS ei
        ON ei.problem_id = p.id
     GROUP BY p.id, p. problem_type, p.problem_group
    

    which gives the result :

    id problem_type problem_group info_1 info_2 info_3
    0 type_x grp_a v1 v2 v3
    1 type_y grp_b v4 null v5

    see test result in dbfiddle

    Login or Signup to reply.
  2. ActiveRecord is built on top of the AST query assembler Arel.

    You can use this assembler to build dynamic queries as needed basically if you can hand type it as a SQL query Arel can build it.

    In this case the following will build your desired crosstab query based on the table structure provided in the post.

    # Get all distinct info_types to build columns
    cols = ExtraInfo.distinct.pluck(:info_type)
    # extra_info Arel::Table
    extra_infos_tbl = ExtraInfo.arel_table
    # Arel::Table to use for querying 
    tbl = Arel::Table.new('ct')
    
    # SQL data type for the extra_infos.info_type column 
    info_type_sql_type = ExtraInfo.columns.find {|c| c.name == 'info_type' }&.sql_type
    
    # Part 1 of crosstab 
    qry_txt = extra_infos_tbl.project( 
      extra_infos_tbl[:problem_id],
      extra_infos_tbl[:info_type],
      extra_infos_tbl[:info_value]
    ) 
    # Part 2 of the crosstab  
    cats =  extra_infos_tbl.project(extra_infos_tbl[:info_type]).distinct
    
    # construct the ct portion of the crosstab query
    ct = Arel::Nodes::NamedFunction.new('ct',[
      Arel::Nodes::TableAlias.new(Arel.sql('"problem_id"'), Arel.sql('bigint')),
      *cols.map {|name|  Arel::Nodes::TableAlias.new(Arel::Table.new(name), Arel.sql(info_type_sql_type))}
    ])
    
    # build the crosstab(...) AS ct(...) statement
    crosstab = Arel::Nodes::As.new(
      Arel::Nodes::NamedFunction.new('crosstab', [Arel.sql("'#{qry_txt.to_sql}'"),
        Arel.sql("'#{cats.to_sql}'")]),
      ct
    )
    
    # final query construction
    q = tbl.project(tbl[Arel.star]).from(crosstab)
    

    Using this q.to_sql will produce:

    SELECT 
      ct.* 
    FROM 
      crosstab('SELECT 
                  extra_infos.problem_id, 
                  extra_infos.info_type, 
                  extra_infos.info_value 
                FROM 
                  extra_infos', 
               'SELECT DISTINCT 
                  extra_infos.info_type 
                FROM 
                  extra_infos') AS ct(problem_id bigint, 
                                      info_1 varchar(255), 
                                      info_2 varchar(255), 
                                      info_3 varchar(255))
    

    And results in

    problem_id info_1 info_2 info_3
    0 v1 v2 v3
    1 v4 v5

    We can join this to the problems table as

    sub = Arel::Table.new('subq')
    sub_q = Arel::Nodes::As.new(q,Arel.sql(sub.name)) 
    
    out = Problem
      .joins(Arel::Nodes::OuterJoin.new(sub_q,            
                Arel::Nodes::On.new(Problem.arel_table[:id].eq(sub[:problem_id]))
      )).select(
         Problem.arel_table[Arel.star],
         *cols.map {|c| sub[c.intern]}
      )
    

    This will return Problem objects where the info_type columns are virtual attributes. e.g. out.first.info_1 #=> 'v1'

    Note: Personally I would break the parts down in a class to make the assembly clearer but the above will produce the desired outcome

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search