skip to Main Content

Here’s the relational model for the database that I’m going to talk about (a show can have multiple actors and multiple genres): Relational model
(sorry, don’t have enough reputation to insert an image)

What I would like to achieve is to get every detail about a specific show + the genres that it has + the actors that played in the show and their IDs.

I tried:

SELECT shows.*,
       STRING_AGG(g.name, ', ')                AS genres,
       STRING_AGG(CAST(a.id AS VARCHAR), ', ') AS actor_ids,
       STRING_AGG(a.name, ', ')                AS actors
FROM shows
         LEFT JOIN show_genres sg ON shows.id = sg.show_id
         LEFT JOIN genres g ON g.id = sg.genre_id
         LEFT JOIN show_characters sc ON shows.id = sc.show_id
         LEFT JOIN actors a ON sc.actor_id = a.id
WHERE shows.id = 1390
GROUP BY shows.id

The result:

genres actor_ids actors
Action, Action, Action, Action, Action, Action, Action, Action, Action, Action, Action, Action, Action, Action, Action, Action, Action, Action, Action, Action, Adventure, Adventure, Adventure, Adventure, Adventure, Adventure, Adventure, Adventure, Adventure, Adventure, Adventure, Adventure, Adventure, Adventure, Adventure, Adventure, Adventure, Adventure, Adventure, Adventure, Drama, Drama, Drama, Drama, Drama, Drama, Drama, Drama, Drama, Drama, Drama, Drama, Drama, Drama, Drama, Drama, Drama, Drama, Drama, Drama, Fantasy, Fantasy, Fantasy, Fantasy, Fantasy, Fantasy, Fantasy, Fantasy, Fantasy, Fantasy, Fantasy, Fantasy, Fantasy, Fantasy, Fantasy, Fantasy, Fantasy, Fantasy, Fantasy, Fantasy 436511, 436512, 412933, 9739, 414839, 436539, 436541, 11439, 419217, 415442, 413734, 432873, 420752, 421890, 436592, 448107, 458178, 150561, 436590, 412041, 436511, 436512, 412933, 9739, 414839, 436539, 436541, 11439, 419217, 415442, 413734, 432873, 420752, 421890, 436592, 448107, 458178, 150561, 436590, 412041, 436511, 436512, 412933, 9739, 414839, 436539, 436541, 11439, 419217, 415442, 413734, 432873, 420752, 421890, 436592, 448107, 458178, 150561, 436590, 412041, 436511, 436512, 412933, 9739, 414839, 436539, 436541, 11439, 419217, 415442, 413734, 432873, 420752, 421890, 436592, 448107, 458178, 150561, 436590, 412041 Emilia Clarke, Kit Harington, Peter Dinklage, Lena Headey, Nikolaj Coster-Waldau, Sophie Turner, Maisie Williams, Iain Glen, Alfie Allen, Liam Cunningham, John Bradley, Conleth Hill, Aidan Gillen, Gwendoline Christie, Isaac Hempstead-Wright, Kristofer Hivju, Nathalie Emmanuel, Jacob Anderson, Jerome Flynn, Rory McCann, Emilia Clarke, Kit Harington, Peter Dinklage, Lena Headey, Nikolaj Coster-Waldau, Sophie Turner, Maisie Williams, Iain Glen, Alfie Allen, Liam Cunningham, John Bradley, Conleth Hill, Aidan Gillen, Gwendoline Christie, Isaac Hempstead-Wright, Kristofer Hivju, Nathalie Emmanuel, Jacob Anderson, Jerome Flynn, Rory McCann, Emilia Clarke, Kit Harington, Peter Dinklage, Lena Headey, Nikolaj Coster-Waldau, Sophie Turner, Maisie Williams, Iain Glen, Alfie Allen, Liam Cunningham, John Bradley, Conleth Hill, Aidan Gillen, Gwendoline Christie, Isaac Hempstead-Wright, Kristofer Hivju, Nathalie Emmanuel, Jacob Anderson, Jerome Flynn, Rory McCann, Emilia Clarke, Kit Harington, Peter Dinklage, Lena Headey, Nikolaj Coster-Waldau, Sophie Turner, Maisie Williams, Iain Glen, Alfie Allen, Liam Cunningham, John Bradley, Conleth Hill, Aidan Gillen, Gwendoline Christie, Isaac Hempstead-Wright, Kristofer Hivju, Nathalie Emmanuel, Jacob Anderson, Jerome Flynn, Rory McCann

Note: only showing the problematic columns

The problem: After joining the "show_characters" and the "actors" table, there are a lot of duplicate data in the string aggregations.

Question: How would I solve this? Without having duplicate data.

My workaround:

@connection_handler
def get_show_by_id(cursor: 'RealDictCursor', id: int) -> 'RealDictRow':
    """

    Args:
        cursor: a cursor which returns dictionaries (use @connection.connection_handler decorator)
        id: number of items shows on a single page

    Returns:
        All show details in a RealDictRow + genres(as a concatenated string) + actors in a list
        containing the actors ids and names in a RealDictRow

    """
    query = """
    SELECT shows.*,STRING_AGG(g.name,',') AS genres
    FROM shows
    LEFT JOIN show_genres sg ON shows.id = sg.show_id
    LEFT JOIN genres g ON sg.genre_id = g.id
    WHERE shows.id = %s
    GROUP BY shows.id
    """
    val = (id,)
    cursor.execute(query, val)
    shows = cursor.fetchone()

    query = """
    SELECT a.id,a.name
    FROM actors a
    RIGHT JOIN show_characters sc ON a.id = sc.actor_id
    RIGHT JOIN shows s ON sc.show_id = s.id
    WHERE s.id = %s
    GROUP BY a.id
    """
    val = (id,)
    cursor.execute(query, val)
    actors = cursor.fetchall()
    shows["actors"] = actors
    return shows

Note for workaround: Used two separate queries to achieve the result I wanted, but I’d like to know how would I be able to do this with one SQL query.

Side note: I’d be really happy to receive any feedback about my code in general, or suggestion about how people usually handle when they get back multiple data from an SQL query, do they just split it? Or is it better to write a separate query, like I did, so I get the data in a separate list right away? What about my docstring? I just started trying to use them, not sure if the way I’m doing it is good or not.

2

Answers


  1. Chosen as BEST ANSWER

    Meanwhile I've been able to come up with a solution myself, just to have an easier time managing the data returned later, I convert the returned array of arrays into a list of dictionaries.

        @connection_handler
        def get_show_by_id(cursor: 'RealDictCursor', id: int) -> 'RealDictRow':
            """
        
            Args:
                cursor: a cursor which returns dictionaries (use @connection.connection_handler decorator)
                id: number of items shows on a single page
        
            Returns:
                All show details in a RealDictRow + genres(as a concatenated string) + actors in a list
                containing the actors ids and names in a dictionary
        
            """
        
            # ONE QUERY SOLUTION:
            # concatenating actors names and ids together to prevent issue when actors share the same name
            # and worked on the same show
            query = """
            SELECT
               shows.*,
               STRING_AGG(DISTINCT g.name, ',')                AS genres,
               ARRAY_AGG(DISTINCT ARRAY[a.id::VARCHAR, a.name]) AS actors_ids_with_names
            FROM shows
                     LEFT JOIN show_genres sg ON shows.id = sg.show_id
                     LEFT JOIN genres g ON g.id = sg.genre_id
                     LEFT JOIN show_characters sc ON shows.id = sc.show_id
                     LEFT JOIN actors a ON sc.actor_id = a.id
            WHERE shows.id=%s
            GROUP BY shows.id
            """
            val = (id,)
            cursor.execute(query, val)
            show = cursor.fetchone()
            actor = [{"id":actor_id_with_name[0], "name": actor_id_with_name[1] } for actor_id_with_name in show["actors_ids_with_names"]]
            show.pop("actors_ids_with_names")
            show["actors"] = actor
            return show
    

  2. One approach in this sort of situation is to STRING_AGG each of the constituent tables (or sets of tables) first; then, LEFT JOIN those contrived tables onto the main table. This sidesteps the problem of multiplication that can occur during consecutive LEFT JOINs.

    In your case, try something like this:

    SELECT
           shows.*,
           show_genre_names.genre_names,
           show_actors.actor_ids,
           show_actors.actor_names
    FROM
           shows
           LEFT JOIN 
                  ( -- one row per show_id
                  SELECT
                         sg.show_id,
                         STRING_AGG(g.name, ', ') AS genre_names
                  FROM
                         show_genres sg
                         JOIN genres g ON g.id = sg.genre_id
                  GROUP BY
                         sg.show_id
                  ) show_genre_names
                  ON shows.id = show_genre_names.show_id 
           LEFT JOIN
                  ( -- one row per show_id
                  SELECT
                         sc.show_id,
                         STRING_AGG(a.id, ', ') AS actor_ids,
                         STRING_AGG(a.name, ', ') AS actor_names
                  FROM
                         show_characters sc
                         JOIN actors a ON a.id = sc.actor_id
                  GROUP BY
                         sc.show_id
                  ) show_actors
                  ON shows.id = show_actors.show_id
    WHERE
           shows.id = 1390
    ;
    

    You can solve this in other ways, too, but understanding this technique will be helpful in your SQL journey.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search