skip to Main Content

Here is the situation.

Imagine a table with a few columns:

ID Name Language
1 Bob ES
2 Bob EN
3 Bob DE
4 Bill EN
5 Jack EN
6 Jack DE
7 Jack ES

Given a specific language, I want to select all the IDs. If the language is not available, I want to default to EN.
So as an example, of I select for ES, I want to get:

ID Name Language
1 Bob ES
4 Bill EN
7 Jack ES

And if I select for EN I want to get:

ID Name Language
2 Bob EN
4 Bill EN
5 Jack EN

There can be up to 10 different languages, and in total there is close to 500k lines. There is about 90k different names.


For now I found one option using COALESCE, but this requires 2 queries for every line. Somelike:

SELECT COALESCE (
  (SELECT ..... WHERE language = 'ES'),
  (SELECT ..... WHERE language = 'EN')
)

But is there a more efficient way to do that? Something like:
SELECT ….. WHERE language = (‘ES’ AND IF NOT EXIST THEN ‘EN).

Thanks.

Edit:
The COALESCE approach doesn’t work. First, it can return only one column and second, it can return only one line.

3

Answers


  1. You can use DISTINCT ON for this:

     SELECT DISTINCT ON (name)
        id
        , name
        , language
     FROM t1
     ORDER BY name
            , language = 'ES' DESC -- this one is your preference
            , language = 'EN' DESC -- default
            , id;
    
    Login or Signup to reply.
  2. Edit: I missed the postgreSQL tag — the original answer was in MS SQL. I downloaded PostgreSQL and adapted it.

    Typically I join to a users table and have simulated that in this response.

    Get all users, connect them to the language table and then case statement the fall back if the desired language is missing. You could coalesce if you prefer.

    -- Demo Data
    CREATE LOCAL TEMP TABLE tmp_Users (Name varchar(50));
    INSERT INTO tmp_Users
    VALUES 
        ('Bob'),
        ('Bill'),
        ('Jack');
    
    CREATE LOCAL TEMP TABLE tmp_langs (Id int, Name varchar(50), Language varchar(2));
    INSERT INTO tmp_langs
    VALUES
        (1, 'Bob', 'ES'),
        (2, 'Bob', 'EN'),
        (3, 'Bob', 'DE'),
        (4, 'Bill', 'EN'),
        (5, 'Jack', 'EN'),
        (6, 'Jack', 'DE'),  
        (7, 'Jack', 'ES');
    
    
        
    SELECT 
        u.Name,
        CASE
            WHEN sl.Language IS NOT null THEN sl.Language
            ELSE dl.Language -- this assumes every entry has an EN default
        END use_language
    FROM tmp_Users u
    LEFT JOIN tmp_langs AS sl --selectedLanguage
        on sl.Language = 'ES' AND u.Name = sl.Name -- join on your user ids
    LEFT JOIN tmp_langs AS dl --defaultLanguage
        on dl.Language = 'EN' AND u.Name = dl.Name;
    

    With ES

    Name use_language
    bob ES
    Bill EN
    Jack ES

    WITH EN

    Name use_language
    bob EN
    Bill EN
    Jack EN

    == With just the table as I see it in your question ==

    Here you grab everyone’s default, then go back and union everyone with EN default that doesn’t have the selected language.

    SELECT
        l.Id,
        l.Name,
        l.Language
    FROM tmp_langs l
    WHERE l.Language = 'EN'
    UNION
    SELECT
        dl.Id,
        dl.Name,
        dl.Language
    FROM tmp_langs dl -- default language
    WHERE dl.Language = 'EN'
      AND NOT EXISTS(SELECT 1 FROM tmp_langs l WHERE l.Language = 'ES' AND l.Name = dl.Name )
    ORDER BY Id;
    
    Login or Signup to reply.
  3. Main Answer

    For the sake of simulating / replicating the problem, I created the following temporary tables:

    CREATE TEMP TABLE langs (id INT, name VARCHAR(25), lang CHAR(2));
    INSERT INTO langs
    VALUES
        (1, 'Bob', 'ES'),
        (2, 'Bob', 'EN'),
        (3, 'Bob', 'DE'),
        (4, 'Bill', 'EN'),
        (5, 'Jack', 'EN'),
        (6, 'Jack', 'DE'),  
        (7, 'Jack', 'ES');
    
    CREATE TEMP TABLE users (name VARCHAR(25));
    INSERT INTO users
    VALUES 
        ('Bob'),
        ('Bill'),
        ('Jack');
    

    We perform the filter operation at least twice: once for the selected language and once of the default language. Therefore, I think it makes sense to create a function instead of repeating the query over and over:

    CREATE OR REPLACE FUNCTION filter_by_lang(selected_lang CHAR(2))
    RETURNS SETOF langs AS $$
        SELECT * FROM langs WHERE lang = selected_lang
    $$ LANGUAGE SQL;
    
    SELECT
        COALESCE(s.id, d.id) AS id,
        u.name,
        COALESCE(s.lang, d.lang) AS lang
    FROM users AS u
        LEFT JOIN filter_by_lang('ES') AS s ON u.name = s.name
        LEFT JOIN filter_by_lang('EN') AS d ON u.name = d.name;
    

    COALESCE returns the first non NULL value. In the context of our problem, it return the first available id and language for a particular user.

    Bonus

    Let’s say that there are two default languages. Thus if the person does not speak the selected language or the first default language, show the second default language. Using the same approach:

    CREATE OR REPLACE FUNCTION filter_by_lang(selected_lang CHAR(2))
    RETURNS SETOF langs AS $$
        SELECT * FROM langs WHERE lang = selected_lang
    $$ LANGUAGE SQL;
    
    SELECT
        COALESCE(s.id, d1.id, d2.id) AS id,
        u.name,
        COALESCE(s.lang, d1.lang, d2.lang) AS lang
    FROM users AS u
        LEFT JOIN filter_by_lang('ES') AS s ON u.name = s.name
        LEFT JOIN filter_by_lang('EN') AS d1 ON u.name = d1.name
        LEFT JOIN filter_by_lang('DE') AS d2 ON u.name = d2.name
    

    By creating a function, we save ourself from repetitive writing and we make the query much more readable.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search