skip to Main Content

I have the following tables that model a book database:

CREATE TABLE Country (
    ISO_3166 CHAR(2) PRIMARY KEY,
    CountryName VARCHAR(256),
    CID varchar(16)
);
CREATE TABLE Users (
    UID INT PRIMARY KEY,
    Username VARCHAR(256),
    DoB DATE,
    Age INT,
    ISO_3166 CHAR(2) REFERENCES Country (ISO_3166)
);
CREATE TABLE Book (
    ISBN VARCHAR(17) PRIMARY KEY,
    Title VARCHAR(256),
    Published DATE,
    Pages INT,
    Language VARCHAR(256)
);
CREATE TABLE Rating (
    UID INT REFERENCES Users (UID),
    ISBN VARCHAR(17) REFERENCES Book (ISBN),
    PRIMARY KEY (UID,ISBN),
    Rating int
);

I now want to find those users that have the most ratings per country. I could use this query:

SELECT Country.CountryName as CountryName, Users.Username as Username, COUNT(Rating.Rating) as NumRatings
FROM Country
JOIN Users ON Users.ISO_3166 = Country.ISO_3166 
JOIN Rating ON Users.UID = Rating.UID
GROUP BY Country.CID, CountryName, Username
ORDER BY CountryName ASC

To return the number of rating per user in the format:

 Countryname | Username | Number of Ratings of this user

I also managed the following query, which gives one user per country, but it is not the one with the most ratings:

SELECT DISTINCT ON (CountryName)
        CountryName, Username, MAX(NumRatings)
FROM (
    SELECT Country.CountryName as CountryName, Users.Username as Username, COUNT(Rating.Rating) as NumRatings
        FROM Country
        JOIN Users ON Users.ISO_3166 = Country.ISO_3166 
        JOIN Rating ON Users.UID = Rating.UID
        GROUP BY Country.CID, CountryName, Username
        ORDER BY CountryName ASC) AS MyTable
GROUP BY CountryName, Username, NumRatings 
ORDER BY CountryName ASC;

But how to write a query that picks users with the maximum per country?

2

Answers


  1. You were so close:

    SELECT DISTINCT ON (CountryName)
            CountryName, Username, NumRatings
    FROM(
        SELECT Country.CountryName as CountryName, Users.Username as Username, COUNT(Rating.Rating) as NumRatings
            FROM Country
            JOIN Users ON Users.ISO_3166 = Country.ISO_3166 
            JOIN Rating ON Users.UID = Rating.UID
            GROUP BY Country.CID, CountryName, Username
            ORDER BY CountryName ASC) AS MyTable
    WHERE TRUE --no filtering needed 
    ORDER BY CountryName ASC, NumRatings DESC
    

    Postgres allows you to sort to determine which record is included when the column you are making distinct is represented by multiple rows. In this case, sorting by NumRatings descending should give you the values from the row with the highest NumRatings value for each country.

    Login or Signup to reply.
  2. DISTINCT ON is nice and simple to get one (as implied by the word "distinct") user with the most ratings per country. See:

    But you want to …

    find those users that have the most ratings per country.

    More than one can have the most ratings per country.
    I suppose to aggregate ratings first, then join to the users table – in a CTE. Then pick one or more winners per country in a LATERAL subquery using WITH TIES:

    WITH agg AS (
       SELECT u.iso_3166, u.uid, u.username, r.numratings
       FROM  (
          SELECT uid, count(*) AS numratings
          FROM   rating r
          GROUP  BY 1
          ) r
       JOIN   users u USING (uid)
       )
    SELECT c.countryname, a.username, a.numratings
    FROM   country c
    LEFT   JOIN LATERAL (
       SELECT *
       FROM   agg a
       WHERE  a.iso_3166 = c.iso_3166
       ORDER  BY a.numratings DESC
       FETCH  FIRST 1 ROWS WITH TIES  -- !
       ) a ON true;
    

    About "aggregate first, join later":

    About WITH TIES:

    About LATERAL:

    Notably, you do not want to GROUP BY Country.CID. country.ISO_3166 is the PK, use that instead. (I optimized the query so I don’t need the country in GROUP BY at all.)

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search