I have the following tables that model a book database:
CREATE TABLE Country (
ISO_3166 CHAR(2) PRIMARY KEY,
CountryName VARCHAR(256),
CID varchar(16)
);
CREATE TABLE Users (
UID INT PRIMARY KEY,
Username VARCHAR(256),
DoB DATE,
Age INT,
ISO_3166 CHAR(2) REFERENCES Country (ISO_3166)
);
CREATE TABLE Book (
ISBN VARCHAR(17) PRIMARY KEY,
Title VARCHAR(256),
Published DATE,
Pages INT,
Language VARCHAR(256)
);
CREATE TABLE Rating (
UID INT REFERENCES Users (UID),
ISBN VARCHAR(17) REFERENCES Book (ISBN),
PRIMARY KEY (UID,ISBN),
Rating int
);
I now want to find those users that have the most ratings per country. I could use this query:
SELECT Country.CountryName as CountryName, Users.Username as Username, COUNT(Rating.Rating) as NumRatings
FROM Country
JOIN Users ON Users.ISO_3166 = Country.ISO_3166
JOIN Rating ON Users.UID = Rating.UID
GROUP BY Country.CID, CountryName, Username
ORDER BY CountryName ASC
To return the number of rating per user in the format:
Countryname | Username | Number of Ratings of this user
I also managed the following query, which gives one user per country, but it is not the one with the most ratings:
SELECT DISTINCT ON (CountryName)
CountryName, Username, MAX(NumRatings)
FROM (
SELECT Country.CountryName as CountryName, Users.Username as Username, COUNT(Rating.Rating) as NumRatings
FROM Country
JOIN Users ON Users.ISO_3166 = Country.ISO_3166
JOIN Rating ON Users.UID = Rating.UID
GROUP BY Country.CID, CountryName, Username
ORDER BY CountryName ASC) AS MyTable
GROUP BY CountryName, Username, NumRatings
ORDER BY CountryName ASC;
But how to write a query that picks users with the maximum per country?
2
Answers
You were so close:
Postgres allows you to sort to determine which record is included when the column you are making distinct is represented by multiple rows. In this case, sorting by NumRatings descending should give you the values from the row with the highest NumRatings value for each country.
DISTINCT ON
is nice and simple to get one (as implied by the word "distinct") user with the most ratings per country. See:But you want to …
More than one can have the most ratings per country.
I suppose to aggregate ratings first, then join to the users table – in a CTE. Then pick one or more winners per country in a
LATERAL
subquery usingWITH TIES
:About "aggregate first, join later":
About
WITH TIES
:About
LATERAL
:Notably, you do not want to
GROUP BY Country.CID
.country.ISO_3166
is the PK, use that instead. (I optimized the query so I don’t need the country inGROUP BY
at all.)