I have two tables, tx and earliest. I’m trying to join them based on "UserId" and "AcceptorCode", and only return the records where "CreatedOn" in tx is greater than or equal to "EarliestPayDate" in earliest.
WITH tx AS(
SELECT
"AcceptorCode",
"UserId",
"CreatedOn",
"Amount",
"WithdrawType",
"DepositType",
"Balance"
FROM
"public"."Transactions"
WHERE
-- "AcceptorCode" IS NOT NULL AND
-- "AcceptorCode" != ''
"CreatedOn" BETWEEN {{START_DATE}} AND {{END_DATE}}
),
earliest AS(
SELECT
"AcceptorCode",
"UserId",
MIN(CAST("CreatedOn" AS date)) AS "EarliestPayDate"
FROM
tx
WHERE
"AcceptorCode" IN (
'000000014578478','000000900030378','000000005227597' )
GROUP BY
"AcceptorCode",
"UserId"
),
wages AS (
SELECT
e."UserId", t.*
FROM
tx t
JOIN
earliest e ON t."UserId" = e."UserId" AND (t."AcceptorCode" IS NULL OR t."AcceptorCode" = '')
WHERE
t."CreatedOn" >= e."EarliestPayDate"
ORDER BY
e."UserId" DESC
)
SELECT *
FROM wages
The problem is that each result is returned three times.
How can I modify my query to return each result only once?
EDIT:
The use of SELECT DISTINCT "UserId"
can address the issue with the query above. However, when I use aggregations and GROUP BY
, it sums the amounts three times.
WITH tx AS(
...
),
earliest AS(
...
),
wages AS (
SELECT
e."UserId",
SUM(CASE WHEN t."WithdrawType" IN (2, 3) THEN t."Amount" ELSE 0 END) AS "TotalWithdrawalWage",
SUM(CASE WHEN t."DepositType" = 4 THEN t."Amount" ELSE 0 END) AS "TotalDepositWage"
FROM
tx t
JOIN
earliest e ON t."UserId" = e."UserId"
WHERE
t."CreatedOn" >= e."EarliestPayDate"
GROUP BY
e."UserId"
ORDER BY
e."UserId" DESC
)
SELECT *, "TotalWithdrawalWage" - "TotalDepositWage" AS "Differnce"
FROM wages
WHERE "TotalWithdrawalWage" > 0 OR "TotalDepositWage" > 0
ORDER BY "UserId" DESC
2
Answers
In subquery
earliest
you have 3 records for everyuserId
– one for each ofAcceptorCode
s mentioned in the filter, since you are grouping by both of these fields.I suppose you need to group by
userId
only:The reason why you are getting 3 copies of each user is because in your
earliest AS(...),
definition, you are grouping byAcceptorCode + UserId
. So you will be the smallestCreatedOn
for eachAcceptorCode + UserId
combination. When this is passed to the surroundingSELECT
and joined withtx
again, usingt."UserId" = e."UserId"
, eacht."UserId"
will match each of the 3e."UserId"
values, since there is no join condition one."AcceptorCode"
. I don’t know what you’re trying to achieve, since you are looking fort."AcceptorCode" IS NULL OR t."AcceptorCode" = ''
— there does not seem to be a clear intention to joint
ande
datasetsIf you are simply trying to get the
EarliestPayDate
per-user, then you can change theearliest AS(...)
definition to read:and also put
AND (t."AcceptorCode" IS NULL OR t."AcceptorCode" = '')
afterWHERE