skip to Main Content

I have two tables, tx and earliest. I’m trying to join them based on "UserId" and "AcceptorCode", and only return the records where "CreatedOn" in tx is greater than or equal to "EarliestPayDate" in earliest.



WITH tx AS(
    SELECT 
        "AcceptorCode", 
        "UserId", 
        "CreatedOn",
        "Amount",
        "WithdrawType",
        "DepositType",
        "Balance"
    FROM 
    "public"."Transactions"
    WHERE 
       -- "AcceptorCode" IS NOT NULL AND 
       -- "AcceptorCode" != ''
    "CreatedOn" BETWEEN {{START_DATE}} AND {{END_DATE}}
), 
earliest AS(
    SELECT 
        "AcceptorCode", 
        "UserId", 
        MIN(CAST("CreatedOn" AS date)) AS "EarliestPayDate"
    FROM 
        tx
    WHERE 
    "AcceptorCode" IN (
    '000000014578478','000000900030378','000000005227597' )
    GROUP BY 
        "AcceptorCode", 
        "UserId"
), 
wages AS (
    SELECT 
        e."UserId", t.*
    FROM 
        tx t
    JOIN 
        earliest e ON t."UserId" = e."UserId" AND (t."AcceptorCode" IS NULL OR t."AcceptorCode" = '')
    WHERE 
        t."CreatedOn" >= e."EarliestPayDate"
 
    ORDER BY 
        e."UserId" DESC   
)
SELECT *
FROM wages


The problem is that each result is returned three times.

enter image description here

How can I modify my query to return each result only once?


EDIT:

The use of SELECT DISTINCT "UserId" can address the issue with the query above. However, when I use aggregations and GROUP BY, it sums the amounts three times.




WITH tx AS(
    ...
), 
earliest AS(
    ...
), 
wages AS (
    SELECT 
        e."UserId", 
        SUM(CASE WHEN t."WithdrawType" IN (2, 3) THEN t."Amount" ELSE 0 END) AS "TotalWithdrawalWage",
        SUM(CASE WHEN t."DepositType" = 4 THEN t."Amount" ELSE 0 END) AS "TotalDepositWage"
    FROM 
        tx t
    JOIN 
        earliest e ON t."UserId" = e."UserId"
    WHERE 
        t."CreatedOn" >= e."EarliestPayDate"
    GROUP BY 
        e."UserId"
    ORDER BY 
        e."UserId" DESC   
)
SELECT *, "TotalWithdrawalWage" - "TotalDepositWage" AS "Differnce"
FROM wages
WHERE "TotalWithdrawalWage" > 0 OR "TotalDepositWage" > 0
ORDER BY "UserId" DESC    
    
   

enter image description here

2

Answers


  1. In subquery earliest you have 3 records for every userId – one for each of AcceptorCodes mentioned in the filter, since you are grouping by both of these fields.

    I suppose you need to group by userId only:

    earliest AS(
        SELECT 
            "UserId", 
            MIN(CAST("CreatedOn" AS date)) AS "EarliestPayDate"
        FROM 
            tx
        WHERE 
        "AcceptorCode" IN (
        '000000014578478','000000900030378','000000005227597' )
        GROUP BY 
            "UserId"
    )
    
    Login or Signup to reply.
  2. The reason why you are getting 3 copies of each user is because in your earliest AS(...), definition, you are grouping by AcceptorCode + UserId. So you will be the smallest CreatedOn for each AcceptorCode + UserId combination. When this is passed to the surrounding SELECT and joined with tx again, using t."UserId" = e."UserId", each t."UserId" will match each of the 3 e."UserId" values, since there is no join condition on e."AcceptorCode". I don’t know what you’re trying to achieve, since you are looking for t."AcceptorCode" IS NULL OR t."AcceptorCode" = '' — there does not seem to be a clear intention to join t and e datasets

    If you are simply trying to get the EarliestPayDate per-user, then you can change the earliest AS(...) definition to read:

    earliest AS(
        SELECT 
            "UserId", 
            MIN(CAST("CreatedOn" AS date)) AS "EarliestPayDate"
        FROM 
            tx
        WHERE 
        "AcceptorCode" IN (
        '000000014578478','000000900030378','000000005227597' )
        GROUP BY 
            "UserId"
    ), 
    

    and also put AND (t."AcceptorCode" IS NULL OR t."AcceptorCode" = '') after WHERE

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search