skip to Main Content

I am facing with a little challenge. I know that this kind of challenge should be solved by using python but I have decided to do it with SQL. And I am looking for little help as I am kind of beginner 🙂

I have set of data with 4 columns.

  • first column represent account ID (receivable)
  • second column represent date of transaction
  • 3rd column represent amounts
  • 4th column represent account ID which has sent the transaction to account ID receivable

First I have done is that I have set kind of subtable which will group account ID (receivable). The reson why is that I want to apply set of filters against group of records from particular account ID receivable.. I have set this query like this:

SELECT t1.account_receivable,t1.datum,t1.amount,t1.account_payable
FROM python_table as t1
WHERE t1.account_receivable IN (
    SELECT t2.account_receivable
    FROM python_table as t2
    GROUP BY 1                  )

and now I want to do this crazy condition for records..
I want records only which has at least 3 consecutive months sent from same account ID of 4th column. There can not be sent more than 1 transaction in same month othervise I do not want to display this. interval for days must be maximum 5 from the lowest day of group (consecutive months)

here is the one group of data for 1 account ID (receivable). I have tried to explain as much as possible what I am trying to achieve… I tried almost everything but not able to solve this one,.. Asking myself is it even possible in SQL with so many conditions?

61441   2014-04-28  102 45437871
61441   2014-04-28  15346   45437871
61441   2014-05-16  98  306658150
**61441 2014-04-28  711 323671229
61441   2014-05-23  694 323671229
61441   2014-06-25  701 323671229
61441   2014-07-25  702 323671229
61441   2014-08-25  694 323671229
61441   2014-09-25  644 323671229**
**61441 2014-06-09  3697    342058995   this set will not match condition as interval for day
61441   2014-07-04  3692    342058995   from lowest to highest is more than 5 days
61441   2014-08-06  3665    342058995
61441   2014-09-10  3672    342058995**
61441   2014-06-10  8409    357368301
61441   2014-04-24  4136    412899724
**61441 2014-04-28  1261    440261807
61441   2014-05-23  1271    440261807
61441   2014-06-25  1267    440261807
61441   2014-07-25  1259    440261807
61441   2014-08-25  1274    440261807
61441   2014-09-25  1120    440261807**
61441   2014-06-19  141 441460477
61441   2014-08-06  314 518735975
**61441 2014-04-01  17032   547166056
61441   2014-05-02  45773   547166056
61441   2014-06-02  17821   547166056
61441   2014-07-01  17445   547166056
61441   2014-08-01  25562   547166056
61441   2014-09-02  17459   547166056**
61441   2014-09-05  157 686201636
61441   2014-09-19  126 686201636
**61441 2014-04-14  7233    762490320  This will not match condition as it has 3 transactions in
61441   2014-05-19  9703    762490320  same month
61441   2014-06-16  8875    762490320
61441   2014-07-14  8274    762490320
61441   2014-07-18  1436    762490320
61441   2014-07-28  841 762490320
61441   2014-08-15  11008   762490320
61441   2014-09-16  8334    762490320**
61441   2014-05-16  340 838201881
61441   2014-05-21  2480    838201881
61441   2014-07-14  295 838201881
61441   2014-07-14  933 838201881
61441   2014-08-25  1696    838201881
61441   2014-08-25  849 838201881
61441   2014-04-28  2011    842644517
61441   2014-09-22  8295    842644517
61441   2014-07-09  35  982718888

2

Answers


  1. This can be achieved with window functions.

    The result from the below query differs from yours in that payments from 762490320 to 61441 had a three-month block that met the other criteria from April 2014 to June 2014 before it was interrupted by the triple payment in July.

    with lines as (
      select *, date_trunc('month', datum)::date as year_month,
             extract('day' from datum) as day_of_month
        from python_table
    ), double_taps as (
      select *, 
             count(*) 
               over (partition by account_receivable, account_payable, year_month)
               as monthly_count
        from lines
    ), consecutive_island_breaks as (
      select *, 
             case 
               when year_month - interval '1 month' = lag(year_month) over w then 0
               else 1
             end as new_group
        from double_taps
       where monthly_count = 1
      window w as (partition by account_receivable, account_payable order by datum) 
    ), islands as (
      select *,
             sum(new_group) over w as island_number
        from consecutive_island_breaks
      window w as (partition by account_receivable, account_payable 
                       order by datum)
    ), island_sizes as (
      select *,
             count(*) over w as island_size,
             max(day_of_month) over w - min(day_of_month) over w as day_range
        from islands
      window w as (partition by account_receivable, account_payable, island_number)
    )
    select * 
      from island_sizes
     where island_size >= 3
       and day_range <= 5;
    

    Working fiddle

    Login or Signup to reply.
  2. first_value(datum) OVER (PARTITION BY account_receivable,
            account_payable ORDER BY datum) AS first_datum
    

    to solve the " interval for days must be maximum 5 from the lowest day of group (consecutive months)".

    count(*) OVER (PARTITION BY  
     account_receivable,account_payable,date_trunc('month', datum)) AS cnt_txn_mon,
    

    to solve "There can not be sent more than 1 transaction in same month othervise I do not want to display this"

    "I want records only which has at least 3 consecutive months":
    means that count of (lead(date_trunc(‘month’, datum),1) = date_trunc(‘month’, datum) + interval ‘1 month’ is true) >= 2.


    demo: https://dbfiddle.uk/uwXO0tDe
    query:

    
    WITH cte AS 
    (
        SELECT
            datum,
            account_receivable,
            account_payable,
            /* for the consective month computation */
            coalesce(lead(date_trunc('month', datum)::date, 1) 
                OVER (PARTITION BY account_receivable, account_payable ORDER BY datum)
                ,date_trunc('month', datum)::date) 
            = (date_trunc('month', datum) + interval '1 month')::date AS is_next_month,
            /*for max 1 txns*/
            count(*) OVER (
                PARTITION BY account_receivable,account_payable,date_trunc('month', datum)) AS cnt_txn_mon,
            /*for maximum 5 from the lowest day of group*/
            first_value(datum) OVER (
                    PARTITION BY account_receivable,account_payable ORDER BY datum) AS first_datum
        FROM  python_table
    ),cte1 AS 
    (
        SELECT
            datum,
            account_receivable,
            account_payable,
            cnt_txn_mon,
            /*ignore date order, means in a group, there is a 3 consective month is fine */
            count(is_next_month) FILTER (WHERE is_next_month) OVER (PARTITION BY account_receivable,
                account_payable) AS cnt_mon_gaps,
            /*for maximum 5 from the lowest day of group*/
            abs(date_trunc('month', first_datum)::date - first_datum) <= 5
            OR abs((date_trunc('month', first_datum) + interval '1 month')::date - first_datum) <= 5 AS date_near_month
        FROM    cte        
        WHERE   cnt_txn_mon = 1
            
    )
    SELECT *
    FROM      cte1
    WHERE     cnt_mon_gaps >= 2 AND date_near_month
    ORDER BY  account_payable,datum;
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search