Mysql - Cummulative substraction of 2 values using SQL or Python

biskd
February 23, 2024
252 views
0 votes
2 Answers

rake_name	origin	destination	created_At	bpc_issue_date	departure_time	journey_distance	bpc_balance_distance_km	abc	rn	logic
ARIL-05	a	b	09-02-2024 06:37	08-02-2024 18:30	09-02-2024 13:40	173.4600067	5271	5097.539993	1	5271-173.4600067
ARIL-05	a	b	11-02-2024 02:58	08-02-2024 18:30	11-02-2024 04:15	1099.369995	5271	3998	2	5097-1099
ARIL-05	a	b	14-02-2024 12:46	08-02-2024 18:30	14-02-2024 15:50	1099.369995	5271	2898	3	3998-1099
ARIL-05	a	b	16-02-2024 17:29	08-02-2024 18:30	17-02-2024 02:30	175.6600037	5271	2713	4	2898-175
ARIL-05	a	b	17-02-2024 10:54	08-02-2024 18:30	18-02-2024 11:30	1178.880005	5271	1544	5	2713-1178
ARIL-05	a	b	20-02-2024 11:44	08-02-2024 18:30	20-02-2024 15:20	1179.369995	5271	364	6	1544-1179

Please look at the logic column for substraction i want.

so i want to update abc column with difference between bpc_balance_distance_km and journey_distance once i get that value i want to substract below abc values with the result of abc - respective journey_distance.

i have tried sql, however i am not sure how can i populate row >1, am able to populate 1st row using
case when rn= 1 then bpc_balance_km - journey_distance

i have tried following query however, i awnt to covert this query int databricks sql supported query. this query works

SET @newabc := 0;

WITH cte AS (
 SELECT *,
       CASE WHEN rn=1 THEN @newabc := bpc_balance_distance_km-journey_distance
         ELSE @newabc := @newabc-journey_distance END AS nabc
 FROM derived_abc ORDER BY rn)
 SELECT *, ROUND(nabc) AS final_result
 FROM cte;

any suggastion in python or sql will be helpful.
here is DB fiddle link : https://dbfiddle.uk/JBzaylSb i am using databricks sql

Answers

Chosen as BEST ANSWER

def calculate_nabc(rn, bpc_balance_distance_km, journey_distance, newabc):
    if rn == 1:
        newabc[0] = bpc_balance_distance_km - journey_distance
    else:
        newabc[0] = newabc[0] - journey_distance
    return newabc[0]

def calculate_final_result(df):
    # Add a row number column
    df['rn'] = df.groupby('rake_name').cumcount() + 1
    
    # Initialize the value of newabc
    newabc = [0.0]

    # Calculate the nabc column using the UDF
    df['nabc'] = df.apply(lambda row: calculate_nabc(row['rn'], row['bpc_balance_distance_km'], row['journey_distance'], newabc), axis=1)

    # Round the nabc values
    df['final_result'] = df['nabc'].round()

    return df
df = calculate_final_result(df)
df.display()

(Edit)

- FanoFN
- February 23, 2024 at 2:51 am
- 0 votes
0
Here, try this:
```
SET @newabc := 0;

WITH cte AS (
 SELECT *,
       CASE WHEN rn=1 THEN @newabc := bpc_balance_distance_km-journey_distance
         ELSE @newabc := @newabc-journey_distance END AS nabc
 FROM derived_abc ORDER BY rn)
 SELECT *, ROUND(nabc) AS final_result
 FROM cte;
```
In the query above, the value of bpc_balance_distance_km-journey_distance for rn=1 is assigned to the @newabc variable. Then for rows where rn > 1, it’ll take the current assigned @newabc value to subtract with the row’s journey_distance value.

There’ll be a slight different result in which causes by what I already point out in the comment – the inconsistencies of the decimal treatment. You can see the example in this demo fiddle.
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

Mysql – Cummulative substraction of 2 values using SQL or Python

Answers