skip to Main Content

I came up with one problem where I am looking for a solution as I have two tables:

Table1: Users

UserID TotalTasks
A3 12
A4 14
A5 11

Table2: Tasks

TaskID NewTask
1 4
2 5
3 3
4 2

Output:

TaskID NewTask UserID UpdatedTasks
1 4 A5 15
2 5 A3 17
3 3 A4 17
4 2 A5 17

Here for table Tasks, first TaskID in order, I need to check which user has minimum TotalTasks from the users table and add that userID and NewTask + TotalTasks under the column "UpdatedTasks" and for the next TaskID, again need to check userID with updated minimum totalTasks and repeat the same process for each row without using loop as this I need to run in synapse sql.

I tried with recursive cte but not getting right output

2

Answers


  1. As per this ask,

    Recursive CTE is not supported in Azure Synapse SQL.

    I have tried your scenario with Recursive CTE, and you see that it is not supported.

    enter image description here

    In the Synapse SQL, you can use loops as a workaround like below to achieve your requirement. Here, it needs a temporary table which you can drop after getting the desired result.

    -- Temporary table with required columns
    CREATE TABLE #temp1 (
        TaskID INT,
        NewTask INT,
        UserID NVARCHAR(50),
        UpdatedTasks INT
    );
    
    -- For the first task, get the user with min TotalTasks and insert it into the temporary table
    INSERT INTO #temp1 (TaskID, NewTask, UserID, UpdatedTasks)
    SELECT 
        t.TaskID,
        t.NewTask,
        u.UserID,
        u.TotalTasks + t.NewTask AS UpdatedTasks
    FROM Tasks t
    CROSS APPLY (
        SELECT TOP 1 UserID, TotalTasks
        FROM Users
        ORDER BY TotalTasks, UserID
    ) u
    WHERE t.TaskID = (SELECT MIN(TaskID) FROM Tasks);
    
    -- Declare the required values to use in the loop
    DECLARE @next_TaskID INT, @next_Task INT, @curr_UserID NVARCHAR(50), @new_Tasks INT;
    
    -- Use loops to iterate the remaining tasks. In each iteration, find the next tasks and get the user with minimum new tasks and insert into the temporary table 
    WHILE EXISTS (SELECT 1 FROM Tasks WHERE TaskID NOT IN (SELECT TaskID FROM #temp1))
    BEGIN
        SELECT TOP 1 
            @next_TaskID = TaskID,
            @next_Task = NewTask
        FROM Tasks
        WHERE TaskID NOT IN (SELECT TaskID FROM #temp1)
        ORDER BY TaskID;
    
        SELECT TOP 1 
            @curr_UserID = UserID,
            @new_Tasks = UpdatedTasks
        FROM (
            SELECT 
                u.UserID,
                COALESCE(a.UpdatedTasks, u.TotalTasks) + @next_Task AS UpdatedTasks
            FROM Users u
            LEFT JOIN #temp1 a ON u.UserID = a.UserID
        ) UpdatedUsers
        ORDER BY UpdatedTasks, UserID;
    
       
        INSERT INTO #temp1 (TaskID, NewTask, UserID, UpdatedTasks)
        VALUES (@next_TaskID, @next_Task, @curr_UserID, @new_Tasks);
    END;
    
    -- Use order by on TaskID and get the values from temporary table and drop it
    SELECT 
        TaskID,
        NewTask,
        UserID,
        UpdatedTasks
    FROM #temp1
    ORDER BY TaskID;
    
    DROP TABLE #temp1;
    

    Output:

    enter image description here

    But as the data size increases, this approach might take more time compared to the recursive approach. So, you can try reading the tables as spark dataframe views in synapse notebook and in the spark SQL, you can use recursive CTE to get the desired data and write it back to the Synapse SQL tables.

    Login or Signup to reply.
  2. Don’t know synapse, but if it supports basic analytic functions [ Row_Number Over() and Max() Over() ] then you could try something like here:

    WITH
      Users AS
        ( Select 'A3' as UserID, 12 as TotalTasks Union All
          Select 'A4', 14 Union All
          Select 'A5', 11
        ),
      Tasks AS
        ( Select 1 as TaskID, 4 as NewTasks Union All
          Select 2, 5 Union All
          Select 3, 3 Union All
          Select 4, 2
        ), 
      grid AS 
        ( SELECT *
          FROM ( Select   TaskID, NewTasks, Row_Number() Over(Order By TaskID) as TaskOrdNum
                 From     Tasks
                 Order By TaskID
               ) t
          LEFT JOIN 
               ( Select UserID, TotalTasks, Row_Number() Over(Order By TotalTasks) as TotalsOrdNum
                 From   Users
               ) u ON(u.TotalsOrdNum = t.TaskOrdNum)
         )
    SELECT g.TaskID,g. NewTasks, 
           Coalesce(g.UserID, (Select UserID From grid Where TaskOrdNum = 1) ) as UserID,
           Coalesce(g.NewTasks + g.TotalTasks, Max(g.NewTasks + g.TotalTasks) Over()) as UpdatedTasks
    FROM   grid g
    
    taskid newtasks userid updatedtasks
    1 4 A5 15
    2 5 A3 17
    3 3 A4 17
    4 2 A5 17

    fiddle

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search