skip to Main Content

My sql statement is written using cross join lateral in postgres sql. But i want to execute this sql statement in google bigquery and bigquery doesn’t support cross join lateral. How can i rewrite this sql for bigquery engine?

select 
left_table.*,x.*
from 
    data left_table
cross join lateral 
(

select string_agg(right_table.session_id,',')
from 
data right_table 
where left_table.source_ip = right_table.source_ip 
and 
    ((right_table.session_start_time >= left_table.session_start_time and right_table.session_start_time <= left_table.session_end_time)
    or 
    (right_table.session_end_time >= left_table.session_start_time and right_table.session_end_time <= left_table.session_end_time))
group by right_table.source_ip
) x

2

Answers


  1. Without lateral joins the "equivalent" is a "correlated subquery" in the select clause, and these are often a performance issue as the correlated subquery is executed row by row though the resultset:

    SELECT
          left_table.*
        , (SELECT string_agg(right_table.session_id, ',') 
           FROM data right_table
           WHERE left_table.source_ip = right_table.source_ip
           AND (
                (
                    right_table.session_start_time >= left_table.session_start_time
                    AND right_table.session_start_time <= left_table.session_end_time
                    )
                OR (
                    right_table.session_end_time >= left_table.session_start_time
                    AND right_table.session_end_time <= left_table.session_end_time
                    )
                )
           GROUP BY right_table.source_ip)  x
    FROM data left_table
    

    nb: The lateral join is executed in a more efficient manner (as part of the from clause).

    Login or Signup to reply.
  2. You can try this code. In this code CROSS JOIN LATERAL is not used, instead correlated sub-queries are used to achieve the same result.

    WITH AggregatedSessions AS (
          SELECT
            left_table.*,
            (
              SELECT STRING_AGG(right_table.session_id, ',')
              FROM data AS right_table
              WHERE
                left_table.source_ip = right_table.source_ip
                AND (
                  (right_table.session_start_time >= left_table.session_start_time AND right_table.session_start_time <= left_table.session_end_time)
                  OR
                  (right_table.session_end_time >= left_table.session_start_time AND right_table.session_end_time <= left_table.session_end_time)
                )
              GROUP BY right_table.source_ip
            ) AS session_ids
          FROM data AS left_table
        )
        SELECT * FROM AggregatedSessions;
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search