skip to Main Content

My SQL knowledge is really basic, and need some help in translating this rather long query (running in a python script querying some AWS athena database, with some f-strings embedded) to plain English or flow chart. I understand concepts of SELECT * WHERE, JOINS etc., but this seems really advanced, renaming multiple tables and handling multiple cases.

SELECT usc_customer_id, escape_customer_id, use_sms, 
start_date, least(date('{segment_date_str}'), max(booking_date)) as 
"max_booking_date", 
date('{segment_date_str}') as segment_date,
count(t.job_no) as bookings, count(a.trip_no) as trips, 
sum(total_trip_fare) as total_fare, sum(promo_amt) as "promo_amount", 
sum(case when promo_amt > 0 then 1 else 0 end) as promo_times
FROM   
(
    SELECT m.usc_customer_id, m.escape_customer_id, use_sms, 
    case when date(user_create_time) > date('{start_date}') then date(
    user_create_time)
        else date('{start_date}') end as start_date,  
    date(b."req_pickup_dt") as booking_date,
        j.job_no, j.status
      FROM  (SELECT m.*, 
              CASE WHEN news_sms = 1 OR third_party_sms = 1 THEN True
              ELSE False END AS use_sms,
                greatest(COALESCE(c.create_time, 
                m.usc_customer_create_time,m.escape_customer_create_time),
                    COALESCE(m.usc_customer_create_time, 
                    m.escape_customer_create_time, c.create_time), 
                    COALESCE(m.escape_customer_create_time, c.create_time, 
                    m.usc_customer_create_time)) as user_create_time
                FROM "publish_cab"."usc_customer" c join 
                customer_mapping m 
                on c.customer_id = m.usc_customer_id where country_code = 
                '65') m
           join "publish_cab"."escape_booking" as b
                on m.escape_customer_id = b.cust_id
           join "publish_cab"."escape_job" as j
                on b.booking_id = j.booking_id
      where b.bookingdate between '{start_date_short}' and '
{segment_date_str_short}'
        and j.createddate between '{start_date_short}' and '
{segment_date_str_short}'
) t   
left join (
    select * from "publish_cab"."escape_trip_details" 
    where createddate between '{start_date_short}' and '
{segment_date_str_short}'
) a
on  t.job_no = a.job_no
group by usc_customer_id, escape_customer_id, start_date, use_sms

Would be grateful if it can be translated into some sort of block diagrammatic flow chart, or corresponding pandas dataframe operations.

2

Answers


  1. You ask "how may I understand the meaning of SELECT … ?"

    Just review the rows.
    (You have the data, we don’t.)
    It’s an onion.
    Start on the inside, and work your way out.

    That SELECT m.*, ... creates a new relation,
    a new table based upon usc_customer.
    As such, it can be JOINed against escape_booking
    and escape_job, which in turn produces another relation.

    As stated these relations have a bunch of complex syntax
    and can look kind of big and scary.
    So tame that complexity.
    Use CREATE VIEW usc_customer_report_v AS SELECT ...
    to encapsulate all those expressions into a named relationship
    that you can conveniently query and JOIN against.

    That 3-way JOIN could go into an escape_v view
    for the same reasons.
    And finally the SELECT m.usc_customer_id, ...
    might go into yet another view.

    At that point you’re looking at a much simpler
    overall query.
    And you can reason about each of the ingredients
    independently, reviewing and debugging them
    if you feel they don’t capture quite the proper rows
    for your business problem.


    Life is full of complex relationships.
    Often the act of descriptively naming
    those relationships is the first step
    to abstracting what they mean,
    and viewing them from a higher perspective.

    Think about a library.
    It contains alphabetic letters,
    a great many letters, when viewed up close.
    Take a step back and we see words, sentences,
    paragraphs, chapters, books,
    each with their own themes.
    Armed with the Dewey Decimal system
    we might step further back
    and survey a bookshelf devoted
    to chemistry, another to physics.
    Use the power of abstraction to
    zoom out to the big picture,
    then zoom in to what you currently find
    are the details of interest.
    Putting names on things
    helps us to make abstractions.

    Login or Signup to reply.
  2. Without knowing your data model and meaning of these fields it’s pretty tough to describe it in a way that would be more clear than sql. SQL is actually remarkably well designed for dictating transformations.

    … but, rewriting to use CTEs (Common table expressions, "with" clause section you see below) it might be easier to see how the names correspond to each piece.

    with 
    /*
    
    Define chunks of the query.
    
    <name> as (<query body>) 
    
    You can think of that as assigning the result of the query body to a named variable.
    
    Note the indentation, each named block starts at the beginning of the line. Don't get confused by the column aliases ex case when yada then yada end AS something. That just assigns the result of that case statement to column name.
    
    See main body lower
    
    
    */
    customer_mapping_for_country_65_and_dates as (
        SELECT m.*, 
               CASE WHEN news_sms = 1 
                      OR third_party_sms = 1 THEN True
                    ELSE False 
               END AS use_sms,
               greatest(
                COALESCE(
                 c.create_time, 
                 m.usc_customer_create_time,
                 m.escape_customer_create_time
                 ),
                COALESCE(
                 m.usc_customer_create_time, 
                 m.escape_customer_create_time, c.create_time), 
                 COALESCE(m.escape_customer_create_time, c.create_time, 
                 m.usc_customer_create_time)
                ) as user_create_time
          FROM "publish_cab"."usc_customer" c 
          join customer_mapping m 
            on c.customer_id = m.usc_customer_id 
         where country_code = '65'),
    customer_jobs_constrained_by_parameter_dates as (
        SELECT cmcd.usc_customer_id, 
               cmcd.escape_customer_id, use_sms, 
               case when date(user_create_time) > date('{start_date}') then
                         date(user_create_time)
                    else date('{start_date}') 
                end as start_date,  
               date(b."req_pickup_dt") as booking_date,
               j.job_no, 
               j.status
          FROM customer_mapping_for_country_65_and_dates cmcd
          join "publish_cab"."escape_booking" as b
            on cmcd.escape_customer_id = b.cust_id
          join "publish_cab"."escape_job" as j
            on b.booking_id = j.booking_id
         where b.bookingdate 
               between '{start_date_short}' and '{segment_date_str_short}'
           and j.createddate 
               between '{start_date_short}' and '{segment_date_str_short}')
    ,
    escape_trips_between_constraint_dates as (
        select *   
          from "publish_cab"."escape_trip_details" 
         where createddate 
               between '{start_date_short}' and '{segment_date_str_short}')
    /*
    
    Main body.
    
    Use all of your defined chunks together.
    
    */
    SELECT usc_customer_id, 
           escape_customer_id, 
           use_sms, 
           start_date, 
           least(
               date('{segment_date_str}'), max(booking_date)) as "max_booking_date", 
           date('{segment_date_str}') as segment_date,
           count(t.job_no) as bookings, count(a.trip_no) as trips, 
           sum(total_trip_fare) as total_fare, sum(promo_amt) as "promo_amount", 
           sum(case when promo_amt > 0 then 1 
                    else 0 
               end) as promo_times
      from customer_jobs_constrained_by_parameter_dates t 
      left
      join escape_trips_between_constraint_dates a 
        on t.job_no = a.job_no
     group 
        by usc_customer_id, 
           escape_customer_id, 
           start_date, 
           use_sms
    
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search