My SQL knowledge is really basic, and need some help in translating this rather long query (running in a python script querying some AWS athena database, with some f-strings embedded) to plain English or flow chart. I understand concepts of SELECT * WHERE, JOINS etc., but this seems really advanced, renaming multiple tables and handling multiple cases.
SELECT usc_customer_id, escape_customer_id, use_sms,
start_date, least(date('{segment_date_str}'), max(booking_date)) as
"max_booking_date",
date('{segment_date_str}') as segment_date,
count(t.job_no) as bookings, count(a.trip_no) as trips,
sum(total_trip_fare) as total_fare, sum(promo_amt) as "promo_amount",
sum(case when promo_amt > 0 then 1 else 0 end) as promo_times
FROM
(
SELECT m.usc_customer_id, m.escape_customer_id, use_sms,
case when date(user_create_time) > date('{start_date}') then date(
user_create_time)
else date('{start_date}') end as start_date,
date(b."req_pickup_dt") as booking_date,
j.job_no, j.status
FROM (SELECT m.*,
CASE WHEN news_sms = 1 OR third_party_sms = 1 THEN True
ELSE False END AS use_sms,
greatest(COALESCE(c.create_time,
m.usc_customer_create_time,m.escape_customer_create_time),
COALESCE(m.usc_customer_create_time,
m.escape_customer_create_time, c.create_time),
COALESCE(m.escape_customer_create_time, c.create_time,
m.usc_customer_create_time)) as user_create_time
FROM "publish_cab"."usc_customer" c join
customer_mapping m
on c.customer_id = m.usc_customer_id where country_code =
'65') m
join "publish_cab"."escape_booking" as b
on m.escape_customer_id = b.cust_id
join "publish_cab"."escape_job" as j
on b.booking_id = j.booking_id
where b.bookingdate between '{start_date_short}' and '
{segment_date_str_short}'
and j.createddate between '{start_date_short}' and '
{segment_date_str_short}'
) t
left join (
select * from "publish_cab"."escape_trip_details"
where createddate between '{start_date_short}' and '
{segment_date_str_short}'
) a
on t.job_no = a.job_no
group by usc_customer_id, escape_customer_id, start_date, use_sms
Would be grateful if it can be translated into some sort of block diagrammatic flow chart, or corresponding pandas dataframe operations.
2
Answers
You ask "how may I understand the meaning of SELECT … ?"
Just review the rows.
(You have the data, we don’t.)
It’s an onion.
Start on the inside, and work your way out.
That
SELECT m.*, ...
creates a new relation,a new table based upon
usc_customer
.As such, it can be JOINed against
escape_booking
and
escape_job
, which in turn produces another relation.As stated these relations have a bunch of complex syntax
and can look kind of big and scary.
So tame that complexity.
Use
CREATE VIEW usc_customer_report_v AS SELECT ...
to encapsulate all those expressions into a named relationship
that you can conveniently query and JOIN against.
That 3-way JOIN could go into an
escape_v
viewfor the same reasons.
And finally the
SELECT m.usc_customer_id, ...
might go into yet another view.
At that point you’re looking at a much simpler
overall query.
And you can reason about each of the ingredients
independently, reviewing and debugging them
if you feel they don’t capture quite the proper rows
for your business problem.
Life is full of complex relationships.
Often the act of descriptively naming
those relationships is the first step
to abstracting what they mean,
and viewing them from a higher perspective.
Think about a library.
It contains alphabetic letters,
a great many letters, when viewed up close.
Take a step back and we see words, sentences,
paragraphs, chapters, books,
each with their own themes.
Armed with the Dewey Decimal system
we might step further back
and survey a bookshelf devoted
to chemistry, another to physics.
Use the power of abstraction to
zoom out to the big picture,
then zoom in to what you currently find
are the details of interest.
Putting names on things
helps us to make abstractions.
Without knowing your data model and meaning of these fields it’s pretty tough to describe it in a way that would be more clear than sql. SQL is actually remarkably well designed for dictating transformations.
… but, rewriting to use CTEs (Common table expressions, "with" clause section you see below) it might be easier to see how the names correspond to each piece.