skip to Main Content

I’ve got 3 tables I need to work with:

CREATE TABLE invoices (
  id INTEGER,
  number VARCHAR(256)
)

CREATE TABLE items (
  invoice_id INTEGER,
  total DECIMAL
)

CREATE TABLE payments (
  invoice_id INTEGER,
  total DECIMAL
)

I need a result set along the lines of:

invoices.id invoices.number item_total payment_total oustanding_balance
00001 i82 42.50 42.50 00.00
00002 i83 89.99 9.99 80.00

I tried

SELECT 
  invoices.*, 
  SUM(items.total) AS item_total, 
  SUM(payments.total) AS payment_total, 
  SUM(items.total) - SUM(payments.total) AS oustanding_balance 
FROM 
  invoices 
  LEFT OUTER JOIN items ON items.invoice_id = invoices.id 
  LEFT OUTER JOIN payments ON payments.invoice_id = invoices.id 
GROUP BY 
  invoices.id

But that fails. The sum for payments ends up wrong since I’m doing 2 joins here and I end up counting payments multiple times.

I ended up with

SELECT
  invoices.*,
  invoices.item_total - invoices.payment_total AS oustanding_balance
FROM
  (
    SELECT invoices.*,
    (SELECT SUM(items.total FROM items WHERE items.invoice_id = invoices.id) AS item_total,
    (SELECT SUM(payments.total FROM payments WHERE payments.invoice_id = invoices.id) AS payment_total
  ) AS invoices

But … that feels ugly. Now I’ve got subqueries going on everywhere. It DOES work, but I’m concerned about performance?

There has to be some good way to do this with joins – I’m sure I’m missing something super obvious?

2

Answers


  1. As you say the sum behavior with multiple joins is normal and working with sub queries (Or CTE for SQl Server) is not a bad practice.

    Doing such GOUP BY on an ID and a total in sub queries won’t significantly downgrade your performance (depending on your tables sizes).

    Another solution could be doing one SUM sub query for each column you need. It would be easier to understand this way I think :

    SELECT
      invoices.id
      , i_total.total as item_total
      , p_total.total aspayment_total
      , ( i_total.total - p_total.total) as outstanding_balance
    FROM 
      invoices 
    LEFT JOIN (
      SELECT invoice_id, SUM(total) as total FROM items GROUP BY invoice_id
    ) i_total 
      ON i_total.invoice_id = invoices.id
    LEFT JOIN (
      SELECT invoice_id, SUM(total) as total FROM payments GROUP BY invoice_id
    ) p_total 
      ON p_total.invoice_id = invoices.id
    
    Login or Signup to reply.
  2. I think a common table expression (or in this case two CTEs) will give you what you want. You are using something called a scalar, which is precisely speaking not wrong, but as you correctly identified is ugly, hard to read, hard to maintain and can be non-performant in many situations.

    CTE essentially take a query and makes it "behave" like a table. We define it once and then we can refer to it later.

    with item_data as (
      SELECT invoice_id, SUM(total) as item_total
      FROM items
      group by invoice_id
    ),
    payment_data as (
      SELECT invoice_id, SUM(total) as payment_total
      FROM payments
      group by invoice_id
    )
    select
      i.*,
      id.item_total - pd.payment_total as outstanding_balance
    from
      invoices i
      join item_data id on i.invoice_id = id.invoice_id
      join payment_data pd on i.invoice_id = pd.invoice_id
    

    Untested, but hopefully you get the idea.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search