I’ve got 3 tables I need to work with:
CREATE TABLE invoices (
id INTEGER,
number VARCHAR(256)
)
CREATE TABLE items (
invoice_id INTEGER,
total DECIMAL
)
CREATE TABLE payments (
invoice_id INTEGER,
total DECIMAL
)
I need a result set along the lines of:
invoices.id | invoices.number | item_total | payment_total | oustanding_balance |
---|---|---|---|---|
00001 | i82 | 42.50 | 42.50 | 00.00 |
00002 | i83 | 89.99 | 9.99 | 80.00 |
I tried
SELECT
invoices.*,
SUM(items.total) AS item_total,
SUM(payments.total) AS payment_total,
SUM(items.total) - SUM(payments.total) AS oustanding_balance
FROM
invoices
LEFT OUTER JOIN items ON items.invoice_id = invoices.id
LEFT OUTER JOIN payments ON payments.invoice_id = invoices.id
GROUP BY
invoices.id
But that fails. The sum for payments ends up wrong since I’m doing 2 joins here and I end up counting payments multiple times.
I ended up with
SELECT
invoices.*,
invoices.item_total - invoices.payment_total AS oustanding_balance
FROM
(
SELECT invoices.*,
(SELECT SUM(items.total FROM items WHERE items.invoice_id = invoices.id) AS item_total,
(SELECT SUM(payments.total FROM payments WHERE payments.invoice_id = invoices.id) AS payment_total
) AS invoices
But … that feels ugly. Now I’ve got subqueries going on everywhere. It DOES work, but I’m concerned about performance?
There has to be some good way to do this with joins – I’m sure I’m missing something super obvious?
2
Answers
As you say the sum behavior with multiple joins is normal and working with sub queries (Or CTE for SQl Server) is not a bad practice.
Doing such GOUP BY on an ID and a total in sub queries won’t significantly downgrade your performance (depending on your tables sizes).
Another solution could be doing one SUM sub query for each column you need. It would be easier to understand this way I think :
I think a common table expression (or in this case two CTEs) will give you what you want. You are using something called a scalar, which is precisely speaking not wrong, but as you correctly identified is ugly, hard to read, hard to maintain and can be non-performant in many situations.
CTE essentially take a query and makes it "behave" like a table. We define it once and then we can refer to it later.
Untested, but hopefully you get the idea.