skip to Main Content

I have a trade history table that looks like this:

CREATE TABLE trade (
    trade_id INT PRIMARY KEY,
    stock_id INT NOT NULL,
    date DATE NOT NULL,
    sell BOOLEAN NOT NULL,
    shares INT NOT NULL,
    usd_total DECIMAL(10, 2) NOT NULL
);

I want to query my trade record for a tax report. Something that result in

stock_id buy_date sell_date shares buy_usd_total sell_usd_total

meaning each sell trade would be matched with one or more rows, each time "spending" shares from the buy rows then moving on to the next.

For example trades of

trade_id stock_id date sell shares usd_total
1 1 2021-01-01 0 2 100.00
2 1 2021-01-02 0 2 120.00
3 1 2021-01-03 1 3 300.00
4 1 2021-01-04 1 1 150.00

should result in

stock_id buy_date sell_date shares buy_usd_total sell_usd_total
1 2021-01-01 2021-01-03 2 100.00 200.00
1 2021-01-02 2021-01-03 1 60.00 100.00
1 2021-01-02 2021-01-04 1 60.00 150.00

Is this something that’s possible with SQL? I’ve tried to wrap my head around it, but keeping score of the remaining shares of buys and sells is difficult. Should I push forward or go with an iterative approach in python?

2

Answers


  1. You can enumerate individual share purchase/sale events and self-join to match the 1st purchase to the 1st sale, 2nd purchase to the 2nd sale and so on: demo at db<>fiddle

    with atomic_operation as (
      select*,usd_total/shares as usd
             ,count(*)over w1 as nth_operation
      from trade cross join lateral generate_series(1,shares)n
      window w1 as (partition by stock_id,sell 
                    order by date,n) )
    select purchase.stock_id
          ,purchase.date as buy_date
          ,sale.date as sell_date
          ,count(*) as shares
          ,sum(purchase.usd) as buy_usd_total
          ,sum(sale.usd) as sell_usd_total
    from atomic_operation as purchase
    
    left join atomic_operation as sale
    on purchase.nth_operation=sale.nth_operation
    and purchase.stock_id=sale.stock_id
    where not purchase.sell
      and sale.sell
    group by 1,2,3
    order by 1,2,3;
    
    stock_id buy_date sell_date shares buy_usd_total sell_usd_total
    1 2021-01-01 2021-01-03 2 100.0 200.0
    1 2021-01-02 2021-01-03 1 60.0 100.0
    1 2021-01-02 2021-01-04 1 60.0 150.0
    1. count(*)over w1 uses the window (partition by stock_id,sell order by date,n). It’ll keep a separate running/stepping/tumbling count for each stock_id, also separating sales from purchases, following the buy/sell event date.
    2. cross join lateral generate_series(1,shares) clones each trade row as many times as there were shares in the event, to produce a separate sale/purchase row for each individual share. The usd_total/shares establishes their per-share price.
    3. Aggregate functions compact the list of individually matched events into groups, as per your example.
    Login or Signup to reply.
  2. My answer is going to be much more verbose than the answer by Zegarek but I believe it works correctly with data beyond the sample dataset. I use recursion in the initial CTE to break out the individual shares for each transaction then aggregate them in terms of buys and sells so I can match them back up per-share to get not only the requested output, but also things like days held and gain/loss.

    Fiddle: https://dbfiddle.uk/I7-8vTr3

    with recursive shares as (
        -- get each transaction, break out cost per share, and create a first row for the first of X shares in the transaction
        select 
              trade_id
            , stock_id
            , date
            , sell
            , shares        
            , usd_total
            , usd_total / shares as price_per_share
            , 1 as share_number 
        from trade
    
        union all
    
        -- recursively break out an additional row for each share of the transaction
        select 
              t.trade_id
            , t.stock_id
            , t.date
            , t.sell
            , t.shares
            , t.usd_total
            , t.usd_total / t.shares as price_per_share
            , 1 + s.share_number as share_number 
        from trade t
        inner join shares s
          on s.trade_id = t.trade_id
         and s.share_number < t.shares
    ), bought_shares as (
        -- get all of the individual shares bought and sequence them in order they were bought
        select 
              trade_id
            , stock_id
            , date
            , price_per_share
            , share_number
            , row_number() over (partition by stock_id, sell order by date, trade_id) as buy_seq
        from shares
        where sell = False
    ), sold_shares as (
        -- get all of the individual shares sold and sequence them in order they were sold
        select 
              trade_id
            , stock_id
            , date
            , price_per_share
            , share_number
            , row_number() over (partition by stock_id, sell order by date, trade_id) as sell_seq
        from shares
        where sell = True
    ), sale_details as (
        -- now we match sold shares to the purchased shares to report on sales using FIFO logic on a per-share basis
        select 
              s.stock_id
            , b.date as buy_date
            , s.date as sell_date
            , coalesce(cast(s.date - b.date as varchar), 'Unknown') as days_held -- handles bad data with an unmatched sell
            , s.price_per_share - b.price_per_share as gain_or_loss
            , b.price_per_share as buy_usd_per_share
            , s.price_per_share as sell_usd_per_share
        from sold_shares s
        left join bought_shares b -- inner join if you don't want to deal with data including a sale without a buy
          on b.stock_id = s.stock_id
         and b.buy_seq = s.sell_seq
    )
    select 
          stock_id
        , buy_date
        , sell_date
        --, max(days_held) as days_held
        , count(*) as shares
        , cast(sum(buy_usd_per_share) as decimal(18,2)) as buy_usd_total
        , cast(sum(sell_usd_per_share)     as decimal(18,2)) as sell_usd_total
        --, sum(gain_or_loss) as gain_or_loss
    from sale_details
    group by
        stock_id
        , buy_date
        , sell_date;
    

    The result set is the same as Zegarek’s:
    answer results

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search