skip to Main Content

I’m working on a recursive SQL query in PostgreSQL to find the minimum cost required to reach various cities from Montreal. The data follows a directed acyclic graph (DAG) structure, where each flight connection is represented in a Flights table with columns src (source city), dst (destination city), and cost (flight cost).

For example:

Route 1: Montreal → Toronto → LA, total cost: $2000
Route 2: Montreal → Chicago → LA, total cost: $1800

In this case, the query should return only the cheapest route to LA with a total cost of $1800.

The query I’ve written finds the cheapest paths, but I’m looking for guidance on optimizing it further.

Specifically, I want to:

  • Ensure the recursive query filters paths to maintain only the minimum total cost for each destination.
  • Avoid revisiting connections (or cities) that have already been included in a path.

Any suggestions on improving the performance or structure of this query?

Thanks for your help!

WITH RECURSIVE CheapestFlight AS 
(
    SELECT src, dst, cost AS total_price
    FROM Flights
    WHERE src = 'Montreal'
    UNION
    SELECT cf.src, f.dst, cf.total_price + f.cost AS total_price
    FROM CheapestFlight cf
    INNER JOIN Flights f ON f.src = cf.dst
)
SELECT dst as destination, total_price
FROM CheapestFlight cf
WHERE total_price = (SELECT MIN(total_price)
                     FROM CheapestFlight
                     WHERE dst = cf.dst)

2

Answers


  1. I would eliminate cycles in the recursive query and use DISTINCT ON in the main query:

    WITH RECURSIVE CheapestFlight AS 
    (
        SELECT src, dst,
               ARRAY[dst] AS path,
               cost AS total_price
        FROM Flights
        WHERE src = 'Montreal'
        UNION
        SELECT cf.src, f.dst,
               cf.path || f.dst AS path,
               cf.total_price + f.cost AS total_price
        FROM CheapestFlight cf
        INNER JOIN Flights f ON f.src = cf.dst
        WHERE NOT ARRAY[f.dst] <@ cf.path
    )
    SELECT DISTINCT ON (dst) 
           dst as destination,
           total_price
    FROM CheapestFlight cf
    ORDER BY dst, total_price;
    

    But that’s certainly not the best possible algorithm out there.

    Login or Signup to reply.
  2. With table flights like here …

    src dst cost
    Montreal Toronto 500
    Montreal Chicago 800
    Montreal LA 2100
    Toronto LA 1500
    Toronto Chicago 400
    Chicago LA 1000

    … you could get your paths and costs combining a few cte-s …

    WITH
      --  sources with the list of the destinations
      dirs AS
        ( Select src, STRING_AGG(dst, '-') as destinations
          From   flights
          Group By src
          Order By src
        ),
      --  path combinations per source
      paths AS
        ( Select     f.src, f.dst, f.cost
          From       flights f
          Left Join  dirs d ON( d.src = f.src And 
                                Position('-' || dst || '-' IN '-' || destinations || '-') > 0 )
        ), 
      -- costs
      costs AS 
        ( Select      p.*, 
                      p2.src as src_2, p2.dst as dst_2, Coalesce(p2.cost, 0) as cost_2, 
                      p3.src as src_3, p3.dst as dst_3, Coalesce(p3.cost, 0) as cost_3, 
                      p.src || 
                      Case When p2.src Is Not Null Then '-' || p2.src 
                      Else '-' 
                      End ||
                        Case When p2.dst Is Not Null And p3.dst Is Not Null 
                                  Then '-' || p2.dst || '-' || p3.dst 
                             When p2.dst Is Not Null And p3.dst Is Null
                             Then '-' || p2.dst
                        Else p.dst 
                        End as path, 
                      p.cost + Coalesce(p2.cost, 0) + Coalesce(p3.cost, 0) as total_cost
          From        paths p
          Left Join   paths p2 ON(p2.src = p.dst)
          Left Join   paths p3 ON(p2.src = p.dst And p3.src = p2.dst) 
        ), 
      paths_costs AS 
      --  paths and costs for all relations - resultset is filtered in Main SQL below - for Montreal-LA ( the cheapest one ) 
        ( Select   src, Substr(path, (Length(path) - Position('-' IN Reverse(path))) + 2) as dst, 
                   path, total_cost
          From     costs
        )  
    
    --      M a i n    S Q L :
    --      filtered for Montreal-LA and the lowest total_cost
    SELECT   pc.src, pc.dst, pc.path, pc.total_cost
    FROM     paths_costs pc
    WHERE    src = 'Montreal' And dst = 'LA' And
             pc.total_cost  = ( Select Min(total_cost) From paths_costs Where src = pc.src And dst = pc.dst )
    

    R e s u l t :

    src dst path total_cost
    Montreal LA Montreal-Chicago-LA 1800

    NOTE: This will work for paths with maximum of 2 flight changes (4 airports from src to dst – both included)

    fiddle

    The costs cte resultset containes all you need for different reportings:

    ... ... ...
    Select   *   From  costs;
    

    cte costs resultset:

    src dst cost src_2 dst_2 cost_2 src_3 dst_3 cost_3 path total_cost
    Montreal Toronto 500 Toronto Chicago 400 Chicago LA 1000 Montreal-Toronto-Chicago-LA 1900
    Montreal Toronto 500 Toronto LA 1500 null null 0 Montreal-Toronto-LA 2000
    Montreal Chicago 800 Chicago LA 1000 null null 0 Montreal-Chicago-LA 1800
    Montreal LA 2100 null null 0 null null 0 Montreal-LA 2100
    Toronto LA 1500 null null 0 null null 0 Toronto-LA 1500
    Toronto Chicago 400 Chicago LA 1000 null null 0 Toronto-Chicago-LA 1400
    Chicago LA 1000 null null 0 null null 0 Chicago-LA 1000
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search