skip to Main Content

I want to know if there is an equivalent to the pg_stat_progress_vacuum() on general queries. For example, in my use-case there is a query that I know it is suppose to perform a seq_scan and I would like to know what is the progress.

2

Answers


  1. No, there is no equivalent for that. If you have debugging symbols installed on the server, you could attach to the backend with a debugger and poke around. But that requires some inside knowledge of PostgreSQL and can hardly be recommended for casual users.

    Login or Signup to reply.
  2. If the query involves multiple stages involving WITH CTEs and/or subqueries, and runs long enough for you to consider this sort of granular monitoring, you might consider splitting it up into a series of separate, standalone statements. The inner subqueries/earlier CTEs write to temp tables, then the outer/later ones read from those.
    That way you can see the progress but also have some sort of checkpointing, where you can halt the operation mid-way and resume it later, because you get to keep the intermediate results from the steps you finished earlier.

    If there’s a risk the split would let other traffic on the db interfere with the results (state of your source tables could change between any two steps, affecting consistency), you can wrap it in a repeatable read transaction.

    You can also use pagination, querying subsets/slices of your sources separately. It could require way more work to rethink your statements to enable that without losing consistency and the total execution time of your multiple statements against the subsets will sum up to more than a single statement would need, but that could be a price you’re willing to pay. Each statement would complete in a fraction of the total time, and you can track how many slices were finished. Having more control over the distribution of the work, you can even try and force Postgres to process those in parallel, but the success of that heavily relies on how you set up your tables and indices as well as what the query does exactly.

    Quite often both the split and pagination might just slow things down, pointlessly interfering with the planning and optimisation that performs worse the less context it has. Still, in some cases you might get away with zero performance loss, while gaining the insight, checkpointing and reduced exec time through forced parallel execution. It all depends on your exact query, but in extreme cases it could even help to split up the statements just so that you can build an index on the intermediate tables between the stages.


    If you just want a "progress bar", you could hold on to a sample result of an explain analyze and if not much changes between runs of the same query, you can expect it to do the same things at the same pace. Knowing for how long the new query has been running you can look at the timings in the sample and guesstimate where it’s at.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search