skip to Main Content

I have 20+ materialized views that I use as some sort of pre-calculated cache for a number of operations. Each night at 3 AM I do automated sync with an external source, followed by a series of materialized views refreshing.

This is where it gets weird. Each morning, at the beginning of the workday, I execute

SELECT * FROM dbo.materialized_view

and then execute the actual materialized view’s query (specified during the CREATE MATERIALIZED VIEW).

I get different results.

The database source does not change after the sync process completes at around 3:10 AM. Originally, REFRESH SQL queries were getting executed from within the same Java sync process after the transaction commits, but since that didn’t work, I created a cron job at 4 AM to do the same. That still did not help.

As soon as I manually refresh views in the morning (around 6-7 AM), the results update

Is there something incredibly funky I am missing about how materialized views refresh?

Update:

Here is one of the materialized views I refresh:

create materialized view dbo.ind_newborn_daily as
SELECT dt.date_time::DATE   AS date,
       s.source_id          AS staff_id,
       b.status,
       COUNT(DISTINCT b.id) AS count
FROM dbo.newborn nb
         JOIN dbo.date_time dt ON nb.created_datetime_id = dt.id
         JOIN dbo.staff s ON nb.staff_id = s.id
         JOIN dbo.baby b ON nb.baby_id = b.id
WHERE nb.current = TRUE
GROUP BY CUBE ((dt.date_time::DATE), s.source_id, b.status);

And I update it via:

REFRESH MATERIALIZED VIEW dbo.ind_newborn_daily;

Update 2

Upon reviewing the Postgres statement logs, I’ve found that refresh statements were executed. though, I failed to mention those are processed by a parallel worker queue (4 threads).

As a result, each refresh statement is wrapped in a separate transaction AND those transitions are shown somewhat intertwined in logs:

2022-09-24 03:06:36.679 CEST,"XZY","XZYdb",17629,"127.0.0.1:44326",632e581c.44dd,3,"BEGIN",2022-09-24 03:06:36 CEST,5/78,0,LOG,00000,"execute <unnamed>: BEGIN",,,,,,,,,"PostgreSQL JDBC Driver","client backend",,0
2022-09-24 03:06:36.679 CEST,"XZY","XZYdb",17629,"127.0.0.1:44326",632e581c.44dd,4,"REFRESH MATERIALIZED VIEW",2022-09-24 03:06:36 CEST,5/78,0,LOG,00000,"execute <unnamed>: REFRESH MATERIALIZED VIEW dbo.view1",,,,,,,,,"PostgreSQL JDBC Driver","client backend",,0
2022-09-24 03:06:36.695 CEST,,,17579,,632e581b.44ab,1,,2022-09-24 03:06:35 CEST,13/3,6157,LOG,00000,"temporary file: path ""base/pgsql_tmp/pgsql_tmp17579.0"", size 3219456",,,,,,"REFRESH MATERIALIZED VIEW dbo.view2",,,"PostgreSQL JDBC Driver","parallel worker",17570,0
2022-09-24 03:06:36.696 CEST,,,17578,,632e581b.44aa,1,,2022-09-24 03:06:35 CEST,12/3,6157,LOG,00000,"temporary file: path ""base/pgsql_tmp/pgsql_tmp17578.0"", size 4087808",,,,,,"REFRESH MATERIALIZED VIEW dbo.view2",,,"PostgreSQL JDBC Driver","parallel worker",17570,0
2022-09-24 03:06:36.713 CEST,"XZY","XZYdb",17631,"127.0.0.1:44330",632e581c.44df,1,"SET",2022-09-24 03:06:36 CEST,6/29,0,LOG,00000,"execute <unnamed>: SET extra_float_digits = 3",,,,,,,,,"","client backend",,0
2022-09-24 03:06:36.713 CEST,"XZY","XZYdb",17631,"127.0.0.1:44330",632e581c.44df,2,"SET",2022-09-24 03:06:36 CEST,6/30,0,LOG,00000,"execute <unnamed>: SET application_name = 'PostgreSQL JDBC Driver'",,,,,,,,,"","client backend",,0
2022-09-24 03:06:36.713 CEST,"XZY","XZYdb",17631,"127.0.0.1:44330",632e581c.44df,3,"BEGIN",2022-09-24 03:06:36 CEST,6/31,0,LOG,00000,"execute <unnamed>: BEGIN",,,,,,,,,"PostgreSQL JDBC Driver","client backend",,0
2022-09-24 03:06:36.713 CEST,"XZY","XZYdb",17631,"127.0.0.1:44330",632e581c.44df,4,"REFRESH MATERIALIZED VIEW",2022-09-24 03:06:36 CEST,6/31,0,LOG,00000,"execute <unnamed>: REFRESH MATERIALIZED VIEW dbo.view3",,,,,,,,,"PostgreSQL JDBC Driver","client backend",,0
2022-09-24 03:06:36.737 CEST,"XZY","XZYdb",17570,"127.0.0.1:44232",632e581b.44a2,5,"REFRESH MATERIALIZED VIEW",2022-09-24 03:06:35 CEST,7/7,6157,LOG,00000,"temporary file: path ""base/pgsql_tmp/pgsql_tmp17570.2"", size 4038656",,,,,,"REFRESH MATERIALIZED VIEW dbo.view2",,,"PostgreSQL JDBC Driver","client backend",,0
2022-09-24 03:06:36.838 CEST,"XZY","XZYdb",17615,"127.0.0.1:44304",632e581c.44cf,5,"COMMIT",2022-09-24 03:06:36 CEST,4/189,6177,LOG,00000,"execute S_1: COMMIT",,,,,,,,,"PostgreSQL JDBC Driver","client backend",,0
2022-09-24 03:06:36.968 CEST,"XZY","XZYdb",17570,"127.0.0.1:44232",632e581b.44a2,6,"REFRESH MATERIALIZED VIEW",2022-09-24 03:06:35 CEST,7/7,6157,LOG,00000,"temporary file: path ""base/pgsql_tmp/pgsql_tmp17570.4"", size 2703360",,,,,,"REFRESH MATERIALIZED VIEW dbo.view2",,,"PostgreSQL JDBC Driver","client backend",,0
2022-09-24 03:06:36.992 CEST,"XZY","XZYdb",17631,"127.0.0.1:44330",632e581c.44df,5,"COMMIT",2022-09-24 03:06:36 CEST,6/31,6183,LOG,00000,"execute S_1: COMMIT",,,,,,,,,"PostgreSQL JDBC Driver","client backend",,0

Later on, at 4 AM, there goes the cron job, which is synchronous (one view at a time), though I did not wrap these in a transaction:

2022-09-24 04:00:08.866 CEST,"postgres","Xdb",20646,"[local]",632e64a1.50a6,422,"idle",2022-09-24 04:00:01 CEST,3/2638,0,LOG,00000,"statement: REFRESH MATERIALIZED VIEW dbo.view1;",,,,,,,,,"psql","client backend",,0
2022-09-24 04:00:09.842 CEST,,,20662,,632e64a8.50b6,1,,2022-09-24 04:00:08 CEST,5/93,6269,LOG,00000,"temporary file: path ""base/pgsql_tmp/pgsql_tmp20662.0"", size 3620864",,,,,,"REFRESH MATERIALIZED VIEW dbo.view1;",,,"psql","parallel worker",20646,0
2022-09-24 04:00:09.842 CEST,,,20661,,632e64a8.50b5,1,,2022-09-24 04:00:08 CEST,4/208,6269,LOG,00000,"temporary file: path ""base/pgsql_tmp/pgsql_tmp20661.0"", size 3612672",,,,,,"REFRESH MATERIALIZED VIEW dbo.view1;",,,"psql","parallel worker",20646,0
2022-09-24 04:00:09.875 CEST,"postgres","Xdb",20646,"[local]",632e64a1.50a6,423,"REFRESH MATERIALIZED VIEW",2022-09-24 04:00:01 CEST,3/2638,6269,LOG,00000,"temporary file: path ""base/pgsql_tmp/pgsql_tmp20646.42"", size 4554752",,,,,,"REFRESH MATERIALIZED VIEW dbo.view1;",,,"psql","client backend",,0
2022-09-24 04:00:10.354 CEST,"postgres","Xdb",20646,"[local]",632e64a1.50a6,424,"REFRESH MATERIALIZED VIEW",2022-09-24 04:00:01 CEST,3/2638,6269,LOG,00000,"temporary file: path ""base/pgsql_tmp/pgsql_tmp20646.41"", size 10829824",,,,,,"REFRESH MATERIALIZED VIEW dbo.view1;",,,"psql","client backend",,0
2022-09-24 04:00:10.687 CEST,"postgres","Xdb",20646,"[local]",632e64a1.50a6,425,"REFRESH MATERIALIZED VIEW",2022-09-24 04:00:01 CEST,3/2638,6269,LOG,00000,"temporary file: path ""base/pgsql_tmp/pgsql_tmp20646.8.sharedfileset/i2of16.p0.0"", size 655360",,,,,,"REFRESH MATERIALIZED VIEW dbo.view1;",,,"psql","client backend",,0
...
...
2022-09-24 04:00:10.701 CEST,"postgres","Xdb",20646,"[local]",632e64a1.50a6,518,"REFRESH MATERIALIZED VIEW",2022-09-24 04:00:01 CEST,3/2638,6269,LOG,00000,"temporary file: path ""base/pgsql_tmp/pgsql_tmp20646.43"", size 10829824",,,,,,"REFRESH MATERIALIZED VIEW dbo.view1;",,,"psql","client backend",,0
2022-09-24 04:00:10.702 CEST,"postgres","Xdb",20646,"[local]",632e64a1.50a6,519,"REFRESH MATERIALIZED VIEW",2022-09-24 04:00:01 CEST,3/2638,6269,LOG,00000,"temporary file: path ""base/pgsql_tmp/pgsql_tmp20646.40"", size 3588096",,,,,,"REFRESH MATERIALIZED VIEW dbo.view1;",,,"psql","client backend",,0

As you can see, logs are pretty much littered with the usage of temporary files, but I expected those as views are big.

Now comes the manual refresh at 8:11 AM from DataGrip:

2022-09-24 08:11:04.997 CEST,"X","Xdb",1088,"127.0.0.1:44432",632e9f78.440,1,"SET",2022-09-24 08:11:04 CEST,3/3703,0,LOG,00000,"execute <unnamed>: SET extra_float_digits = 3",,,,,,,,,"","client backend",,0
2022-09-24 08:11:05.034 CEST,"X","Xdb",1088,"127.0.0.1:44432",632e9f78.440,2,"SET",2022-09-24 08:11:04 CEST,3/3704,0,LOG,00000,"execute <unnamed>: SET application_name = ''",,,,,,,,,"","client backend",,0
2022-09-24 08:11:05.063 CEST,"X","Xdb",1088,"127.0.0.1:44432",632e9f78.440,3,"SELECT",2022-09-24 08:11:04 CEST,3/3705,0,LOG,00000,"execute <unnamed>: select version()",,,,,,,,,"","client backend",,0
2022-09-24 08:11:05.096 CEST,"X","Xdb",1088,"127.0.0.1:44432",632e9f78.440,4,"SET",2022-09-24 08:11:04 CEST,3/3706,0,LOG,00000,"execute <unnamed>: SET application_name = 'DataGrip 2022.2.4'",,,,,,,,,"","client backend",,0
2022-09-24 08:11:05.172 CEST,"X","Xdb",1088,"127.0.0.1:44432",632e9f78.440,5,"SHOW",2022-09-24 08:11:04 CEST,3/3708,0,LOG,00000,"execute <unnamed>: SHOW TRANSACTION ISOLATION LEVEL",,,,,,,,,"DataGrip 2022.2.4","client backend",,0
2022-09-24 08:11:05.232 CEST,"X","Xdb",1088,"127.0.0.1:44432",632e9f78.440,6,"REFRESH MATERIALIZED VIEW",2022-09-24 08:11:04 CEST,3/3709,0,LOG,00000,"execute <unnamed>: refresh materialized view dbo.view1",,,,,,,,,"DataGrip2022.2.4","client backend",,0
2022-09-24 08:11:06.236 CEST,,,1089,,632e9f79.441,1,,2022-09-24 08:11:05 CEST,4/274,6372,LOG,00000,"temporary file: path ""base/pgsql_tmp/pgsql_tmp1089.0"", size 3604480",,,,,,"refresh materialized view dbo.view1",,,"DataGrip 2022.2.4","parallel worker",1088,0
2022-09-24 08:11:06.237 CEST,,,1090,,632e9f79.442,1,,2022-09-24 08:11:05 CEST,5/147,6372,LOG,00000,"temporary file: path ""base/pgsql_tmp/pgsql_tmp1090.0"", size 3645440",,,,,,"refresh materialized view dbo.view1",,,"DataGrip 2022.2.4","parallel worker",1088,0
2022-09-24 08:11:06.263 CEST,"X","Xdb",1088,"127.0.0.1:44432",632e9f78.440,7,"REFRESH MATERIALIZED VIEW",2022-09-24 08:11:04 CEST,3/3709,6372,LOG,00000,"temporary file: path ""base/pgsql_tmp/pgsql_tmp1088.2"", size 4554752",,,,,,"refresh materialized view dbo.view1",,,"DataGrip 2022.2.4","client backend",,0
2022-09-24 08:11:06.742 CEST,"X","Xdb",1088,"127.0.0.1:44432",632e9f78.440,8,"REFRESH MATERIALIZED VIEW",2022-09-24 08:11:04 CEST,3/3709,6372,LOG,00000,"temporary file: path ""base/pgsql_tmp/pgsql_tmp1088.1"", size 10829824",,,,,,"refresh materialized view dbo.view1",,,"DataGrip 2022.2.4","client backend",,0
...
...
2022-09-24 08:11:07.072 CEST,"X","Xdb",1088,"127.0.0.1:44432",632e9f78.440,102,"REFRESH MATERIALIZED VIEW",2022-09-24 08:11:04 CEST,3/3709,6372,LOG,00000,"temporary file: path ""base/pgsql_tmp/pgsql_tmp1088.3"", size 10829824",,,,,,"refresh materialized view dbo.view1",,,"DataGrip 2022.2.4","client backend",,0
2022-09-24 08:11:07.072 CEST,"X","Xdb",1088,"127.0.0.1:44432",632e9f78.440,103,"REFRESH MATERIALIZED VIEW",2022-09-24 08:11:04 CEST,3/3709,6372,LOG,00000,"temporary file: path ""base/pgsql_tmp/pgsql_tmp1088.0"", size 3579904",,,,,,"refresh materialized view dbo.view1",,,"DataGrip 2022.2.4","client backend",,0

The only difference I spotted so far was that both Java-based and manual refresh had:

execute <unnamed>: REFRESH MATERIALIZED VIEW

whereas the one from cron-job only had:

statement: REFRESH MATERIALIZED VIEW

Update #3

2022-09-24 03:00:00.814 CEST,"X","Xdb",17112,"127.0.0.1:44060",632e5690.42d8,1,"SET",2022-09-24 03:00:00 CEST,3/2370,0,LOG,00000,"execute <unnamed>: SET extra_float_digits = 3",,,,,,,,,"","client backend",,0
2022-09-24 03:00:00.815 CEST,"X","Xdb",17112,"127.0.0.1:44060",632e5690.42d8,2,"SET",2022-09-24 03:00:00 CEST,3/2371,0,LOG,00000,"execute <unnamed>: SET application_name = 'PostgreSQL JDBC Driver'",,,,,,,,,"","client backend",,0
2022-09-24 03:00:00.815 CEST,"X","Xdb",17112,"127.0.0.1:44060",632e5690.42d8,3,"SHOW",2022-09-24 03:00:00 CEST,3/2372,0,LOG,00000,"execute <unnamed>: SHOW TRANSACTION ISOLATION LEVEL",,,,,,,,,"PostgreSQL JDBC Driver","client backend",,0
2022-09-24 03:00:00.815 CEST,"X","Xdb",17112,"127.0.0.1:44060",632e5690.42d8,4,"SET",2022-09-24 03:00:00 CEST,3/2373,0,LOG,00000,"execute <unnamed>: SET SESSION CHARACTERISTICS AS TRANSACTION ISOLATION LEVEL READ UNCOMMITTED",,,,,,,,,"PostgreSQL JDBC Driver","client backend",,0
...
After a while sync job completes
...
2022-09-24 03:06:35.311 CEST,"X","Xdb",17112,"127.0.0.1:44060",632e5690.42d8,77697,"COMMIT",2022-09-24 03:00:00 CEST,3/2374,6153,LOG,00000,"execute S_35: COMMIT",,,,,,,,,"PostgreSQL JDBC Driver","client backend",,0
2022-09-24 03:06:35.322 CEST,"X","Xdb",17112,"127.0.0.1:44060",632e5690.42d8,77698,"SET",2022-09-24 03:00:00 CEST,3/2375,0,LOG,00000,"execute <unnamed>: SET SESSION CHARACTERISTICS AS TRANSACTION ISOLATION LEVEL READ COMMITTED",,,,,,,,,"PostgreSQL JDBC Driver","client backend",,0
2022-09-24 03:06:35.355 CEST,"X","Xdb",17568,"127.0.0.1:44228",632e581b.44a0,1,"SET",2022-09-24 03:06:35 CEST,4/171,0,LOG,00000,"execute <unnamed>: SET extra_float_digits = 3",,,,,,,,,"","client backend",,0
2022-09-24 03:06:35.355 CEST,"X","Xdb",17568,"127.0.0.1:44228",632e581b.44a0,2,"SET",2022-09-24 03:06:35 CEST,4/172,0,LOG,00000,"execute <unnamed>: SET application_name = 'PostgreSQL JDBC Driver'",,,,,,,,,"","client backend",,0
2022-09-24 03:06:35.355 CEST,"X","Xdb",17571,"127.0.0.1:44234",632e581b.44a3,1,"SET",2022-09-24 03:06:35 CEST,6/17,0,LOG,00000,"execute <unnamed>: SET extra_float_digits = 3",,,,,,,,,"","client backend",,0
2022-09-24 03:06:35.355 CEST,"X","Xdb",17571,"127.0.0.1:44234",632e581b.44a3,2,"SET",2022-09-24 03:06:35 CEST,6/18,0,LOG,00000,"execute <unnamed>: SET application_name = 'PostgreSQL JDBC Driver'",,,,,,,,,"","client backend",,0
2022-09-24 03:06:35.355 CEST,"X","Xdb",17569,"127.0.0.1:44230",632e581b.44a1,1,"SET",2022-09-24 03:06:35 CEST,5/48,0,LOG,00000,"execute <unnamed>: SET extra_float_digits = 3",,,,,,,,,"","client backend",,0
2022-09-24 03:06:35.356 CEST,"X","Xdb",17569,"127.0.0.1:44230",632e581b.44a1,2,"SET",2022-09-24 03:06:35 CEST,5/49,0,LOG,00000,"execute <unnamed>: SET application_name = 'PostgreSQL JDBC Driver'",,,,,,,,,"","client backend",,0
2022-09-24 03:06:35.356 CEST,"X","Xdb",17569,"127.0.0.1:44230",632e581b.44a1,3,"BEGIN",2022-09-24 03:06:35 CEST,5/50,0,LOG,00000,"execute <unnamed>: BEGIN",,,,,,,,,"PostgreSQL JDBC Driver","client backend",,0
... The refresh statements go on and on

It turns out that the sync job runs in a transaction isolation level READ UNCOMMITTED, but view refresh tasks run at READ COMMITTED level

On the other hand, DataGrip does not even set the isolation level, but only queries it…

2

Answers


  1. Chosen as BEST ANSWER

    Edit:

    I realize that my "solution" is kinda hacky and not really correct. Frustration with this problem caused me to jump to conclusion, but it is in fact invalid.

    As stated here: https://wiki.postgresql.org/wiki/Don't_Do_This#Don't_use_timestamp_(without_time_zone)

    I later realized that my Linux box is set (and Postgres) in CET TZ, however, all my timestamps are in UTC and my clients use UTC.

    Original answer:

    After several months of the original question, I can finally put an end to this. Up until now, every single morning, I had to refresh views manually in order to work. Finally, I dedicated some time to investigating and solving this.

    Root cause:

    • The database server is running in the CET timezone
    • The application server is running in the CET timezone
    • All clients (JetBrains Data Grip / Tableau) are running in UTC timezone
    • Application server executed code against materialized views
    • Tableau executed arbitrary SQL
    • I stored all datetimes WITH TIMEZONE
    • All events happing after 11 PM, were pushed to the next day...

    I heard this one before: Always store the datetimes in UTC (or without timezone). But mistakes happen.

    I guess this is a meme-worthy mistake:

    enter image description here

    Thank you all for the elaborate discussion and for bearing with at the time non-sense "facts" :)


  2. Answer you are looking for is that materialized views are not updated immediately. They are updated in the background. You can try to run the following query to see if the materialized view is still updating:

    SELECT pg_is_in_recovery(); -- returns true if the server is a standby server
    
    SELECT pg_last_xact_replay_timestamp(); -- returns the timestamp of the last transaction that was replayed on the standby server
    
    SELECT pg_last_xact_replay_timestamp() > pg_last_xact_replay_timestamp(); -- returns true if the last transaction that was replayed on the standby server is newer than the last transaction that was replayed on the primary server
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search