Neo4j 4.2.1 Community edition on Ubuntu Server 20.04
A database I administer is failing to start with this error:
"Caused by: java.lang.RuntimeException: java.lang.RuntimeException: Error reading transaction logs, recovery not possible. To force the database to start anyway, you can specify 'unsupported.dbms.tx_log.fail_on_corrupted_log_files=false'. This will try to recover as much as possible and then truncate the corrupt part of the transaction log. Doing this means your database integrity might be compromised, please consider restoring from a consistent backup instead."
If I roll back to the server instance from yesterday the database runs fine, but it goes through a recovery step as follows:
2022-07-10 12:21:23.825+0000 INFO [o.n.k.d.Database] [neo4j/2443e357] Recovery required from position LogPosition{logVersion=0, byteOffset=191545629}
2022-07-10 12:21:27.676+0000 INFO [o.n.k.r.Recovery] [neo4j/2443e357] 10% completed
2022-07-10 12:21:28.578+0000 INFO [o.n.k.r.Recovery] [neo4j/2443e357] 20% completed
2022-07-10 12:21:29.715+0000 INFO [o.n.k.r.Recovery] [neo4j/2443e357] 30% completed
2022-07-10 12:21:31.078+0000 INFO [o.n.k.r.Recovery] [neo4j/2443e357] 40% completed
2022-07-10 12:21:32.140+0000 INFO [o.n.k.r.Recovery] [neo4j/2443e357] 50% completed
2022-07-10 12:21:32.709+0000 INFO [o.n.k.i.a.i.IndexingService] [neo4j/2443e357] IndexingService.init: indexes not specifically mentioned above are ONLINE
2022-07-10 12:21:37.360+0000 INFO [o.n.k.r.Recovery] [neo4j/2443e357] 60% completed
2022-07-10 12:21:39.550+0000 INFO [o.n.k.r.Recovery] [neo4j/2443e357] 70% completed
2022-07-10 12:21:40.971+0000 INFO [o.n.k.r.Recovery] [neo4j/2443e357] 80% completed
2022-07-10 12:21:42.104+0000 INFO [o.n.k.r.Recovery] [neo4j/2443e357] 90% completed
2022-07-10 12:21:43.128+0000 INFO [o.n.k.r.Recovery] [neo4j/2443e357] 100% completed
2022-07-10 12:21:43.151+0000 INFO [o.n.k.d.Database] [neo4j/2443e357] Recovery completed. 195143 transactions, first:98943, last:294085 recovered, time spent: 18s 577ms
It clearly isn’t 100% ok though because if I try to run a backup with sudo neo4j-admin dump --database=neo4j --to=~/
I get the following error:
Active logical log detected, this might be a source of inconsistencies.
Please recover database before running the dump.
To perform recovery please start database and perform clean shutdown.
Starting and shutting it down makes no difference.
All the backups within our retention period have this problem.
We execute a script daily which performs a lot deletes and inserts on the database. When I run this on the working instance and re-start the database, the database fails to restart and I get the error I first listed again.
So it seems that the corruption in the transaction logs has been lingering for some time and that running this batch of deletes and inserts "pushes it over the edge", making it fail. Incidentally, this script has been running daily for 2 years now without any issues, so I’m sure it’s not the script itself causing problems.
I tried setting dbms.tx_log.rotation.retention_policy=keep_none
before running the script and that made no difference, although the failed start error becomes:
Caused by: java.lang.RuntimeException: org.neo4j.exceptions.UnderlyingStorageException: No check point found in any log file from version 1 to 2
I also tried deleting the transaction log files as a desperate measure. That just broke things as expected.
I’m running community edition and my backups are EC2 server instances, so I don’t believe that I need the transaction logging feature.
How can I fix or remove the transaction logs please? Thank you.
2
Answers
Old transaction logs cannot be safely archived or removed. So you might use
dbms.directories.transaction.logs.root
to change the root location where Neo4j will store transaction logs.Or, if the problem might be about memory, you can control which file size the logical log will auto-rotate by
dbms.tx_log.rotation.size
.Had the same issue on my production database, only
neo4j-admin copy helped me