skip to Main Content

I have a small home sever running with Debian Buster where I have a ZFS filesystem (ZFS: Loaded module v0.7.12-2+deb10u2, ZFS pool version 5000, ZFS filesystem version 5) with a RAID.

As the server is sometimes not used for days I have configured a autoshutdown script which shuts down the server if my 2 big WD red hard disks are in standby for more than 45 minutes (not the system hard disk). Now I figured out that the server is not shutting down anymore as both drives are only a few minutes in standby before getting active again. I tested with iotop and figured out that ZFS with the command txg_sync is waking them up. Even if no other process is writing or reading anything on the drives.

I did also a check with fatrace -c after changing to the directory where the datapool is mounted. There is no output at the time as the command txg_sync pops up and wakes the drives. Update: As it seems that fatrace is not working properly with ZFS.

I now used iosnoopfrom and now know that dm_crypt is writing on my disks regularly. My underlying drives are encrypted with LUKS.

./iosnoop -d 8,16
Tracing block I/O. Ctrl-C to end.
COMM         PID    TYPE DEV      BLOCK        BYTES     LATms
dmcrypt_writ 1895   W    8,16     2080476248   4096    6516.10
dmcrypt_writ 1895   W    8,16     3334728264   4096    6516.14
dmcrypt_writ 1895   W    8,16     2080429048   16384      0.16
dmcrypt_writ 1895   W    8,16     3334728272   20480      0.21
dmcrypt_writ 1895   W    8,16     2080476256   20480      0.16
dmcrypt_writ 1895   W    8,16     3328225336   16384      0.20

What is the reason for that and how can I prevent this occuring?

3

Answers


  1. https://github.com/openzfs/zfs/issues/8537#issuecomment-477361010

    @niksfirefly if the pool is being written to then you should expect to see cpu and I/O by consumed by the txg_sync thread. How much will depend on your specific hardware, the pool configuration, which features/properties are enabled, and your workload. This may be normal for your circumstances.

    And maybe this link is helpful too:
    https://serverfault.com/questions/661336/slow-performance-due-to-txg-sync-for-zfs-0-6-3-on-ubuntu-14-04

    How to check disk I/O utilization per process:

    cut -d" " -f 1,2,42 /proc/*/stat | sort -n -k +3
    

    Those fields are PID, command and cumulative IO-wait ticks. This will show your hot processes, though only if they are still running. (You probably want to ignore your filesystem journalling threads.)

    (from https://serverfault.com/a/466342/580935)

    Login or Signup to reply.
  2. Another note on ZFS.
    I’m on Manjaro 20210101 with kernel 5.4 and had a high load on txg_sync in the last weeks.

    According to /var/log/pacman.log

    [2020-12-31T08:58:24+0100] [ALPM] upgraded zfs-utils (0.8.5-2 -> 2.0.0-2)
    [2020-12-31T08:58:24+0100] [ALPM] upgraded linux54-zfs (0.8.5-10 -> 2.0.0-6)

    Since that time, the txg_sync process has also restored peace.

    Under Debian, the use of ZFS (and its versions) is certainly solved somewhat differently.

    Login or Signup to reply.
  3. You can change the frequency of tgx_sync operations in Linux with

    echo 15 > /sys/module/zfs/parameters/zfs_txg_timeout 
    

    for 15 seconds for eg, or set it in /etc/modprobe.d/zfs.conf:

    options zfs zfs_txg_timeout=$your_value
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search