Ubuntu - Monitoring for failure and quickly restarting systemd service

mzrt
February 16, 2024
103 views
1 vote
2 Answers

I am running a 24/7 youtube stream on Ubuntu. My ffmpeg command is wrapped in a systemd service. On several occasions the ffmpeg command has failed and systemd has not restarted quickly enough to keep the youtube stream alive. When this happens I need to daemon-reload and restart the systemd service.

To counter this I have written a bash script that checks the log for stream ending errors, however, it does not seem to be working. I have had failures since implementing this script, and it did not seem to have been triggered.

two questions:

is there a more efficient way to do what I am doing?
if not, can anyone identify what I am doing wrong?

#!/bin/bash

RESET=0

while true; do
    # Get the current time minus 1 minute
    LAST_1_MINUTE=$(date -d '1 minute ago' '+%b %e %H:%M:%S')
    
    # Run the command to check for the error within the last minute
    if journalctl --since "$LAST_1_MINUTE" | grep -qi "Error writing trailer"; then
        if [ $RESET -lt 1 ]; then
            # Perform actions if error is detected
            sudo systemctl daemon-reload && 
            echo "Restarting master.service by monitor.sh script at $(date)" >> /var/log/monitor.log && 
            sudo systemctl restart master.service
            RESET=2
        fi
    else
        RESET=$((RESET - 1))
    fi

    # Wait for 20 seconds before the next iteration
    sleep 20
done

Answers

Chosen as BEST ANSWER
- mzrt
- February 16, 2024 at 7:59 pm
- 0 votes
0
I encountered a frustrating issue where my FFmpeg command, managed by systemd, kept restarting instantly upon failure, yet YouTube would still drop the video feed. After some troubleshooting, I realized that the rapid restart was likely causing interference with the shutdown process from the failure, resulting in this persistent problem.

Initially, I thought about implementing error monitoring code, but soon realized it was redundant and not addressing the root cause.

Instead, I made a simple adjustment in the systemd service file. I added a 10-second delay before the service restarts upon failure. Since making this change, my stream has encountered failures four times but has stayed online without interruptions!

something like below:
```
[Unit]
Description=FFmpeg Stream Service
After=network.target

[Service]
Type=simple
ExecStart=/path/to/ffmpeg.sh
Restart=on-failure
RestartSec=10 #this!!!!

[Install]
WantedBy=multi-user.target
```

(Edit)

- user1686
- February 14, 2024 at 6:01 pm
- 0 votes
0
is there a more efficient way to do what I am doing?

Yes; in plenty of ways. Most importantly, log messages can be received as a continuous stream without ever needing to re-check every X seconds. For traditional syslog, you could tail -f the log file, which would use inotify to output lines as they come; for journald you can do the same with journalctl -f. Once you have the log stream, the loop becomes while read – the read function will automatically pause until data is available, consuming no resources when idle while at the same time providing instant updates.
```
journalctl -f | while IFS= read -r line; do
    ...
done
```
(Even without this, you could avoid the need to call date by using journalctl’s built-in support for relative timestamps, i.e. journalctl -S -1m for "last minute", or by using its support for resuming at a specific cursor, i.e. journalctl --cursor-file=/tmp/cursor).

Most likely you don’t care about all logs, so it would be more efficient to restrict output to only your service’s logs – journalctl has the -u option to match by service, -t to match by "syslog tag" (the name before [xyz]:), and you can even do journalctl /usr/bin/ffmpeg to filter by executable.

(You might also want to have the script itself log to syslog, using logger or systemd-cat.)
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

Ubuntu – Monitoring for failure and quickly restarting systemd service

Answers