Very recently I ran an Online Migration update through YaST on SUSE Linux Enterprise Server (SLES) 15.1 to 15.2 and ended up with the following versions of these after doing so:
SLES 15.2
Apache 2.4.43
MariaDB 10.4.17
PHP 7.4.6
Varnish 6.2.1
My main linux architecture is now as follows:
The preliminary tests showed no conflicts or issues prior to the upgrade and it rebooted and came up just fine when it all completed. Upon checking everything afterwards, I noticed that the varnish.service (varnishd) had failed to start. I’ve never had an issue with Varnish not starting, whether it was SUSE Linux, CentOS, Ubuntu, etc. I thought at first my custom vcl file was causing issues so I went with the default configuration file that it comes with (/etc/varnish/vcl.conf) just to start fresh with the basics but to no avail. The exact same issue happened.
Then I decided to take a shot and compile Varnish from source. Through YaST, I removed the varnish package and all of its configuration and service files, and then I downloaded the latest TAR Archive file (varnish-6.6.0.tgz) direct from https://varnish-cache.org/. After compiling and making Varnish this way, ironically, the same issue is happening when I try to start Varnish.
As with either, compiled (v6.6.0) or service package (v6.2.1), I get the following error(s) exactly the same between the two:
It describes a "Child not responding to CLI, killed it." and then proceeds to mention there’s a "CLI communication error (hdr)." And finally a, "Child died signal=6."
What’s most puzzling is that with either way of setting up Varnish, is that it fails the exact same way. I supposed this would indicate that Varnish isn’t the issue per se, but rather something within the server configuration? I’ve been through every forum on Varnish that I could find and have found nothing this specific. I have even tried to get it to start by trying different CLI parameters (like timeout settings, pool delays, etc.) but it still won’t do it. Again, this is with having the most basic/default configuration file loaded and nothing else.
# Marker to tell the VCL compiler that this VCL has been adapted to the
# new 4.0 format.
vcl 4.0;
# Default backend definition. Set this to point to your content server.
backend default {
.host = "127.0.0.1";
.port = "80";
}
Now here’s the ultimate kicker… I took another (Development) server, slicked it bare, and installed SLES 15.2 from scratch and everything, including Varnish, works! So something with the in-place upgrade is stopping Varnish somehow. I can’t take the main (Production) SLES 15.2 server and start over with it like that, however, because of so many other things that are currently installed and configured on it.
I’m trying to get Varnish back up and started within the current upgraded environment, but nothing seems to be working. Also, there is nothing in the Varnish logs (/var/log/varnish/varnish.log) to give me any clue either.
I’m at a loss as to what to try or where to go next. I’ve even tried starting Varnish in Debug Mode (-d) and then trying to get a child to start that way and it’s the exact same error.
And ultimately, I can’t check for any panics because Varnish won’t even start in the first place.
So to recap, literally all I did was run the in-place upgrade from SLES 15.1 to 15.2, rebooted when it was all done, and now all other services start fine except for Varnish (which worked perfectly on 15.1).
UPDATE #1: I tried to start varnish with no vcl file and no backend (varnishd -b none) but it errored out. Then I simply substituted "none" with "localhost" and I’m right back to the same error as before.
UPDATE #2: Here is the output of the "strace -f varnishd" command.
2
Answers
VCL loop
This is a long shot, but can you please change the
.port
property in your backend to8080
instead of80
? Just for testing.Because if you start
varnishd
without an explicit-a
, the standard listening port will be80
. But since your VCL file already connects to port80
onlocalhost
for its backend, you might end up in a loop.I’m not saying the
assert()
that is triggered on your system is caused by this, but it’s worth the attempt.In older versions of Varnish, the standard port was
6081
, but this has changed in recent versions.What I am sure of, is that the error is caused by a file descriptor that is not available. Maybe a file descriptor that has already been closed.
Please give it a shot, and let me know.
Debug mode
It is also possible to enable debug mode by adding the
-d
runtime parameter to yourvarnishd
command.Please give it a try to increase the verbosity of the debug output
Checking panics
Another thing you can do is run the following command to see if any panics occured:
Trying out various runtime options
Apparently the error refers to the fact that it cannot load the VCL file.
Let’s try running
varnishd
without a VCL file to see whether or not that’s the problem.Just try starting
varnishd
using the following command:This command will start Varnish without a VCL file and without a backend. When you then try to access Varnish via HTTP, you should be getting an
HTTP 503
error. That’s not perfect, but at least we know that Varnish is capable of not crashing all the time.-b
and add your-f
parameter that refers to the VCL file-s
settingUse packages
Other than that, the only advise I can give you is to install Varnish using the official packages on a supported operating system (Debian, Ubuntu, Fedora, CentOS, RHEL).
When checking the output of the requested strace command, I found this:
Varnishd tries to change the owner of at least two files, but isn’t allowed to do so. I’m not sure about the details, but as a next step you could try to find these files (probably below /var/cache/varnish) and check the current permissions. Maybe they belong to a user which is not the user you’re running varnishd with.
AFAIK the daemon is started as user root and then the process switches to an unprivileged user. This assumption brings us back to my previous question: Are you running AppArmor or SElinux?