I have Nginx cache server built on Ubuntu 18 and with docker image nginx:1.19.10-alpine.
Ubuntu 18 disk usage details given below for reference
ubuntu@host_name:~$ df -h
Filesystem Size Used Avail Use% Mounted on
udev 126G 0 126G 0% /dev
tmpfs 26G 1.4M 26G 1% /run
/dev/mapper/vg_system-root 193G 8.9G 176G 5% /
tmpfs 126G 0 126G 0% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 126G 0 126G 0% /sys/fs/cgroup
/dev/mapper/vg_data-srv 24T 369G 24T 2% /srv
/dev/sda1 453M 364M 62M 86% /boot
overlay 193G 8.9G 176G 5% /var/lib/docker/overlay2/64_characters_random/merged
tmpfs 26G 0 26G 0% /run/user/1646269961
Docker container details
ubuntu@host_name:~$ sudo docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
contnr_idxyz nginx:1.19.10-alpine "/docker-entrypoint.…" 5 days ago Up 9 hours contnr_name
Nginx Configuration for reference
user@host-name:/srv/mytool/nginx/config$ cat proxy.conf
access_log off;
root /var/log/nginx;
open_log_file_cache max=100;
log_format mytoollogformat
'$time_iso8601 $remote_addr $status $request_time $body_bytes_sent '
'$upstream_cache_status "$request" "$http_user_agent"';
proxy_http_version 1.1;
client_max_body_size 10g;
# R/W timeout for origin server
proxy_read_timeout 15m;
proxy_send_timeout 15m;
# R/W timeout for clients
client_body_timeout 15m;
client_header_timeout 15m;
send_timeout 15m;
# TODO: ssl_stapling and ssl_ciphers
ssl_prefer_server_ciphers on;
ssl_session_cache shared:SSL:10m;
proxy_set_header Connection "";
proxy_set_header Host $host;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-mytool-Cache-Host $scheme://$host;
proxy_redirect off;
proxy_cache_path /var/cache/nginx levels=1:2 keys_zone=mytool:10m max_size=22000g inactive=180d;
proxy_cache_key $host$uri$is_args$args$slice_range;
proxy_set_header Range $slice_range;
proxy_cache_valid 200 206 2y;
proxy_cache_revalidate on;
add_header X-Cache-Status $upstream_cache_status;
proxy_next_upstream error timeout invalid_header http_500 http_502 http_503 http_504 http_429;
Removed server block as it comes out of context for the current issue and also due to security reason.
Let me explain my issue in detail. We have main server(proxy_passed server) which has hundreds of TBs of file which are static resources. When we set up cache server it was filling cache and serving files from cache well. But over time noticed that cache size is not increasing above 344GB
user@host_name:/srv/mytool/nginx$ sudo du -sh ./*
344G ./cache
52K ./config
1004K ./log
user@host_name:/srv/mytool/nginx$
I have wrote a script to download about 500GB’s of files. But It never increased the size of cache above 344GB.
Experiments done so far
Added max_size=100000g
(along with old value min_free=1000g)
Modified max_size=22000g
(set less value than /srv size which is 24TB)
removed min_free=1000g
(Assuming min_free is somehow clearing cache)
Modified
proxy_cache_valid 200 206 1h;
to proxy_cache_valid 200 206 2y;
For all above experiments, after configuration change i have restarted docker container and ran script of downloading 500GB of files through the cache server. But even though cache size was reaching 380 to 400 GB, but within an hour it was suddenly dropping to 344GB.
I left with no clue, why cache is not filling up completely even though i have allocated 24TB for /srv
Is it issue with Nginx? I mean for free version of Nginx there might be any limitations.. Should i go with Nginx plus. Or there might be configuration mistake.
Any guess would help me. Thanks in advance
Update
What are the soft and hard limits for max open files on the cache server?
$ cat /proc/sys/fs/file-max
26375980
$ ulimit -Hn
1048576
$ ulimit -Sn
1024
have you set limits in your nginx conf using worker_rlimit_nofile?
Currently no settings of worker_rlimit_nofile
/ # cat /etc/nginx/nginx.conf
user nginx;
worker_processes auto;
error_log /var/log/nginx/error.log warn;
pid /var/run/nginx.pid;
events {
worker_connections 1024;
}
http {
include /etc/nginx/mime.types;
default_type application/octet-stream;
log_format main '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for"';
access_log /var/log/nginx/access.log main;
sendfile on;
#tcp_nopush on;
keepalive_timeout 65;
#gzip on;
include /etc/nginx/conf.d/*.conf;
}
Do you have anything in the error logs?
Below given filtered/distinct logs
$ cat /srv/mytool/nginx/log/error.log
2022/01/09 05:56:35 [warn] 22#22: *10 an upstream response is buffered to a temporary file /var/cache/nginx/proxy_temp/1/00/0000000001 while reading upstream, client: masked_IPv6_address, server: dev.mytool-region.mycompany.com, request: "GET /dists/58.2.A.0.409/semc/binary-arm/Packages HTTP/1.1", upstream: "https://[IPv6_address]:443/masked_path/Packages", host: "dev.mytool-region.mycompany.com"
2022/01/09 06:09:21 [warn] 22#22: *35 a client request body is buffered to a temporary file /var/cache/nginx/client_temp/0000000006, client: masked_IPv6_address, server: dev.mytool-region.mycompany.com, request: "POST /masked_path/all HTTP/1.1", host: "dev.mytool-region.mycompany.com"
2022/01/09 08:19:01 [error] 22#22: *120 etag mismatch in slice response while reading response header from upstream, client: masked_IPv6_address, server: dev.mytool-region.mycompany.com, request: "GET /masked_path/xyz.zip HTTP/1.1", subrequest: "/masked_path/xyz.zip", upstream: "https://[IPv6_address]:443/masked_path/xyz.zip", host: "dev.mytool-region.mycompany.com"
2022/01/09 18:19:12 [warn] 22#22: *1566 upstream server temporarily disabled while reading response header from upstream, client: masked_IPv6_address, server: dev.mytool-region.mycompany.com, request: "GET /masked_path/xyz.zip HTTP/1.1", upstream: "https://[masked_IPv6_address]:443/masked_path/xyz.zip", host: "dev.mytool-region.mycompany.com"
2022/01/10 01:23:20 [error] 22#22: *2920 etag mismatch in slice response while reading response header from upstream, client: masked_IPv6_address, server: dev.mytool-region.mycompany.com, request: "GET /masked_path/xyz.zip HTTP/1.1", subrequest: "/masked_path/xyz.zip", upstream: "https://[IPv6_address]:443/masked_path/xyz.zip", host: "dev.mytool-region.mycompany.com"
2022/01/21 07:43:47 [error] 36#36: *441913 upstream timed out (110: Operation timed out) while SSL handshaking to upstream, client: masked_IPv6_address, server: dev.mytool-region.mycompany.com, request: "GET /masked_path/xyz.zip HTTP/1.1", upstream: "https://[IPv6_address]:443/masked_path/xyz.zip", host: "dev.mytool-region.mycompany.com"
2022/01/21 07:46:17 [warn] 37#37: *442070 upstream server temporarily disabled while SSL handshaking to upstream, client: masked_IPv6_address, server: dev.mytool-region.mycompany.com, request: "GET /masked_path/xyz.zip HTTP/1.1", upstream: "https://[IPv6_address]:443/masked_path/xyz.zip", host: "dev.mytool-region.mycompany.com"
Total 25k rows same as below within 10 days of logs
2022/01/11 05:36:58 [alert] 70#70: ignore long locked inactive cache entry 55a25a5037f198bbec6cd49100bb1b76, count:1
2022/01/11 05:36:58 [alert] 70#70: ignore long locked inactive cache entry e996d5e104f405444a579cd491faf3a8, count:1
2022/01/11 05:36:58 [alert] 70#70: ignore long locked inactive cache entry 394517a8ed8e43949003b3f7538dc471, count:1
2022/01/11 05:37:08 [alert] 70#70: ignore long locked inactive cache entry 4f92d3a72f64b7bafdbb3f0b66d8e638, count:1
2022/01/11 05:37:08 [alert] 70#70: ignore long locked inactive cache entry be41b259a3e8f9698e0976639883a423, count:1
2022/01/11 05:37:08 [alert] 70#70: ignore long locked inactive cache entry 1da19b571ea4bce1428251689f0a7c69, count:1
2022/01/11 05:37:08 [alert] 70#70: ignore long locked inactive cache entry 2a4cac0c28ea430e7eef3f808cf1e06f, count:1
2022/01/11 05:37:18 [alert] 70#70: ignore long locked inactive cache entry 53a826f6931cf0f16020bcae100af347, count:1
Update 2:
Tried the same with nginx:perl docker container. It also didn’t work and cache size observed that even though it grows beyond 392GB but within couple hours suddenly dropped to 344GB. Command used to start container given below
sudo docker run
--detach
--restart unless-stopped
--volume /srv/mytool/nginx/config:/etc/nginx/conf.d:ro
--volume /srv/mytool/nginx/cache:/var/cache/nginx
--volume /srv/mytool/nginx/log:/var/log/nginx
nginx:perl
Update Again
Avoided docker container nginx:1.19.10-alpine
And did simple Nginx configuration as given below
sudo apt install nginx
systemctl status nginx
$ sudo ufw app list
Available applications:
Nginx Full
Nginx HTTP
Nginx HTTPS
OpenSSH
$ sudo ufw allow 'Nginx Full'
Rules updated
Rules updated (v6)
Modify default.conf in /etc/nginx/sites-available
proxy_cache_path /var/cache/nginx levels=1:2 keys_zone=custom_cache:10m inactive=180d;
upstream origin_server {
server dev.mytool-region.mycompany.com;
}
server {
listen 80 default_server;
listen [::]:80 default_server;
server_name _;
location / {
include proxy_params;
proxy_pass http://origin_server;
}
location ~ ^/(path1|path2|path3|path4)/ {
slice 5m;
proxy_cache custom_cache;
proxy_pass http://origin_server;
proxy_cache_valid 200 206 2y;
add_header X-Proxy-Cache $upstream_cache_status;
}
}
Downloaded about ~500GB. It worked and cache filled as expected
ubuntu@host_name:/var/cache$ sudo du -sh ./*
128K ./apparmor
82M ./apt
4.8M ./debconf
20K ./ldconfig
1.2M ./man
0 ./motd-news
518G ./nginx
4.0K ./pollinate
20K ./snapd
ubuntu@host_name:/var/cache$
But still don’t know the exact reason or what is wrong with my configuration. Still trial is going on.
One more trial
Used old configuration(docker nginx:1.19.10-alpine
And moved proxy_cache_valid 200 206 2y;
inside
location ~ ^/(path1|path2|path3|path4)/ {
But this also not worked.
4
Answers
Problem with nginx cache slicing is when you configure cache slicing of say 5 MB. It will end up in creating sliced cache files in cache directory. Number of files that can be cached up is directly propotional to keys_zone size
keys_zone=mytool:10m
Since i had 10m(10 mega byte) for cache keys it was allowing maximum 71203 files. Document says
So modifying keys_zone to larger value
keys_zone=mytool:1000m
Fixed issue.You can observe cache file count was not growing after 71203 for
keys_zone=mytool:10m
But it started growing in size by allowing cached file count seamlessly for
keys_zone=mytool:1000m
You can try to configure the temporary cache directory
I very suspect that your backend-services respond ‘YES’ when nginx asks them ‘If-Modified-Since’
and your setting
removes outdated cache items as it supposed to do. That explains why your cache can grow up to 500G but a bit later reduces to 344G
According to documentation,
max_size
parameter is optional.Removing the
max-size
may let cache use all available space. Try editing theproxy_cache_path
directive and remove it, in your current conf modified to:proxy_cache_path /var/cache/nginx levels=1:2 keys_zone=mytool:10m inactive=180d;