skip to Main Content

I have Nginx cache server built on Ubuntu 18 and with docker image nginx:1.19.10-alpine.

Ubuntu 18 disk usage details given below for reference

ubuntu@host_name:~$ df -h
Filesystem                  Size  Used Avail Use% Mounted on
udev                        126G     0  126G   0% /dev
tmpfs                        26G  1.4M   26G   1% /run
/dev/mapper/vg_system-root  193G  8.9G  176G   5% /
tmpfs                       126G     0  126G   0% /dev/shm
tmpfs                       5.0M     0  5.0M   0% /run/lock
tmpfs                       126G     0  126G   0% /sys/fs/cgroup
/dev/mapper/vg_data-srv      24T  369G   24T   2% /srv
/dev/sda1                   453M  364M   62M  86% /boot
overlay                     193G  8.9G  176G   5% /var/lib/docker/overlay2/64_characters_random/merged
tmpfs                        26G     0   26G   0% /run/user/1646269961

Docker container details

ubuntu@host_name:~$ sudo docker ps
CONTAINER ID   IMAGE                  COMMAND                  CREATED      STATUS       PORTS     NAMES
contnr_idxyz   nginx:1.19.10-alpine   "/docker-entrypoint.…"   5 days ago   Up 9 hours             contnr_name

Nginx Configuration for reference

user@host-name:/srv/mytool/nginx/config$ cat proxy.conf
access_log          off;
root                /var/log/nginx;
open_log_file_cache max=100;

log_format mytoollogformat
    '$time_iso8601 $remote_addr $status $request_time $body_bytes_sent '
    '$upstream_cache_status "$request" "$http_user_agent"';

proxy_http_version 1.1;
client_max_body_size 10g;

# R/W timeout for origin server
proxy_read_timeout 15m;
proxy_send_timeout 15m;

# R/W timeout for clients
client_body_timeout   15m;
client_header_timeout 15m;
send_timeout          15m;

# TODO: ssl_stapling and ssl_ciphers
ssl_prefer_server_ciphers on;
ssl_session_cache         shared:SSL:10m;

proxy_set_header Connection "";
proxy_set_header Host $host;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-mytool-Cache-Host $scheme://$host;

proxy_redirect         off;

proxy_cache_path       /var/cache/nginx levels=1:2 keys_zone=mytool:10m max_size=22000g inactive=180d;
proxy_cache_key        $host$uri$is_args$args$slice_range;
proxy_set_header       Range $slice_range;
proxy_cache_valid      200 206 2y;
proxy_cache_revalidate on;

add_header X-Cache-Status $upstream_cache_status;

proxy_next_upstream error timeout invalid_header http_500 http_502 http_503 http_504 http_429;

Removed server block as it comes out of context for the current issue and also due to security reason.

Let me explain my issue in detail. We have main server(proxy_passed server) which has hundreds of TBs of file which are static resources. When we set up cache server it was filling cache and serving files from cache well. But over time noticed that cache size is not increasing above 344GB

user@host_name:/srv/mytool/nginx$ sudo du -sh ./*
344G    ./cache
52K     ./config
1004K   ./log
user@host_name:/srv/mytool/nginx$

I have wrote a script to download about 500GB’s of files. But It never increased the size of cache above 344GB.
Experiments done so far
Added max_size=100000g (along with old value min_free=1000g)
Modified max_size=22000g (set less value than /srv size which is 24TB)
removed min_free=1000g (Assuming min_free is somehow clearing cache)
Modified
proxy_cache_valid 200 206 1h; to proxy_cache_valid 200 206 2y;
For all above experiments, after configuration change i have restarted docker container and ran script of downloading 500GB of files through the cache server. But even though cache size was reaching 380 to 400 GB, but within an hour it was suddenly dropping to 344GB.

I left with no clue, why cache is not filling up completely even though i have allocated 24TB for /srv

Is it issue with Nginx? I mean for free version of Nginx there might be any limitations.. Should i go with Nginx plus. Or there might be configuration mistake.

Any guess would help me. Thanks in advance

Update

What are the soft and hard limits for max open files on the cache server?

$ cat /proc/sys/fs/file-max
26375980
$ ulimit -Hn
1048576
$ ulimit -Sn
1024

have you set limits in your nginx conf using worker_rlimit_nofile?

Currently no settings of worker_rlimit_nofile

/ # cat /etc/nginx/nginx.conf

user  nginx;
worker_processes  auto;

error_log  /var/log/nginx/error.log warn;
pid        /var/run/nginx.pid;


events {
    worker_connections  1024;
}


http {
    include       /etc/nginx/mime.types;
    default_type  application/octet-stream;

    log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '
                      '$status $body_bytes_sent "$http_referer" '
                      '"$http_user_agent" "$http_x_forwarded_for"';

    access_log  /var/log/nginx/access.log  main;

    sendfile        on;
    #tcp_nopush     on;

    keepalive_timeout  65;

    #gzip  on;

    include /etc/nginx/conf.d/*.conf;
}

Do you have anything in the error logs?

Below given filtered/distinct logs

$ cat /srv/mytool/nginx/log/error.log
2022/01/09 05:56:35 [warn] 22#22: *10 an upstream response is buffered to a temporary file /var/cache/nginx/proxy_temp/1/00/0000000001 while reading upstream, client: masked_IPv6_address, server: dev.mytool-region.mycompany.com, request: "GET /dists/58.2.A.0.409/semc/binary-arm/Packages HTTP/1.1", upstream: "https://[IPv6_address]:443/masked_path/Packages", host: "dev.mytool-region.mycompany.com"
2022/01/09 06:09:21 [warn] 22#22: *35 a client request body is buffered to a temporary file /var/cache/nginx/client_temp/0000000006, client: masked_IPv6_address, server: dev.mytool-region.mycompany.com, request: "POST /masked_path/all HTTP/1.1", host: "dev.mytool-region.mycompany.com"
2022/01/09 08:19:01 [error] 22#22: *120 etag mismatch in slice response while reading response header from upstream, client: masked_IPv6_address, server: dev.mytool-region.mycompany.com, request: "GET /masked_path/xyz.zip HTTP/1.1", subrequest: "/masked_path/xyz.zip", upstream: "https://[IPv6_address]:443/masked_path/xyz.zip", host: "dev.mytool-region.mycompany.com"
2022/01/09 18:19:12 [warn] 22#22: *1566 upstream server temporarily disabled while reading response header from upstream, client: masked_IPv6_address, server: dev.mytool-region.mycompany.com, request: "GET /masked_path/xyz.zip HTTP/1.1", upstream: "https://[masked_IPv6_address]:443/masked_path/xyz.zip", host: "dev.mytool-region.mycompany.com"
2022/01/10 01:23:20 [error] 22#22: *2920 etag mismatch in slice response while reading response header from upstream, client: masked_IPv6_address, server: dev.mytool-region.mycompany.com, request: "GET /masked_path/xyz.zip HTTP/1.1", subrequest: "/masked_path/xyz.zip", upstream: "https://[IPv6_address]:443/masked_path/xyz.zip", host: "dev.mytool-region.mycompany.com"
2022/01/21 07:43:47 [error] 36#36: *441913 upstream timed out (110: Operation timed out) while SSL handshaking to upstream, client: masked_IPv6_address, server: dev.mytool-region.mycompany.com, request: "GET /masked_path/xyz.zip HTTP/1.1", upstream: "https://[IPv6_address]:443/masked_path/xyz.zip", host: "dev.mytool-region.mycompany.com"
2022/01/21 07:46:17 [warn] 37#37: *442070 upstream server temporarily disabled while SSL handshaking to upstream, client: masked_IPv6_address, server: dev.mytool-region.mycompany.com, request: "GET /masked_path/xyz.zip HTTP/1.1", upstream: "https://[IPv6_address]:443/masked_path/xyz.zip", host: "dev.mytool-region.mycompany.com"

Total 25k rows same as below within 10 days of logs
2022/01/11 05:36:58 [alert] 70#70: ignore long locked inactive cache entry 55a25a5037f198bbec6cd49100bb1b76, count:1
2022/01/11 05:36:58 [alert] 70#70: ignore long locked inactive cache entry e996d5e104f405444a579cd491faf3a8, count:1
2022/01/11 05:36:58 [alert] 70#70: ignore long locked inactive cache entry 394517a8ed8e43949003b3f7538dc471, count:1
2022/01/11 05:37:08 [alert] 70#70: ignore long locked inactive cache entry 4f92d3a72f64b7bafdbb3f0b66d8e638, count:1
2022/01/11 05:37:08 [alert] 70#70: ignore long locked inactive cache entry be41b259a3e8f9698e0976639883a423, count:1
2022/01/11 05:37:08 [alert] 70#70: ignore long locked inactive cache entry 1da19b571ea4bce1428251689f0a7c69, count:1
2022/01/11 05:37:08 [alert] 70#70: ignore long locked inactive cache entry 2a4cac0c28ea430e7eef3f808cf1e06f, count:1
2022/01/11 05:37:18 [alert] 70#70: ignore long locked inactive cache entry 53a826f6931cf0f16020bcae100af347, count:1

Update 2:
Tried the same with nginx:perl docker container. It also didn’t work and cache size observed that even though it grows beyond 392GB but within couple hours suddenly dropped to 344GB. Command used to start container given below

sudo docker run 
--detach 
--restart unless-stopped 
--volume /srv/mytool/nginx/config:/etc/nginx/conf.d:ro 
--volume /srv/mytool/nginx/cache:/var/cache/nginx 
--volume /srv/mytool/nginx/log:/var/log/nginx 
nginx:perl

Update Again

Avoided docker container nginx:1.19.10-alpine
And did simple Nginx configuration as given below

sudo apt install nginx
systemctl status nginx

$ sudo ufw app list
Available applications:
  Nginx Full
  Nginx HTTP
  Nginx HTTPS
  OpenSSH
$ sudo ufw allow 'Nginx Full'
Rules updated
Rules updated (v6)

Modify default.conf in /etc/nginx/sites-available

proxy_cache_path /var/cache/nginx levels=1:2 keys_zone=custom_cache:10m inactive=180d;

upstream origin_server {
    server dev.mytool-region.mycompany.com;
}
server {
        listen 80 default_server;
        listen [::]:80 default_server;

        server_name _;

        location / {
            include proxy_params;
            proxy_pass http://origin_server;
        }

        location ~ ^/(path1|path2|path3|path4)/ {
            slice       5m;
            proxy_cache custom_cache;

            proxy_pass http://origin_server;
            proxy_cache_valid 200 206 2y;
            add_header X-Proxy-Cache $upstream_cache_status;
        }
}

Downloaded about ~500GB. It worked and cache filled as expected

ubuntu@host_name:/var/cache$ sudo du -sh ./*
128K    ./apparmor
82M     ./apt
4.8M    ./debconf
20K     ./ldconfig
1.2M    ./man
0       ./motd-news
518G    ./nginx
4.0K    ./pollinate
20K     ./snapd
ubuntu@host_name:/var/cache$

But still don’t know the exact reason or what is wrong with my configuration. Still trial is going on.
One more trial
Used old configuration(docker nginx:1.19.10-alpine
And moved proxy_cache_valid 200 206 2y; inside

location ~ ^/(path1|path2|path3|path4)/ {

But this also not worked.

4

Answers


  1. Chosen as BEST ANSWER

    Problem with nginx cache slicing is when you configure cache slicing of say 5 MB. It will end up in creating sliced cache files in cache directory. Number of files that can be cached up is directly propotional to keys_zone size keys_zone=mytool:10m
    Since i had 10m(10 mega byte) for cache keys it was allowing maximum 71203 files. Document says

    In addition, all active keys and information about data are stored in a shared memory zone, whose name and size are configured by the keys_zone parameter. One megabyte zone can store about 8 thousand keys.

    As part of commercial subscription, the shared memory zone also stores extended cache information, thus, it is required to specify a larger zone size for the same number of keys. For example, one megabyte zone can store about 4 thousand keys.

    So modifying keys_zone to larger value keys_zone=mytool:1000m Fixed issue.

    You can observe cache file count was not growing after 71203 for keys_zone=mytool:10m

    user@host_name:/srv/mytool/nginx$ sudo du -sh ./*
    326G    ./cache
    52K     ./config
    406M    ./log
    user@host_name:/srv/mytool/nginx$ sudo find cache/ -type f | wc -l
    71203
    

    But it started growing in size by allowing cached file count seamlessly for keys_zone=mytool:1000m

    user@host_name:/srv/mytool/nginx$ sudo du -sh ./*
    518G    ./cache
    52K     ./config
    4.6M    ./log
    user@host_name:/srv/mytool/nginx$ sudo find cache/ -type f | wc -l
    107243
    

  2. You can try to configure the temporary cache directory

    proxy_max_temp_file_size
    
    Login or Signup to reply.
  3. I very suspect that your backend-services respond ‘YES’ when nginx asks them ‘If-Modified-Since’

    and your setting

    proxy_cache_revalidate on;
    

    removes outdated cache items as it supposed to do. That explains why your cache can grow up to 500G but a bit later reduces to 344G

    Login or Signup to reply.
  4. According to documentation, max_size parameter is optional.

    not specifying a value allows the cache to grow to use all available
    disk space.

    Removing the max-size may let cache use all available space. Try editing the proxy_cache_path directive and remove it, in your current conf modified to:

    proxy_cache_path /var/cache/nginx levels=1:2 keys_zone=mytool:10m inactive=180d;

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search