skip to Main Content

I have a simple pod with a nginx container which returns text healthy on path /. I have prometheus to scrape port 80 on path /. When I ran up == 0 in the prometheus dashboard it showed this pod which means this pod is not healthy. But I tried ssh into the container, it was running fine and I saw in the nginx log prometheus was pinging / and getting 200 response. Any idea why?

deployment.yml

apiVersion: apps/v1
kind: Deployment
metadata:
  ...
spec:
  ...
  template:
    metadata:
      labels:
        ...
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/path: "/"
        prometheus.io/port: "80"
    spec:
      containers:
        - name: nginx
          image: nginx
          volumeMounts:
            - name: nginx-conf
              mountPath: /etc/nginx
              readOnly: true
          ports:
            - containerPort: 80
      volumes:
        - name: nginx-conf
          configMap:
            name: nginx-conf
            items:
              - key: nginx.conf
                path: nginx.conf


nginx.conf

apiVersion: v1
kind: ConfigMap
metadata:
  name: nginx-conf
data:
  nginx.conf: |
    http {
      server {
        listen 80;

        location / {
          return 200 'healthyn';
        }
      }
    }

nginx access log

192.168.88.81 - - [xxx +0000] "GET / HTTP/1.1" 200 8 "-" "Prometheus/2.26.0"
192.168.88.81 - - [xxx +0000] "GET / HTTP/1.1" 200 8 "-" "Prometheus/2.26.0"
192.168.88.81 - - [xxx +0000] "GET / HTTP/1.1" 200 8 "-" "Prometheus/2.26.0"

2

Answers


  1. When you configure these annotations to pods, the Prometheus expects that the given path returns Prometheus-readable metrics. But 'healthyn' is not a valid Prometheus metrics type.

          annotations:
            prometheus.io/scrape: "true"
            prometheus.io/path: "/"
            prometheus.io/port: "80"
    

    Recommended Fix:

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      ...
    spec:
      ...
      template:
        metadata:
          labels:
            ...
          annotations:
            prometheus.io/scrape: "true"
            prometheus.io/path: "/metrics"
            prometheus.io/port: "9113"
        spec:
          containers:
            - name: nginx
              image: nginx
              volumeMounts:
                - name: nginx-conf
                  mountPath: /etc/nginx
                  readOnly: true
              ports:
                - containerPort: 80
            - name: nginx-exporter
              args:
              - "-nginx.scrape-uri=http://localhost:80/stub_status" # nginx address
              image: nginx/nginx-prometheus-exporter:0.9.0
              ports:
                - containerPort: 9113
          volumes:
            - name: nginx-conf
              configMap:
                name: nginx-conf
                items:
                  - key: nginx.conf
                    path: nginx.conf
    

    Now, try querying nginx_up from Prometheus. The nginx-prometheus-exporter also comes with a grafana dashboard, you can also give it a try.

    Login or Signup to reply.
  2. When Prometheus scrapes an endpoint it expects metrics. Typical metrics look like this:

    # HELP go_gc_duration_seconds A summary of the GC invocation durations.
    # TYPE go_gc_duration_seconds summary
    go_gc_duration_seconds{quantile="0"} 1.3234e-05
    go_gc_duration_seconds{quantile="0.25"} 1.7335e-05
    

    "healthy" doesn’t meet the standard and thus it causes Prometheus to fail on scraping this target. There is the blackbox exporter, which is designed to check endpoints from users perspective (this is what black box monitoring is). The exporter can perform HTTP requests and make metrics of the results. For example it can check whether the response code was 200, or if the response body contains certain text. Here are sample metrics returned by this exporter (note probe_success, this is the same as up):

    # HELP probe_dns_lookup_time_seconds Returns the time taken for probe dns lookup in seconds
    # TYPE probe_dns_lookup_time_seconds gauge
    probe_dns_lookup_time_seconds 0.026007318
    # HELP probe_duration_seconds Returns how long the probe took to complete in seconds
    # TYPE probe_duration_seconds gauge
    probe_duration_seconds 0.550007522
    # HELP probe_failed_due_to_regex Indicates if probe failed due to regex
    # TYPE probe_failed_due_to_regex gauge
    probe_failed_due_to_regex 0
    # HELP probe_http_content_length Length of http content response
    # TYPE probe_http_content_length gauge
    probe_http_content_length -1
    # HELP probe_http_duration_seconds Duration of http request by phase, summed over all redirects
    # TYPE probe_http_duration_seconds gauge
    probe_http_duration_seconds{phase="connect"} 0.098082009
    probe_http_duration_seconds{phase="processing"} 0.154402544
    probe_http_duration_seconds{phase="resolve"} 0.038066771
    probe_http_duration_seconds{phase="tls"} 0.209702302
    probe_http_duration_seconds{phase="transfer"} 0.047839785
    # HELP probe_http_redirects The number of redirects
    # TYPE probe_http_redirects gauge
    probe_http_redirects 1
    # HELP probe_http_ssl Indicates if SSL was used for the final redirect
    # TYPE probe_http_ssl gauge
    probe_http_ssl 1
    # HELP probe_http_status_code Response HTTP status code
    # TYPE probe_http_status_code gauge
    probe_http_status_code 200
    # HELP probe_http_uncompressed_body_length Length of uncompressed response body
    # TYPE probe_http_uncompressed_body_length gauge
    probe_http_uncompressed_body_length 87617
    # HELP probe_http_version Returns the version of HTTP of the probe response
    # TYPE probe_http_version gauge
    probe_http_version 2
    # HELP probe_ip_addr_hash Specifies the hash of IP address. It's useful to detect if the IP address changes.
    # TYPE probe_ip_addr_hash gauge
    probe_ip_addr_hash 8.57979034e+08
    # HELP probe_ip_protocol Specifies whether probe ip protocol is IP4 or IP6
    # TYPE probe_ip_protocol gauge
    probe_ip_protocol 4
    # HELP probe_ssl_earliest_cert_expiry Returns earliest SSL cert expiry in unixtime
    # TYPE probe_ssl_earliest_cert_expiry gauge
    probe_ssl_earliest_cert_expiry 1.639030838e+09
    # HELP probe_ssl_last_chain_expiry_timestamp_seconds Returns last SSL chain expiry in timestamp seconds
    # TYPE probe_ssl_last_chain_expiry_timestamp_seconds gauge
    probe_ssl_last_chain_expiry_timestamp_seconds 1.639030838e+09
    # HELP probe_ssl_last_chain_info Contains SSL leaf certificate information
    # TYPE probe_ssl_last_chain_info gauge
    probe_ssl_last_chain_info{fingerprint_sha256="ef4eaeb464efb33f5332b365a350b2b06588ea71837af27f83d45b726d19af2a"} 1
    # HELP probe_success Displays whether or not the probe was a success
    # TYPE probe_success gauge
    probe_success 1
    # HELP probe_tls_version_info Contains the TLS version used
    # TYPE probe_tls_version_info gauge
    probe_tls_version_info{version="TLS 1.2"} 1
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search