skip to Main Content

I’m trying to configure the official helm Fluent-Bit for my EKS to send apps logs to AWS CloudWatch.

I configure everything with Terraform. Here’s my file

resource "kubernetes_namespace" "fluentbit" {
  metadata {
    name = var.fluentbit_namespace
  }
}

resource "kubernetes_service_account" "fluentbit" {
  metadata {
    namespace = var.fluentbit_namespace
    name      = var.fluentbit_service_account_name
    annotations = {
      "eks.amazonaws.com/role-arn" = aws_iam_role.cloudwatch_logs_role.arn
    }
  }
}

resource "helm_release" "fluent_bit_daemonset" {
  repository = "https://fluent.github.io/helm-charts"
  chart      = "fluent-bit"
  version    = "0.15.15"

  name             = "fluent-bit"
  namespace        = var.fluentbit_namespace
  create_namespace = false
  cleanup_on_fail  = true
  values = [
    templatefile("${path.cwd}/templates/fluentbit_config_template.yaml", {
      service_account_name   = var.fluentbit_service_account_name,
      create_service_account = false
      image_version          = var.fluentbit_image_version,
      region                 = var.cluster_region
      log_group_name         = var.fluentbit_log_group_name
      log_stream_name        = var.fluentbit_log_stream_name
      role_arn               = aws_iam_role.cloudwatch_logs_role.arn
      cluster_name           = "xxxx-kubernetes-sandbox"
    }),
  ]
}

resource "aws_cloudwatch_log_group" "cloudwatch_log_group" {
  name = var.fluentbit_log_group_name
}


resource "aws_iam_role" "cloudwatch_logs_role" {
  name = "cloudwatch-logs-role"

  assume_role_policy = jsonencode({
    Version = "2012-10-17",
    Statement = [
      {
        Action = "sts:AssumeRole",
        Effect = "Allow",
        Principal = {
          Service = "logs.amazonaws.com"
        }
      }
    ]
  })
}

resource "aws_iam_policy" "cloudwatch_logs_policy" {
  name = "fluentbit-role"

  description = "Role use to create logs from K8S to cloudwatch"

  policy = jsonencode({
    Version = "2012-10-17",
    Statement = [
      {
        Action = [
          "logs:CreateLogGroup",
          "logs:CreateLogStream",
          "logs:PutLogEvents",
          "logs:DescribeLogGroups",
          "logs:DescribeLogStreams"
        ],
        Effect   = "Allow",
        Resource = "*"
      }
    ]
  })
}

resource "aws_iam_policy_attachment" "attach_cloudwatch_logs_policy" {
  name       = "attach-cloudwatch-logs-policy"
  policy_arn = aws_iam_policy.cloudwatch_logs_policy.arn
  roles      = [aws_iam_role.cloudwatch_logs_role.name]
}

The fluentbit configuration, which is not complicated for the moment

image:
   tag: ${image_version}
 
serviceAccount:
   name: ${service_account_name}
   create: ${create_service_account}

config:
  service: |
    [SERVICE]
        Flush        2
        Daemon       Off
        Log_Level    info
        Parsers_File parsers.conf
        HTTP_Server  On
        HTTP_Listen  0.0.0.0
        HTTP_Port    {{ .Values.service.port }}

  inputs: |
    [INPUT]
        Name              tail
        Tag               kube.*
        Path              /var/log/containers/*.log
        Parser            docker
        DB                /var/log/flb_kube.db
        Mem_Buf_Limit     5MB
        Skip_Long_Lines   On
        Refresh_Interval  10    

  filters: |
    [FILTER]
        Name                kubernetes
        Match               application.*
        Kube_URL            https://kubernetes.default.svc:443
        Kube_Tag_Prefix     application.var.log.containers.
        Merge_Log           On
        Merge_Log_Key       log_processed
        K8S-Logging.Parser  On
        K8S-Logging.Exclude Off
        Labels              Off
        Annotations         Off
        Use_Kubelet         On
        Kubelet_Port        10250
        Buffer_Size         0

  outputs: |
    [OUTPUT]
        Name                cloudwatch_logs
        Match               *
        region              ${region}
        log_group_name      ${log_group_name}
        log_stream_name     ${log_stream_name}
        auto_create_group   false
        sts_endpoint        https://sts.${region}.amazonaws.com

I end-up with error

CreateLogStream API responded with error='AccessDeniedException'  
Failed to create log stream
Failed to send events

If I add the option role_arn in the fluentbit configuration file the error is different

[2023/11/07 15:16:32] [error] [signv4] Provider returned no credentials, service=logs
[2023/11/07 15:16:32] [error] [aws_client] could not sign request
[2023/11/07 15:16:32] [error] [aws_credentials] STS assume role request failed 
[2023/11/07 15:16:32] [ warn] [aws_credentials] No cached credentials are available and a credential refresh is already in progress. The current co-routine will retry.

I try to use the module
https://registry.terraform.io/modules/terraform-aws-modules/iam/aws/latest/submodules/iam-role-for-service-accounts-eks on the service account but it doesn’t change anything

I don’t know if there’s no another problem into my cluster configuration ?
Do you have any hint ? Thanks.

2

Answers


  1. Chosen as BEST ANSWER

    Resolved by

    1. use the module "IRSA for K8S". https://registry.terraform.io/modules/terraform-aws-modules/iam/aws/latest/submodules/iam-role-for-service-accounts-eks
    module "fluentbit_cloudwatch_irsa_role" {
      source  = "terraform-aws-modules/iam/aws//modules/iam-role-for-service-accounts-eks"
      version = "5.11.1"
    
      role_name = "balyo-k8s-fluentbit-cloudwatch-irsa-role-${terraform.workspace}"
    
      oidc_providers = {
        main = {
          provider_arn               = data.terraform_remote_state.k8s-cluster.outputs.cluster_oidc_provider_arn
          namespace_service_accounts = ["${var.fluentbit_namespace}:${var.fluentbit_service_account_name}"]
        }
      }
    }
    
    1. Fix the permissions (need to restrict to logs:* for better security)
    resource "aws_iam_policy" "cloudwatch_logs_policy" {
      name = "fluentbit-role"
    
      description = "Role use to create logs from K8S to cloudwatch"
    
      policy = jsonencode({
        Version = "2012-10-17",
        Statement = [
          {
            Action = [
              "ec2:DescribeTags",
              "logs:PutLogEvents",
              "cloudwatch:PutMetricData",
              "logs:DescribeLogStreams",
              "logs:DescribeLogGroups",
              "logs:CreateLogStream",
              "logs:CreateLogGroup"
            ],
            Effect   = "Allow",
            Resource = "*"
          }
        ]
      })
    }
    

  2. I am using using below config for terraform helm deployment of fluent bit.
    What’s wrong in this? I want to log in separate log group for application, host and dataplane.

    serviceAccount:
    create: false
    name: ${SA_Name}

    config:

    fluent-bit.conf: |

    [SERVICE]
        Flush                     5
        Grace                     30
        Log_Level                 info
        Daemon                    off
        Parsers_File              parsers.conf
        HTTP_Server               On
        HTTP_Listen               0.0.0.0
        HTTP_Port                 2020
        storage.path              /var/fluent-bit/state/flb-storage/
        storage.sync              normal
        storage.checksum          off
        storage.backlog.mem_limit 5M
    

    application-log.conf: |

    [INPUT]
        Name                tail
        Tag                 application.*
        Exclude_Path        /var/log/containers/cloudwatch-agent*, /var/log/containers/fluent-bit*, /var/log/containers/aws-node*, /var/log/containers/kube-proxy*
        Path                /var/log/containers/*.log
        multiline.parser    docker, cri
        DB                  /var/fluent-bit/state/flb_container.db
        Mem_Buf_Limit       50MB
        Skip_Long_Lines     On
        Refresh_Interval    10
        Rotate_Wait         30
        storage.type        filesystem
    
    
    [INPUT]
        Name                tail
        Tag                 application.*
        Path                /var/log/containers/fluent-bit*
        multiline.parser    docker, cri
        DB                  /var/fluent-bit/state/flb_log.db
        Mem_Buf_Limit       5MB
        Skip_Long_Lines     On
        Refresh_Interval    10
    
    
    [INPUT]
        Name                tail
        Tag                 application.*
        Path                /var/log/containers/cloudwatch-agent*
        multiline.parser    docker, cri
        DB                  /var/fluent-bit/state/flb_cwagent.db
        Mem_Buf_Limit       5MB
        Skip_Long_Lines     On
        Refresh_Interval    10
    
    
    [FILTER]
        Name                kubernetes
        Match               application.*
        Kube_URL            https://kubernetes.default.svc:443
        Kube_Tag_Prefix     application.var.log.containers.
        Merge_Log           On
        Merge_Log_Key       log_processed
        K8S-Logging.Parser  On
        K8S-Logging.Exclude Off
        Labels              Off
        Annotations         Off
        Use_Kubelet         On
        Kubelet_Port        10250
        Buffer_Size         0
    
    [OUTPUT]
        Name                cloudwatch_logs
        Match               application.*
        region              ${AWS_REGION}
        log_group_name      /aws/containerinsights/${CLUSTER_NAME}/application
        auto_create_group   true
        extra_user_agent    container-insights
    

    dataplane-log.conf: |

    [INPUT]
        Name                systemd
        Tag                 dataplane.systemd.*
        Systemd_Filter      _SYSTEMD_UNIT=docker.service
        Systemd_Filter      _SYSTEMD_UNIT=containerd.service
        Systemd_Filter      _SYSTEMD_UNIT=kubelet.service
        DB                  /var/fluent-bit/state/systemd.db
        Path                /var/log/journal
        Read_From_Tail      On
    
    [INPUT]
        Name                tail
        Tag                 dataplane.tail.*
        Path                /var/log/containers/aws-node*, /var/log/containers/kube-proxy*
        multiline.parser    docker, cri
        DB                  /var/fluent-bit/state/flb_dataplane_tail.db
        Mem_Buf_Limit       50MB
        Skip_Long_Lines     On
        Refresh_Interval    10
        Rotate_Wait         30
        storage.type        filesystem
    
    
    [FILTER]
        Name                modify
        Match               dataplane.systemd.*
        Rename              _HOSTNAME                   hostname
        Rename              _SYSTEMD_UNIT               systemd_unit
        Rename              MESSAGE                     message
        Remove_regex        ^((?!hostname|systemd_unit|message).)*$
    
    [FILTER]
        Name                aws
        Match               dataplane.*
        imds_version        v2
    
    [OUTPUT]
        Name                cloudwatch_logs
        Match               dataplane.*
        region              ${AWS_REGION}
        log_group_name      /aws/containerinsights/${CLUSTER_NAME}/dataplane
        auto_create_group   true
        extra_user_agent    container-insights
    

    host-log.conf: |

    [INPUT]
        Name                tail
        Tag                 host.dmesg
        Path                /var/log/dmesg
        Key                 message
        DB                  /var/fluent-bit/state/flb_dmesg.db
        Mem_Buf_Limit       5MB
        Skip_Long_Lines     On
        Refresh_Interval    10
    
    
    [INPUT]
        Name                tail
        Tag                 host.messages
        Path                /var/log/messages
        Parser              syslog
        DB                  /var/fluent-bit/state/flb_messages.db
        Mem_Buf_Limit       5MB
        Skip_Long_Lines     On
        Refresh_Interval    10
    
    
    [INPUT]
        Name                tail
        Tag                 host.secure
        Path                /var/log/secure
        Parser              syslog
        DB                  /var/fluent-bit/state/flb_secure.db
        Mem_Buf_Limit       5MB
        Skip_Long_Lines     On
        Refresh_Interval    10
    
    
    [FILTER]
        Name                aws
        Match               host.*
        imds_version        v2
    
    [OUTPUT]
        Name                cloudwatch_logs
        Match               host.*
        region              ${AWS_REGION}
        log_group_name      /aws/containerinsights/${CLUSTER_NAME}/host
        auto_create_group   true
        extra_user_agent    container-insights
    

    parsers.conf: |

    [PARSER]
        Name                syslog
        Format              regex
        Regex               ^(?<time>[^ ]* {1,2}[^ ]* [^ ]*) (?<host>[^ ]*) (?<ident>[a-zA-Z0-9_/.-]*)(?:[(?<pid>[0-9]+)])?(?:[^:]*:)? *(?<message>.*)$
        Time_Key            time
        Time_Format         %b %d %H:%M:%S
    
    [PARSER]
        Name                container_firstline
        Format              regex
        Regex               (?<log>(?<="log":")S(?!.).*?)(?<!\)".*(?<stream>(?<="stream":").*?)".*(?<time>d{4}-d{1,2}-d{1,2}Td{2}:d{2}:d{2}.w*).*(?=})
        Time_Key            time
        Time_Format         %Y-%m-%dT%H:%M:%S.%LZ
    
    [PARSER]
        Name                cwagent_firstline
        Format              regex
        Regex               (?<log>(?<="log":")d{4}[/-]d{1,2}[/-]d{1,2}[ T]d{2}:d{2}:d{2}(?!.).*?)(?<!\)".*(?<stream>(?<="stream":").*?)".*(?<time>d{4}-d{1,2}-d{1,2}Td{2}:d{2}:d{2}.w*).*(?=})
        Time_Key            time
        Time_Format         %Y-%m-%dT%H:%M:%S.%LZ
    

    Terraform, right format

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search