skip to Main Content

I’m using the latest version of google-kubernetes (1.22.8-gke.202) in a Kubernetes managed cluster. I also have a custom service account that has access to the "Artifact Registry Reader" scope that should grant it permission to pull private images from the repository – calling this custom-service-account.

I’ve validated that the nodes themselves have the custom-service-account service account linked to them within Compute Engine. Kubernetes is setup with a service account that is linked to the IAM service account with the same name through workload identity. However, when I try to spawn a pod that pulls from my private repo it fails indefinitely.

Events:
  Type     Reason             Age                   From                Message
  ----     ------             ----                  ----                -------
  Warning  FailedScheduling   21m (x3 over 24m)     default-scheduler   0/2 nodes are available: 2 node(s) were unschedulable.
  Warning  FailedScheduling   19m                   default-scheduler   no nodes available to schedule pods
  Normal   NotTriggerScaleUp  18m (x25 over 24m)    cluster-autoscaler  pod didn't trigger scale-up: 1 node(s) had taint {reserved-pool: true}, that the pod didn't tolerate
  Normal   Scheduled          18m                   default-scheduler   Successfully assigned default/test-service-a-deployment-5757fc5797-b54gx to gke-personal-XXXX--personal-XXXX--ac9a05b6-16sb
  Normal   Pulling            17m (x4 over 18m)     kubelet             Pulling image "us-central1-docker.pkg.dev/personal-XXXX/my-test-repo/my-test-repo-business-logic:latest"
  Warning  Failed             17m (x4 over 18m)     kubelet             Failed to pull image "us-central1-docker.pkg.dev/personal-XXXX/my-test-repo/my-test-repo-business-logic:latest": rpc error: code = Unknown desc = failed to pull and unpack image "us-central1-docker.pkg.dev/personal-XXXX/my-test-repo/my-test-repo-business-logic:latest": failed to resolve reference "us-central1-docker.pkg.dev/personal-XXXX/my-test-repo/my-test-repo-business-logic:latest": failed to authorize: failed to fetch anonymous token: unexpected status: 403 Forbidden
  Warning  Failed             17m (x4 over 18m)     kubelet             Error: ErrImagePull
  Warning  Failed             16m (x6 over 18m)     kubelet             Error: ImagePullBackOff
  Normal   BackOff            3m27s (x65 over 18m)  kubelet             Back-off pulling image "us-central1-docker.pkg.dev/personal-XXXX/my-test-repo/my-test-repo-business-logic:latest"

I’ve also ssh’ed into the nodes themselves and at least by default with a regular docker pull or crictl pull see this same error.

So, the specific questions I have:

  • How is GCP injecting the service account credentials into Kubernetes/Docker worker that tries to launch the images? Is it expected that the regular docker command doesn’t seem to have these credentials?
  • Do I need to manually bootstrap some additional authentication for Kubernetes aside from just inheriting the service account on the pods?

EDIT: Result of here

> gcloud container clusters describe personal-XXXX-gke --zone us-central1-a --format="value(workloadIdentityConfig.workloadPool)"
personal-XXXX.svc.id.goog

> gcloud container node-pools describe personal-XXXX-gke-node-pool --cluster personal-XXXX-gke --format="value(config.workloadMetadataConfig.mode)" --zone us-central1-a
GKE_METADATA

> kubectl describe serviceaccount --namespace default be-service-account
Name:                be-service-account
Namespace:           default
Labels:              <none>
Annotations:         iam.gke.io/gcp-service-account: [email protected]
Image pull secrets:  <none>
Mountable secrets:   be-service-account-token-jmss9
Tokens:              be-service-account-token-jmss9
Events:              <none>

> gcloud iam service-accounts get-iam-policy [email protected]
bindings:
- members:
  - serviceAccount:personal-XXXX.svc.id.goog[default/be-service-account]
  role: roles/iam.workloadIdentityUser
etag: BwXjqJ9DC6A=
version: 1

2

Answers


  1. When checking for access to artifact registry, please check permission and scopes as per this documentation.

    Login or Signup to reply.
  2. Depending on how your cluster is created, various scopes are added. https://cloud.google.com/kubernetes-engine/docs/how-to/access-scopes#create_with_sa

    In my case, I created Autopilot cluster from the console (UI) and did everything you did w.r.t linking service accounts – turns out the default service account that gets applied does not get the scope cloud-platform.

    I ended up re-creating the cluster with the right service account (non-default) for my autopilot nodes. https://cloud.google.com/sdk/gcloud/reference/container/clusters/create#–scopes. Most likely to use the CLI for future creations.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search