skip to Main Content

In short:

I have a Django application being served up by Apache on a Google Compute Engine VM.

I want to access a secret from Google Secret Manager in my Python code (when the Django app is initialising).

When I do ‘python manage.py runserver’, the secret is successfully retrieved. However, when I get Apache to run my application, it hangs when it sends a request to the secret manager.

Too much detail:

I followed the answer to this question GCP VM Instance is not able to access secrets from Secret Manager despite of appropriate Roles. I have created a service account (not the default), and have given it the ‘cloud-platform’ scope. I also gave it the ‘Secret Manager Admin’ role in the web console.

After initially running into trouble, I downloaded the a json key for the service account from the web console, and set the GOOGLE_APPLICATION_CREDENTIALS env-var to point to it.

When I run the django server directly on the VM, everything works fine. When I let Apache run the application, I can see from the logs that the service account credential json is loaded successfully.

However, when I make my first API call, via google.cloud.secretmanager.SecretManagerServiceClient.list_secret_versions , the application hangs. I don’t even get a 500 error in my browser, just an eternal loading icon. I traced the execution as far as:

grpc._channel._UnaryUnaryMultiCallable._blocking, line 926 : ‘call = self._channel.segregated_call(…’

It never gets past that line. I couldn’t figure out where that call goes so I couldnt inspect it any further than that.

Thoughts

I don’t understand GCP service accounts / API access very well. I can’t understand why this difference is occurring between the django dev server and apache, given that they’re both using the same service account credentials from json. I’m also surprised that the application just hangs in the google library rather than throwing an exception. There’s even a timeout option when sending a request, but changing this doesn’t make any difference.

I wonder if it’s somehow related to the fact that I’m running the django server under my own account, but apache is using whatever user account it uses?

Update

I tried changing the user/group that apache runs as to match my own. No change.

I enabled logging for gRPC itself. There is a clear difference between when I run with apache vs the django dev server.

On Django:

secure_channel_create.cc:178] grpc_secure_channel_create(creds=0x17cfda0, target=secretmanager.googleapis.com:443, args=0x7fe254620f20, reserved=(nil))
init.cc:167]                grpc_init(void)
client_channel.cc:1099]     chand=0x2299b88: creating client_channel for channel stack 0x2299b18
...
timer_manager.cc:188]       sleep for a 1001 milliseconds
...
client_channel.cc:1879]     chand=0x2299b88 calld=0x229e440: created call
...
call.cc:1980]               grpc_call_start_batch(call=0x229daa0, ops=0x20cfe70, nops=6, tag=0x7fe25463c680, reserved=(nil))
call.cc:1573]               ops[0]: SEND_INITIAL_METADATA...
call.cc:1573]               ops[1]: SEND_MESSAGE ptr=0x21f7a20
...

So, a channel is created, then a call is created, and then we see gRPC start to execute the operations for that call (as far as I read it).

On Apache:

secure_channel_create.cc:178] grpc_secure_channel_create(creds=0x7fd5bc850f70, target=secretmanager.googleapis.com:443, args=0x7fd583065c50, reserved=(nil))
init.cc:167]                grpc_init(void)
client_channel.cc:1099]     chand=0x7fd5bca91bb8: creating client_channel for channel stack 0x7fd5bca91b48
...
timer_manager.cc:188]       sleep for a 1001 milliseconds
...
timer_manager.cc:188]       sleep for a 1001 milliseconds
...

So, we a channel is created… and then nothing. No call, no operations. So the python code is sitting there waiting for gRPC to make this call, which it never does.

3

Answers


  1. Chosen as BEST ANSWER

    The problem appears to be that the forking behaviour of Apache breaks gRPC somehow. I couldn't nail down the precise cause, but after I began to suspect that forking was the issue, I found this old gRPC issue that indicates that forking is a bit of a tricky area.

    I tried to reconfigure Apache to use a different 'Multi-processing Module', but as my experience in this is limited, I couldn't get gRPC to work under any of them.

    In the end, I switched to using nginx/uwsgi instead of Apache/mod_wsgi, and I did not have the same issue. If you're trying to solve a problem like this and you have to use Apache, I'd advice further investigating Apache forking, how gRPC handles forking, and the different MPMs available for Apache.


  2. I’m facing a similar issue. When running my Flask Application with eventlet==0.33.0 and gunicorn https://github.com/benoitc/gunicorn/archive/ff58e0c6da83d5520916bc4cc109a529258d76e1.zip#egg=gunicorn==20.1.0. When calling secret_client.access_secret_version it hangs forever.

    It used to work fine with an older eventlet version, but we needed to upgrade to the latest version of eventlet due to security reasons.

    Login or Signup to reply.
  3. I experienced a similar issue and I was able to solve with the following:

    import grpc.experimental.gevent as grpc_gevent
    from gevent import monkey
    from google.cloud import secretmanager
    
    monkey.patch_all()
    grpc_gevent.init_gevent()
    
    client = secretmanager.SecretManagerServiceClient()
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search