I’m running Apache Openwhisk on k3s, installed using helm.
Below is the invoker logs, taken several hours after a fresh install, with several functions set to run periodically. This message appears every few seconds after the problem starts.
[2020-03-17T13:27:12.691Z] [ERROR] [#tid_sid_invokerHealth] [ContainerPool]
Rescheduling Run message, too many message in the pool, freePoolSize: 0 containers and 0 MB,
busyPoolSize: 8 containers and 4096 MB, maxContainersMemory 4096 MB, userNamespace: whisk.system,
action: ExecutableWhiskAction/whisk.system/[email protected], needed memory: 128 MB,
waiting messages: 24
Here are the running pods. Notice all the function pods have an age of 11+ hours.
NAME READY STATUS RESTARTS AGE
openwhisk-gen-certs-n965b 0/1 Completed 0 14h
openwhisk-init-couchdb-4s9rh 0/1 Completed 0 14h
openwhisk-install-packages-pnvmq 0/1 Completed 0 14h
openwhisk-apigateway-78c64dd7c9-2gsw6 1/1 Running 2 14h
openwhisk-couchdb-844c6df68f-qrxq6 1/1 Running 2 14h
openwhisk-wskadmin 1/1 Running 2 14h
openwhisk-redis-77494b8d44-gkmlt 1/1 Running 2 14h
openwhisk-zookeeper-0 1/1 Running 2 14h
openwhisk-kafka-0 1/1 Running 2 14h
openwhisk-controller-0 1/1 Running 2 14h
openwhisk-nginx-5f795dd747-c228s 1/1 Running 4 14h
openwhisk-cloudantprovider-69fd94b6f6-x88f4 1/1 Running 2 14h
openwhisk-kafkaprovider-544fbfdcc7-kn29p 1/1 Running 2 14h
openwhisk-alarmprovider-58c5454cc8-q4wbw 1/1 Running 2 14h
openwhisk-invoker-0 1/1 Running 2 14h
wskopenwhisk-invoker-00-1-prewarm-nodejs10 1/1 Running 0 14h
wskopenwhisk-invoker-00-6-prewarm-nodejs10 1/1 Running 0 13h
wskopenwhisk-invoker-00-15-whisksystem-checkuserload 1/1 Running 0 13h
wskopenwhisk-invoker-00-31-whisksystem-guacscaleup 1/1 Running 0 12h
wskopenwhisk-invoker-00-30-whisksystem-guacscaledown 1/1 Running 0 12h
wskopenwhisk-invoker-00-37-whisksystem-functionelastalertcheckd 1/1 Running 0 11h
wskopenwhisk-invoker-00-39-whisksystem-checkuserload 1/1 Running 0 11h
wskopenwhisk-invoker-00-40-whisksystem-functionelastalertcheckd 1/1 Running 0 11h
wskopenwhisk-invoker-00-42-whisksystem-guacscaleup 1/1 Running 0 11h
wskopenwhisk-invoker-00-43-whisksystem-functionelastalertcheckd 1/1 Running 0 11h
Shouldn’t Openwhisk be killing these pods after they reach the timeout? The functions all have a timeout of either 3 or 5 minutes, but Openwhisk doesn’t seem to enforce this.
One other thing I noticed was “timeout” being set to “false” on the activations.
$ wsk activation get ...
{
"annotations": [
...
{
"key": "timeout",
"value": false
},
...
}
2
Answers
Ok I fixed this by changing the invoker container factory implementation to
docker
. I'm not sure why the kubernetes implementation fails to kill pods (and release memory), but we are using docker as the container runtime for k3s.To set this, change
invoker.containerFactory.impl
todocker
in the helm chart values: https://github.com/apache/openwhisk-deploy-kube/blob/master/helm/openwhisk/values.yaml#L261I also increased the invoker memory (
invoker.jvmHeapMB
) to1024
: https://github.com/apache/openwhisk-deploy-kube/blob/master/helm/openwhisk/values.yaml#L257Here is a link that explains the container factory setting: https://github.com/apache/openwhisk-deploy-kube/blob/master/docs/configurationChoices.md#invoker-container-factory
The
timeout
annotation is specific to an particular activation. If the value istrue
it means that particular activation of the corresponding function exceeded its set maximum duration which is a range of values from 100 ms to 5 minutes by default (per the docs) unless changed for the system deployment as a whole.The pods are used to execute the functions – they will stick around for some duration while idle to facilitate future warm starts. The openwhisk invoker will terminate these warm pods eventually after an idle timeout, or when resource are required to run other pods.