Kubernetes Troubleshooting: Fixing Pod Issues with Restricted UID in securityContext

Troubleshooting is the art of turning frustration into curiosity and chaos into clarity …

Asish M Madhu

DevOps for you

· ~9 min read · January 13, 2025 (Updated: January 17, 2025) · Free: No

Troubleshooting is the art of turning frustration into curiosity and chaos into clarity …

Introduction

Running applications as a non-root user is a crucial security practice, especially in containerized environments like Kubernetes. While Kubernetes provides the securityContext and its runAsUser feature to simplify this, implementing it can sometimes lead to unexpected challenges. In this article, I share my journey of troubleshooting and overcoming these hurdles, ensuring that the application runs securely without elevated privileges. Whether you're encountering similar challenges or just beginning to explore the runAsUser configuration, this article offers practical insights and solutions to strengthen your skills in implementing security for Kubernetes Pods. I encourage you to read through to the end for a comprehensive understanding.

The Situation

Realizing the importance of running applications as non root user and the complications it can create while going for business compliance, I decided to test securityContext feature in k8s for an nginx based application. By default, the container from nginx image runs as root. Based on my intention, I am changing the behaviour to run as user id "400" ( A randon UID). It is added under container spec, add below securityContext

spec:
      containers:
      - image: nginx
        securityContext:
          runAsUser: 400
        name: app1

I exepected the pod to come up, but it decided not to. It complains of permission issues. The container do not have permission to access certain files, while starting up.

The directory /usr/share/nginx/html is not mounted.
Therefore, over-writing the default index.html file with some useful information:
tee: /usr/share/nginx/html/index.html: Permission denied
Praqma Network MultiTool (with NGINX) - app1-59b5fff48c-645xb - 10.42.3.20 - HTTP: 80 , HTTPS: 443
========================= IMPORTANT ==============================
/docker/entrypoint.sh: line 26: can't create /usr/share/nginx/html/index.html: Permission denied
cat: can't open '/root/press-release.md': Permission denied
==================================================================
nginx: [alert] could not open error log file: open() "/var/lib/nginx/logs/error.log" failed (13: Permission denied)
2025/01/09 05:17:03 [warn] 1#1: the "user" directive makes sense only if the master process runs with super-user privileges, ignored in /etc/nginx/nginx.conf:3
2025/01/09 05:17:03 [emerg] 1#1: cannot load certificate "/certs/server.crt": BIO_new_file() failed (SSL: error:0200100D:system library:fopen:Permission denied:fopen('/certs/server.crt','r') error:2006D002:BIO routines:BIO_new_file:system lib)
kubectl  logs app1-5bffc684c4-gmlhj -n proda
/docker-entrypoint.sh: /docker-entrypoint.d/ is not empty, will attempt to perform configuration
/docker-entrypoint.sh: Looking for shell scripts in /docker-entrypoint.d/
/docker-entrypoint.sh: Launching /docker-entrypoint.d/10-listen-on-ipv6-by-default.sh
10-listen-on-ipv6-by-default.sh: info: can not modify /etc/nginx/conf.d/default.conf (read-only file system?)
/docker-entrypoint.sh: Sourcing /docker-entrypoint.d/15-local-resolvers.envsh
/docker-entrypoint.sh: Launching /docker-entrypoint.d/20-envsubst-on-templates.sh
/docker-entrypoint.sh: Launching /docker-entrypoint.d/30-tune-worker-processes.sh
/docker-entrypoint.sh: Configuration complete; ready for start up
2025/01/10 06:08:40 [warn] 1#1: the "user" directive makes sense only if the master process runs with super-user privileges, ignored in /etc/nginx/nginx.conf:2
nginx: [warn] the "user" directive makes sense only if the master process runs with super-user privileges, ignored in /etc/nginx/nginx.conf:2
2025/01/10 06:08:40 [emerg] 1#1: mkdir() "/var/cache/nginx/client_temp" failed (13: Permission denied)
nginx: [emerg] mkdir() "/var/cache/nginx/client_temp" failed (13: Permission denied)
kubectl  logs app1-5bffc684c4-gmlhj -n proda
Defaulted container "app1" out of: app1, debugger-nk2jx (ephem), debugger-42pkr (ephem), debugger-4vblr (ephem)
Error from server: Get "https://192.168.1.7:10250/containerLogs/proda/app1-5bffc684c4-gmlhj/app1": proxy error from 127.0.0.1:6443 while dialing 192.168.1.7:10250, code 502: 502 Bad Gateway
➜  kubectl debug -n proda -it app1-5bffc684c4-gmlhj  --image=busybox
Defaulting debug container name to debugger-nk2jx.
If you don't see a command prompt, try pressing enter.
warning: couldn't attach to pod/app1-5bffc684c4-gmlhj, falling back to streaming logs: error dialing backend: proxy error from 127.0.0.1:6443 while dialing 192.168.1.7:10250, code 502: 502 Bad Gateway
Error from server: Get "https://192.168.1.7:10250/containerLogs/proda/app1-5bffc684c4-gmlhj/debugger-nk2jx": proxy error from 127.0.0.1:6443 while dialing 192.168.1.7:10250, code 502: 502 Bad Gateway

I notice the errors mentioning about ENTRYPOINT, permission issue, failed to open file etc. I also notice that for some pods it complains about 502 bad gateway. WHY? This could be some unrelated issue. May be the images do not have the corresponding UID? But why does it complain about BAD Gateway after sometime. Let's understand the workflow when we run a kubectl log command.

kubectl will perform the basic validation, gets the k8s API endpoint, user token, certs and construct a http request to API server.
The k8s API server will perform authentication, authorization and retrieves the Pod information from etcd. From this information it gets the node where the pod is running currently.
It then proxies the log request to the kubelet on the node. The request will look like below as the error log. The URL request is send by API servers, internal proxy https://192.168.1.7:10250/containerLogs/proda/app1-5bffc684c4-gmlhj/app1
The kubelet in the node will recieve the request and verifies if the pod and container exists in the node. In our case the container might be down and hence the 502 response to API server, from kubelet ? Or it could be some issue with kubelet and the node itself. It should have returned 404 instead of 502. This need to be verified.
If the container had existed, it will connect to the container runtime and grabbed the logs from the location of container logs.

My suspicion of kubelet issue on the node side, came true after I ran the pod in another node. As it never complained about the 502 gateway isue. Drained the node and deleted it.

Focusing on the main issue (Permission issue ), let's see how we can fix this.

Find the existing UID of the application image and use it in the securityContext. Ideally the application which the container will be running should have a UID. Let me run the image locally and get the details from /etc/passwd

➜  docker run -it --rm  nginx /bin/bash
root@cdb7b7a8cb62:/# grep nginx /etc/passwd
nginx:x:101:101:nginx user:/nonexistent:/bin/false

I am updating the UID to 101 in the pod definition and running it.

spec:
      containers:
      - image: nginx
        securityContext:
          runAsUser: 101

Still getting the permission issue;

nginx: [warn] the "user" directive makes sense only if the master process runs with super-user privileges, ignored in /etc/nginx/nginx.conf:2
2025/01/10 06:08:40 [emerg] 1#1: mkdir() "/var/cache/nginx/client_temp" failed (13: Permission denied)
nginx: [emerg] mkdir() "/var/cache/nginx/client_temp" failed (13: Permission denied)

It is complaining about permission issues for /var/cache/nginx folder as well as a warning about user directive in config. Before I fix this, I wanted to test, what change it will bring when I enable the securityContext at POD level -> spec.template.spec.securityContext

spec:
      securityContext:
        runAsUser: 101
      containers:
      - image: nginx
        #securityContext:
        #  runAsUser: 101
        #  runAsGroup: 101
➜  kubectl apply -f deploy.yaml -n proda
deployment.apps/app1 configured
➜  kubectl get pods -n proda
NAME                    READY   STATUS    RESTARTS   AGE
app1-694d5bc75f-5pjs5   1/1     Running   0          3s
app1-694d5bc75f-bg46q   1/1     Running   0          5s
app1-694d5bc75f-2t78l   1/1     Running   0          5s

The pod is up now, but I observe that the main process is run as root and its child process is run as nginx user (UID 101)

root@app1-694d5bc75f-2t78l:/# ps -aef
UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0 00:51 ?        00:00:00 nginx: master process nginx -g daemon off;
nginx       30     1  0 00:51 ?        00:00:00 nginx: worker process
nginx       31     1  0 00:51 ?        00:00:00 nginx: worker process
root        32     0  0 00:51 pts/0    00:00:00 bash

This is not what I want. Let me also try fsGroup and runAsGroup options in securityContext. While fsGroup affects file system group ownership for mounted volumes, runAsGroup affects the primary group ID for container processes

spec:
      securityContext:
        runAsUser: 101
        runAsGroup: 101
        fsGroup: 101
      containers:
      - image: nginx

I noticed, still the container started as user root.

Can we fix this by adjusting the user directivie in nginx config file. Let's add below config file for nginx.

apiVersion: v1
kind: ConfigMap
metadata:
  name: nginx-config
  namespace: check-host-ns
data:
  nginx.conf: |
    user nginx;
    events {
      worker_connections 1024;
    }
    http {
      server {
        listen 8080;
        location / {
          root /usr/share/nginx/html;
          index index.html;
        }
      }
    }

Note that, I have added a user and changed the default port to 8080.

Also adjusted the deploy resource to include the nginx config map as mount volume

 ...
        volumeMounts:
          - name: nginx-config
            mountPath: /etc/nginx/nginx.conf
            subPath: nginx.conf
      volumes:
        - name: nginx-config
          configMap:
            name: nginx-config
...
➜  kubectl exec -it app1-6c86f8bbfc-2v76n -n proda -- id
uid=0(root) gid=0(root) groups=0(root)

Still it is starting as root user. Also the permission of nginx directories are owned by root.

➜  kubectl exec -it app1-6c86f8bbfc-2v76n -n proda -- ls -ld /var/cache/nginx
drwxr-xr-x 1 root root 4096 Jan 11 01:08 /var/cache/nginx

Now let me try the powers of an init container. I can change the permission of folders using an initContainer.

initContainers:
      - name: fix-permissions
        image: busybox
        command:
          - sh
          - -c
          - |
            chown -R 101:101 /var/cache/nginx
            chown -R 101:101 /usr/share/nginx/html
        volumeMounts:
          - name: nginx-cache
            mountPath: /var/cache/nginx
          - name: nginx-html
            mountPath: /usr/share/nginx/html
➜  kubectl exec -it app1-98758c996-2nn5h -n proda -- ls -ld /var/cache/nginx
drwxrwxrwx 7 nginx nginx 4096 Jan 11 02:31 /var/cache/nginx

Okey, so now the folder permission is fixed. But still the pod is started as root user.

Is there a better way? How can I start the process as nginx user, so that all folders come up with it's permission. The ideal solution will be to update the Dockerfile for the image to include USER directive. But here we are finding a solution within the kubernetes resources.

The container runtime will start the container as per the ENTRYPOINT entry of the container image. Can we adjust that a little bit, so that it starts as nginx user. I am going to try overwrite it within k8s by starting a shell and running the nginx inside it as argument.

Let's try below;

      containers:
      - image: nginx
        command:
        - /bin/sh
        args:
        - -c
        - |
          chown -R 101:101 /var/cache/nginx /var/run /usr/share/nginx/html && \
          nginx -g "daemon off;"
        securityContext:
          runAsUser: 101
          runAsGroup: 101

The problem here is, I am asking the pod to run as nginx user and trying to change folder permissions on /var/run which are restricted to root user. It requires root priviledge to change permission on /var/run folder. I think we can leverage an initContainer to change the permissions of folders and let the main container just start nginx.

      containers:
      - image: nginx
        command:
        - /bin/sh
        args:
        - -c
        - |
          nginx -g "daemon off;"
        securityContext:
          runAsUser: 101
          runAsGroup: 101
        name: app1
        volumeMounts:
          - name: nginx-config
            mountPath: /etc/nginx/nginx.conf
            subPath: nginx.conf
          - name: nginx-cache
            mountPath: /var/cache/nginx
          - name: nginx-html
            mountPath: /usr/share/nginx/html
          - name: nginx-run
            mountPath: /var/run
      initContainers:
      - name: fix-permissions
        image: busybox
        command:
          - sh
          - -c
          - |
            mkdir -p /var/run && \
            touch /var/run/nginx.pid && \
            chown -R 101:101 /var/run
            chown -R 101:101 /var/cache/nginx
            chown -R 101:101 /usr/share/nginx/html
        volumeMounts:
          - name: nginx-cache
            mountPath: /var/cache/nginx
          - name: nginx-html
            mountPath: /usr/share/nginx/html
          - name: nginx-run
            mountPath: /var/run

This worked like a charm.

➜  kubectl get pods -n proda
NAME                    READY   STATUS    RESTARTS   AGE
app1-5f9cf87669-f5jp8   1/1     Running   0          3h39m
app1-5f9cf87669-jzhtr   1/1     Running   0          3h39m
app1-5f9cf87669-w6m9h   1/1     Running   0          3h39m
app1-5f9cf87669-z8j2r   1/1     Running   0          3h39m
➜  kubectl exec -it app1-5f9cf87669-f5jp8  -n proda -c app1 -- id
uid=101(nginx) gid=101(nginx) groups=101(nginx)
➜  kubectl exec -it app1-5f9cf87669-f5jp8  -n proda -c app1 -- ls -ld /usr/share/nginx/html
drwxrwxrwx 2 nginx nginx 4096 Jan 11 08:35 /usr/share/nginx/html

Now, you might argue that running the initContainer as a root user is necessary to change folder permissions. However, it's important to remember that an initContainer is a special type of container in a Pod that executes before any main container starts. Its primary purpose is to perform initialization tasks such as setting up configurations, fixing file permissions, or ensuring dependencies are ready. The unique advantage of initContainers lies in their ability to share volumes mounted at the Pod level, ensuring that the main container starts only after the initContainer has successfully completed its tasks.

Situation Under Control ?

Now, that we achieved our objective, for pruduction environments, I would suggest to fix it at image level. In below dockerfile, I am using the USER directive to make sure the container starts as user nginx.

FROM nginx:latest
COPY nginx.conf /etc/nginx/nginx.conf
RUN mkdir -p /var/cache/nginx && chown -R nginx:nginx /var/cache/nginx
USER nginx

This was tested in a local desktop docker setup.

➜ docker build -t nginx-local:latest .
➜  securitycontext docker run -it --rm  nginx-local id
uid=101(nginx) gid=101(nginx) groups=101(nginx)
➜  securitycontext docker run -it --rm  nginx-local ls -ld /var/cache/nginx
drwxr-xr-x 1 nginx nginx 4096 Nov 26 16:44 /var/cache/nginx

We can push this image and use it in the pod image, without any initContainers.

Conclusion

During this troubleshooting exercise, we gained hands-on experience configuring and running containers as a non-root user using the securityContext feature in Kubernetes. Each challenge provided valuable insights as we implemented solutions to overcome various barriers. Key aspects we addressed included understanding how kubectl logs operates, leveraging the capabilities of initContainers, overriding the ENTRYPOINT set by container images within Kubernetes Pod definitions, and ultimately creating a custom Docker image with the USER directive baked into it.

Hope you enjoyed this article. If you liked my article, you can follow my publication for future articles, which give me the motivation to share more. If you have any comments, please write to me. https://medium.com/@asishmm https://devopsforyou.com/

Thank You !

#devops #kubernetes #development #containers #security

Kubernetes Troubleshooting: Fixing Pod Issues with Restricted UID in securityContext

Troubleshooting is the art of turning frustration into curiosity and chaos into clarity …

Introduction

The Situation

Situation Under Control ?

Conclusion

Reporting a Problem