Monitoring a Kubernetes Pod's Memory Usage

12 Sep, 2023

The Problem

The client had a pod that was getting killed with OOMKilled as the reason. This is pretty obvious as it means that either the node ran out of memory and a pod was killed, or the pod exceeded it's limit defined in the pod definition or the controller definition (deployment, replicaset, statefulset, or daemonset).

The catch

The pod was allocated 4Gi of memory,but in the metric graphs it only went to 1.5Gi and the node had 12Gi, but never went above 8Gi. So what's going on here?

The metric gathering on Kubernetes is every 60s so the hypothesis is that the pod is allocating a huge amount of memory in less than a minute. To test this I wrote this bash script:

#!/bin/bash

LOG_FILE="command_log.txt"

while true; do
  {
    pod_name=$(kubectl get pods -l component=component_name -o custom-columns="NAME:.metadata.name" --no-headers=true)

    echo "Timestamp: $(date)"
    echo "Check for Pod: "
    command_1_output=$( kubectl get pod $metabase_pod_name)
    echo "$command_1_output"

    echo "Check Pod Mem Usage Using Kubectl Top: "
    command_2_output=$(kubectl top pod $metabase_pod_name)
    echo "$command_2_output"

    echo "Check Pod Mem Usage Using cgroup Memory:"
    command_3_output=$(temp=$(kubectl exec -it $metabase_pod_name -- cat /sys/fs/cgroup/memory/memory.usage_in_bytes); temp2=$(echo $temp | tr -d '\r'); result=$(echo "$temp2 / 1024 / 1024" | bc); echo $result"MiB")
    echo "$command_3_output"

    echo "Check Pod Last State: "
    command_4_output=$(kubectl get pods $metabase_pod_name -o custom-columns=NAME:.metadata.name,FINISHED:.status.containerStatuses[*].lastState.terminated.reason)
    echo "$command_4_output"

    echo "----------------------------------"
  } >> "$LOG_FILE"

  # Sleep for 10 seconds
  sleep 10
done

It is just looping forever and every 10 seconds it does this:

Record the date and time
Check the pod (to check for restarts)
Check the memory as reported through kubectl (to see what prometheus, google metrics, and kubernetes see)
Check the memory as reported through the cgroup usage
Check the previous run error (to see if it is Error or OOMKilled). In this case Error will mean the Java Virtual Machine is probably reporting an OutOfMemory Exception and OOMKilled reports the pod was killed by the kernel.

Running this for several hours told us that it was indeed spiking in memory usage:

----------------------------------
Timestamp: Mon Sep 11 18:05:45 PDT 2023
Check for Pod: 
NAME                        READY   STATUS    RESTARTS   AGE
pod-6b4f75bb68-l8kn5        1/1     Running   0          145m
Check Pod Mem Usage Using Top: 
NAME                        CPU(cores)   MEMORY(bytes)   
pod-6b4f75bb68-l8kn5        10m          1184Mi          
Check Pod Mem Usage Using cgroup Memory:
1480MiB
Check Pod Last State: 
NAME                        FINISHED
pod-6b4f75bb68-l8kn5        <none>
----------------------------------
Timestamp: Mon Sep 11 18:05:58 PDT 2023
Check for Pod: 
NAME                        READY   STATUS    RESTARTS   AGE
pod-6b4f75bb68-l8kn5        1/1     Running   0          145m
Check Pod Mem Usage Using Top: 
NAME                        CPU(cores)   MEMORY(bytes)   
pod-6b4f75bb68-l8kn5        64m          1184Mi          
Check Pod Mem Usage Using cgroup Memory:
2002MiB
Check Pod Last State: 
NAME                        FINISHED
pod-6b4f75bb68-l8kn5        <none>
----------------------------------
Timestamp: Mon Sep 11 18:06:13 PDT 2023
Check for Pod: 
NAME                        READY   STATUS    RESTARTS   AGE
pod-6b4f75bb68-l8kn5        1/1     Running   0          145m
Check Pod Mem Usage Using Top: 
NAME                        CPU(cores)   MEMORY(bytes)   
pod-6b4f75bb68-l8kn5        64m          1184Mi          
Check Pod Mem Usage Using cgroup Memory:
3551MiB
Check Pod Last State: 
NAME                        FINISHED
pod-6b4f75bb68-l8kn5        <none>
----------------------------------
Timestamp: Mon Sep 11 18:06:26 PDT 2023
Check for Pod: 
NAME                        READY   STATUS    RESTARTS   AGE
pod-6b4f75bb68-l8kn5        1/1     Running   0          145m
Check Pod Mem Usage Using Top: 
NAME                        CPU(cores)   MEMORY(bytes)   
pod-6b4f75bb68-l8kn5        64m          1184Mi          
Check Pod Mem Usage Using cgroup Memory:
4694MiB
Check Pod Last State: 
NAME                        FINISHED
pod-6b4f75bb68-l8kn5        <none>
----------------------------------
Timestamp: Mon Sep 11 18:06:40 PDT 2023
Check for Pod: 
NAME                        READY   STATUS    RESTARTS      AGE
pod-6b4f75bb68-l8kn5        1/1     Running   1 (10s ago)   146m
Check Pod Mem Usage Using Top: 
NAME                        CPU(cores)   MEMORY(bytes)   
pod-6b4f75bb68-l8kn5        1577m        3573Mi          
Check Pod Mem Usage Using cgroup Memory:
789MiB
Check Pod Last State: 
NAME                        FINISHED
pod-6b4f75bb68-l8kn5        Error

Here we see between: 18:05:45 and 18:06:26 (41 seconds) the memory was spiking. Going from 1480MiB to 4694MiB. The limit is 4696Mi (4Gi + 600Mi) and it was obviously breached and the kernel killed it.

So the problem is an extreme memory spike that needs to be looked into. However Kuberentes kubectl top, GKE Metrics, Prometheus all never reported the memory accurately as from the 60s rate, it wasn't ever noticed.