Monitoring a Kubernetes Pod's Memory Usage
The Problem
The client had a pod that was getting killed with OOMKilled as the reason. This is pretty obvious as it means that either the node ran out of memory and a pod was killed, or the pod exceeded it's limit defined in the pod definition or the controller definition (deployment, replicaset, statefulset, or daemonset).
The catch
The pod was allocated 4Gi of memory,but in the metric graphs it only went to 1.5Gi and the node had 12Gi, but never went above 8Gi. So what's going on here?
The metric gathering on Kubernetes is every 60s so the hypothesis is that the pod is allocating a huge amount of memory in less than a minute. To test this I wrote this bash script:
#!/bin/bash
LOG_FILE="command_log.txt"
while true; do
{
pod_name=$(kubectl get pods -l component=component_name -o custom-columns="NAME:.metadata.name" --no-headers=true)
echo "Timestamp: $(date)"
echo "Check for Pod: "
command_1_output=$( kubectl get pod $metabase_pod_name)
echo "$command_1_output"
echo "Check Pod Mem Usage Using Kubectl Top: "
command_2_output=$(kubectl top pod $metabase_pod_name)
echo "$command_2_output"
echo "Check Pod Mem Usage Using cgroup Memory:"
command_3_output=$(temp=$(kubectl exec -it $metabase_pod_name -- cat /sys/fs/cgroup/memory/memory.usage_in_bytes); temp2=$(echo $temp | tr -d '\r'); result=$(echo "$temp2 / 1024 / 1024" | bc); echo $result"MiB")
echo "$command_3_output"
echo "Check Pod Last State: "
command_4_output=$(kubectl get pods $metabase_pod_name -o custom-columns=NAME:.metadata.name,FINISHED:.status.containerStatuses[*].lastState.terminated.reason)
echo "$command_4_output"
echo "----------------------------------"
} >> "$LOG_FILE"
# Sleep for 10 seconds
sleep 10
done
It is just looping forever and every 10 seconds it does this:
- Record the date and time
- Check the pod (to check for restarts)
- Check the memory as reported through kubectl (to see what prometheus, google metrics, and kubernetes see)
- Check the memory as reported through the cgroup usage
- Check the previous run error (to see if it is
Error
orOOMKilled
). In this caseError
will mean the Java Virtual Machine is probably reporting an OutOfMemory Exception andOOMKilled
reports the pod was killed by the kernel.
Running this for several hours told us that it was indeed spiking in memory usage:
----------------------------------
Timestamp: Mon Sep 11 18:05:45 PDT 2023
Check for Pod:
NAME READY STATUS RESTARTS AGE
pod-6b4f75bb68-l8kn5 1/1 Running 0 145m
Check Pod Mem Usage Using Top:
NAME CPU(cores) MEMORY(bytes)
pod-6b4f75bb68-l8kn5 10m 1184Mi
Check Pod Mem Usage Using cgroup Memory:
1480MiB
Check Pod Last State:
NAME FINISHED
pod-6b4f75bb68-l8kn5 <none>
----------------------------------
Timestamp: Mon Sep 11 18:05:58 PDT 2023
Check for Pod:
NAME READY STATUS RESTARTS AGE
pod-6b4f75bb68-l8kn5 1/1 Running 0 145m
Check Pod Mem Usage Using Top:
NAME CPU(cores) MEMORY(bytes)
pod-6b4f75bb68-l8kn5 64m 1184Mi
Check Pod Mem Usage Using cgroup Memory:
2002MiB
Check Pod Last State:
NAME FINISHED
pod-6b4f75bb68-l8kn5 <none>
----------------------------------
Timestamp: Mon Sep 11 18:06:13 PDT 2023
Check for Pod:
NAME READY STATUS RESTARTS AGE
pod-6b4f75bb68-l8kn5 1/1 Running 0 145m
Check Pod Mem Usage Using Top:
NAME CPU(cores) MEMORY(bytes)
pod-6b4f75bb68-l8kn5 64m 1184Mi
Check Pod Mem Usage Using cgroup Memory:
3551MiB
Check Pod Last State:
NAME FINISHED
pod-6b4f75bb68-l8kn5 <none>
----------------------------------
Timestamp: Mon Sep 11 18:06:26 PDT 2023
Check for Pod:
NAME READY STATUS RESTARTS AGE
pod-6b4f75bb68-l8kn5 1/1 Running 0 145m
Check Pod Mem Usage Using Top:
NAME CPU(cores) MEMORY(bytes)
pod-6b4f75bb68-l8kn5 64m 1184Mi
Check Pod Mem Usage Using cgroup Memory:
4694MiB
Check Pod Last State:
NAME FINISHED
pod-6b4f75bb68-l8kn5 <none>
----------------------------------
Timestamp: Mon Sep 11 18:06:40 PDT 2023
Check for Pod:
NAME READY STATUS RESTARTS AGE
pod-6b4f75bb68-l8kn5 1/1 Running 1 (10s ago) 146m
Check Pod Mem Usage Using Top:
NAME CPU(cores) MEMORY(bytes)
pod-6b4f75bb68-l8kn5 1577m 3573Mi
Check Pod Mem Usage Using cgroup Memory:
789MiB
Check Pod Last State:
NAME FINISHED
pod-6b4f75bb68-l8kn5 Error
Here we see between: 18:05:45 and 18:06:26 (41 seconds) the memory was spiking. Going from 1480MiB to 4694MiB. The limit is 4696Mi (4Gi + 600Mi) and it was obviously breached and the kernel killed it.
So the problem is an extreme memory spike that needs to be looked into. However Kuberentes kubectl top
, GKE Metrics, Prometheus all never reported the memory accurately as from the 60s rate, it wasn't ever noticed.