Recently I have faced a situation where one of the services I work within a Kubernetes cluster was getting restarted once an hour on average. Checking the logs obviously didn’t lead to anything useful. Except seeing that the JVM process suddenly got killed with no further error. To make the matter worse I observed that the issue was happening on random occasion. For instance, one time the container has failed after five minutes, another time it ran for a couple of hours.
By checking the Kubernetes pod status, I have found out that at some point the pod status has changed from Running
OOM Killed
/heapdump
in the actuator if it is not disabled.
Analyzing the heap dump didn’t shed any lights on the issue. I repeated analyzing various heap dumps multiple times. Each time I got more confused and couldn’t reach to a sensible conclusion.
As the last resort, before giving up I took a quick look at the pod configuration and saw this:
jvm:
minHeapSize: 256m
maxHeapSize: 256m
resources:
cpu: 300m
memory: 512Mi
So basically, pod runs on 512 MB of RAM which half of is for JVM. That rang a bell, Linux might have killed the process. To confirm my hypothesis I had a look at the system log, dmesg
and found out this:
[4368955.387661] Memory cgroup out of memory: Kill process 4883 (java) score 11 or sacrifice child
[4368955.393077] Killed process 4883 (java) total-vm:1925104kB, anon-rss:516924kB, file-rss:10852kB, shmem-rss:0kB
[4368955.966522] oom_reaper: reaped process 4883 (java), now anon-rss:0kB, file-rss:4kB, shmem-rss:0kB
So what’s the problem?
Well, Linux kernel Out Of Memory Management
OOM Killer
which activates when the kernel faces severe lower memory condition. The kernel then literally kills a process which is usually the one that uses the most of the memory.
Solution
The simplest fix is to increase the pod memory size and enable swap. Other than, if the application does not have the
References
- https://www.kernel.org/doc/gorman/html/understand/understand016.html
- https://plumbr.io/blog/memory-leaks/out-of-memory-kill-process-or-sacrifice-child
- https://stackoverflow.com/questions/726690/what-killed-my-process-and-why
- https://unix.stackexchange.com/questions/136291/will-linux-start-killing-my-processes-without-asking-me-if-memory-gets-short
- https://geekyhacker.com/2019/01/04/jvm-does-not-release-memory/
Inline/featured images credits
- Featured image by PublicDomainPictures from Pixabay