No description
  • Smarty 52.6%
  • Shell 24.5%
  • Makefile 22.9%
Find a file
2025-10-09 00:44:02 +01:00
templates Bugfix 2025-10-01 15:25:10 +01:00
.helmignore First version 2025-09-30 18:06:44 +01:00
Chart.yaml Add more documentation 2025-09-30 19:18:37 +01:00
crashloop-example.yaml Add support for deleting crashlooping pods in _all_ namespaces 2025-09-30 18:07:12 +01:00
delete-crashlooping-pods Add support for deleting crashlooping pods in _all_ namespaces 2025-09-30 18:07:12 +01:00
LICENSE Add more documentation 2025-09-30 19:18:37 +01:00
list-crashlooping-pods Add support for deleting crashlooping pods in _all_ namespaces 2025-09-30 18:07:12 +01:00
Makefile Proper defaults for local dev 2025-10-09 00:44:02 +01:00
README.md Add more documentation 2025-09-30 19:18:37 +01:00
values.yaml Add more documentation 2025-09-30 19:18:37 +01:00

Crashlooping Pod Deleter

The problem:

  • You have a pod (or more pods) which are in state CrashLooping - kubernetes keeps restarting them because the fail, but restarting does not appear to have any effect.

  • You cannot fix the underlying problem. Or at least: Not yet. Perhaps there is an underlying problem outside of your control.

  • You "know" that actually deleting the pod and letting Kubernetes create a new replacement pod is likely to fix the problem.

  • Manually finding (and restarting) such pods is cumbersome. Sure: You can write a script to do it, but you still need to get notified that pods are crashlooping, find out which ones and then run that script. Possibly in the middle of night, probably while you are in the middle of something else. It gets annoying.

This helm chart will create a job to do that automatically.

Note that this will not solve the underlying problem of "why are the pods crashlooping in the first place" - that is still up to you. But it can make life easier until you sort that out.

Neither is there any guarantee that deleting the pod and letting Kubernetes start a new pod will actually (even temporarily) solve the problem. But it is a decent first attempt, especially if your priority is to keep a system running.

Installation

For most installations, the default values.yaml should suffice: It will take care of deleting CrashLooping pods in the installation namespace every 10 minutes.

The main ways to tweak the behaviour are:

  • selector: A selector (same as if given to kubectl get pod --selector ...) to limit which pods are examined. By default, all pods are examined.

  • allNamespaces: whether to examine pods in all namespaces, rather than just the installation namespace.

If you have more complicated setups: You can always install multiple instances of this helm chart to cover different combinations.

Why Do Pods CrashLoop ?

There can be a plethora of reasons why pods end up crashlooping - it is generally very application-specific.

Your first step would be to run kubectl describe pod ... to get an idea of why kubernetes decided to restart a container in the pod. If

The key thing to remember is that Kubernetes will restart individual containers in a pod when the container fails. It is not the pod which is restarted. And the restart is (basically) just done by making the container runtime restart the process (or processes) the container runs. Most of the time this is perfectly adequate. But there are corner cases where it is not.

There are some common patterns:

  • Multiple initContainers which have dependencies on things running in a certain order. When you have multiple initContainers, Kubernetes does not guarantee the order in which they are started; it will generally try to start them as soon as possible, but this does not guarantee ordering. And if one container somehow depends on the other container not having done something yet, the non-deterministic nature of computer systems can get in the way.

  • If the container requires access to a disk device (e.g. /dev/sdb or similar) which gets disconnected (e.g. via a USB bus reset), the device is "lost" as far as the pod is concerned.

  • And sometimes the application is just terminally broken, and needs a "bigger kick" than merely restarting a container.

Enjoy!

Karl E. Jørgensen karl@jorgensen.org.uk