Liveness and Readiness Probes for Kubernetes in Phoenix application
How to implement healthchecks in Elixir for Docker containers?
Many applications running for long periods of time eventually transition to broken states, and cannot recover except by being restarted. When you run them inside Kubernetes, it provides liveness probes to detect and remedy such situations. Moreover, if your Container needs to work on loading large data, configuration files, or migrations during startup, specify a readiness probe.
Liveness vs Readiness Probes
Before we begin, let’s have a little bit of theory here.
kubelet
A kubelet
is an agent that runs on each node in the cluster. It makes sure containers are running in a pod but it doesn’t manage containers which were not created by Kubernetes.
It takes a set of PodSpecs (as e.g. YAML files) and ensures that the containers described there are running and healthy. kubelet
has basically one job: given a set of containers to run, make sure they are all running.
liveness
The kubelet
uses liveness probes to know when to restart a Container. For example, liveness probes could catch a deadlock, where an application is running, but unable to make progress. Restarting a Container in such a state can help to make the application more available despite bugs.
readiness
The kubelet
uses readiness probes to know when a Container is ready to start accepting traffic. A Pod is considered ready when all of its Containers are ready. One use of this signal is to control which Pods are used as backends for Services. When a Pod is not ready, it is removed from Service load balancers.
A side note: both of these healthchecks can define initialDelaySeconds
. If this is undefined, they will start counting at the same time, as soon as a pod is scheduled and created. If you want livenessProbe
to start after the readinessProbe
(i.e. wait enough time for readiness to be likely verified first), you will need to adjust their initialDelaySeconds
.
Kubernetes
The basic YAML template for these probes is really very simple:
readinessProbe:
httpGet:
path: /health/ready
port: 3000
livenessProbe:
httpGet:
path: /health/alive
port: 3000
You just define path
and port
for HTTP healthchecks. As I said previously, you can also provide configuration for them like:
initialDelaySeconds
: Number of seconds after the container has started before liveness or readiness probes are initiated.periodSeconds
: How often (in seconds) to perform the probe. Default to 10 seconds. The minimum value is 1.timeoutSeconds
: Number of seconds after which the probe times out. Defaults to 1 second. The minimum value is 1.successThreshold
: Minimum consecutive successes for the probe to be considered successful after having failed. Defaults to 1. Must be 1 for liveness. The minimum value is 1.failureThreshold
: When a Pod starts and the probe fails, Kubernetes will tryfailureThreshold
times before giving up. Giving up in case of liveness probe means restarting the Pod. In case of readiness probe, the Pod will be marked Unready. Defaults to 3. The minimum value is 1.
What is more, httpGet
probes have additional fields:
host
: Hostname to connect to, defaults to the pod IP.scheme
: Scheme to use for connecting to the host (HTTP or HTTPS). Defaults to HTTP.path
: Path to access on the HTTP server.httpHeaders
: Custom headers to set in the request.port
: Name or number of the port to access on the container. The number must be in the range of 1 to 65535.
What is even more, there are not only httpGet
probes but also TCP
and command
ones but I’ve already told you more than it’s important for the scope of this article.
Elixir
Once we know what Liveness and Readiness Probes are, and since we are able to define them for Kubernetes, let’s finally implement them in our code.
For that purpose, we will leverage plug
library. Plugs are composable middlewares mounted in Controllers, as a part of a Router
or defined for the entire Endpoint
.
Forwarding
The very first approach I suggest is to leverage forward/2
from Plug.Router
.
With a simple function, you forward all requests on the given path to the particular Plug, in our case it’s a liveness probe.
Mounting
The other way you may use is to directly mount a Plug inside your router.
It’s basically as simple as that, there’s no need for any additional configuration, assuming your Plug will handle requests correctly.
Configuration
I always like to have my libraries/dependencies configurable. Thus, we can provide both the liveness path and response in our configuration:
Later on, in our module, we can fetch them like that:
What we do here is fetching our configuration with some defaults if nothing found, then, we are turning opts
list into a map with specific keys which will be used in call/2
function too.
Keep in mind, when including an external library, its configuration must be provided in a project’s own config.exs
file.
Response
Finally, we have to implement a simple function that will allow responding to the healthcheck. The response will be simply 200 "OK"
but can be configurable of course.
As you can see, we are checking what is the actual request_path
in the incoming conn
. If it matches the one configured previously (either via config.exs
or by path
option), we are halting the connection and return a successful response. Otherwise, we pass the request through.
If you want to see more usage examples and the readiness probe definition, the entire code is available here:
You can use it as a dependency for your project and include both healthchecks in your application’s Router.
Subscribe to get the latest content immediately
https://tinyletter.com/KamilLelonek
Summary
To sum up, I’d like to share with you some best practices regarding Liveness and Readiness Probes.
- Avoid checking dependencies in liveness probes. Liveness probes should be inexpensive and have response times with minimal variance.
- The
initialDelaySeconds
parameter should be longer than maximum initialization time for the container. - Regularly restart containers to exercise startup dynamics and to avoid unexpected behavioral changes during initialization.
- If the container evaluates a shared dependency in the Readiness probe, set its timeout longer than the maximum response time for that dependency.
I hope you will find these Plug
s useful and leverage them in your own applications.
Further reading:
- https://medium.com/metrosystemsro/kubernetes-readiness-liveliness-probes-best-practices-86c3cd9f0b4a
- https://blog.colinbreck.com/kubernetes-liveness-and-readiness-probes-how-to-avoid-shooting-yourself-in-the-foot/
- https://cloud.google.com/blog/products/gcp/kubernetes-best-practices-setting-up-health-checks-with-readiness-and-liveness-probes