Liveness and Readiness Probes for Kubernetes in Phoenix application

How to implement healthchecks in Elixir for Docker containers?

Kamil Lelonek
Kamil Lelonek  - Software Engineer

--

Image result for kubernetes health check

Many applications running for long periods of time eventually transition to broken states, and cannot recover except by being restarted. When you run them inside Kubernetes, it provides liveness probes to detect and remedy such situations. Moreover, if your Container needs to work on loading large data, configuration files, or migrations during startup, specify a readiness probe.

Liveness vs Readiness Probes

Before we begin, let’s have a little bit of theory here.

kubelet

A kubelet is an agent that runs on each node in the cluster. It makes sure containers are running in a pod but it doesn’t manage containers which were not created by Kubernetes.

It takes a set of PodSpecs (as e.g. YAML files) and ensures that the containers described there are running and healthy. kubelet has basically one job: given a set of containers to run, make sure they are all running.

liveness

The kubelet uses liveness probes to know when to restart a Container. For example, liveness probes could catch a deadlock, where an application is running, but unable to make progress. Restarting a Container in such a state can help to make the application more available despite bugs.

readiness

The kubelet uses readiness probes to know when a Container is ready to start accepting traffic. A Pod is considered ready when all of its Containers are ready. One use of this signal is to control which Pods are used as backends for Services. When a Pod is not ready, it is removed from Service load balancers.

A side note: both of these healthchecks can define initialDelaySeconds. If this is undefined, they will start counting at the same time, as soon as a pod is scheduled and created. If you want livenessProbe to start after the readinessProbe (i.e. wait enough time for readiness to be likely verified first), you will need to adjust their initialDelaySeconds.

Kubernetes

The basic YAML template for these probes is really very simple:

readinessProbe:
httpGet:
path: /health/ready
port: 3000
livenessProbe:
httpGet:
path: /health/alive
port: 3000

You just define path and port for HTTP healthchecks. As I said previously, you can also provide configuration for them like:

  • initialDelaySeconds: Number of seconds after the container has started before liveness or readiness probes are initiated.
  • periodSeconds: How often (in seconds) to perform the probe. Default to 10 seconds. The minimum value is 1.
  • timeoutSeconds: Number of seconds after which the probe times out. Defaults to 1 second. The minimum value is 1.
  • successThreshold: Minimum consecutive successes for the probe to be considered successful after having failed. Defaults to 1. Must be 1 for liveness. The minimum value is 1.
  • failureThreshold: When a Pod starts and the probe fails, Kubernetes will try failureThreshold times before giving up. Giving up in case of liveness probe means restarting the Pod. In case of readiness probe, the Pod will be marked Unready. Defaults to 3. The minimum value is 1.

What is more, httpGet probes have additional fields:

  • host: Hostname to connect to, defaults to the pod IP.
  • scheme: Scheme to use for connecting to the host (HTTP or HTTPS). Defaults to HTTP.
  • path: Path to access on the HTTP server.
  • httpHeaders: Custom headers to set in the request.
  • port: Name or number of the port to access on the container. The number must be in the range of 1 to 65535.

What is even more, there are not only httpGet probes but also TCP and command ones but I’ve already told you more than it’s important for the scope of this article.

Elixir

Once we know what Liveness and Readiness Probes are, and since we are able to define them for Kubernetes, let’s finally implement them in our code.

For that purpose, we will leverage plug library. Plugs are composable middlewares mounted in Controllers, as a part of a Router or defined for the entire Endpoint.

Forwarding

The very first approach I suggest is to leverage forward/2 from Plug.Router.

With a simple function, you forward all requests on the given path to the particular Plug, in our case it’s a liveness probe.

Mounting

The other way you may use is to directly mount a Plug inside your router.

It’s basically as simple as that, there’s no need for any additional configuration, assuming your Plug will handle requests correctly.

Configuration

I always like to have my libraries/dependencies configurable. Thus, we can provide both the liveness path and response in our configuration:

Later on, in our module, we can fetch them like that:

What we do here is fetching our configuration with some defaults if nothing found, then, we are turning opts list into a map with specific keys which will be used in call/2 function too.

Keep in mind, when including an external library, its configuration must be provided in a project’s own config.exs file.

Response

Finally, we have to implement a simple function that will allow responding to the healthcheck. The response will be simply 200 "OK" but can be configurable of course.

As you can see, we are checking what is the actual request_path in the incoming conn. If it matches the one configured previously (either via config.exs or by path option), we are halting the connection and return a successful response. Otherwise, we pass the request through.

If you want to see more usage examples and the readiness probe definition, the entire code is available here:

You can use it as a dependency for your project and include both healthchecks in your application’s Router.

Subscribe to get the latest content immediately
https://tinyletter.com/KamilLelonek

Summary

To sum up, I’d like to share with you some best practices regarding Liveness and Readiness Probes.

  • Avoid checking dependencies in liveness probes. Liveness probes should be inexpensive and have response times with minimal variance.
  • The initialDelaySeconds parameter should be longer than maximum initialization time for the container.
  • Regularly restart containers to exercise startup dynamics and to avoid unexpected behavioral changes during initialization.
  • If the container evaluates a shared dependency in the Readiness probe, set its timeout longer than the maximum response time for that dependency.

I hope you will find these Plugs useful and leverage them in your own applications.

--

--