Kubernetes – an introduction

I decided to start learning about Kubernetes (k8s) although I have not used it in production yet. These are my notes from the first part of the comprehensive book “Kubernetes in Action”. The second part gets into k8s’s internals. I will write about that as I read more of the book.

  • k8s abstracts away the individual machine-level details of a cluster of machines. The cluster’s hardware capacity is available as a collection of “containers” for developers to deploy and scale their apps. For more on containers themselves, see my post: Docker — a conceptual overview.

  • k8s also promises automatic restarts and relocation of apps on the cluster. To fulfil these ends, it encourages an opinionated usage of containers: only one process per container! But since many apps are not designed with this level of isolation in mind, k8s breaks the abstraction provided by containers via its notion of Pods — a pod is a group of containers that share resources (typically the network) and are always scheduled to run on the same machine. This makes it possible to run apps which launch multiple processes within the same container by scheduling them on different containers within the same pod.

  • k8s can also scale your app by launching new instances dynamically. To be precise, it launches an entire new Pod as needed. To achieve this, an additional level of indirection is wrapped around each pod, called Service.

  • A Service provides a permanent host:port endpoint to reach your app. The Service will load balance requests to your app across one or more underlying pods. The Service IP address is a “virtual IP” (you can’t ping it for example).

  • A Service can also be configured as “headless” in which case a DNS lookup returns the IP addresses of all its associated pods.

  • You can tag a collection of pods or services with a “label”. Certain commands can operate on all resource instances associated with a label. This can be used to mark resources as prod/dev/stage etc or any other categorization scheme you need. In fact, you can even label worker nodes, and then schedule pods on only nodes having that label.

  • Apart from Labels, k8s also provides “namespaces”, which is a scoping mechanism for identifiers (similar to programming languages). Namespaces help to split up large clusters into distinct parts so that both identifier and resource clashes are easily averted. This can also act as a way to implement resource usage quotas.

  • As mentioned earlier, k8s can check whether an app (to be precise: a pod) is still up and running. This can be customized to some extent via “Liveness Probes” that are defined for the pod. Typically, the probe is a HTTP or TCP endpoint that returns a static response or performs some operation and returns success if the operation succeeds. If the operation fails, k8s is free to restart the pod.

  • Closely related are “Readiness Probes” which indicate whether a pod is ready to service user requests. While a lack of Liveness will lead to a pod being restarted, a lack of Readiness just defers user requests. It is recommended that pods always have a readiness probe.

  • But what if an entire node crashes and takes down all its pods? This is where k8s provides a management component called the Replication Controller (RC). This component keeps checking whether all the desired pods of the cluster are up and running. In order to avail this feature, you should start your pod via the RC so that it is aware of it. Directly creating pods via the “kubectl” command will bypass the RC and is not recommended on production.

  • Using the RC is as simple as defining an RC with a matching label selector and the desired no. of replicas at the top of the k8s YAML file, followed by a pod template that also has the same label selector as the RC. Since pods can be (re)labelled when running, it’s possible to dynamically reassign them from one RC to another. Conversely the label selector of an RC can be edited dynamically as well.

  • ReplicaSets (RS) are a newer alternative to ReplicaControllers and the preferred method now. RS have more expressive label selectors compared to RCs. The expression matcher section of an RS definition is an ugly version of a SQL WHERE clause.

  • A DaemonSet is a controller that runs one instance of the specified pod template on all nodes in the cluster (or the nodes matching the label selector). Useful for running system level processes.

  • A “Job” is a kind of pod that runs one or more times (either as a single instance or in parallel) but is stopped after finishing successfully. Useful to run batch jobs. There is a CronJob variant as well.

  • A “Service” is a single, consistent entry point (host:port) to a group of pods providing some service. Since pods are ephemeral and can be relocated, Service becomes necessary. Internally, k8s extracts the pod related info to a separate Endpoints resource which is what is used when clients try to connect.

  • A Service is linked to its pods by using the pod’s label selectors. k8s also generates env vars corresponding to each(?) service on all the nodes. Eg: “service-1” generates SERVICE_1_SERVICE_HOST and SERVICE_1_SERVICE_PORT. Pods can use this to discover services.

  • But a better way is to use k8s’s inbuilt DNS mechanism. Broadly, the pod’s /etc/resolv.conf gets filled in with relevant info.

  • There are a variety of mechanisms for exposing k8s services to nodes outside the cluster (NodePort, LoadBalancer and Ingress Resources). These mechanisms may integrate with the proprietary infrastructure provided by the cloud providers.

  • Volumes” are k8s way of handling data storage. You can mount the same storage subsystem on multiple different pods. Persistent Volumes (PVs) are used to persist data. A variety of storage tech is supported: GCP, AWS, NFS etc. k8s admins can create PVs and make them available to users. This decouples pods from having to specify details of the storage tech. In this case, pods point to a separate PersistentVolumeClaim (PVC) resource instead of directly to the volume.

  • In practice, one more level of indirection (and convenience) is used: k8s admins can enable PV creation on demand by setting up relevant Storage Classes (SCs). Pods refer only to these SCs and get the relevant storage allocated. This drastically simplifies the PV workflow for the org.

  • ConfigMaps and Secrets are the preferred way to handle configuration and secret management in k8s. But for config, there are a few basic mechanisms as well: overriding the command and its args specified in the Docker file, using the “env” tag in the YAML to pass along env vars, loading them from a file. ConfigMaps are a k8s resource that can be created separately and the pod spec can load a ConfigMap in any of the above ways.

  • Apps are updated to a new version using a k8s resource called Deployment. It provides a declarative way to roll out the new version, and is managed by the k8s master so it’s fault tolerant. Deployments create the necessary ReplicaSets behind the scenes. A Deployment is updated by just changing its associated container image, and specifying one out of a few rollout strategies (blue-green, rolling deploy etc). There is an “undo” command to roll back to the previous version. And a pause/resume feature for manual checks and canaries.

  • StatefulSets help a pod retain the same identity across restarts, where identity is the tuple (podname, hostname, attached storage). Note that this hostname is different from the physical node where the pod is restarted. StatefulSets work by i) giving a pod the same name each time it’s started, ii) updating a DNS entry for that “pod.hostname” to always point to the pod’s current IP address iii) generating pod-specific PersistentVolumeClaims by using a volume claim template instead of exactly identical claims by all the pods.

2 Comments

Leave a Reply

Your email address will not be published. Required fields are marked *