During my study for EX280 I found Taints and Tolerations very hard to understand. This is because Taints and Tolerations flip the scheduling of pods the other way around. What I mean by this is the following. Traditionally you would specify on a server what the server would run and what not. But in OpenShift this is flipped around. Instead of creating a list on a Server that allows or disallows apps to run we “taint” the node wit a key=value
pair and a effect
and let the scheduler sort out which pods are assigned to the node. How do we get a specifiek pod to run on a “tainted” node? We do this by “tolerating” the taint. For example; If we would taint a node with the following location:westus
we could let a pod tolerate that this node is running at that location.
⚠️ FYI; this blog is written for Openshift but the theory is the same and most of the
oc
commands can be replaces withkubectl
But why?
Using taints is a great way to dynamically schedule workloads across different nodes with different key:value
pairs. A few examples could be:
- Nodes that are in a certain datacenter:
location=datacenter01:PreferNoSchedule
- Nodes that have a special CPU:
cpu=special:NoSchedule
- Nodes that we have a special place for in our heart:
specialNode=true:NoExecute
We can also combine different taints to get a specifiek selection of nodes that we want for our workload. Using taints can make scheduling your workloads a lot easier!
Understanding the effects
A taint consist of key=value:effect
. The effect are how the node will handle the taint1, if:
NoSchedule
is selected, Openshift wil not schedule pods on that node that do not tolerate the taint. Existing pods will keep running.PreferNoSchedule
is selected, Openshift wil try not to schedule pods on the node that do not tolerate the taint. This is a great way to use taints without ending up with underutilized nodes.NoExecute
is selected, Openshift will nog schedule pods that do not tolerate the taint. Openshift will also evict existing pods from the node. When youcordon
a node in Openshift it basically gets this taint and asks all pods to get out of there. If there are pods that tolerate this taint then the additional paramatertolerationSeconds
in the toleration can be parsed to allow the pod some time to shut down.
In practice
So now we know why, on to the how. You can follow along with these commands on a CRC instance or on a local K3s cluster. Don’t forget to change the command’s from oc
to kube
or kubectl
Tainting a node
Tainting a node is pretty easy. It’ done using the oc adm taint
command:
$ oc adm taint node crc-hsl9k-master-0 linux=good:NoSchedule
node/crc-hsl9k-master-0 tainted
So, now only pods that “tolerate” that linux=good
will be scheduled on this node.
Removing a taint
Untainting a node can easily be done by adding a -
after our taint command:
$ oc adm taint node crc-hsl9k-master-0 linux=good:NoSchedule-
node/crc-hsl9k-master-0 untainted
The taint in action
Let’s spin up a pod and see the effect of our taint:
$ oc new-project tainted-love
$ oc new-app --name i-like-linux --docker-image bitnami/nginx
In my case this results in a pending pod because CRC is a one (1) node cluster. Great for this example. If you run multiple nodes and want to replicate the effect you should taint them all. Be aware that this will prevent other pods from running (even pods used by operators).
The reason my pod is in pending?
$ oc describe pod i-like-linux-968b9cdbc-n5lb6
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 35s (x2 over 114s) default-scheduler 0/1 nodes are available: 1 node(s) had taint {linux: good}, that the pod didn't tolerate.
Adding a toleration to a pod
At the time of writing there is no easy CLI command to add a toleration to a pod or deployment. That means we will have to edit the yaml
of the deployment. You can do this live by using oc edit deployment DEPLOYMENT_NAME
or we can export the current config, edit it and send it back to the Cluster. Yeah, lets do that.
Exporting the deployment yaml
We will export the deployment using oc get
and the -o yaml
flag. In the past there was an easy way to get the yaml
without extra data but in this case we have to do it ourself:
$ oc get deployment i-like-linux -o yaml > tolerate_app.yaml
We can now edit the file and add our toleration. Look for the spec
of our container:
spec:
....
template:
....
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
deployment: i-like-linux
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
annotations:
openshift.io/generated-by: OpenShiftNewApp
creationTimestamp: null
labels:
deployment: i-like-linux
spec:
containers:
- image: bitnami/nginx@sha256:8f5062e816099c770d98613b95c86b4e1ac8d369712237a579fc3121225e55e2
imagePullPolicy: IfNotPresent
name: i-like-linux
ports:
- containerPort: 8443
protocol: TCP
- containerPort: 8080
protocol: TCP
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
....
And add a toleration like (in this case I will be adding it under the spec.spec.dnsPolicy
field):
spec:
....
template:
....
spec:
dnsPolicy: ClusterFirst
tolerations:
- key: linux
value: good
operator: Equal
effect: NoSchedule
....
And patch that back to the cluster:
$ oc apply -f tolerate_app.yaml
deployment.apps/i-like-linux configured
⚠️ FYI; If you want to edit the yaml again you will need to re-export it
After this our pod should be running
$ NAME READY STATUS RESTARTS AGE
i-like-linux-54b9b5fb7c-sndtx 1/1 Running 0 59s
How does this work
Taints and tolerations are matched by the scheduler of the cluster. In this case we gave the toleration of “Equal” (operator: Equal
) to match with a node that has the same taint (key=linux
and value=good
). The effect (NoSchedule
) must also match.
Wildcards
You can also create a wild card toleration on a pod. This is done with the following values:
spec:
....
template:
....
spec:
tolerations:
- effect: NoSchedule
operator: Exists
Or
spec:
....
template:
....
spec:
tolerations:
- key: linux
operator: Exists
In this case the Exits
operator will only check if the taint or effect existst on the node 2
There is even a way to tolerate all taints3 :
spec:
....
template:
....
spec:
tolerations:
- operator: "Exists"
Wraping up
Taints are a great way to make the scheduling of your pods more predicable. It allows you designate specifiek nodes for specifiek workloads based on your usecase. There is a lot more to do with taints like combining them or applying them dynamically to nodes but I will not cover that in this blog.
-
https://docs.openshift.com/container-platform/4.9/nodes/scheduling/nodes-scheduler-taints-tolerations.html ↩
-
https://docs.openshift.com/container-platform/4.9/nodes/scheduling/nodes-scheduler-taints-tolerations.html#nodes-scheduler-taints-tolerations-about_nodes-scheduler-taints-tolerations ↩
-
https://docs.openshift.com/container-platform/4.9/nodes/scheduling/nodes-scheduler-taints-tolerations.html#nodes-scheduler-taints-tolerations-all_nodes-scheduler-taints-tolerations ↩