Kubernetes Resources Management - QoS, Quota, and LimitRange
March 1, 2020/
Before Kubernetes, software applications were typically run standalone in a VM and use up all the resources. Operators and developers needed to carefully choose the size of the VM for running them. But in Kubernetes, pods/containers can run on any machine. This requires sharing resources with others. That is where the QoS (Quality of Service Classes) and Resource Quota comes in.
When you create a pod for your application, you can set requests and limits for CPU and memory for every container inside. Properly setting these values is the only way to instruct Kubernetes on how to reserve enough resources for your applications.
spec: containers: - image: k8s/hello-k8s name: hello-k8s resources: requests: cpu: 100m memory: 200Mi limits: cpu: 200m memory: 400Mi
Requests: The values are used for scheduling. It is the minimum amount of resources a container needs to run. The Pods will remain in "Pending" state if no node has enough resources.
Limits: The maximum amount for this kind of resource that the node will allow the containers to use.
A node can be overcommitted when it has pod scheduled that make no request, or when the sum of limits across all pods on that node exceeds the available machine capacity. In an overcommitted environment, the pods on the node may attempt to use more compute resources than the ones available at any given point in time.
When this occurs, the node must give priority to one container over another. Containers that have the lowest priority are terminated/throttle first. The entity used to make this decision is referred as the Quality of Service (QoS) Class.
|1 (highest)||Guaranteed||If limits and optionally requests are set (not equal to 0) for all resources and they are equal.|
|2||Burstable||If requests and optionally limits are set (not equal to 0) for all resources, and they are not equal|
|3 (lowest)||BestEffort||If requests and limits are not set for any of the resources|
Therefore, if the developer does not declare CPU/Memory requests and limits, the container will be terminated first. We should protect the critical pods in production projects by setting limits so they are classified as Guaranted. BestEffort or Burstable ppods should be used in developing projects only.
The administrator can set the Project Quota to restrict resource consumption. This has an additional effect; if you set a Memory request in the quota, then all pods need to set a Memory request in their definition. The new pod will not be scheduled and will remain pending if it tries to allocate more resources than the quota restriction.
A limit range is a policy to constrain resources by Pod or Container in a namespace. it can:
Make sure all nodes are in "Ready" state
Make sure no pod is in "Pending" Status
A good warning threshold would be (n-1)/n*100, where n is the number of nodes.
Over this threshold, you may not be able to reallocate your workloads in the rest of the nodes.
The OS Kernel invokes OOMKiller when Memory usage comes under pressure in the node.
CPU Pressure will restrain processes and affect their performance.
A warning threshold to notify the administrator that this node may have issues or be about to reach "Eviction Policies".
Add the following warning thresholds to notify the administrator that this node may not able to allocate new pods.
If n-1 nodes can not allocate new pods, then it is time to scale up or check whether the CPU/Memory requests are too high or not.
If the node runs out of disk, it will try to free docker space with a fair chance of pod eviction
Because Kubernetes limits are per container, not per pod. Therefore it is not necessary to monitor resources usage per pod.
Ideally, containers should use a similar amount of resources than the ones requested. If your usage is much lower than your request this will waste valuable resources and potentially will be too hard to allocate new pods. On the opposite case, usage is higher than resources, you might face performance issues.
It is important to make sure requests and limits are declared and tested before deploying to production. Cluster admins can set up a namespace quota to enforce all of the workloads in the namespace to have a request and limit in every container. A good configuration of requests and limits will make your applications much more stable.
Appropriate monitoring and alerts will help the cluster admin to reduce the waste of the cluster resources and avoid performance issue. Ask us today if you need help to monitor your Kubernetes system! :)
© 2017-2020 Darumatic Pty Ltd. All Rights Reserved.