Today's data centers are serving more users and providing more high-bandwidth services than ever. With this incredible growth in cloud-based computing comes increased demand for computing resources.
As more individuals, businesses, and governments come to rely on cloud computing, resource utilization—exploiting the full capacity of existing resources—has become a high-priority focus area for production engineers, to ensure the services they run and maintain continue working well for their users.
This guide introduces cgroup2, a Linux kernel component that provides a mechanism to isolate, measure, and control the distribution of resources for a collection of processes on a server. It also describes how cgroup2 is being deployed in production today, along with tools and strategies built around cgroups, to drive substantial resource control gains in large data centers.
cgroup enables the grouping and structuring of workloads, to control and limit the amount of system resources assigned to each.
One of the key use cases for cgroup is Isolating a core workload from background system resource needs. For example, you may want to isolate a web server from background system processes like metric collection. Or you may want to separate time-critical or latency-sensitive work from long-term asynchronous jobs.
Within a comprehensive resource control framework, it’s even possible to stack different workloads on the same system without resource conflicts.
The cgroup architecture is comprised of two main components – the cgroup core and a set of subsystem controllers for memory, CPU, IO, PID, and RDMA.
The core is where you group processes into logical units and define hierarchical relationships. All cgroups on a system form a single hierarchy or tree, comprised of the root cgroup with child cgroups and subtrees for controlling resource use of partitions, containers, and processes.
The core contains a pseudo-filesystem cgroupfs, where you organize processes in a set of interface files at:
The controllers manage distribution of their respective resources along the hierarchy according to their configuration.
For each type of resource controller enabled, a corresponding set of interface files is automatically created. These files are where you specify resource thresholds for memory, CPU, IO, etc. We'll look at these files in detail in the sections for each controller.
You’ll want to refer to the following docs for additional details and info:
Case study: The fbtax2 project
Throughout this guide, we'll look at a real-world case study at Facebook that used the concepts, strategies, and tools described here to chalk up significant resource control wins: the fbtax2 project.
The goal of the project was to use cgroup2 to isolate and protect a system's main workload from widely distributed system binaries and other system services that run on many Facebook hosts. The memory reserved for these system binaries was nicknamed the fbtax, which later became the name of the project.
In the following sections, we'll show how fbtax2 used the new cgroup2 features and related tools to score high-impact resource utilization gains: