Linux Administration: Setting Up Linux cgroups

cgroups (control groups) is a Linux kernel feature to limit, account and isolate resource usage (CPU, memory, disk I/O, etc.) of process groups. In late 2007 it was merged to kernel version 2.6.24.

By using cgroups, system administrators gain fine-grained control over allocating, prioritizing, denying, managing, and monitoring system resources. Hardware resources can be smartly divided up among tasks and users, increasing overall efficiency [1].

Cgroups are organized hierarchically, like processes, and child cgroups inherit some of the attributes of their parents.

Red Hat Enterprise Linux 6 provides ten cgroup subsystems, listed below by name and function:

blkio — this subsystem sets limits on input/output access to and from block devices such as physical drives (disk, solid state, USB, etc.).
cpu — this subsystem uses the scheduler to provide cgroup tasks access to the CPU.
cpuacct — this subsystem generates automatic reports on CPU resources used by tasks in a cgroup.
cpuset — this subsystem assigns individual CPUs (on a multicore system) and memory nodes to tasks in a cgroup.
devices — this subsystem allows or denies access to devices by tasks in a cgroup.
freezer — this subsystem suspends or resumes tasks in a cgroup.
memory — this subsystem sets limits on memory use by tasks in a cgroup, and generates automatic reports on memory resources used by those tasks.
net_cls — this subsystem tags network packets with a class identifier (classid) that allows the Linux traffic controller (tc) to identify packets originating from a particular cgroup task.
net_prio — this subsystem provides a way to dynamically set the priority of network traffic per network interface.
ns — the namespace subsystem.

The easiest way to work with cgroups is to install the libcgroup package, which contains a number of cgroup-related command line utilities and their associated man pages, such as the cgconfig service. It is also possible to mount hierarchies and set cgroup parameters (non-persistently) using shell commands and utilities available on any system. You can then save all the changes in a config file using the cgsnapshot utility.

	[root@server1 ~]# yum install libcgroup
	[root@server1 ~]# service cgconfig start
	Starting cgconfig service: [ OK ]

view raw gistfile1.sh hosted with ❤ by GitHub

This creates a virtual file system mounted at /cgroup containing all the sybsystems:

	[root@server1 ~]# ls -la /cgroup/
	total 12
	drwxr-xr-x 10 root root 4096 Jul 3 15:51 .
	drwxr-xr-x. 23 root root 4096 Jul 3 15:33 ..
	drwxr-xr-x 4 root root 0 Jul 3 15:51 blkio
	drwxr-xr-x 2 root root 0 Jul 3 15:51 cpu
	drwxr-xr-x 2 root root 0 Jul 3 15:51 cpuacct
	drwxr-xr-x 2 root root 0 Jul 3 15:51 cpuset
	drwxr-xr-x 2 root root 0 Jul 3 15:51 devices
	drwxr-xr-x 2 root root 0 Jul 3 15:51 freezer
	drwxr-xr-x 2 root root 0 Jul 3 15:51 memory
	drwxr-xr-x 2 root root 0 Jul 3 15:51 net_cls

view raw gistfile1.sh hosted with ❤ by GitHub

File: gistfile1.sh ------------------ [root@server1 ~]# ls -la /cgroup/ total 12 drwxr-xr-x 10 root root 4096 Jul 3 15:51 . drwxr-xr-x. 23 root root 4096 Jul 3 15:33 .. drwxr-xr-x 4 root root 0 Jul 3 15:51 blkio drwxr-xr-x 2 root root 0 Jul 3 15:51 cpu drwxr-xr-x 2 root root 0 Jul 3 15:51 cpuacct drwxr-xr-x 2 root root 0 Jul 3 15:51 cpuset drwxr-xr-x 2 root root 0 Jul 3 15:51 devices drwxr-xr-x 2 root root 0 Jul 3 15:51 freezer drwxr-xr-x 2 root root 0 Jul 3 15:51 memory drwxr-xr-x 2 root root 0 Jul 3 15:51 net_cls

There are two main configuration files in /etc - cgconfig.conf and cgrules.conf.

cgconfig.conf is the configuration file used by libcgroup to define control groups, their parameters and also mount points. The file consists of mount and group sections.

cgrules.conf configuration file is used by libcgroups to define the control groups to which the process belongs to. The file contains list of rules which assign to a defined group/user a control group in a subsystem.

For more information on the cgroups hierarchy and subsystems please refer to [2].

Now let's get our hands dirty by implementing cgroups that limit how much I/O and CPU cycles two processes running on the same system can use.

Scenario 1 - Limiting I/O throughput

Let's assume that we have two applications running on a server that are heavily I/O bound - app1 and app2. We would like to give more bandwidth to app1 during the day and to app2 during the night. This type of I/O throughput prioritization can be achieved by using the blkio subsystem.

In the following example I'll show how to do this by manually creating the file tree and then creating a persistent config file out of that.

1. Attach the blkio subsystem to the /cgroup/blkio/ cgroup if not already attached:

	[root@server1 ~]# mkdir /cgroup/blkio
	[root@server1 ~]# mount -t cgroup -o blkio blkio /cgroup/blkio

view raw gistfile1.sh hosted with ❤ by GitHub

2. Create a high and low priority cgroup:

	[root@server1 ~]# mkdir /cgroup/blkio/high_io
	[root@server1 ~]# mkdir /cgroup/blkio/low_io

view raw gistfile1.sh hosted with ❤ by GitHub

This can also be achieved by running:

[root@server1 ~]# cgcreate -g high_io,low_io:/blkio

view raw gistfile1.sh hosted with ❤ by GitHub

3. Acquire the PIDs of the processes that represent both running applications and move them to their specific cgroup:

	[root@server1 ~]# pidoff app1 \| while read pid; do echo $pid >> /cgroups/blkio/high_io/tasks; done
	[root@server1 ~]# pidoff app2 \| while read pid; do echo $pid >> /cgroups/blkio/low_io/tasks; done

view raw gistfile1.sh hosted with ❤ by GitHub

File: gistfile1.sh ------------------ [root@server1 ~]# pidoff app1 | while read pid; do echo $pid >> /cgroups/blkio/high_io/tasks; done [root@server1 ~]# pidoff app2 | while read pid; do echo $pid >> /cgroups/blkio/low_io/tasks; done

Alternatively, if the applications are not yet running, or are not controlled by the daemon() function from /etc/init.d/functions, you can start them manually and add them to the cgroups at the same time by using the cgexec utility:

	[root@server1 ~]# cgexec -g blkio:high_io app1
	[root@server1 ~]# cgexec -g blkio:low_io app2

view raw gistfile1.sh hosted with ❤ by GitHub

4. Set a ratio of 10:1 for the high_io and low_io cgroups. Processes in those cgroups, will immediately use only the resources made available to them.

	[root@server1 ~]# echo 1000 > /cgroup/blkio/high_io/blkio.weight
	[root@server1 ~]# echo 100 > /cgroup/blkio/low_io/blkio.weight

view raw gistfile1.sh hosted with ❤ by GitHub

Another way to set subsystem parameters is to use the cgset utility:

	[root@server1 ~]# cgset -r blkio.weight=1000 high_io
	[root@server1 ~]# cgset -r blkio.weight=100 low_io

view raw gistfile1.sh hosted with ❤ by GitHub

In this example, the low priority cgroup permits the low priority application - app2 - to use only about 10% of the I/O operations, whereas the high priority cgroup permits the high priority application - app1 - to use about 90% of the I/O operations.

To make this reverse during the day/night cycle create a cron job that flips the values from step 4, and the I/O utilization will reflect the specified weights.

To make these changes persistent across reboots we can create a configuration file out of the file structure we created manually in the previous steps by using the cgsnapshot command:

	[root@server1 ~]# cgsnapshot -s -f ~/cgconfig_io.conf
	# Configuration file generated by cgsnapshot
	mount {
	cpuset = /cgroup/cpuset;
	cpu = /cgroup/cpu;
	cpuacct = /cgroup/cpuacct;
	memory = /cgroup/memory;
	devices = /cgroup/devices;
	freezer = /cgroup/freezer;
	net_cls = /cgroup/net_cls;
	blkio = /cgroup/blkio;
	}

	group low_io {
	blkio {
	blkio.throttle.write_iops_device="";
	blkio.throttle.read_iops_device="";
	blkio.throttle.write_bps_device="";
	blkio.throttle.read_bps_device="";
	blkio.reset_stats="";
	blkio.weight="100";
	blkio.weight_device="";
	}
	}

	group high_io {
	blkio {
	blkio.throttle.write_iops_device="";
	blkio.throttle.read_iops_device="";
	blkio.throttle.write_bps_device="";
	blkio.throttle.read_bps_device="";
	blkio.reset_stats="";
	blkio.weight="1000";
	blkio.weight_device="";
	}
	}

view raw gistfile1.sh hosted with ❤ by GitHub

File: gistfile1.sh ------------------ [root@server1 ~]# cgsnapshot -s -f ~/cgconfig_io.conf # Configuration file generated by cgsnapshot mount { cpuset = /cgroup/cpuset; cpu = /cgroup/cpu; cpuacct = /cgroup/cpuacct; memory = /cgroup/memory; devices = /cgroup/devices; freezer = /cgroup/freezer; net_cls = /cgroup/net_cls; blkio = /cgroup/blkio; } group low_io { blkio { blkio.throttle.write_iops_device=""; blkio.throttle.read_iops_device=""; blkio.throttle.write_bps_device=""; blkio.throttle.read_bps_device=""; blkio.reset_stats=""; blkio.weight="100"; blkio.weight_device=""; } } group high_io { blkio { blkio.throttle.write_iops_device=""; blkio.throttle.read_iops_device=""; blkio.throttle.write_bps_device=""; blkio.throttle.read_bps_device=""; blkio.reset_stats=""; blkio.weight="1000"; blkio.weight_device=""; } }

To list all cgroups on the system run:

	[root@server1 ~]# lscgroup
	cpuset:/
	cpu:/
	cpuacct:/
	memory:/
	devices:/
	freezer:/
	net_cls:/
	blkio:/
	blkio:/low_io
	blkio:/high_io

view raw gistfile1.sh hosted with ❤ by GitHub

If you need to start from scratch and clear the entire cgroup file system you can unmount the directory hierarchy or use:

	[root@server1 ~]# cgclear
	[root@server1 ~]# lscgroup
	cgroups can't be listed: Cgroup is not mounted

view raw gistfile1.sh hosted with ❤ by GitHub

It's important to mention that if app1 and app2 are services like httpd for example, then they would need to have a file entry in /etc/sysconfig/httpd.conf like so:

	[root@server1 ~]# cat /etc/sysconfig/httpd.conf
	CGROUP_DAEMON="blkio:/high_io"

view raw gistfile1.sh hosted with ❤ by GitHub

where CGROUP_DAEMON="subsystem:control_group" and the service has to use the daemon() function from /etc/init.d/functions to start the service (it can be started by the service utility).

Scenario 2 - Limiting memory and CPU usage

In this scenario let's assume we have two user groups, group1 and group2, with group1 requiring more memory and CPU allocation than group2.

To achieve this we can combine the cpu, cpuacct and memory subsystems into a single subsystem called cpu_and_mem.

1. In the /etc/cgconfig.conf file, configure the following subsystems to be mounted and cgroups to be created:

	[root@server1 ~]# vi /etc/cgconfig.conf

	mount {
	cpu = /cgroup/cpu_and_mem;
	cpuacct = /cgroup/cpu_and_mem;
	memory = /cgroup/cpu_and_mem;
	}

	group group1 {
	cpu {
	cpu.shares="800";
	}
	cpuacct {
	cpuacct.usage="0";
	}
	memory {
	memory.limit_in_bytes="4G";
	memory.memsw.limit_in_bytes="6G";
	}
	}

	group group2 {
	cpu {
	cpu.shares="200";
	}
	cpuacct {
	cpuacct.usage="0";
	}
	memory {
	memory.limit_in_bytes="2G";
	memory.memsw.limit_in_bytes="4G";
	}
	}

view raw gistfile1.sh hosted with ❤ by GitHub

File: gistfile1.sh ------------------ [root@server1 ~]# vi /etc/cgconfig.conf mount { cpu = /cgroup/cpu_and_mem; cpuacct = /cgroup/cpu_and_mem; memory = /cgroup/cpu_and_mem; } group group1 { cpu { cpu.shares="800"; } cpuacct { cpuacct.usage="0"; } memory { memory.limit_in_bytes="4G"; memory.memsw.limit_in_bytes="6G"; } } group group2 { cpu { cpu.shares="200"; } cpuacct { cpuacct.usage="0"; } memory { memory.limit_in_bytes="2G"; memory.memsw.limit_in_bytes="4G"; } }

When loaded, the above configuration file mounts the cpu, cpuacct, and memory subsystems to a single cpu_and_mem cgroup.

Then, it creates a hierarchy in cpu_and_mem which contains two cgroups: group1 and group2.

In each of these cgroups, custom parameters are set for each subsystem:

cpu — the cpu.shares parameter determines the share of CPU resources available to each process in all cgroups. Setting the parameter to 800 and 200 in group1 and group2 cgroups respectively means that processes started in these groups will split the resources with a 4:1 ratio. Note that when a single process is running, it consumes as much CPU as necessary no matter which cgroup it is placed in. The CPU limitation only comes into effect when two or more processes compete for CPU resources.
cpuacct — the cpuacct.usage="0" parameter is used to reset values stored in the cpuacct.usage and cpuacct.usage_percpu files. These files report total CPU time (in nanoseconds) consumed by all processes in a cgroup.
memory — the memory.limit_in_bytes parameter represents the amount of memory that is made available to all processes within a certain cgroup. In our example, processes started in the group1 cgroup have 4 GB of memory available and processes in the group2 group have 2 GB of memory available. The memory.memsw.limit_in_bytes parameter specifies the total amount of memory and swap space processes may use. Should a process in the group1 cgroup hit the 4 GB memory limit, it is allowed to use another 2 GB of swap space, thus totaling the configured 6 GB.

2. Since we are dealing with user and groups ID's we can leverage the cgrulesengd daemon.

cgrulesengd is a daemon, which distributes processes to control groups. When any process changes its effective UID or GID, cgrulesengd inspects list of rules loaded from grules.conf file and moves the process to the appropriate control group.

To define the rules which the cgrulesengd daemon uses to move processes to specific cgroups, configure the /etc/cgrules.conf in the following way:

	[root@server1 ~]# vi /etc/cgrules.conf

	#
	@group1 cpu,cpuacct,memory group1
	@group2 cpu,cpuacct,memory group2

view raw gistfile1.sh hosted with ❤ by GitHub

The above configuration creates rules that assign a specific system group (for example, @group1) the resource controllers it may use (for example, cpu, cpuacct, memory) and a cgroup (for example, group1) which contains all processes originating from that system group. In our example, when the cgrulesengd daemon, started via the service cgred start command, detects a process that is started by a user that belongs to the group1 system group, that process is automatically moved to the /cgroup/cpu_and_mem/group1/tasks file and is subjected to the resource limitations set in the group1 cgroup.

3. Start the cgconfig service to create the hierarchy of cgroups and set the needed parameters in all created cgroups:

	[root@server1 ~]# service cgconfig start
	Starting cgconfig service: [ OK ]

view raw gistfile1.sh hosted with ❤ by GitHub

Start the cgred service to let the cgrulesengd daemon detect any processes started in system groups configured in the /etc/cgrules.conf file:

	[root@server1 ~]# service cgred start
	Starting CGroup Rules Engine Daemon: [ OK ]

view raw gistfile1.sh hosted with ❤ by GitHub

4. To make all of the above changes persistent across reboots run:

	[root@server1 ~]# chkconfig cgconfig on
	[root@server1 ~]# chkconfig cgred on

view raw gistfile1.sh hosted with ❤ by GitHub

To test whether this setup works, execute a CPU or memory intensive process and observe the results, for example, using the top utility. To test the CPU resource management, execute the following dd command under each user in both group1 and group2:

[root@server1 ~]# dd if=/dev/zero of=/dev/null bs=1024k

view raw gistfile1.sh hosted with ❤ by GitHub

The above command reads the /dev/zero and outputs it to the /dev/null in chunks of 1024 KB. When the top utility is launched, you can see results similar to these:

	PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
	7806 kivanov 10 0 98m 1233 436 R 79.9 0.1 0:04.12 dd
	1503 wcarroll 10 0 98m 1211 436 R 21.9 0.1 0:03.32 dd

view raw gistfile1.sh hosted with ❤ by GitHub

File: gistfile1.sh ------------------ PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 7806 kivanov 10 0 98m 1233 436 R 79.9 0.1 0:04.12 dd 1503 wcarroll 10 0 98m 1211 436 R 21.9 0.1 0:03.32 dd

Resources:

[1] http://en.wikipedia.org/wiki/Cgroups

[2] https://access.redhat.com/knowledge/docs/en-US/Red_Hat_Enterprise_Linux/6/pdf/Resource_Management_Guide/Red_Hat_Enterprise_Linux-6-Resource_Management_Guide-en-US.pdf

Linux Administration

Pages

Setting Up Linux cgroups - Control Groups