memory
subsystem generates automatic reports on memory resources used by the tasks in a cgroup, and sets limits on memory use of those tasks:
Note
memory
subsystem uses 40 bytes of memory per physical page on x86_64 systems. These resources are consumed even if memory
is not used in any hierarchy. If you do not plan to use the memory
subsystem, you can disable it to reduce the resource consumption of the kernel.
memory
subsystem, open the /boot/grub/grub.conf
configuration file as root
and append the following text to the line that starts with the kernel keyword:
cgroup_disable=memory
/boot/grub/grub.conf
, see the Configuring the GRUB Boot Loader chapter in the Red Hat Enterprise Linux 6 Deployment Guide.
memory
subsystem for a single session, perform the following steps when starting the system:
cgroup_disable=memory
at the end of the line and press Enter to exit GRUB edit mode.
cgroup_disable=memory
enabled, memory
is not visible as an individually mountable subsystem and it is not automatically mounted when mounting all cgroups in a single hierarchy. Please note that memory
is currently the only subsystem that can be effectively disabled with cgroup_disable
to save resources. Using this option with other subsystems only disables their usage, but does not cut their resource consumption. However, other subsystems do not consume as much resources as the memory
subsystem.
memory
subsystem:
Table 3.2. Values reported by memory.stat
Statistic | Description |
---|---|
cache |
page cache, including tmpfs (shmem ), in bytes |
rss |
anonymous and swap cache, not including tmpfs (shmem ), in bytes |
mapped_file |
size of memory-mapped mapped files, including tmpfs (shmem ), in bytes |
pgpgin |
number of pages paged into memory |
pgpgout |
number of pages paged out of memory |
swap |
swap usage, in bytes |
active_anon |
anonymous and swap cache on active least-recently-used (LRU) list, including tmpfs (shmem ), in bytes |
inactive_anon |
anonymous and swap cache on inactive LRU list, including tmpfs (shmem ), in bytes |
active_file |
file-backed memory on active LRU list, in bytes |
inactive_file |
file-backed memory on inactive LRU list, in bytes |
unevictable |
memory that cannot be reclaimed, in bytes |
hierarchical_memory_limit |
memory limit for the hierarchy that contains the memory cgroup, in bytes |
hierarchical_memsw_limit |
memory plus swap limit for the hierarchy that contains the memory cgroup, in bytes |
hierarchical_memory_limit
and hierarchical_memsw_limit
has a counterpart prefixed total_
that reports not only on the cgroup, but on all its children as well. For example, swap
reports the swap usage by a cgroup and total_swap
reports the total swap usage by the cgroup and all its child groups.
memory.stat
, note how the various statistics inter-relate:
active_anon
+ inactive_anon
= anonymous memory + file cache for tmpfs
+ swap cache
active_anon
+ inactive_anon
≠ rss
, because rss
does not include tmpfs
.
active_file
+ inactive_file
= cache - size of tmpfs
k
or K
for kilobytes, m
or M
for megabytes, and g
or G
for gigabytes. For example, to set the limit to 1 gigabyte, execute:
~]# echo 1G > /cgroup/memory/lab1/memory.limit_in_bytes
memory.limit_in_bytes
to limit the root cgroup; you can only apply values to groups lower in the hierarchy.
-1
to memory.limit_in_bytes
to remove any existing limits.
k
or K
for kilobytes, m
or M
for megabytes, and g
or G
for gigabytes.
memory.memsw.limit_in_bytes
to limit the root cgroup; you can only apply values to groups lower in the hierarchy.
-1
to memory.memsw.limit_in_bytes
to remove any existing limits.
Important
memory.limit_in_bytes
parameter before setting the memory.memsw.limit_in_bytes
parameter: attempting to do so in the reverse order results in an error. This is because memory.memsw.limit_in_bytes
becomes available only after all memory limitations (previously set in memory.limit_in_bytes
) are exhausted.
memory.limit_in_bytes = 2G
and memory.memsw.limit_in_bytes = 4G
for a certain cgroup will allow processes in that cgroup to allocate 2 GB of memory and, once exhausted, allocate another 2 GB of swap only. The memory.memsw.limit_in_bytes
parameter represents the sum of memory and swap. Processes in a cgroup that does not have the memory.memsw.limit_in_bytes
parameter set can potentially use up all the available swap (after exhausting the set memory limitation) and trigger an Out Of Memory situation caused by the lack of available swap.
memory.limit_in_bytes
and memory.memsw.limit_in_bytes
parameters are set in the /etc/cgconfig.conf
file is important as well. The following is a correct example of such a configuration:
memory { memory.limit_in_bytes = 1G; memory.memsw.limit_in_bytes = 1G; }
memory.limit_in_bytes
.
memory.memsw.limit_in_bytes
.
memory.limit_in_bytes
parameter. However, when the system detects memory contention or low memory, control groups are forced to restrict their consumption to their soft limits. To set the soft limit for example to 256 MB, execute:
~]# echo 256M > /cgroup/memory/lab1/memory.soft_limit_in_bytes
memory.limit_in_bytes
to represent units. To have any effect, the soft limit must be set below the hard limit. If lowering the memory usage to the soft limit does not solve the contention, cgroups are pushed back as much as possible to make sure that one control group does not starve the others of memory. Note that soft limits take effect over a long period of time, since they involve reclaiming memory for balancing between memory cgroups.
0
, empties memory of all pages used by tasks in the cgroup. This interface can only be used when the cgroup has no tasks. If memory cannot be freed, it is moved to a parent cgroup if possible. Use the memory.force_empty
parameter before removing a cgroup to avoid moving out-of-use page caches to its parent cgroup.
/proc/sys/vm/swappiness
for the system as a whole. The default value is 60
. Values lower than 60
decrease the kernel's tendency to swap out process memory, values greater than 60
increase the kernel's tendency to swap out process memory, and values greater than 100
permit the kernel to swap out pages that are part of the address space of the processes in this cgroup.
0
does not prevent process memory being swapped out; swap out might still happen when there is a shortage of system memory because the global virtual memory management logic does not read the cgroup value. To lock pages completely, use mlock()
instead of cgroups.
/proc/sys/vm/swappiness
.
memory.move_charge_at_immigrate
enabled, the pages associated with a task are taken from the old cgroup and charged to the new cgroup. The following example shows how to enable memory.move_charge_at_immigrate
:
~]# echo 1 > /cgroup/memory/lab1/memory.move_charge_at_immigrate
memory.move_charge_at_immigrate
, execute:
~]# echo 0 > /cgroup/memory/lab1/memory.move_charge_at_immigrate
0
or 1
) that specifies whether memory usage should be accounted for throughout a hierarchy of cgroups. If enabled (1
), the memory subsystem reclaims memory from the children of and process that exceeds its memory limit. By default (0
), the subsystem does not reclaim memory from a task's children.
0
or 1
) that enables or disables the Out of Memory killer for a cgroup. If enabled (0
), tasks that attempt to consume more memory than they are allowed are immediately killed by the OOM killer. The OOM killer is enabled by default in every cgroup using the memory
subsystem; to disable it, write 1
to the memory.oom_control
file:
~]# echo 1 > /cgroup/memory/lab1/memory.oom_control
memory.oom_control
file also reports the OOM status of the current cgroup under the under_oom
entry. If the cgroup is out of memory and tasks in it are paused, the under_oom
entry reports the value 1
.
memory.oom_control
file is capable of reporting an occurrence of an OOM situation using the notification API. For more information, refer to Section 2.13, “Using the Notification API” and Example 3.3, “OOM Control and Notifications”.
Example 3.3. OOM Control and Notifications
memory
subsystem to a hierarchy and create a cgroup:
~]#mount -t cgroup -o memory memory /cgroup/memory
~]#mkdir /cgroup/memory/blue
blue
cgroup can use to 100 MB:
~]# echo 104857600 > memory.limit_in_bytes
blue
directory and make sure the OOM killer is enabled:
~]#cd /cgroup/memory/blue
blue]#cat memory.oom_control
oom_kill_disable 0 under_oom 0
tasks
file of the blue
cgroup so that all other processes started in this shell are automatically moved to the blue
cgroup:
blue]# echo $$ > tasks
blue
cgroup runs out of free memory, the OOM killer kills the test program and reports Killed
to the standard output:
blue]# ~/mem-hog
Killed
#include <stdio.h> #include <stdlib.h> #include <string.h> #include <unistd.h> #define KB (1024) #define MB (1024 * KB) #define GB (1024 * MB) int main(int argc, char *argv[]) { char *p; again: while ((p = (char *)malloc(GB))) memset(p, 0, GB); while ((p = (char *)malloc(MB))) memset(p, 0, MB); while ((p = (char *)malloc(KB))) memset(p, 0, KB); sleep(1); goto again; return 0; }
blue]#echo 1 > memory.oom_control
blue]#~/mem-hog
under_oom
state of the cgroup has changed to indicate that the cgroup is out of available memory:
~]# cat /cgroup/memory/blue/memory.oom_control
oom_kill_disable 1
under_oom 1
#include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <sys/eventfd.h> #include <errno.h> #include <string.h> #include <stdio.h> #include <stdlib.h> static inline void die(const char *msg) { fprintf(stderr, "error: %s: %s(%d)\n", msg, strerror(errno), errno); exit(EXIT_FAILURE); } static inline void usage(void) { fprintf(stderr, "usage: oom_eventfd_test <cgroup.event_control> <memory.oom_control>\n"); exit(EXIT_FAILURE); } #define BUFSIZE 256 int main(int argc, char *argv[]) { char buf[BUFSIZE]; int efd, cfd, ofd, rb, wb; uint64_t u; if (argc != 3) usage(); if ((efd = eventfd(0, 0)) == -1) die("eventfd"); if ((cfd = open(argv[1], O_WRONLY)) == -1) die("cgroup.event_control"); if ((ofd = open(argv[2], O_RDONLY)) == -1) die("memory.oom_control"); if ((wb = snprintf(buf, BUFSIZE, "%d %d", efd, ofd)) >= BUFSIZE) die("buffer too small"); if (write(cfd, buf, wb) == -1) die("write cgroup.event_control"); if (close(cfd) == -1) die("close cgroup.event_control"); for (;;) { if (read(efd, &u, sizeof(uint64_t)) != sizeof(uint64_t)) die("read eventfd"); printf("mem_cgroup oom event received\n"); } return 0; }
mem_cgroup oom event received
string to the standard output.
blue
cgroup's control files as arguments:
~]$ ./oom_notification /cgroup/memory/blue/cgroup.event_control /cgroup/memory/blue/memory.oom_control
mem_hog
test program to create an OOM situation to see the oom_notification
program report it in the standard output:
blue]# ~/mem-hog