diff options
Diffstat (limited to 'Documentation')
-rw-r--r-- | Documentation/filesystems/proc.txt | 94 |
1 files changed, 57 insertions, 37 deletions
diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt index 8fe8895894d..cf1295c2bb6 100644 --- a/Documentation/filesystems/proc.txt +++ b/Documentation/filesystems/proc.txt @@ -33,7 +33,8 @@ Table of Contents 2 Modifying System Parameters 3 Per-Process Parameters - 3.1 /proc/<pid>/oom_adj - Adjust the oom-killer score + 3.1 /proc/<pid>/oom_adj & /proc/<pid>/oom_score_adj - Adjust the oom-killer + score 3.2 /proc/<pid>/oom_score - Display current oom-killer score 3.3 /proc/<pid>/io - Display the IO accounting fields 3.4 /proc/<pid>/coredump_filter - Core dump filtering settings @@ -1234,42 +1235,61 @@ of the kernel. CHAPTER 3: PER-PROCESS PARAMETERS ------------------------------------------------------------------------------ -3.1 /proc/<pid>/oom_adj - Adjust the oom-killer score ------------------------------------------------------- - -This file can be used to adjust the score used to select which processes -should be killed in an out-of-memory situation. Giving it a high score will -increase the likelihood of this process being killed by the oom-killer. Valid -values are in the range -16 to +15, plus the special value -17, which disables -oom-killing altogether for this process. - -The process to be killed in an out-of-memory situation is selected among all others -based on its badness score. This value equals the original memory size of the process -and is then updated according to its CPU time (utime + stime) and the -run time (uptime - start time). The longer it runs the smaller is the score. -Badness score is divided by the square root of the CPU time and then by -the double square root of the run time. - -Swapped out tasks are killed first. Half of each child's memory size is added to -the parent's score if they do not share the same memory. Thus forking servers -are the prime candidates to be killed. Having only one 'hungry' child will make -parent less preferable than the child. - -/proc/<pid>/oom_score shows process' current badness score. - -The following heuristics are then applied: - * if the task was reniced, its score doubles - * superuser or direct hardware access tasks (CAP_SYS_ADMIN, CAP_SYS_RESOURCE - or CAP_SYS_RAWIO) have their score divided by 4 - * if oom condition happened in one cpuset and checked process does not belong - to it, its score is divided by 8 - * the resulting score is multiplied by two to the power of oom_adj, i.e. - points <<= oom_adj when it is positive and - points >>= -(oom_adj) otherwise - -The task with the highest badness score is then selected and its children -are killed, process itself will be killed in an OOM situation when it does -not have children or some of them disabled oom like described above. +3.1 /proc/<pid>/oom_adj & /proc/<pid>/oom_score_adj- Adjust the oom-killer score +-------------------------------------------------------------------------------- + +These file can be used to adjust the badness heuristic used to select which +process gets killed in out of memory conditions. + +The badness heuristic assigns a value to each candidate task ranging from 0 +(never kill) to 1000 (always kill) to determine which process is targeted. The +units are roughly a proportion along that range of allowed memory the process +may allocate from based on an estimation of its current memory and swap use. +For example, if a task is using all allowed memory, its badness score will be +1000. If it is using half of its allowed memory, its score will be 500. + +There is an additional factor included in the badness score: root +processes are given 3% extra memory over other tasks. + +The amount of "allowed" memory depends on the context in which the oom killer +was called. If it is due to the memory assigned to the allocating task's cpuset +being exhausted, the allowed memory represents the set of mems assigned to that +cpuset. If it is due to a mempolicy's node(s) being exhausted, the allowed +memory represents the set of mempolicy nodes. If it is due to a memory +limit (or swap limit) being reached, the allowed memory is that configured +limit. Finally, if it is due to the entire system being out of memory, the +allowed memory represents all allocatable resources. + +The value of /proc/<pid>/oom_score_adj is added to the badness score before it +is used to determine which task to kill. Acceptable values range from -1000 +(OOM_SCORE_ADJ_MIN) to +1000 (OOM_SCORE_ADJ_MAX). This allows userspace to +polarize the preference for oom killing either by always preferring a certain +task or completely disabling it. The lowest possible value, -1000, is +equivalent to disabling oom killing entirely for that task since it will always +report a badness score of 0. + +Consequently, it is very simple for userspace to define the amount of memory to +consider for each task. Setting a /proc/<pid>/oom_score_adj value of +500, for +example, is roughly equivalent to allowing the remainder of tasks sharing the +same system, cpuset, mempolicy, or memory controller resources to use at least +50% more memory. A value of -500, on the other hand, would be roughly +equivalent to discounting 50% of the task's allowed memory from being considered +as scoring against the task. + +For backwards compatibility with previous kernels, /proc/<pid>/oom_adj may also +be used to tune the badness score. Its acceptable values range from -16 +(OOM_ADJUST_MIN) to +15 (OOM_ADJUST_MAX) and a special value of -17 +(OOM_DISABLE) to disable oom killing entirely for that task. Its value is +scaled linearly with /proc/<pid>/oom_score_adj. + +Writing to /proc/<pid>/oom_score_adj or /proc/<pid>/oom_adj will change the +other with its scaled value. + +Caveat: when a parent task is selected, the oom killer will sacrifice any first +generation children with seperate address spaces instead, if possible. This +avoids servers and important system daemons from being killed and loses the +minimal amount of work. + 3.2 /proc/<pid>/oom_score - Display current oom-killer score ------------------------------------------------------------- |