diff options
author | J. Bruce Fields <bfields@citi.umich.edu> | 2008-02-07 00:13:37 -0800 |
---|---|---|
committer | Linus Torvalds <torvalds@woody.linux-foundation.org> | 2008-02-07 08:42:17 -0800 |
commit | 9b8eae7248dad42091204f83ed3448e661456af1 (patch) | |
tree | 1e300d41f8aaa9c258c179024ba63799a79f5a6f /Documentation/sched-nice-design.txt | |
parent | d3cf91d0e201962a6367191e5926f5b0920b0339 (diff) |
Documentation: create new scheduler/ subdirectory
The top-level Documentation/ directory is unmanageably large, so we
should take any obvious opportunities to move stuff into subdirectories.
These sched-*.txt files seem an obvious easy case.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Cc: Ingo Molnar <mingo@elte.hu>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Diffstat (limited to 'Documentation/sched-nice-design.txt')
-rw-r--r-- | Documentation/sched-nice-design.txt | 108 |
1 files changed, 0 insertions, 108 deletions
diff --git a/Documentation/sched-nice-design.txt b/Documentation/sched-nice-design.txt deleted file mode 100644 index e2bae5a577e..00000000000 --- a/Documentation/sched-nice-design.txt +++ /dev/null @@ -1,108 +0,0 @@ -This document explains the thinking about the revamped and streamlined -nice-levels implementation in the new Linux scheduler. - -Nice levels were always pretty weak under Linux and people continuously -pestered us to make nice +19 tasks use up much less CPU time. - -Unfortunately that was not that easy to implement under the old -scheduler, (otherwise we'd have done it long ago) because nice level -support was historically coupled to timeslice length, and timeslice -units were driven by the HZ tick, so the smallest timeslice was 1/HZ. - -In the O(1) scheduler (in 2003) we changed negative nice levels to be -much stronger than they were before in 2.4 (and people were happy about -that change), and we also intentionally calibrated the linear timeslice -rule so that nice +19 level would be _exactly_ 1 jiffy. To better -understand it, the timeslice graph went like this (cheesy ASCII art -alert!): - - - A - \ | [timeslice length] - \ | - \ | - \ | - \ | - \|___100msecs - |^ . _ - | ^ . _ - | ^ . _ - -*----------------------------------*-----> [nice level] - -20 | +19 - | - | - -So that if someone wanted to really renice tasks, +19 would give a much -bigger hit than the normal linear rule would do. (The solution of -changing the ABI to extend priorities was discarded early on.) - -This approach worked to some degree for some time, but later on with -HZ=1000 it caused 1 jiffy to be 1 msec, which meant 0.1% CPU usage which -we felt to be a bit excessive. Excessive _not_ because it's too small of -a CPU utilization, but because it causes too frequent (once per -millisec) rescheduling. (and would thus trash the cache, etc. Remember, -this was long ago when hardware was weaker and caches were smaller, and -people were running number crunching apps at nice +19.) - -So for HZ=1000 we changed nice +19 to 5msecs, because that felt like the -right minimal granularity - and this translates to 5% CPU utilization. -But the fundamental HZ-sensitive property for nice+19 still remained, -and we never got a single complaint about nice +19 being too _weak_ in -terms of CPU utilization, we only got complaints about it (still) being -too _strong_ :-) - -To sum it up: we always wanted to make nice levels more consistent, but -within the constraints of HZ and jiffies and their nasty design level -coupling to timeslices and granularity it was not really viable. - -The second (less frequent but still periodically occuring) complaint -about Linux's nice level support was its assymetry around the origo -(which you can see demonstrated in the picture above), or more -accurately: the fact that nice level behavior depended on the _absolute_ -nice level as well, while the nice API itself is fundamentally -"relative": - - int nice(int inc); - - asmlinkage long sys_nice(int increment) - -(the first one is the glibc API, the second one is the syscall API.) -Note that the 'inc' is relative to the current nice level. Tools like -bash's "nice" command mirror this relative API. - -With the old scheduler, if you for example started a niced task with +1 -and another task with +2, the CPU split between the two tasks would -depend on the nice level of the parent shell - if it was at nice -10 the -CPU split was different than if it was at +5 or +10. - -A third complaint against Linux's nice level support was that negative -nice levels were not 'punchy enough', so lots of people had to resort to -run audio (and other multimedia) apps under RT priorities such as -SCHED_FIFO. But this caused other problems: SCHED_FIFO is not starvation -proof, and a buggy SCHED_FIFO app can also lock up the system for good. - -The new scheduler in v2.6.23 addresses all three types of complaints: - -To address the first complaint (of nice levels being not "punchy" -enough), the scheduler was decoupled from 'time slice' and HZ concepts -(and granularity was made a separate concept from nice levels) and thus -it was possible to implement better and more consistent nice +19 -support: with the new scheduler nice +19 tasks get a HZ-independent -1.5%, instead of the variable 3%-5%-9% range they got in the old -scheduler. - -To address the second complaint (of nice levels not being consistent), -the new scheduler makes nice(1) have the same CPU utilization effect on -tasks, regardless of their absolute nice levels. So on the new -scheduler, running a nice +10 and a nice 11 task has the same CPU -utilization "split" between them as running a nice -5 and a nice -4 -task. (one will get 55% of the CPU, the other 45%.) That is why nice -levels were changed to be "multiplicative" (or exponential) - that way -it does not matter which nice level you start out from, the 'relative -result' will always be the same. - -The third complaint (of negative nice levels not being "punchy" enough -and forcing audio apps to run under the more dangerous SCHED_FIFO -scheduling policy) is addressed by the new scheduler almost -automatically: stronger negative nice levels are an automatic -side-effect of the recalibrated dynamic range of nice levels. |