summaryrefslogtreecommitdiffstats
path: root/kernel/kexec.c
diff options
context:
space:
mode:
authorLinus Torvalds <torvalds@linux-foundation.org>2007-06-18 09:34:40 -0700
committerLinus Torvalds <torvalds@woody.linux-foundation.org>2007-06-18 11:52:55 -0700
commitfa490cfd15d7ce0900097cc4e60cfd7a76381138 (patch)
tree37c0294ed6f6f9e0362db974c4136979a37d9ecd /kernel/kexec.c
parenta0f98a1cb7d27c656de450ba56efd31bdc59065e (diff)
Fix possible runqueue lock starvation in wait_task_inactive()
Miklos Szeredi reported very long pauses (several seconds, sometimes more) on his T60 (with a Core2Duo) which he managed to track down to wait_task_inactive()'s open-coded busy-loop. He observed that an interrupt on one core tries to acquire the runqueue-lock but does not succeed in doing so for a very long time - while wait_task_inactive() on the other core loops waiting for the first core to deschedule a task (which it wont do while spinning in an interrupt handler). This rewrites wait_task_inactive() to do all its waiting optimistically without any locks taken at all, and then just double-check the end result with the proper runqueue lock held over just a very short section. If there were races in the optimistic wait, of a preemption event scheduled the process away, we simply re-synchronize, and start over. So the code now looks like this: repeat: /* Unlocked, optimistic looping! */ rq = task_rq(p); while (task_running(rq, p)) cpu_relax(); /* Get the *real* values */ rq = task_rq_lock(p, &flags); running = task_running(rq, p); array = p->array; task_rq_unlock(rq, &flags); /* Check them.. */ if (unlikely(running)) { cpu_relax(); goto repeat; } /* Preempted away? Yield if so.. */ if (unlikely(array)) { yield(); goto repeat; } Basically, that first "while()" loop is done entirely without any locking at all (and doesn't check for the case where the target process might have been preempted away), and so it's possibly "incorrect", but we don't really care. Both the runqueue used, and the "task_running()" check might be the wrong tests, but they won't oops - they just mean that we could possibly get the wrong results due to lack of locking and exit the loop early in the case of a race condition. So once we've exited the loop, we then get the proper (and careful) rq lock, and check the running/runnable state _safely_. And if it turns out that our quick-and-dirty and unsafe loop was wrong after all, we just go back and try it all again. (The patch also adds a lot of comments, which is the actual bulk of it all, to make it more obvious why we can do these things without holding the locks). Thanks to Miklos for all the testing and tracking it down. Tested-by: Miklos Szeredi <miklos@szeredi.hu> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Diffstat (limited to 'kernel/kexec.c')
0 files changed, 0 insertions, 0 deletions