summaryrefslogtreecommitdiffstats
path: root/kernel/rcutree.c
AgeCommit message (Collapse)Author
2010-10-07rcu: using ACCESS_ONCE() to observe the jiffies_stall/rnp->qsmask valueDongdong Deng
Using ACCESS_ONCE() to observe the jiffies_stall/rnp->qsmask value due to the caller didn't hold the root_rcu/rnp node's lock. Although use without ACCESS_ONCE() is safe due to the value loaded being used but once, the ACCESS_ONCE() is a good documentation aid -- the variables are being loaded without the services of a lock. Signed-off-by: Dongdong Deng <dongdong.deng@windriver.com> CC: Dipankar Sarma <dipankar@in.ibm.com> CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2010-09-23rcu: Add tracing data to support queueing modelsPaul E. McKenney
The current tracing data is not sufficient to deduce the average time that a callback spends waiting for a grace period to end. Add three per-CPU counters recording the number of callbacks invoked (ci), the number of callbacks orphaned (co), and the number of callbacks adopted (ca). Given the existing callback queue length (ql), the average wait time in absence of CPU hotplug operations is ql/ci. The units of wait time will be in terms of the duration over which ci was measured. In the presence of CPU hotplug operations, there is room for argument, but ql/(ci-co+ca) won't steer you too far wrong. Also fixes a typo called out by Lucas De Marchi <lucas.de.marchi@gmail.com>. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2010-08-20rcu: permit suppressing current grace period's CPU stall warningsPaul E. McKenney
When using a kernel debugger, a long sojourn in the debugger can get you lots of RCU CPU stall warnings once you resume. This might not be helpful, especially if you are using the system console. This patch therefore allows RCU CPU stall warnings to be suppressed, but only for the duration of the current set of grace periods. This differs from Jason's original patch in that it adds support for tiny RCU and preemptible RCU, and uses a slightly different method for suppressing the RCU CPU stall warning messages. Signed-off-by: Jason Wessel <jason.wessel@windriver.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Jason Wessel <jason.wessel@windriver.com>
2010-08-20rcu: refer RCU CPU stall-warning victims to stallwarn.txtPaul E. McKenney
There is some documentation on RCU CPU stall warnings contained in Documentation/RCU/stallwarn.txt, but it will not be apparent to someone who runs into such a warning while under time pressure. This commit therefore adds comments preceding the printk()s pointing out the location of this documentation. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2010-08-19rcu: Allow RCU CPU stall warnings to be off at boot, but manually enablablePaul E. McKenney
Currently, if RCU CPU stall warnings are enabled, they are enabled immediately upon boot. They can be manually disabled via /sys (and also re-enabled via /sys), and are automatically disabled upon panic. However, some users need RCU CPU stalls to be disabled at boot time, but to be enabled without rebuilding/rebooting. For example, someone running a real-time application in production might not want the additional latency of RCU CPU stall detection in normal operation, but might need to enable it at any point for fault isolation purposes. This commit therefore provides a new CONFIG_RCU_CPU_STALL_DETECTOR_RUNNABLE kernel configuration parameter that maintains the current behavior (enable at boot) by default, but allows a kernel to be configured with RCU CPU stall detection built into the kernel, but disabled at boot time. Requested-by: Clark Williams <williams@redhat.com> Requested-by: John Kacur <jkacur@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2010-08-19rcu: allow RCU CPU stall warning messages to be controlled in /sysPaul E. McKenney
Set the permissions of the rcu_cpu_stall_suppress to 644 to enable RCU CPU stall warnings to be enabled and disabled at runtime via sysfs. Suggested-by: Josh Triplett <josh@joshtriplett.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2010-08-19rcu: add boot parameter to suppress RCU CPU stall warning messagesPaul E. McKenney
Although the RCU CPU stall warning messages are a very good way to alert people to a problem, once alerted, it is sometimes helpful to shut them off in order to avoid obscuring other messages that might be being used to track down the problem. Although you can rebuild the kernel with CONFIG_RCU_CPU_STALL_DETECTOR=n, this is sometimes inconvenient. This commit therefore adds a boot parameter named "rcu_cpu_stall_suppress" that shuts these messages off without requiring a rebuild (though a reboot might be needed for those not brave enough to patch their kernel while it is running). This message-suppression was already in place for the panic case, so this commit need only rename the variable and export it via module_param(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2010-08-19rcu: simplify the usage of percpu dataLai Jiangshan
&percpu_data is compatible with allocated percpu data. And we use it and remove the "->rda[NR_CPUS]" array, saving significant storage on systems with large numbers of CPUs. This does add an additional level of indirection and thus an additional cache line referenced, but because ->rda is not used on the read side, this is OK. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Reviewed-by: Tejun Heo <tj@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2010-06-14tree/tiny rcu: Add debug RCU head objectsMathieu Desnoyers
Helps finding racy users of call_rcu(), which results in hangs because list entries are overwritten and/or skipped. Changelog since v4: - Bissectability is now OK - Now generate a WARN_ON_ONCE() for non-initialized rcu_head passed to call_rcu(). Statically initialized objects are detected with object_is_static(). - Rename rcu_head_init_on_stack to init_rcu_head_on_stack. - Remove init_rcu_head() completely. Changelog since v3: - Include comments from Lai Jiangshan This new patch version is based on the debugobjects with the newly introduced "active state" tracker. Non-initialized entries are all considered as "statically initialized". An activation fixup (triggered by call_rcu()) takes care of performing the debug object initialization without issuing any warning. Since we cannot increase the size of struct rcu_head, I don't see much room to put an identifier for statically initialized rcu_head structures. So for now, we have to live without "activation without explicit init" detection. But the main purpose of this debug option is to detect double-activations (double call_rcu() use of a rcu_head before the callback is executed), which is correctly addressed here. This also detects potential internal RCU callback corruption, which would cause the callbacks to be executed twice. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> CC: David S. Miller <davem@davemloft.net> CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> CC: akpm@linux-foundation.org CC: mingo@elte.hu CC: laijs@cn.fujitsu.com CC: dipankar@in.ibm.com CC: josh@joshtriplett.org CC: dvhltc@us.ibm.com CC: niv@us.ibm.com CC: tglx@linutronix.de CC: peterz@infradead.org CC: rostedt@goodmis.org CC: Valdis.Kletnieks@vt.edu CC: dhowells@redhat.com CC: eric.dumazet@gmail.com CC: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
2010-05-11rcu: remove all rcu head initializations, except on_stack initializationsPaul E. McKenney
Remove all rcu head inits. We don't care about the RCU head state before passing it to call_rcu() anyway. Only leave the "on_stack" variants so debugobjects can keep track of objects on stack. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2010-05-10rcu: reduce the number of spurious RCU_SOFTIRQ invocationsPaul E. McKenney
Lai Jiangshan noted that up to 10% of the RCU_SOFTIRQ are spurious, and traced this down to the fact that the current grace-period machinery will uselessly raise RCU_SOFTIRQ when a given CPU needs to go through a quiescent state, but has not yet done so. In this situation, there might well be nothing that RCU_SOFTIRQ can do, and the overhead can be worth worrying about in the ksoftirqd case. This patch therefore avoids raising RCU_SOFTIRQ in this situation. Changes since v1 (http://lkml.org/lkml/2010/3/30/122 from Lai Jiangshan): o Omit the rcu_qs_pending() prechecks, as they aren't that much less expensive than the quiescent-state checks. o Merge with the set_need_resched() patch that reduces IPIs. o Add the new n_rp_report_qs field to the rcu_pending tracing output. o Update the tracing documentation accordingly. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2010-05-10rcu: permit discontiguous cpu_possible_mask CPU numberingPaul E. McKenney
TREE_RCU assumes that CPU numbering is contiguous, but some users need large holes in the numbering to better map to hardware layout. This patch makes TREE_RCU (and TREE_PREEMPT_RCU) tolerate large holes in the CPU numbering. However, NR_CPUS must still be greater than the largest CPU number. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2010-05-10rcu: improve RCU CPU stall-warning messagesPaul E. McKenney
The existing RCU CPU stall-warning messages can be confusing, especially in the case where one CPU detects a single other stalled CPU. In addition, the console messages did not say which flavor of RCU detected the stall, which can make it difficult to work out exactly what is causing the stall. This commit improves these messages. Requested-by: Dhaval Giani <dhaval.giani@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2010-05-10rcu: print boot-time console messages if RCU configs out of ordinaryPaul E. McKenney
Print boot-time messages if tracing is enabled, if fanout is set to non-default values, if exact fanout is specified, if accelerated dyntick-idle grace periods have been enabled, if RCU-lockdep is enabled, if rcutorture has been boot-time enabled, if the CPU stall detector has been disabled, or if four-level hierarchy has been enabled. This is all for TREE_RCU and TREE_PREEMPT_RCU. TINY_RCU will be handled separately, if at all. Suggested-by: Josh Triplett <josh@joshtriplett.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2010-05-10rcu: disable CPU stall warnings upon panicPaul E. McKenney
The current RCU CPU stall warnings remain enabled even after a panic occurs, which some people have found to be a bit counterproductive. This patch therefore uses a notifier to disable stall warnings once a panic occurs. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2010-05-10rcu: slim down rcutiny by removing rcu_scheduler_active and friendsPaul E. McKenney
TINY_RCU does not need rcu_scheduler_active unless CONFIG_DEBUG_LOCK_ALLOC. So conditionally compile rcu_scheduler_active in order to slim down rcutiny a bit more. Also gets rid of an EXPORT_SYMBOL_GPL, which is responsible for most of the slimming. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2010-05-10rcu: refactor RCU's context-switch handlingPaul E. McKenney
The addition of preemptible RCU to treercu resulted in a bit of confusion and inefficiency surrounding the handling of context switches for RCU-sched and for RCU-preempt. For RCU-sched, a context switch is a quiescent state, pure and simple, just like it always has been. For RCU-preempt, a context switch is in no way a quiescent state, but special handling is required when a task blocks in an RCU read-side critical section. However, the callout from the scheduler and the outer loop in ksoftirqd still calls something named rcu_sched_qs(), whose name is no longer accurate. Furthermore, when rcu_check_callbacks() notes an RCU-sched quiescent state, it ends up unnecessarily (though harmlessly, aside from the performance hit) enqueuing the current task if it happens to be running in an RCU-preempt read-side critical section. This not only increases the maximum latency of scheduler_tick(), it also needlessly increases the overhead of the next outermost rcu_read_unlock() invocation. This patch addresses this situation by separating the notion of RCU's context-switch handling from that of RCU-sched's quiescent states. The context-switch handling is covered by rcu_note_context_switch() in general and by rcu_preempt_note_context_switch() for preemptible RCU. This permits rcu_sched_qs() to handle quiescent states and only quiescent states. It also reduces the maximum latency of scheduler_tick(), though probably by much less than a microsecond. Finally, it means that tasks within preemptible-RCU read-side critical sections avoid incurring the overhead of queuing unless there really is a context switch. Suggested-by: Lai Jiangshan <laijs@cn.fujitsu.com> Acked-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <peterz@infradead.org>
2010-05-10rcu: move some code from macro to functionLai Jiangshan
Shrink the RCU_INIT_FLAVOR() macro by moving all but the initialization of the ->rda[] array to rcu_init_one(). The call to rcu_init_one() can then be moved to the end of the RCU_INIT_FLAVOR() macro, which is required because rcu_boot_init_percpu_data(), which is now called from rcu_init_one(), depends on the initialization of the ->rda[] array. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2010-05-10rcu: make dead code really deadLai Jiangshan
cleanup: make dead code really dead Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2010-05-10rcu: substitute set_need_resched for sending resched IPIsPaul E. McKenney
This patch adds a check to __rcu_pending() that does a local set_need_resched() if the current CPU is holding up the current grace period and if force_quiescent_state() will be called soon. The goal is to reduce the probability that force_quiescent_state() will need to do smp_send_reschedule(), which sends an IPI and is therefore more expensive on most architectures. Signed-off-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
2010-02-27rcu: Fix accelerated grace periods for last non-dynticked CPUPaul E. McKenney
It is invalid to invoke __rcu_process_callbacks() with irqs disabled, so do it indirectly via raise_softirq(). This requires a state-machine implementation to cycle through the grace-period machinery the required number of times. Located-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: laijs@cn.fujitsu.com Cc: dipankar@in.ibm.com Cc: mathieu.desnoyers@polymtl.ca Cc: josh@joshtriplett.org Cc: dvhltc@us.ibm.com Cc: niv@us.ibm.com Cc: peterz@infradead.org Cc: rostedt@goodmis.org Cc: Valdis.Kletnieks@vt.edu Cc: dhowells@redhat.com LKML-Reference: <1267231138-27856-1-git-send-email-paulmck@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-02-26rcu: Make rcu_read_lock_sched_held() take boot time into accountPaul E. McKenney
Before the scheduler starts, all tasks are non-preemptible by definition. So, during that time, rcu_read_lock_sched_held() needs to always return "true". This patch makes that be so. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: laijs@cn.fujitsu.com Cc: dipankar@in.ibm.com Cc: mathieu.desnoyers@polymtl.ca Cc: josh@joshtriplett.org Cc: dvhltc@us.ibm.com Cc: niv@us.ibm.com Cc: peterz@infradead.org Cc: rostedt@goodmis.org Cc: Valdis.Kletnieks@vt.edu Cc: dhowells@redhat.com LKML-Reference: <1267135607-7056-2-git-send-email-paulmck@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-02-25rcu: Add RCU_CPU_STALL_VERBOSE to dump detailed per-task informationPaul E. McKenney
When RCU detects a grace-period stall, it currently just prints out the PID of any tasks doing the stalling. This patch adds RCU_CPU_STALL_VERBOSE, which enables the more-verbose reporting from sched_show_task(). Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: laijs@cn.fujitsu.com Cc: dipankar@in.ibm.com Cc: mathieu.desnoyers@polymtl.ca Cc: josh@joshtriplett.org Cc: dvhltc@us.ibm.com Cc: niv@us.ibm.com Cc: peterz@infradead.org Cc: rostedt@goodmis.org Cc: Valdis.Kletnieks@vt.edu Cc: dhowells@redhat.com LKML-Reference: <1266887105-1528-21-git-send-email-paulmck@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-02-25rcu: Fix deadlock in TREE_PREEMPT_RCU CPU stall detectionPaul E. McKenney
Under TREE_PREEMPT_RCU, print_other_cpu_stall() invokes rcu_print_task_stall() with the root rcu_node structure's ->lock held, and rcu_print_task_stall() acquires that same lock for self-deadlock. Fix this by removing the lock acquisition from rcu_print_task_stall(), and making all callers acquire the lock instead. Tested-by: John Kacur <jkacur@redhat.com> Tested-by: Thomas Gleixner <tglx@linutronix.de> Located-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: laijs@cn.fujitsu.com Cc: dipankar@in.ibm.com Cc: mathieu.desnoyers@polymtl.ca Cc: josh@joshtriplett.org Cc: dvhltc@us.ibm.com Cc: niv@us.ibm.com Cc: peterz@infradead.org Cc: rostedt@goodmis.org Cc: Valdis.Kletnieks@vt.edu Cc: dhowells@redhat.com LKML-Reference: <1266887105-1528-19-git-send-email-paulmck@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-02-25rcu: Convert to raw_spinlocksPaul E. McKenney
The spinlocks in rcutree need to be real spinlocks in preempt-rt. Convert them to raw_spinlocks. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: laijs@cn.fujitsu.com Cc: dipankar@in.ibm.com Cc: mathieu.desnoyers@polymtl.ca Cc: josh@joshtriplett.org Cc: dvhltc@us.ibm.com Cc: niv@us.ibm.com Cc: peterz@infradead.org Cc: rostedt@goodmis.org Cc: Valdis.Kletnieks@vt.edu Cc: dhowells@redhat.com LKML-Reference: <1266887105-1528-18-git-send-email-paulmck@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-02-25rcu: Stop overflowing signed integersPaul E. McKenney
The C standard does not specify the result of an operation that overflows a signed integer, so such operations need to be avoided. This patch changes the type of several fields from "long" to "unsigned long" and adjusts operations as needed. ULONG_CMP_GE() and ULONG_CMP_LT() macros are introduced to do the modular comparisons that are appropriate given that overflow is an expected event. Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: laijs@cn.fujitsu.com Cc: dipankar@in.ibm.com Cc: mathieu.desnoyers@polymtl.ca Cc: josh@joshtriplett.org Cc: dvhltc@us.ibm.com Cc: niv@us.ibm.com Cc: peterz@infradead.org Cc: rostedt@goodmis.org Cc: Valdis.Kletnieks@vt.edu Cc: dhowells@redhat.com LKML-Reference: <1266887105-1528-17-git-send-email-paulmck@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-02-25rcu: Accelerate grace period if last non-dynticked CPUPaul E. McKenney
Currently, rcu_needs_cpu() simply checks whether the current CPU has an outstanding RCU callback, which means that the last CPU to go into dyntick-idle mode might wait a few ticks for the relevant grace periods to complete. However, if all the other CPUs are in dyntick-idle mode, and if this CPU is in a quiescent state (which it is for RCU-bh and RCU-sched any time that we are considering going into dyntick-idle mode), then the grace period is instantly complete. This patch therefore repeatedly invokes the RCU grace-period machinery in order to force any needed grace periods to complete quickly. It does so a limited number of times in order to prevent starvation by an RCU callback function that might pass itself to call_rcu(). However, if any CPU other than the current one is not in dyntick-idle mode, fall back to simply checking (with fix to bug noted by Lai Jiangshan). Also, take advantage of last grace-period forcing, the opportunity to do so noted by Steve Rostedt. And apply simplified #ifdef condition suggested by Frederic Weisbecker. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: laijs@cn.fujitsu.com Cc: dipankar@in.ibm.com Cc: mathieu.desnoyers@polymtl.ca Cc: josh@joshtriplett.org Cc: dvhltc@us.ibm.com Cc: niv@us.ibm.com Cc: peterz@infradead.org Cc: rostedt@goodmis.org Cc: Valdis.Kletnieks@vt.edu Cc: dhowells@redhat.com LKML-Reference: <1266887105-1528-15-git-send-email-paulmck@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-01-16rcu: Fix sparse warningsPaul E. McKenney
Rename local variable "i" in rcu_init() to avoid conflict with RCU_INIT_FLAVOR(), restrict the scope of RCU_TREE_NONCORE, and make __synchronize_srcu() static. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: laijs@cn.fujitsu.com Cc: dipankar@in.ibm.com Cc: mathieu.desnoyers@polymtl.ca Cc: josh@joshtriplett.org Cc: dvhltc@us.ibm.com Cc: niv@us.ibm.com Cc: peterz@infradead.org Cc: rostedt@goodmis.org Cc: Valdis.Kletnieks@vt.edu Cc: dhowells@redhat.com LKML-Reference: <12635142581560-git-send-email-> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-01-13rcu: Give different levels of the rcu_node hierarchy distinct lockdep namesPaul E. McKenney
Previously, each level of the rcu_node hierarchy had the same rather unimaginative name: "&rcu_node_class[i]". This makes lockdep diagnostics involving these lockdep classes less helpful than would be nice. This patch fixes this by giving each level of the rcu_node hierarchy a distinct name: "rcu_node_level_0", "rcu_node_level_1", and so on. This version of the patch includes improved diagnostics suggested by Josh Triplett and Peter Zijlstra. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: laijs@cn.fujitsu.com Cc: dipankar@in.ibm.com Cc: mathieu.desnoyers@polymtl.ca Cc: josh@joshtriplett.org Cc: dvhltc@us.ibm.com Cc: niv@us.ibm.com Cc: peterz@infradead.org Cc: rostedt@goodmis.org Cc: Valdis.Kletnieks@vt.edu Cc: dhowells@redhat.com LKML-Reference: <12626498421830-git-send-email-> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-01-13rcu: Add force_quiescent_state() testing to rcutorturePaul E. McKenney
Add force_quiescent_state() testing to rcutorture, with a separate thread that repeatedly invokes force_quiescent_state() in bursts. This can greatly increase the probability of encountering certain types of race conditions. Suggested-by: Josh Triplett <josh@joshtriplett.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: laijs@cn.fujitsu.com Cc: dipankar@in.ibm.com Cc: mathieu.desnoyers@polymtl.ca Cc: josh@joshtriplett.org Cc: dvhltc@us.ibm.com Cc: niv@us.ibm.com Cc: peterz@infradead.org Cc: rostedt@goodmis.org Cc: Valdis.Kletnieks@vt.edu Cc: dhowells@redhat.com LKML-Reference: <1262646551116-git-send-email-> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-01-13rcu: Make force_quiescent_state() start grace period if neededPaul E. McKenney
Grace periods cannot be started while force_quiescent_state() is active. This is OK in that the affected CPUs will try again later, but it does induce needless grace-period delays. This patch causes rcu_start_gp() to record a failed attempt to start a grace period. When force_quiescent_state() prepares to return, it then starts the grace period if there was such a failed attempt. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: laijs@cn.fujitsu.com Cc: dipankar@in.ibm.com Cc: mathieu.desnoyers@polymtl.ca Cc: josh@joshtriplett.org Cc: dvhltc@us.ibm.com Cc: niv@us.ibm.com Cc: peterz@infradead.org Cc: rostedt@goodmis.org Cc: Valdis.Kletnieks@vt.edu Cc: dhowells@redhat.com LKML-Reference: <12626465501854-git-send-email-> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-01-13rcu: Remove redundant grace-period checkPaul E. McKenney
The rcu_process_dyntick() function checks twice for the end of the current grace period. However, it holds the current rcu_node structure's ->lock field throughout, and doesn't get to the second call to rcu_gp_in_progress() unless there is at least one CPU corresponding to this rcu_node structure that has not yet checked in for the current grace period, which would prevent the current grace period from ending. So the current grace period cannot have ended, and the second check is redundant, so remove it. Also, given that this function is used even with !CONFIG_NO_HZ, its name is quite misleading. Change from rcu_process_dyntick() to force_qs_rnp(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: laijs@cn.fujitsu.com Cc: dipankar@in.ibm.com Cc: mathieu.desnoyers@polymtl.ca Cc: josh@joshtriplett.org Cc: dvhltc@us.ibm.com Cc: niv@us.ibm.com Cc: peterz@infradead.org Cc: rostedt@goodmis.org Cc: Valdis.Kletnieks@vt.edu Cc: dhowells@redhat.com LKML-Reference: <1262646550562-git-send-email-> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-01-13rcu: Remove leg of force_quiescent_state() switch statementPaul E. McKenney
The comparisons of rsp->gpnum nad rsp->completed in rcu_process_dyntick() and force_quiescent_state() can be replaced by the much more clear rcu_gp_in_progress() predicate function. After doing this, it becomes clear that the RCU_SAVE_COMPLETED leg of the force_quiescent_state() function's switch statement is almost completely a no-op. A small change to the RCU_SAVE_DYNTICK leg renders it a complete no-op, after which it can be removed. Doing so also eliminates the forcenow local variable from force_quiescent_state(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: laijs@cn.fujitsu.com Cc: dipankar@in.ibm.com Cc: mathieu.desnoyers@polymtl.ca Cc: josh@joshtriplett.org Cc: dvhltc@us.ibm.com Cc: niv@us.ibm.com Cc: peterz@infradead.org Cc: rostedt@goodmis.org Cc: Valdis.Kletnieks@vt.edu Cc: dhowells@redhat.com LKML-Reference: <12626465501781-git-send-email-> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-01-13rcu: Eliminate rcu_process_dyntick() return valuePaul E. McKenney
Because a new grace period cannot start while we are executing within the force_quiescent_state() function's switch statement, if any test within that switch statement or within any function called from that switch statement shows that the current grace period has ended, we can safely re-do that test any time before we leave the switch statement. This means that we no longer need a return value from rcu_process_dyntick(), as we can simply invoke rcu_gp_in_progress() to check whether the old grace period has finished -- there is no longer any need to worry about whether or not a new grace period has been started. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: laijs@cn.fujitsu.com Cc: dipankar@in.ibm.com Cc: mathieu.desnoyers@polymtl.ca Cc: josh@joshtriplett.org Cc: dvhltc@us.ibm.com Cc: niv@us.ibm.com Cc: peterz@infradead.org Cc: rostedt@goodmis.org Cc: Valdis.Kletnieks@vt.edu Cc: dhowells@redhat.com LKML-Reference: <12626465501857-git-send-email-> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-01-13rcu: Eliminate second argument of rcu_process_dyntick()Paul E. McKenney
At this point, the second argument to all calls to rcu_process_dyntick() is a function of the same field of the structure passed in as the first argument, namely, rsp->gpnum-1. So propagate rsp->gpnum-1 to all uses of the second argument within rcu_process_dyntick() and then eliminate the second argument. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: laijs@cn.fujitsu.com Cc: dipankar@in.ibm.com Cc: mathieu.desnoyers@polymtl.ca Cc: josh@joshtriplett.org Cc: dvhltc@us.ibm.com Cc: niv@us.ibm.com Cc: peterz@infradead.org Cc: rostedt@goodmis.org Cc: Valdis.Kletnieks@vt.edu Cc: dhowells@redhat.com LKML-Reference: <12626465503786-git-send-email-> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-01-13rcu: Eliminate local variable lastcomp from force_quiescent_state()Paul E. McKenney
Because rsp->fqs_active is set to 1 across force_quiescent_state()'s switch statement, rcu_start_gp() will refrain from starting a new grace period during this time. Therefore, rsp->gpnum is constant, and can be propagated to all uses of lastcomp, eliminating this local variable. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: laijs@cn.fujitsu.com Cc: dipankar@in.ibm.com Cc: mathieu.desnoyers@polymtl.ca Cc: josh@joshtriplett.org Cc: dvhltc@us.ibm.com Cc: niv@us.ibm.com Cc: peterz@infradead.org Cc: rostedt@goodmis.org Cc: Valdis.Kletnieks@vt.edu Cc: dhowells@redhat.com LKML-Reference: <12626465502985-git-send-email-> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-01-13rcu: Eliminate local variable signaled from force_quiescent_state()Paul E. McKenney
Because the root rcu_node lock is held across entry to the switch statement in force_quiescent_state(), it is no longer necessary to snapshot rsp->signaled to a local variable. Eliminate both the snapshotting and the local variable. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: laijs@cn.fujitsu.com Cc: dipankar@in.ibm.com Cc: mathieu.desnoyers@polymtl.ca Cc: josh@joshtriplett.org Cc: dvhltc@us.ibm.com Cc: niv@us.ibm.com Cc: peterz@infradead.org Cc: rostedt@goodmis.org Cc: Valdis.Kletnieks@vt.edu Cc: dhowells@redhat.com LKML-Reference: <1262646550602-git-send-email-> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-01-13rcu: Prohibit starting new grace periods while forcing quiescent statesPaul E. McKenney
Reduce the number and variety of race conditions by prohibiting the start of a new grace period while force_quiescent_state() is active. A new fqs_active flag in the rcu_state structure is used to trace whether or not force_quiescent_state() is active, and this new flag is tested by rcu_start_gp(). If the CPU that closed out the last grace period needs another grace period, this new grace period may be delayed up to one scheduling-clock tick, but it will eventually get started. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: laijs@cn.fujitsu.com Cc: dipankar@in.ibm.com Cc: mathieu.desnoyers@polymtl.ca Cc: josh@joshtriplett.org Cc: dvhltc@us.ibm.com Cc: niv@us.ibm.com Cc: peterz@infradead.org Cc: rostedt@goodmis.org Cc: Valdis.Kletnieks@vt.edu Cc: dhowells@redhat.com LKML-Reference: <126264655052-git-send-email-> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-01-13rcu: Adjust force_quiescent_state() locking, step 2Paul E. McKenney
This patch releases rnp->lock after the end of force_quiescent_state()'s switch statement. This is a second step towards prohibiting starting grace periods while force_quiescent_state() is executing, which will reduce the number and complexity of races that force_quiescent_state() is involved in. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: laijs@cn.fujitsu.com Cc: dipankar@in.ibm.com Cc: mathieu.desnoyers@polymtl.ca Cc: josh@joshtriplett.org Cc: dvhltc@us.ibm.com Cc: niv@us.ibm.com Cc: peterz@infradead.org Cc: rostedt@goodmis.org Cc: Valdis.Kletnieks@vt.edu Cc: dhowells@redhat.com LKML-Reference: <12626465501994-git-send-email-> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-01-13rcu: Adjust force_quiescent_state() locking, step 1Paul E. McKenney
This causes rnp->lock to be held on entry to force_quiescent_state()'s switch statement. This is a first step towards prohibiting starting grace periods while force_quiescent_state() is executing, which will reduce the number and complexity of races that force_quiescent_state() is involved in. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: laijs@cn.fujitsu.com Cc: dipankar@in.ibm.com Cc: mathieu.desnoyers@polymtl.ca Cc: josh@joshtriplett.org Cc: dvhltc@us.ibm.com Cc: niv@us.ibm.com Cc: peterz@infradead.org Cc: rostedt@goodmis.org Cc: Valdis.Kletnieks@vt.edu Cc: dhowells@redhat.com LKML-Reference: <12626465501455-git-send-email-> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-03rcu: Add expedited grace-period support for preemptible RCUPaul E. McKenney
Implement an synchronize_rcu_expedited() for preemptible RCU that actually is expedited. This uses synchronize_sched_expedited() to force all threads currently running in a preemptible-RCU read-side critical section onto the appropriate ->blocked_tasks[] list, then takes a snapshot of all of these lists and waits for them to drain. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: laijs@cn.fujitsu.com Cc: dipankar@in.ibm.com Cc: mathieu.desnoyers@polymtl.ca Cc: josh@joshtriplett.org Cc: dvhltc@us.ibm.com Cc: niv@us.ibm.com Cc: peterz@infradead.org Cc: rostedt@goodmis.org Cc: Valdis.Kletnieks@vt.edu Cc: dhowells@redhat.com LKML-Reference: <1259784616158-git-send-email-> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-03rcu: Enable fourth level of TREE_RCU hierarchyPaul E. McKenney
Enable a fourth level of rcu_node hierarchy for TREE_RCU and TREE_PREEMPT_RCU. This is for stress-testing and experiemental purposes only, although in theory this would enable 16,777,216 CPUs on 64-bit systems, though only 1,048,576 CPUs on 32-bit systems. Normal experimental use of this fourth level will normally set CONFIG_RCU_FANOUT=2, requiring a 16-CPU system, though the more adventurous (and more fortunate) experimenters may wish to chose CONFIG_RCU_FANOUT=3 for 81-CPU systems or even CONFIG_RCU_FANOUT=4 for 256-CPU systems. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Josh Triplett <josh@joshtriplett.org> Acked-by: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: dipankar@in.ibm.com Cc: mathieu.desnoyers@polymtl.ca Cc: dvhltc@us.ibm.com Cc: niv@us.ibm.com Cc: peterz@infradead.org Cc: rostedt@goodmis.org Cc: Valdis.Kletnieks@vt.edu Cc: dhowells@redhat.com LKML-Reference: <12597846161257-git-send-email-> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-03rcu: Rename "quiet" functionsPaul E. McKenney
The number of "quiet" functions has grown recently, and the names are no longer very descriptive. The point of all of these functions is to do some portion of the task of reporting a quiescent state, so rename them accordingly: o cpu_quiet() becomes rcu_report_qs_rdp(), which reports a quiescent state to the per-CPU rcu_data structure. If this turns out to be a new quiescent state for this grace period, then rcu_report_qs_rnp() will be invoked to propagate the quiescent state up the rcu_node hierarchy. o cpu_quiet_msk() becomes rcu_report_qs_rnp(), which reports a quiescent state for a given CPU (or possibly a set of CPUs) up the rcu_node hierarchy. o cpu_quiet_msk_finish() becomes rcu_report_qs_rsp(), which reports a full set of quiescent states to the global rcu_state structure. o task_quiet() becomes rcu_report_unblock_qs_rnp(), which reports a quiescent state due to a task exiting an RCU read-side critical section that had previously blocked in that same critical section. As indicated by the new name, this type of quiescent state is reported up the rcu_node hierarchy (using rcu_report_qs_rnp() to do so). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Josh Triplett <josh@joshtriplett.org> Acked-by: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: dipankar@in.ibm.com Cc: mathieu.desnoyers@polymtl.ca Cc: dvhltc@us.ibm.com Cc: niv@us.ibm.com Cc: peterz@infradead.org Cc: rostedt@goodmis.org Cc: Valdis.Kletnieks@vt.edu Cc: dhowells@redhat.com LKML-Reference: <12597846163698-git-send-email-> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-22rcu: Re-arrange code to reduce #ifdef painPaul E. McKenney
Remove #ifdefs from kernel/rcupdate.c and include/linux/rcupdate.h by moving code to include/linux/rcutiny.h, include/linux/rcutree.h, and kernel/rcutree.c. Also remove some definitions that are no longer used. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: laijs@cn.fujitsu.com Cc: dipankar@in.ibm.com Cc: mathieu.desnoyers@polymtl.ca Cc: josh@joshtriplett.org Cc: dvhltc@us.ibm.com Cc: niv@us.ibm.com Cc: peterz@infradead.org Cc: rostedt@goodmis.org Cc: Valdis.Kletnieks@vt.edu Cc: dhowells@redhat.com LKML-Reference: <1258908830885-git-send-email-> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-22rcu: Eliminate unneeded function wrappingPaul E. McKenney
The functions rcu_init() is a wrapper for __rcu_init(), and also sets up the CPU-hotplug notifier for rcu_barrier_cpu_hotplug(). But TINY_RCU doesn't need CPU-hotplug notification, and the rcu_barrier_cpu_hotplug() is a simple wrapper for rcu_cpu_notify(). So push rcu_init() out to kernel/rcutree.c and kernel/rcutiny.c and get rid of the wrapper function rcu_barrier_cpu_hotplug(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: laijs@cn.fujitsu.com Cc: dipankar@in.ibm.com Cc: mathieu.desnoyers@polymtl.ca Cc: josh@joshtriplett.org Cc: dvhltc@us.ibm.com Cc: niv@us.ibm.com Cc: peterz@infradead.org Cc: rostedt@goodmis.org Cc: Valdis.Kletnieks@vt.edu Cc: dhowells@redhat.com LKML-Reference: <12589088302320-git-send-email-> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-22rcu: Fix grace-period-stall bug on large systems with CPU hotplugPaul E. McKenney
When the last CPU of a given leaf rcu_node structure goes offline, all of the tasks queued on that leaf rcu_node structure (due to having blocked in their current RCU read-side critical sections) are requeued onto the root rcu_node structure. This requeuing is carried out by rcu_preempt_offline_tasks(). However, it is possible that these queued tasks are the only thing preventing the leaf rcu_node structure from reporting a quiescent state up the rcu_node hierarchy. Unfortunately, the old code would fail to do this reporting, resulting in a grace-period stall given the following sequence of events: 1. Kernel built for more than 32 CPUs on 32-bit systems or for more than 64 CPUs on 64-bit systems, so that there is more than one rcu_node structure. (Or CONFIG_RCU_FANOUT is artificially set to a number smaller than CONFIG_NR_CPUS.) 2. The kernel is built with CONFIG_TREE_PREEMPT_RCU. 3. A task running on a CPU associated with a given leaf rcu_node structure blocks while in an RCU read-side critical section -and- that CPU has not yet passed through a quiescent state for the current RCU grace period. This will cause the task to be queued on the leaf rcu_node's blocked_tasks[] array, in particular, on the element of this array corresponding to the current grace period. 4. Each of the remaining CPUs corresponding to this same leaf rcu_node structure pass through a quiescent state. However, the task is still in its RCU read-side critical section, so these quiescent states cannot be reported further up the rcu_node hierarchy. Nevertheless, all bits in the leaf rcu_node structure's ->qsmask field are now zero. 5. Each of the remaining CPUs go offline. (The events in step #4 and #5 can happen in any order as long as each CPU passes through a quiescent state before going offline.) 6. When the last CPU goes offline, __rcu_offline_cpu() will invoke rcu_preempt_offline_tasks(), which will move the task to the root rcu_node structure, but without reporting a quiescent state up the rcu_node hierarchy (and this failure to report a quiescent state is the bug). But because this leaf rcu_node structure's ->qsmask field is already zero and its ->block_tasks[] entries are all empty, force_quiescent_state() will skip this rcu_node structure. Therefore, grace periods are now hung. This patch abstracts some code out of rcu_read_unlock_special(), calling the result task_quiet() by analogy with cpu_quiet(), and invokes task_quiet() from both rcu_read_lock_special() and __rcu_offline_cpu(). Invoking task_quiet() from __rcu_offline_cpu() reports the quiescent state up the rcu_node hierarchy, fixing the bug. This ends up requiring a separate lock_class_key per level of the rcu_node hierarchy, which this patch also provides. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: laijs@cn.fujitsu.com Cc: dipankar@in.ibm.com Cc: mathieu.desnoyers@polymtl.ca Cc: josh@joshtriplett.org Cc: dvhltc@us.ibm.com Cc: niv@us.ibm.com Cc: peterz@infradead.org Cc: rostedt@goodmis.org Cc: Valdis.Kletnieks@vt.edu Cc: dhowells@redhat.com LKML-Reference: <12589088301770-git-send-email-> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-14rcu: Eliminate __rcu_pending() false positivesPaul E. McKenney
Now that there are both ->gpnum and ->completed fields in the rcu_node structure, __rcu_pending() should check rdp->gpnum and rdp->completed against rnp->gpnum and rdp->completed, respectively, instead of the prior comparison against the rcu_state fields rsp->gpnum and rsp->completed. Given the old comparison, __rcu_pending() could return 1, resulting in a needless raise_softirq(RCU_SOFTIRQ). This useless work would happen if RCU responded to a scheduling-clock interrupt after the rcu_state fields had been updated, but before the rcu_node fields had been updated. Changing the comparison from the rcu_state fields to the rcu_node fields prevents this useless work from happening. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: laijs@cn.fujitsu.com Cc: dipankar@in.ibm.com Cc: mathieu.desnoyers@polymtl.ca Cc: josh@joshtriplett.org Cc: dvhltc@us.ibm.com Cc: niv@us.ibm.com Cc: peterz@infradead.org Cc: rostedt@goodmis.org Cc: Valdis.Kletnieks@vt.edu Cc: dhowells@redhat.com LKML-Reference: <12581706991966-git-send-email-> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-14rcu: Further cleanups of use of lastcompPaul E. McKenney
Now that a copy of the rsp->completed flag is available in all rcu_node structures, make full use of it. It is still legitimate to access rsp->completed while holding the root rcu_node structure's lock, however. Also, tighten up force_quiescent_state()'s checks for end of current grace period. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: laijs@cn.fujitsu.com Cc: dipankar@in.ibm.com Cc: mathieu.desnoyers@polymtl.ca Cc: josh@joshtriplett.org Cc: dvhltc@us.ibm.com Cc: niv@us.ibm.com Cc: peterz@infradead.org Cc: rostedt@goodmis.org Cc: Valdis.Kletnieks@vt.edu Cc: dhowells@redhat.com LKML-Reference: <1258170699933-git-send-email-> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-13rcu: Simplify association of forced quiescent states with grace periodsPaul E. McKenney
The force_quiescent_state() function also took a snapshot of the ->completed field, which was as obnoxious as it was in rcu_sched_qs() and friends. So snapshot ->gpnum-1. Also, since the dyntick_record_completed() and dyntick_recall_completed() functions are now simple assignments that are independent of CONFIG_NO_HZ, and since their names are now misleading, get rid of them. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: laijs@cn.fujitsu.com Cc: dipankar@in.ibm.com Cc: mathieu.desnoyers@polymtl.ca Cc: josh@joshtriplett.org Cc: dvhltc@us.ibm.com Cc: niv@us.ibm.com Cc: peterz@infradead.org Cc: rostedt@goodmis.org Cc: Valdis.Kletnieks@vt.edu Cc: dhowells@redhat.com LKML-Reference: <12580941042308-git-send-email-> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-13rcu: Accelerate callback processing on CPUs not detecting GP endPaul E. McKenney
An earlier fix for a race resulted in a situation where the CPUs other than the CPU that detected the end of the grace period would not process their callbacks until the next grace period started. This means that these other CPUs would unnecessarily demand that an extra grace period be started. This patch eliminates this extra grace period and speeds callback processing by propagating rsp->completed to the rcu_node structures in the case where the CPU detecting the end of the grace period sees no reason to start a new grace period. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: laijs@cn.fujitsu.com Cc: dipankar@in.ibm.com Cc: mathieu.desnoyers@polymtl.ca Cc: josh@joshtriplett.org Cc: dvhltc@us.ibm.com Cc: niv@us.ibm.com Cc: peterz@infradead.org Cc: rostedt@goodmis.org Cc: Valdis.Kletnieks@vt.edu Cc: dhowells@redhat.com LKML-Reference: <1258094104417-git-send-email-> Signed-off-by: Ingo Molnar <mingo@elte.hu>