From 4c54005ca438a8b46dd542b497d4f0dc2ca375e8 Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Thu, 14 Jan 2010 16:10:57 -0800 Subject: rcu: 1Q2010 update for RCU documentation Add expedited functions. Review documentation and update obsolete verbiage. Also fix the advice for the RCU CPU-stall kernel configuration parameter, and document RCU CPU-stall warnings. Signed-off-by: Paul E. McKenney Cc: laijs@cn.fujitsu.com Cc: dipankar@in.ibm.com Cc: mathieu.desnoyers@polymtl.ca Cc: josh@joshtriplett.org Cc: dvhltc@us.ibm.com Cc: niv@us.ibm.com Cc: peterz@infradead.org Cc: rostedt@goodmis.org Cc: Valdis.Kletnieks@vt.edu Cc: dhowells@redhat.com LKML-Reference: <12635142581866-git-send-email-> Signed-off-by: Ingo Molnar --- Documentation/RCU/checklist.txt | 200 ++++++++++++++++++++++++---------------- 1 file changed, 119 insertions(+), 81 deletions(-) (limited to 'Documentation/RCU/checklist.txt') diff --git a/Documentation/RCU/checklist.txt b/Documentation/RCU/checklist.txt index 51525a30e8b..767cf06a427 100644 --- a/Documentation/RCU/checklist.txt +++ b/Documentation/RCU/checklist.txt @@ -8,13 +8,12 @@ would cause. This list is based on experiences reviewing such patches over a rather long period of time, but improvements are always welcome! 0. Is RCU being applied to a read-mostly situation? If the data - structure is updated more than about 10% of the time, then - you should strongly consider some other approach, unless - detailed performance measurements show that RCU is nonetheless - the right tool for the job. Yes, you might think of RCU - as simply cutting overhead off of the readers and imposing it - on the writers. That is exactly why normal uses of RCU will - do much more reading than updating. + structure is updated more than about 10% of the time, then you + should strongly consider some other approach, unless detailed + performance measurements show that RCU is nonetheless the right + tool for the job. Yes, RCU does reduce read-side overhead by + increasing write-side overhead, which is exactly why normal uses + of RCU will do much more reading than updating. Another exception is where performance is not an issue, and RCU provides a simpler implementation. An example of this situation @@ -35,13 +34,13 @@ over a rather long period of time, but improvements are always welcome! If you choose #b, be prepared to describe how you have handled memory barriers on weakly ordered machines (pretty much all of - them -- even x86 allows reads to be reordered), and be prepared - to explain why this added complexity is worthwhile. If you - choose #c, be prepared to explain how this single task does not - become a major bottleneck on big multiprocessor machines (for - example, if the task is updating information relating to itself - that other tasks can read, there by definition can be no - bottleneck). + them -- even x86 allows later loads to be reordered to precede + earlier stores), and be prepared to explain why this added + complexity is worthwhile. If you choose #c, be prepared to + explain how this single task does not become a major bottleneck on + big multiprocessor machines (for example, if the task is updating + information relating to itself that other tasks can read, there + by definition can be no bottleneck). 2. Do the RCU read-side critical sections make proper use of rcu_read_lock() and friends? These primitives are needed @@ -51,8 +50,10 @@ over a rather long period of time, but improvements are always welcome! actuarial risk of your kernel. As a rough rule of thumb, any dereference of an RCU-protected - pointer must be covered by rcu_read_lock() or rcu_read_lock_bh() - or by the appropriate update-side lock. + pointer must be covered by rcu_read_lock(), rcu_read_lock_bh(), + rcu_read_lock_sched(), or by the appropriate update-side lock. + Disabling of preemption can serve as rcu_read_lock_sched(), but + is less readable. 3. Does the update code tolerate concurrent accesses? @@ -62,25 +63,27 @@ over a rather long period of time, but improvements are always welcome! of ways to handle this concurrency, depending on the situation: a. Use the RCU variants of the list and hlist update - primitives to add, remove, and replace elements on an - RCU-protected list. Alternatively, use the RCU-protected - trees that have been added to the Linux kernel. + primitives to add, remove, and replace elements on + an RCU-protected list. Alternatively, use the other + RCU-protected data structures that have been added to + the Linux kernel. This is almost always the best approach. b. Proceed as in (a) above, but also maintain per-element locks (that are acquired by both readers and writers) that guard per-element state. Of course, fields that - the readers refrain from accessing can be guarded by the - update-side lock. + the readers refrain from accessing can be guarded by + some other lock acquired only by updaters, if desired. This works quite well, also. c. Make updates appear atomic to readers. For example, - pointer updates to properly aligned fields will appear - atomic, as will individual atomic primitives. Operations - performed under a lock and sequences of multiple atomic - primitives will -not- appear to be atomic. + pointer updates to properly aligned fields will + appear atomic, as will individual atomic primitives. + Sequences of perations performed under a lock will -not- + appear to be atomic to RCU readers, nor will sequences + of multiple atomic primitives. This can work, but is starting to get a bit tricky. @@ -98,9 +101,9 @@ over a rather long period of time, but improvements are always welcome! a new structure containing updated values. 4. Weakly ordered CPUs pose special challenges. Almost all CPUs - are weakly ordered -- even i386 CPUs allow reads to be reordered. - RCU code must take all of the following measures to prevent - memory-corruption problems: + are weakly ordered -- even x86 CPUs allow later loads to be + reordered to precede earlier stores. RCU code must take all of + the following measures to prevent memory-corruption problems: a. Readers must maintain proper ordering of their memory accesses. The rcu_dereference() primitive ensures that @@ -113,14 +116,21 @@ over a rather long period of time, but improvements are always welcome! The rcu_dereference() primitive is also an excellent documentation aid, letting the person reading the code know exactly which pointers are protected by RCU. - - The rcu_dereference() primitive is used by the various - "_rcu()" list-traversal primitives, such as the - list_for_each_entry_rcu(). Note that it is perfectly - legal (if redundant) for update-side code to use - rcu_dereference() and the "_rcu()" list-traversal - primitives. This is particularly useful in code - that is common to readers and updaters. + Please note that compilers can also reorder code, and + they are becoming increasingly aggressive about doing + just that. The rcu_dereference() primitive therefore + also prevents destructive compiler optimizations. + + The rcu_dereference() primitive is used by the + various "_rcu()" list-traversal primitives, such + as the list_for_each_entry_rcu(). Note that it is + perfectly legal (if redundant) for update-side code to + use rcu_dereference() and the "_rcu()" list-traversal + primitives. This is particularly useful in code that + is common to readers and updaters. However, neither + rcu_dereference() nor the "_rcu()" list-traversal + primitives can substitute for a good concurrency design + coordinating among multiple updaters. b. If the list macros are being used, the list_add_tail_rcu() and list_add_rcu() primitives must be used in order @@ -135,11 +145,14 @@ over a rather long period of time, but improvements are always welcome! readers. Similarly, if the hlist macros are being used, the hlist_del_rcu() primitive is required. - The list_replace_rcu() primitive may be used to - replace an old structure with a new one in an - RCU-protected list. + The list_replace_rcu() and hlist_replace_rcu() primitives + may be used to replace an old structure with a new one + in their respective types of RCU-protected lists. + + d. Rules similar to (4b) and (4c) apply to the "hlist_nulls" + type of RCU-protected linked lists. - d. Updates must ensure that initialization of a given + e. Updates must ensure that initialization of a given structure happens before pointers to that structure are publicized. Use the rcu_assign_pointer() primitive when publicizing a pointer to a structure that can @@ -151,16 +164,31 @@ over a rather long period of time, but improvements are always welcome! it cannot block. 6. Since synchronize_rcu() can block, it cannot be called from - any sort of irq context. Ditto for synchronize_sched() and - synchronize_srcu(). - -7. If the updater uses call_rcu(), then the corresponding readers - must use rcu_read_lock() and rcu_read_unlock(). If the updater - uses call_rcu_bh(), then the corresponding readers must use - rcu_read_lock_bh() and rcu_read_unlock_bh(). If the updater - uses call_rcu_sched(), then the corresponding readers must - disable preemption. Mixing things up will result in confusion - and broken kernels. + any sort of irq context. The same rule applies for + synchronize_rcu_bh(), synchronize_sched(), synchronize_srcu(), + synchronize_rcu_expedited(), synchronize_rcu_bh_expedited(), + synchronize_sched_expedite(), and synchronize_srcu_expedited(). + + The expedited forms of these primitives have the same semantics + as the non-expedited forms, but expediting is both expensive + and unfriendly to real-time workloads. Use of the expedited + primitives should be restricted to rare configuration-change + operations that would not normally be undertaken while a real-time + workload is running. + +7. If the updater uses call_rcu() or synchronize_rcu(), then the + corresponding readers must use rcu_read_lock() and + rcu_read_unlock(). If the updater uses call_rcu_bh() or + synchronize_rcu_bh(), then the corresponding readers must + use rcu_read_lock_bh() and rcu_read_unlock_bh(). If the + updater uses call_rcu_sched() or synchronize_sched(), then + the corresponding readers must disable preemption, possibly + by calling rcu_read_lock_sched() and rcu_read_unlock_sched(). + If the updater uses synchronize_srcu(), the the corresponding + readers must use srcu_read_lock() and srcu_read_unlock(), + and with the same srcu_struct. The rules for the expedited + primitives are the same as for their non-expedited counterparts. + Mixing things up will result in confusion and broken kernels. One exception to this rule: rcu_read_lock() and rcu_read_unlock() may be substituted for rcu_read_lock_bh() and rcu_read_unlock_bh() @@ -212,6 +240,8 @@ over a rather long period of time, but improvements are always welcome! e. Periodically invoke synchronize_rcu(), permitting a limited number of updates per grace period. + The same cautions apply to call_rcu_bh() and call_rcu_sched(). + 9. All RCU list-traversal primitives, which include rcu_dereference(), list_for_each_entry_rcu(), list_for_each_continue_rcu(), and list_for_each_safe_rcu(), @@ -229,7 +259,8 @@ over a rather long period of time, but improvements are always welcome! 10. Conversely, if you are in an RCU read-side critical section, and you don't hold the appropriate update-side lock, you -must- use the "_rcu()" variants of the list macros. Failing to do so - will break Alpha and confuse people reading your code. + will break Alpha, cause aggressive compilers to generate bad code, + and confuse people trying to read your code. 11. Note that synchronize_rcu() -only- guarantees to wait until all currently executing rcu_read_lock()-protected RCU read-side @@ -239,15 +270,21 @@ over a rather long period of time, but improvements are always welcome! rcu_read_lock()-protected read-side critical sections, do -not- use synchronize_rcu(). - If you want to wait for some of these other things, you might - instead need to use synchronize_irq() or synchronize_sched(). + Similarly, disabling preemption is not an acceptable substitute + for rcu_read_lock(). Code that attempts to use preemption + disabling where it should be using rcu_read_lock() will break + in real-time kernel builds. + + If you want to wait for interrupt handlers, NMI handlers, and + code under the influence of preempt_disable(), you instead + need to use synchronize_irq() or synchronize_sched(). 12. Any lock acquired by an RCU callback must be acquired elsewhere with softirq disabled, e.g., via spin_lock_irqsave(), spin_lock_bh(), etc. Failing to disable irq on a given - acquisition of that lock will result in deadlock as soon as the - RCU callback happens to interrupt that acquisition's critical - section. + acquisition of that lock will result in deadlock as soon as + the RCU softirq handler happens to run your RCU callback while + interrupting that acquisition's critical section. 13. RCU callbacks can be and are executed in parallel. In many cases, the callback code simply wrappers around kfree(), so that this @@ -265,29 +302,30 @@ over a rather long period of time, but improvements are always welcome! not the case, a self-spawning RCU callback would prevent the victim CPU from ever going offline.) -14. SRCU (srcu_read_lock(), srcu_read_unlock(), and synchronize_srcu()) - may only be invoked from process context. Unlike other forms of - RCU, it -is- permissible to block in an SRCU read-side critical - section (demarked by srcu_read_lock() and srcu_read_unlock()), - hence the "SRCU": "sleepable RCU". Please note that if you - don't need to sleep in read-side critical sections, you should - be using RCU rather than SRCU, because RCU is almost always - faster and easier to use than is SRCU. +14. SRCU (srcu_read_lock(), srcu_read_unlock(), synchronize_srcu(), + and synchronize_srcu_expedited()) may only be invoked from + process context. Unlike other forms of RCU, it -is- permissible + to block in an SRCU read-side critical section (demarked by + srcu_read_lock() and srcu_read_unlock()), hence the "SRCU": + "sleepable RCU". Please note that if you don't need to sleep + in read-side critical sections, you should be using RCU rather + than SRCU, because RCU is almost always faster and easier to + use than is SRCU. Also unlike other forms of RCU, explicit initialization and cleanup is required via init_srcu_struct() and cleanup_srcu_struct(). These are passed a "struct srcu_struct" that defines the scope of a given SRCU domain. Once initialized, the srcu_struct is passed to srcu_read_lock(), srcu_read_unlock() - and synchronize_srcu(). A given synchronize_srcu() waits only - for SRCU read-side critical sections governed by srcu_read_lock() - and srcu_read_unlock() calls that have been passd the same - srcu_struct. This property is what makes sleeping read-side - critical sections tolerable -- a given subsystem delays only - its own updates, not those of other subsystems using SRCU. - Therefore, SRCU is less prone to OOM the system than RCU would - be if RCU's read-side critical sections were permitted to - sleep. + synchronize_srcu(), and synchronize_srcu_expedited(). A given + synchronize_srcu() waits only for SRCU read-side critical + sections governed by srcu_read_lock() and srcu_read_unlock() + calls that have been passed the same srcu_struct. This property + is what makes sleeping read-side critical sections tolerable -- + a given subsystem delays only its own updates, not those of other + subsystems using SRCU. Therefore, SRCU is less prone to OOM the + system than RCU would be if RCU's read-side critical sections + were permitted to sleep. The ability to sleep in read-side critical sections does not come for free. First, corresponding srcu_read_lock() and @@ -311,12 +349,12 @@ over a rather long period of time, but improvements are always welcome! destructive operation, and -only- -then- invoke call_rcu(), synchronize_rcu(), or friends. - Because these primitives only wait for pre-existing readers, - it is the caller's responsibility to guarantee safety to - any subsequent readers. + Because these primitives only wait for pre-existing readers, it + is the caller's responsibility to guarantee that any subsequent + readers will execute safely. -16. The various RCU read-side primitives do -not- contain memory - barriers. The CPU (and in some cases, the compiler) is free - to reorder code into and out of RCU read-side critical sections. - It is the responsibility of the RCU update-side primitives to - deal with this. +16. The various RCU read-side primitives do -not- necessarily contain + memory barriers. You should therefore plan for the CPU + and the compiler to freely reorder code into and out of RCU + read-side critical sections. It is the responsibility of the + RCU update-side primitives to deal with this. -- cgit v1.2.3-70-g09d2 From c598a070bc581aea8a518b460dae8c0cf8e74344 Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Mon, 22 Feb 2010 17:04:57 -0800 Subject: rcu: Documentation update for CONFIG_PROVE_RCU Adds a lockdep.txt file and updates checklist.txt and whatisRCU.txt to reflect the new lockdep-enabled capabilities of RCU. Signed-off-by: Paul E. McKenney Cc: laijs@cn.fujitsu.com Cc: dipankar@in.ibm.com Cc: mathieu.desnoyers@polymtl.ca Cc: josh@joshtriplett.org Cc: dvhltc@us.ibm.com Cc: niv@us.ibm.com Cc: peterz@infradead.org Cc: rostedt@goodmis.org Cc: Valdis.Kletnieks@vt.edu Cc: dhowells@redhat.com LKML-Reference: <1266887105-1528-13-git-send-email-paulmck@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar --- Documentation/RCU/00-INDEX | 2 ++ Documentation/RCU/checklist.txt | 34 ++++++++++++--------- Documentation/RCU/lockdep.txt | 67 +++++++++++++++++++++++++++++++++++++++++ Documentation/RCU/whatisRCU.txt | 13 +++++--- 4 files changed, 97 insertions(+), 19 deletions(-) create mode 100644 Documentation/RCU/lockdep.txt (limited to 'Documentation/RCU/checklist.txt') diff --git a/Documentation/RCU/00-INDEX b/Documentation/RCU/00-INDEX index 0a27ea9621f..71b6f500ddb 100644 --- a/Documentation/RCU/00-INDEX +++ b/Documentation/RCU/00-INDEX @@ -6,6 +6,8 @@ checklist.txt - Review Checklist for RCU Patches listRCU.txt - Using RCU to Protect Read-Mostly Linked Lists +lockdep.txt + - RCU and lockdep checking NMI-RCU.txt - Using RCU to Protect Dynamic NMI Handlers rcubarrier.txt diff --git a/Documentation/RCU/checklist.txt b/Documentation/RCU/checklist.txt index 767cf06a427..cbc180f9019 100644 --- a/Documentation/RCU/checklist.txt +++ b/Documentation/RCU/checklist.txt @@ -127,10 +127,14 @@ over a rather long period of time, but improvements are always welcome! perfectly legal (if redundant) for update-side code to use rcu_dereference() and the "_rcu()" list-traversal primitives. This is particularly useful in code that - is common to readers and updaters. However, neither - rcu_dereference() nor the "_rcu()" list-traversal - primitives can substitute for a good concurrency design - coordinating among multiple updaters. + is common to readers and updaters. However, lockdep + will complain if you access rcu_dereference() outside + of an RCU read-side critical section. See lockdep.txt + to learn what to do about this. + + Of course, neither rcu_dereference() nor the "_rcu()" + list-traversal primitives can substitute for a good + concurrency design coordinating among multiple updaters. b. If the list macros are being used, the list_add_tail_rcu() and list_add_rcu() primitives must be used in order @@ -249,7 +253,9 @@ over a rather long period of time, but improvements are always welcome! must be protected by appropriate update-side locks. RCU read-side critical sections are delimited by rcu_read_lock() and rcu_read_unlock(), or by similar primitives such as - rcu_read_lock_bh() and rcu_read_unlock_bh(). + rcu_read_lock_bh() and rcu_read_unlock_bh(), in which case + the matching rcu_dereference() primitive must be used in order + to keep lockdep happy, in this case, rcu_dereference_bh(). The reason that it is permissible to use RCU list-traversal primitives when the update-side lock is held is that doing so @@ -302,15 +308,15 @@ over a rather long period of time, but improvements are always welcome! not the case, a self-spawning RCU callback would prevent the victim CPU from ever going offline.) -14. SRCU (srcu_read_lock(), srcu_read_unlock(), synchronize_srcu(), - and synchronize_srcu_expedited()) may only be invoked from - process context. Unlike other forms of RCU, it -is- permissible - to block in an SRCU read-side critical section (demarked by - srcu_read_lock() and srcu_read_unlock()), hence the "SRCU": - "sleepable RCU". Please note that if you don't need to sleep - in read-side critical sections, you should be using RCU rather - than SRCU, because RCU is almost always faster and easier to - use than is SRCU. +14. SRCU (srcu_read_lock(), srcu_read_unlock(), srcu_dereference(), + synchronize_srcu(), and synchronize_srcu_expedited()) may only + be invoked from process context. Unlike other forms of RCU, it + -is- permissible to block in an SRCU read-side critical section + (demarked by srcu_read_lock() and srcu_read_unlock()), hence the + "SRCU": "sleepable RCU". Please note that if you don't need + to sleep in read-side critical sections, you should be using + RCU rather than SRCU, because RCU is almost always faster and + easier to use than is SRCU. Also unlike other forms of RCU, explicit initialization and cleanup is required via init_srcu_struct() and diff --git a/Documentation/RCU/lockdep.txt b/Documentation/RCU/lockdep.txt new file mode 100644 index 00000000000..fe24b58627b --- /dev/null +++ b/Documentation/RCU/lockdep.txt @@ -0,0 +1,67 @@ +RCU and lockdep checking + +All flavors of RCU have lockdep checking available, so that lockdep is +aware of when each task enters and leaves any flavor of RCU read-side +critical section. Each flavor of RCU is tracked separately (but note +that this is not the case in 2.6.32 and earlier). This allows lockdep's +tracking to include RCU state, which can sometimes help when debugging +deadlocks and the like. + +In addition, RCU provides the following primitives that check lockdep's +state: + + rcu_read_lock_held() for normal RCU. + rcu_read_lock_bh_held() for RCU-bh. + rcu_read_lock_sched_held() for RCU-sched. + srcu_read_lock_held() for SRCU. + +These functions are conservative, and will therefore return 1 if they +aren't certain (for example, if CONFIG_DEBUG_LOCK_ALLOC is not set). +This prevents things like WARN_ON(!rcu_read_lock_held()) from giving false +positives when lockdep is disabled. + +In addition, a separate kernel config parameter CONFIG_PROVE_RCU enables +checking of rcu_dereference() primitives: + + rcu_dereference(p): + Check for RCU read-side critical section. + rcu_dereference_bh(p): + Check for RCU-bh read-side critical section. + rcu_dereference_sched(p): + Check for RCU-sched read-side critical section. + srcu_dereference(p, sp): + Check for SRCU read-side critical section. + rcu_dereference_check(p, c): + Use explicit check expression "c". + rcu_dereference_raw(p) + Don't check. (Use sparingly, if at all.) + +The rcu_dereference_check() check expression can be any boolean +expression, but would normally include one of the rcu_read_lock_held() +family of functions and a lockdep expression. However, any boolean +expression can be used. For a moderately ornate example, consider +the following: + + file = rcu_dereference_check(fdt->fd[fd], + rcu_read_lock_held() || + lockdep_is_held(&files->file_lock) || + atomic_read(&files->count) == 1); + +This expression picks up the pointer "fdt->fd[fd]" in an RCU-safe manner, +and, if CONFIG_PROVE_RCU is configured, verifies that this expression +is used in: + +1. An RCU read-side critical section, or +2. with files->file_lock held, or +3. on an unshared files_struct. + +In case (1), the pointer is picked up in an RCU-safe manner for vanilla +RCU read-side critical sections, in case (2) the ->file_lock prevents +any change from taking place, and finally, in case (3) the current task +is the only task accessing the file_struct, again preventing any change +from taking place. + +There are currently only "universal" versions of the rcu_assign_pointer() +and RCU list-/tree-traversal primitives, which do not (yet) check for +being in an RCU read-side critical section. In the future, separate +versions of these primitives might be created. diff --git a/Documentation/RCU/whatisRCU.txt b/Documentation/RCU/whatisRCU.txt index 469a58b2e67..1dc00ee9716 100644 --- a/Documentation/RCU/whatisRCU.txt +++ b/Documentation/RCU/whatisRCU.txt @@ -323,15 +323,17 @@ used as follows: Defer Protect a. synchronize_rcu() rcu_read_lock() / rcu_read_unlock() - call_rcu() + call_rcu() rcu_dereference() b. call_rcu_bh() rcu_read_lock_bh() / rcu_read_unlock_bh() + rcu_dereference_bh() c. synchronize_sched() rcu_read_lock_sched() / rcu_read_unlock_sched() preempt_disable() / preempt_enable() local_irq_save() / local_irq_restore() hardirq enter / hardirq exit NMI enter / NMI exit + rcu_dereference_sched() These three mechanisms are used as follows: @@ -781,9 +783,8 @@ Linux-kernel source code, but it helps to have a full list of the APIs, since there does not appear to be a way to categorize them in docbook. Here is the list, by category. -RCU pointer/list traversal: +RCU list traversal: - rcu_dereference list_for_each_entry_rcu hlist_for_each_entry_rcu hlist_nulls_for_each_entry_rcu @@ -809,7 +810,7 @@ RCU: Critical sections Grace period Barrier rcu_read_lock synchronize_net rcu_barrier rcu_read_unlock synchronize_rcu - synchronize_rcu_expedited + rcu_dereference synchronize_rcu_expedited call_rcu @@ -817,7 +818,7 @@ bh: Critical sections Grace period Barrier rcu_read_lock_bh call_rcu_bh rcu_barrier_bh rcu_read_unlock_bh synchronize_rcu_bh - synchronize_rcu_bh_expedited + rcu_dereference_bh synchronize_rcu_bh_expedited sched: Critical sections Grace period Barrier @@ -826,12 +827,14 @@ sched: Critical sections Grace period Barrier rcu_read_unlock_sched call_rcu_sched [preempt_disable] synchronize_sched_expedited [and friends] + rcu_dereference_sched SRCU: Critical sections Grace period Barrier srcu_read_lock synchronize_srcu N/A srcu_read_unlock synchronize_srcu_expedited + srcu_dereference SRCU: Initialization/cleanup init_srcu_struct -- cgit v1.2.3-70-g09d2