summaryrefslogtreecommitdiffstats
path: root/kernel
AgeCommit message (Collapse)Author
2010-09-09sched: Move sched_avg_update() to update_cpu_load()Suresh Siddha
Currently sched_avg_update() (which updates rt_avg stats in the rq) is getting called from scale_rt_power() (in the load balance context) which doesn't take rq->lock. Fix it by moving the sched_avg_update() to more appropriate update_cpu_load() where the CFS load gets updated as well. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1282596171.2694.3.camel@sbsiddha-MOBL3> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-09perf: Fix CPU hotplugPeter Zijlstra
Since we have UP_PREPARE, we should also have UP_CANCELED. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: paulus <paulus@samba.org> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-09perf, trace: Fix module leakLi Zefan
Commit 1c024eca (perf, trace: Optimize tracepoints by using per-tracepoint-per-cpu hlist to track events) caused a module refcount leak. Reported-And-Tested-by: Avi Kivity <avi@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <4C7E1F12.8030304@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-08Merge branch 'core-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: gcc-4.6: kernel/*: Fix unused but set warnings mutex: Fix annotations to include it in kernel-locking docbook pid: make setpgid() system call use RCU read-side critical section MAINTAINERS: Add RCU's public git tree
2010-09-08Merge branch 'perf-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: perf, x86: Try to handle unknown nmis with an enabled PMU perf, x86: Fix handle_irq return values perf, x86: Fix accidentally ack'ing a second event on intel perf counter oprofile, x86: fix init_sysfs() function stub lockup_detector: Sync touch_*_watchdog back to old semantics tracing: Fix a race in function profile oprofile, x86: fix init_sysfs error handling perf_events: Fix time tracking for events with pid != -1 and cpu != -1 perf: Initialize callchains roots's childen hits oprofile: fix crash when accessing freed task structs
2010-09-08tracing/kprobe: Fix handling of C-unlike argument namesMasami Hiramatsu
Check the argument name whether it is invalid (not C-like symbol name). This makes event format simple. Reported-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> LKML-Reference: <20100827113912.22882.62313.stgit@ltc236.sdl.hitachi.co.jp> Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-09-08tracing/kprobes: Fix handling of argument namesMasami Hiramatsu
Set "argN" name for each argument automatically if it has no specified name. Since dynamic trace event(kprobe_events) accepts special characters for its argument, its format can show those special characters (e.g. '$', '%', '+'). However, perf can't parse those format because of the character (especially '%') mess up the format. This sets "argX" name for those arguments if user omitted the argument names. E.g. # echo 'p do_fork %ax IP=%ip $stack' > tracing/kprobe_events # cat tracing/kprobe_events p:kprobes/p_do_fork_0 do_fork arg1=%ax IP=%ip arg3=$stack Reported-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> LKML-Reference: <20100827113906.22882.59312.stgit@ltc236.sdl.hitachi.co.jp> Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-09-08tracing/kprobe: Fix a memory leak in error caseMasami Hiramatsu
Fix a memory leak which happens when a field name conflicts with others. In error case, free_trace_probe() will free all arguments until nr_args, so this increments nr_args the begining of the loop instead of the end. Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> LKML-Reference: <20100827113846.22882.12670.stgit@ltc236.sdl.hitachi.co.jp> Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-09-08tracing: Do not allow llseek to set_ftrace_filterSteven Rostedt
Reading the file set_ftrace_filter does three things. 1) shows whether or not filters are set for the function tracer 2) shows what functions are set for the function tracer 3) shows what triggers are set on any functions 3 is independent from 1 and 2. The way this file currently works is that it is a state machine, and as you read it, it may change state. But this assumption breaks when you use lseek() on the file. The state machine gets out of sync and the t_show() may use the wrong pointer and cause a kernel oops. Luckily, this will only kill the app that does the lseek, but the app dies while holding a mutex. This prevents anyone else from using the set_ftrace_filter file (or any other function tracing file for that matter). A real fix for this is to rewrite the code, but that is too much for a -rc release or stable. This patch simply disables llseek on the set_ftrace_filter() file for now, and we can do the proper fix for the next major release. Reported-by: Robert Swiecki <swiecki@google.com> Cc: Chris Wright <chrisw@sous-sol.org> Cc: Tavis Ormandy <taviso@google.com> Cc: Eugene Teo <eugene@redhat.com> Cc: vendor-sec@lst.de Cc: <stable@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-09-08sched: Remove unnecessary #ifdef CONFIG_SMPChristian Dietrich
The CONFIG_SMP ifdef isn't necessary at this point, because it is checked in an outer ifdef level already and has no effect here. Cleanup only, no functional effect. Signed-off-by: Christian Dietrich <qy03fugy@stud.informatik.uni-erlangen.de> Cc: vamos-dev@i4.informatik.uni-erlangen.de Cc: Peter Zijlstra <peterz@infradead.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Tejun Heo <tj@kernel.org> LKML-Reference: <7a3a39ef3f765a4473cb026b1f204059568a7098.1283782701.git.qy03fugy@stud.informatik.uni-erlangen.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-07Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wqLinus Torvalds
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: use zalloc_cpumask_var() for gcwq->mayday_mask workqueue: fix GCWQ_DISASSOCIATED initialization workqueue: Add a workqueue chapter to the tracepoint docbook workqueue: fix cwq->nr_active underflow workqueue: improve destroy_workqueue() debuggability workqueue: mark lock acquisition on worker_maybe_bind_and_lock() workqueue: annotate lock context change workqueue: free rescuer on destroy_workqueue
2010-09-05gcc-4.6: kernel/*: Fix unused but set warningsAndi Kleen
No real bugs I believe, just some dead code. Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: andi@firstfloor.org Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-03mutex: Fix annotations to include it in kernel-locking docbookRandy Dunlap
Fix kernel-doc notation in linux/mutex.h and kernel/mutex.c, then add these 2 files to the kernel-locking docbook as the Mutex API reference chapter. Add one API function to mutex-design.txt and correct a typo in that file. Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: Rusty Russell <rusty@rustcorp.com.au> LKML-Reference: <20100902154816.6cc2f9ad.randy.dunlap@oracle.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-02rcu: fix _oddness handling of verbose stall warningsPaul E. McKenney
CONFIG_RCU_CPU_STALL_VERBOSE depends on CONFIG_TREE_PREEMPT_RCU, but rcu_bootup_announce_oddness() complains if CONFIG_RCU_CPU_STALL_VERBOSE is not set even in the case of CONFIG_TREE_RCU. This commit therefore fixes rcu_bootup_announce_oddness() to avoid insisting on impossibilities. Reported-by: Guy Martin <gmsoft@tuxicoman.be> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2010-09-02ARM: 6314/1: ftrace: allow build without frame pointers on ARMRabin Vincent
With a new enough GCC, ARM function tracing can be supported without the need for frame pointers. This is essential for Thumb-2 support, since frame pointers aren't available then. Acked-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Rabin Vincent <rabin@rab.in> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2010-09-01ring-buffer: Place duplicate expression into a single functionSteven Rostedt
While discussing the strictness of the 80 character limit on the Kernel Summit Discussion mailing list, I showed examples that I broke that limit slightly with some algorithms. In discussing with John Linville, what looked better, I realized that two of the 80 char breaking culprits were an identical expression. As a clean up, this patch moves the identical expression into its own helper function and that is used instead. As a side effect, the offending code is now under the 80 character limit. :-) This clean up code also changes the expression from (A - B) - C to A - (B + C) This makes the code look a little nicer too. Cc: John W. Linville <linville@tuxdriver.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-09-01lockup_detector: Sync touch_*_watchdog back to old semanticsDon Zickus
During my rewrite, the semantics of touch_nmi_watchdog and touch_softlockup_watchdog changed enough to break some drivers (mostly over preemptable regions). These are cases where long delays on one CPU (due to print_delay for example) can cause long delays on other CPUs - so we must 'touch' the nmi_watchdog flag of those other CPUs as well. This change brings those touch_*_watchdog() functions back in line with to how they used to work. Signed-off-by: Don Zickus <dzickus@redhat.com> Acked-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: peterz@infradead.org Cc: fweisbec@gmail.com LKML-Reference: <1283310009-22168-2-git-send-email-dzickus@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-01lockup_detector: Remove unused panic_notifierAkinobu Mita
The panic notifer in lockup_detector just set did_panic to 1. But did_panic is not used anywhere so we can just remove it. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: peterz@infradead.org Cc: gorcunov@gmail.com Cc: fweisbec@gmail.com LKML-Reference: <1283310009-22168-4-git-send-email-dzickus@redhat.com> Signed-off-by: Don Zickus <dzickus@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-01lockup_detector: Convert cpu notifier to return encapsulate errno valueAkinobu Mita
By the commit e6bde73b07edeb703d4c89c1daabc09c303de11f ("cpu-hotplug: return better errno on cpu hotplug failure"), the cpu notifier can return encapsulate errno value, resulting in more meaningful error codes for CPU hotplug failures. This converts the cpu notifier to return encapsulate errno value for the lockup_detector as well. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: peterz@infradead.org Cc: gorcunov@gmail.com Cc: fweisbec@gmail.com LKML-Reference: <1283310009-22168-3-git-send-email-dzickus@redhat.com> Signed-off-by: Don Zickus <dzickus@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-08-31pid: make setpgid() system call use RCU read-side critical sectionPaul E. McKenney
[ 23.584719] [ 23.584720] =================================================== [ 23.585059] [ INFO: suspicious rcu_dereference_check() usage. ] [ 23.585176] --------------------------------------------------- [ 23.585176] kernel/pid.c:419 invoked rcu_dereference_check() without protection! [ 23.585176] [ 23.585176] other info that might help us debug this: [ 23.585176] [ 23.585176] [ 23.585176] rcu_scheduler_active = 1, debug_locks = 1 [ 23.585176] 1 lock held by rc.sysinit/728: [ 23.585176] #0: (tasklist_lock){.+.+..}, at: [<ffffffff8104771f>] sys_setpgid+0x5f/0x193 [ 23.585176] [ 23.585176] stack backtrace: [ 23.585176] Pid: 728, comm: rc.sysinit Not tainted 2.6.36-rc2 #2 [ 23.585176] Call Trace: [ 23.585176] [<ffffffff8105b436>] lockdep_rcu_dereference+0x99/0xa2 [ 23.585176] [<ffffffff8104c324>] find_task_by_pid_ns+0x50/0x6a [ 23.585176] [<ffffffff8104c35b>] find_task_by_vpid+0x1d/0x1f [ 23.585176] [<ffffffff81047727>] sys_setpgid+0x67/0x193 [ 23.585176] [<ffffffff810029eb>] system_call_fastpath+0x16/0x1b [ 24.959669] type=1400 audit(1282938522.956:4): avc: denied { module_request } for pid=766 comm="hwclock" kmod="char-major-10-135" scontext=system_u:system_r:hwclock_t:s0 tcontext=system_u:system_r:kernel_t:s0 tclas It turns out that the setpgid() system call fails to enter an RCU read-side critical section before doing a PID-to-task_struct translation. This commit therefore does rcu_read_lock() before the translation, and also does rcu_read_unlock() after the last use of the returned pointer. Reported-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: David Howells <dhowells@redhat.com>
2010-08-31tracing: Fix a race in function profileLi Zefan
While we are reading trace_stat/functionX and someone just disabled function_profile at that time, we can trigger this: divide error: 0000 [#1] PREEMPT SMP ... EIP is at function_stat_show+0x90/0x230 ... This fix just takes the ftrace_profile_lock and checks if rec->counter is 0. If it's 0, we know the profile buffer has been reset. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Cc: stable@kernel.org LKML-Reference: <4C723644.4040708@cn.fujitsu.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-08-31workqueue: use zalloc_cpumask_var() for gcwq->mayday_maskTejun Heo
alloc_mayday_mask() was using alloc_cpumask_var() making gcwq->mayday_mask contain garbage after initialization on CONFIG_CPUMASK_OFFSTACK=y configurations. This combined with the previously fixed GCWQ_DISASSOCIATED initialization bug could make rescuers fall into infinite loop trying to bind to an offline cpu. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: CAI Qian <caiqian@redhat.com>
2010-08-31workqueue: fix GCWQ_DISASSOCIATED initializationTejun Heo
init_workqueues() incorrectly marks workqueues for all possible CPUs associated. Combined with mayday_mask initialization bug, this can make rescuers keep trying to bind to an offline gcwq indefinitely. Fix init_workqueues() such that only online CPUs have their gcwqs have GCWQ_DISASSOCIATED cleared. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: CAI Qian <caiqian@redhat.com>
2010-08-31Merge commit 'v2.6.36-rc3' into x86/memblockIngo Molnar
Conflicts: arch/x86/kernel/trampoline.c mm/memblock.c Merge reason: Resolve the conflicts, update to latest upstream. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-08-30perf_events: Fix time tracking for events with pid != -1 and cpu != -1Stephane Eranian
Per-thread events with a cpu filter, i.e., cpu != -1, were not reporting correct timings when the thread never ran on the monitored cpu. The time enabled was reported as a negative value. This patch fixes the problem by updating tstamp_stopped, tstamp_running in event_sched_out() for events with filters and which are marked as INACTIVE. The function group_sched_out() is modified to systematically call into event_sched_out() to avoid duplicating the timing adjustment code twice. With the patch, I now get: $ task_cpu -i -e unhalted_core_cycles,unhalted_core_cycles noploop 2 noploop for 2 seconds CPU0 0 unhalted_core_cycles (ena=1,991,136,594, run=0) CPU0 0 unhalted_core_cycles (ena=1,991,136,594, run=0) CPU1 0 unhalted_core_cycles (ena=1,991,136,594, run=0) CPU1 0 unhalted_core_cycles (ena=1,991,136,594, run=0) CPU2 0 unhalted_core_cycles (ena=1,991,136,594, run=0) CPU2 0 unhalted_core_cycles (ena=1,991,136,594, run=0) CPU3 4,747,990,931 unhalted_core_cycles (ena=1,991,136,594, run=1,991,136,594) CPU3 4,747,990,931 unhalted_core_cycles (ena=1,991,136,594, run=1,991,136,594) Signed-off-by: Stephane Eranian <eranian@gmail.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: paulus@samba.org Cc: davem@davemloft.net Cc: fweisbec@gmail.com Cc: perfmon2-devel@lists.sf.net Cc: eranian@google.com LKML-Reference: <4c76802d.aae9d80a.115d.70fe@mx.google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-08-28Merge branch 'pm-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6 * 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6: PM QoS: Fix inline documentation. PM QoS: Fix kzalloc() parameters swapped in pm_qos_power_open()
2010-08-28Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: Input: pxa27x_keypad - remove input_free_device() in pxa27x_keypad_remove() Input: mousedev - fix regression of inverting axes Input: uinput - add devname alias to allow module on-demand load Input: hil_kbd - fix compile error USB: drop tty argument from usb_serial_handle_sysrq_char() Input: sysrq - drop tty argument form handle_sysrq() Input: sysrq - drop tty argument from sysrq ops handlers
2010-08-27x86: Remove not used early_res codeYinghai Lu
and some functions in e820.c that are not used anymore Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-08-27rcu: performance fixes to TINY_PREEMPT_RCU callback checkingPaul E. McKenney
This commit tightens up checks in rcu_preempt_check_callbacks() to avoid unnecessary special handling at rcu_read_unlock() time. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2010-08-27Merge branch 'perf/urgent' into perf/coreFrederic Weisbecker
Conflicts: tools/perf/util/callchain.h Merge reason: Fix a non-trivial conflict with latest fixes
2010-08-26PM QoS: Fix inline documentation.Saravana Kannan
Fix the pm_qos_add_request() kerneldoc comment that doesn't reflect the behavior of the function after the last PM QoS update. Signed-off-by: Saravana Kannan <skannan@codeaurora.org> Acked-by: mark gross <markgross@thegnar.org> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2010-08-25Merge branch 'perf-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: perf, x86, Pentium4: Clear the P4_CCCR_FORCE_OVF flag tracing/trace_stack: Fix stack trace on ppc64
2010-08-25Merge branch 'sched-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, tsc, sched: Recompute cyc2ns_offset's during resume from sleep states sched: Fix rq->clock synchronization when migrating tasks
2010-08-25Merge branch 'linus' into perf/coreIngo Molnar
Merge reason: pick up perf fixes Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-08-25tracing/trace_stack: Fix stack trace on ppc64Anton Blanchard
save_stack_trace() stores the instruction pointer, not the function descriptor. On ppc64 the trace stack code currently dereferences the instruction pointer and shows 8 bytes of instructions in our backtraces: # cat /sys/kernel/debug/tracing/stack_trace Depth Size Location (26 entries) ----- ---- -------- 0) 5424 112 0x6000000048000004 1) 5312 160 0x60000000ebad01b0 2) 5152 160 0x2c23000041c20030 3) 4992 240 0x600000007c781b79 4) 4752 160 0xe84100284800000c 5) 4592 192 0x600000002fa30000 6) 4400 256 0x7f1800347b7407e0 7) 4144 208 0xe89f0108f87f0070 8) 3936 272 0xe84100282fa30000 Since we aren't dealing with function descriptors, use %pS instead of %pF to fix it: # cat /sys/kernel/debug/tracing/stack_trace Depth Size Location (26 entries) ----- ---- -------- 0) 5424 112 ftrace_call+0x4/0x8 1) 5312 160 .current_io_context+0x28/0x74 2) 5152 160 .get_io_context+0x48/0xa0 3) 4992 240 .cfq_set_request+0x94/0x4c4 4) 4752 160 .elv_set_request+0x60/0x84 5) 4592 192 .get_request+0x2d4/0x468 6) 4400 256 .get_request_wait+0x7c/0x258 7) 4144 208 .__make_request+0x49c/0x610 8) 3936 272 .generic_make_request+0x390/0x434 Signed-off-by: Anton Blanchard <anton@samba.org> Cc: rostedt@goodmis.org Cc: fweisbec@gmail.com LKML-Reference: <20100825013238.GE28360@kryten> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-08-25workqueue: fix cwq->nr_active underflowTejun Heo
cwq->nr_active is used to keep track of how many work items are active for the cpu workqueue, where 'active' is defined as either pending on global worklist or executing. This is used to implement the max_active limit and workqueue freezing. If a work item is queued after nr_active has already reached max_active, the work item doesn't increment nr_active and is put on the delayed queue and gets activated later as previous active work items retire. try_to_grab_pending() which is used in the cancellation path unconditionally decremented nr_active whether the work item being cancelled is currently active or delayed, so cancelling a delayed work item makes nr_active underflow. This breaks max_active enforcement and triggers BUG_ON() in destroy_workqueue() later on. This patch fixes this bug by adding a flag WORK_STRUCT_DELAYED, which is set while a work item in on the delayed list and making try_to_grab_pending() decrement nr_active iff the work item is currently active. The addition of the flag enlarges cwq alignment to 256 bytes which is getting a bit too large. It's scheduled to be reduced back to 128 bytes by merging WORK_STRUCT_PENDING and WORK_STRUCT_CWQ in the next devel cycle. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Johannes Berg <johannes@sipsolutions.net>
2010-08-24Merge branch 'perf-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: watchdog: Don't throttle the watchdog tracing: Fix timer tracing
2010-08-24Merge branch 'core-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: mutex: Improve the scalability of optimistic spinning
2010-08-24PM QoS: Fix kzalloc() parameters swapped in pm_qos_power_open()David Alan Gilbert
sparse spotted that the kzalloc() in pm_qos_power_open() in the current Linus' git tree had its parameters swapped. Fix this. Signed-off-by: David Alan Gilbert <linux@treblig.org> Acked-by: mark gross <markgross@thegnar.org> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2010-08-24workqueue: improve destroy_workqueue() debuggabilityTejun Heo
Now that the worklist is global, having works pending after wq destruction can easily lead to oops and destroy_workqueue() have several BUG_ON()s to catch these cases. Unfortunately, BUG_ON() doesn't tell much about how the work became pending after the final flush_workqueue(). This patch adds WQ_DYING which is set before the final flush begins. If a work is requested to be queued on a dying workqueue, WARN_ON_ONCE() is triggered and the request is ignored. This clearly indicates which caller is trying to queue a work on a dying workqueue and keeps the system working in most cases. Locking rule comment is updated such that the 'I' rule includes modifying the field from destruction path. Signed-off-by: Tejun Heo <tj@kernel.org>
2010-08-23workqueue: mark lock acquisition on worker_maybe_bind_and_lock()Namhyung Kim
worker_maybe_bind_and_lock() actually grabs gcwq->lock but was missing proper annotation. Add it. So this patch will remove following sparse warnings: kernel/workqueue.c:1214:13: warning: context imbalance in 'worker_maybe_bind_and_lock' - wrong count at exit arch/x86/include/asm/irqflags.h:44:9: warning: context imbalance in 'worker_rebind_fn' - unexpected unlock kernel/workqueue.c:1991:17: warning: context imbalance in 'rescuer_thread' - unexpected unlock Signed-off-by: Namhyung Kim <namhyung@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2010-08-23workqueue: annotate lock context changeNamhyung Kim
Some of internal functions called within gcwq->lock context releases and regrabs the lock but were missing proper annotations. Add it. Signed-off-by: Namhyung Kim <namhyung@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2010-08-23Merge branch 'rcu/next' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-2.6-rcu into core/rcu
2010-08-23mutex: Improve the scalability of optimistic spinningTim Chen
There is a scalability issue for current implementation of optimistic mutex spin in the kernel. It is found on a 8 node 64 core Nehalem-EX system (HT mode). The intention of the optimistic mutex spin is to busy wait and spin on a mutex if the owner of the mutex is running, in the hope that the mutex will be released soon and be acquired, without the thread trying to acquire mutex going to sleep. However, when we have a large number of threads, contending for the mutex, we could have the mutex grabbed by other thread, and then another ……, and we will keep spinning, wasting cpu cycles and adding to the contention. One possible fix is to quit spinning and put the current thread on wait-list if mutex lock switch to a new owner while we spin, indicating heavy contention (see the patch included). I did some testing on a 8 socket Nehalem-EX system with a total of 64 cores. Using Ingo's test-mutex program that creates/delete files with 256 threads (http://lkml.org/lkml/2006/1/8/50) , I see the following speed up after putting in the mutex spin fix: ./mutex-test V 256 10 Ops/sec 2.6.34 62864 With fix 197200 Repeating the test with Aim7 fserver workload, again there is a speed up with the fix: Jobs/min 2.6.34 91657 With fix 149325 To look at the impact on the distribution of mutex acquisition time, I collected the mutex acquisition time on Aim7 fserver workload with some instrumentation. The average acquisition time is reduced by 48% and number of contentions reduced by 32%. #contentions Time to acquire mutex (cycles) 2.6.34 72973 44765791 With fix 49210 23067129 The histogram of mutex acquisition time is listed below. The acquisition time is in 2^bin cycles. We see that without the fix, the acquisition time is mostly around 2^26 cycles. With the fix, we the distribution get spread out a lot more towards the lower cycles, starting from 2^13. However, there is an increase of the tail distribution with the fix at 2^28 and 2^29 cycles. It seems a small price to pay for the reduced average acquisition time and also getting the cpu to do useful work. Mutex acquisition time distribution (acq time = 2^bin cycles): 2.6.34 With Fix bin #occurrence % #occurrence % 11 2 0.00% 120 0.24% 12 10 0.01% 790 1.61% 13 14 0.02% 2058 4.18% 14 86 0.12% 3378 6.86% 15 393 0.54% 4831 9.82% 16 710 0.97% 4893 9.94% 17 815 1.12% 4667 9.48% 18 790 1.08% 5147 10.46% 19 580 0.80% 6250 12.70% 20 429 0.59% 6870 13.96% 21 311 0.43% 1809 3.68% 22 255 0.35% 2305 4.68% 23 317 0.44% 916 1.86% 24 610 0.84% 233 0.47% 25 3128 4.29% 95 0.19% 26 63902 87.69% 122 0.25% 27 619 0.85% 286 0.58% 28 0 0.00% 3536 7.19% 29 0 0.00% 903 1.83% 30 0 0.00% 0 0.00% I've done similar experiments with 2.6.35 kernel on smaller boxes as well. One is on a dual-socket Westmere box (12 cores total, with HT). Another experiment is on an old dual-socket Core 2 box (4 cores total, no HT) On the 12-core Westmere box, I see a 250% increase for Ingo's mutex-test program with my mutex patch but no significant difference in aim7's fserver workload. On the 4-core Core 2 box, I see the difference with the patch for both mutex-test and aim7 fserver are negligible. So far, it seems like the patch has not caused regression on smaller systems. Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: <stable@kernel.org> # .35.x LKML-Reference: <1282168827.9542.72.camel@schen9-DESK> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-08-23watchdog: Don't throttle the watchdogPeter Zijlstra
Stephane reported that when the machine locks up, the regular ticks, which are responsible to resetting the throttle count, stop too. Hence the NMI watchdog can end up being throttled before it reports on the locked up state, and we end up being sad.. Cure this by having the watchdog overflow reset its own throttle count. Reported-by: Stephane Eranian <eranian@google.com> Tested-by: Stephane Eranian <eranian@google.com> Cc: Don Zickus <dzickus@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1282215916.1926.4696.camel@laptop> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-08-21workqueue: Add basic tracepoints to track workqueue executionArjan van de Ven
With the introduction of the new unified work queue thread pools, we lost one feature: It's no longer possible to know which worker is causing the CPU to wake out of idle. The result is that PowerTOP now reports a lot of "kworker/a:b" instead of more readable results. This patch adds a pair of tracepoints to the new workqueue code, similar in style to the timer/hrtimer tracepoints. With this pair of tracepoints, the next PowerTOP can correctly report which work item caused the wakeup (and how long it took): Interrupt (43) i915 time 3.51ms wakeups 141 Work ieee80211_iface_work time 0.81ms wakeups 29 Work do_dbs_timer time 0.55ms wakeups 24 Process Xorg time 21.36ms wakeups 4 Timer sched_rt_period_timer time 0.01ms wakeups 1 Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-21mm: make the vma list be doubly linkedLinus Torvalds
It's a really simple list, and several of the users want to go backwards in it to find the previous vma. So rather than have to look up the previous entry with 'find_vma_prev()' or something similar, just make it doubly linked instead. Tested-by: Ian Campbell <ijc@hellion.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-21Input: sysrq - drop tty argument form handle_sysrq()Dmitry Torokhov
Sysrq operations do not accept tty argument anymore so no need to pass it to us. [Stephen Rothwell <sfr@canb.auug.org.au>: fix build breakage in drm code caused by sysrq using bool but not including linux/types.h] [Sachin Sant <sachinp@in.ibm.com>: fix build breakage in s390 keyboadr driver] Acked-by: Alan Cox <alan@lxorguk.ukuu.org.uk> Acked-by: Jason Wessel <jason.wessel@windriver.com> Acked-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
2010-08-20kfifo: implement missing __kfifo_skip_r()Andrea Righi
kfifo_skip() is currently broken, due to the missing of the internal helper function. Add it. Signed-off-by: Andrea Righi <arighi@develer.com> Cc: Greg KH <greg@kroah.com> Acked-by: Stefani Seibold <stefani@seibold.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-20rcu: apply TINY_PREEMPT_RCU read-side speedup to TREE_PREEMPT_RCUPaul E. McKenney
Replace one of the ACCESS_ONCE() calls in each of __rcu_read_lock() and __rcu_read_unlock() with barrier() as suggested by Steve Rostedt in order to avoid the potential compiler-optimization-induced bug noted by Mathieu Desnoyers. Located-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Suggested-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>