summaryrefslogtreecommitdiffstats
path: root/kernel
AgeCommit message (Collapse)Author
2011-01-06Merge branch 'timers-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: MAINTAINERS: Update timer related entries timers: Use this_cpu_read timerqueue: Make timerqueue_getnext() static inline hrtimer: fix timerqueue conversion flub hrtimers: Convert hrtimers to use timerlist infrastructure timers: Fixup allmodconfig build issue timers: Rename timerlist infrastructure to timerqueue timers: Introduce timerlist infrastructure. hrtimer: Remove stale comment on curr_timer timer: Warn when del_timer_sync() is called in hardirq context timer: Del_timer_sync() can be used in softirq context timer: Make try_to_del_timer_sync() the same on SMP and UP posix-timers: Annotate lock_timer() timer: Permit statically-declared work with deferrable timers time: Use ARRAY_SIZE macro in timecompare.c timer: Initialize the field slack of timer_list timer_list: Remove alignment padding on 64 bit when CONFIG_TIMER_STATS time: Compensate for rounding on odd-frequency clocksources Fix up trivial conflict in MAINTAINERS
2011-01-06Merge branch 'sched-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (30 commits) sched: Change wait_for_completion_*_timeout() to return a signed long sched, autogroup: Fix reference leak sched, autogroup: Fix potential access to freed memory sched: Remove redundant CONFIG_CGROUP_SCHED ifdef sched: Fix interactivity bug by charging unaccounted run-time on entity re-weight sched: Move periodic share updates to entity_tick() printk: Use this_cpu_{read|write} api on printk_pending sched: Make pushable_tasks CONFIG_SMP dependant sched: Add 'autogroup' scheduling feature: automated per session task groups sched: Fix unregister_fair_sched_group() sched: Remove unused argument dest_cpu to migrate_task() mutexes, sched: Introduce arch_mutex_cpu_relax() sched: Add some clock info to sched_debug cpu: Remove incorrect BUG_ON cpu: Remove unused variable sched: Fix UP build breakage sched: Make task dump print all 15 chars of proc comm sched: Update tg->shares after cpu.shares write sched: Allow update_cfs_load() to update global load sched: Implement demand based update_cfs_load() ...
2011-01-06Merge branch 'perf-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (146 commits) tools, perf: Documentation for the power events API perf: Add calls to suspend trace point perf script: Make some lists static perf script: Use the default lost event handler perf session: Warn about errors when processing pipe events too perf tools: Fix perf_event.h header usage perf test: Clarify some error reports in the open syscall test x86, NMI: Add touch_nmi_watchdog to io_check_error delay x86: Avoid calling arch_trigger_all_cpu_backtrace() at the same time x86: Only call smp_processor_id in non-preempt cases perf timechart: Adjust perf timechart to the new power events perf: Clean up power events by introducing new, more generic ones perf: Do not export power_frequency, but power_start event perf test: Add test for counting open syscalls perf evsel: Auto allocate resources needed for some methods perf evsel: Use {cpu,thread}_map to shorten list of parameters perf tools: Refactor all_tids to hold nr and the map perf tools: Refactor cpumap to hold nr and the map perf evsel: Introduce per cpu and per thread open helpers perf evsel: Steal the counter reading routines from stat ...
2011-01-06Merge branch 'core-futexes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-futexes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: futex: Add futex_q static initializer futex: Replace fshared and clockrt with combined flags futex: Cleanup stale fshared flag interfaces
2011-01-06Merge branch 'core-rcu-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: rcu: remove unused __list_for_each_rcu() macro rculist: fix borked __list_for_each_rcu() macro rcu: reduce __call_rcu()-induced contention on rcu_node structures rcu: limit rcu_node leaf-level fanout rcu: fine-tune grace-period begin/end checks rcu: Keep gpnum and completed fields synchronized rcu: Stop chasing QS if another CPU did it for us rcu: increase synchronize_sched_expedited() batching rcu: Make synchronize_srcu_expedited() fast if running readers rcu: fix race condition in synchronize_sched_expedited() rcu: update documentation/comments for Lai's adoption patch rcu,cleanup: simplify the code when cpu is dying rcu,cleanup: move synchronize_sched_expedited() out of sched.c rcu: get rid of obsolete "classic" names in TREE_RCU tracing rcu: Distinguish between boosting and boosted rcu: document TINY_RCU and TINY_PREEMPT_RCU tracing. rcu: add tracing for TINY_RCU and TINY_PREEMPT_RCU rcu: priority boosting for TINY_PREEMPT_RCU rcu: move TINY_RCU from softirq to kthread rcu: add priority-inversion testing to rcutorture
2011-01-05perf: Add calls to suspend trace pointJean Pihet
Uses the machine_suspend trace point, called from the generic kernel suspend_devices_and_enter function. Signed-off-by: Jean Pihet <j-pihet@ti.com> Acked-by: Rafael J. Wysocki <rjw@sisk.pl> Cc: Arjan van de Ven <arjan@linux.intel.com> CC: Thomas Renninger <trenn@suse.de> Cc: Len Brown <len.brown@intel.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: linux-pm@lists.linux-foundation.org LKML-Reference: <1294253342-29056-2-git-send-email-j-pihet@ti.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-05Merge commit 'v2.6.37' into perf/coreIngo Molnar
Merge reason: Add the final .37 tree. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-05sched: Change wait_for_completion_*_timeout() to return a signed longNeilBrown
wait_for_completion_*_timeout() can return: 0: if the wait timed out -ve: if the wait was interrupted +ve: if the completion was completed. As they currently return an 'unsigned long', the last two cases are not easily distinguished which can easily result in buggy code, as is the case for the recently added wait_for_completion_interruptible_timeout() call in net/sunrpc/cache.c So change them both to return 'long'. As MAX_SCHEDULE_TIMEOUT is LONG_MAX, a large +ve return value should never overflow. Signed-off-by: NeilBrown <neilb@suse.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: J. Bruce Fields <bfields@fieldses.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> LKML-Reference: <20110105125016.64ccab0e@notabene.brown> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-05Merge commit 'v2.6.37' into sched/coreIngo Molnar
Merge reason: Merge the final .37 tree. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-04sched, autogroup: Fix reference leakMike Galbraith
The cgroup exit mess also uncovered a struct autogroup reference leak. copy_process() was simply freeing vs putting the signal_struct, stranding a reference. Signed-off-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Oleg Nesterov <oleg@redhat.com> LKML-Reference: <1293784350.6839.2.camel@marge.simson.net> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-04sched, autogroup: Fix potential access to freed memoryMike Galbraith
Oleg pointed out that the /proc interface kref_get() useage may race with the final put during autogroup_move_group(). A signal->autogroup assignment may be in flight when the /proc interface dereference, leaving them taking a reference to an already dead group. Reported-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1292508592.5940.28.camel@maggy.simson.net> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-04sched: Remove redundant CONFIG_CGROUP_SCHED ifdefYong Zhang
CONFIG_[FAIR|RT]_GROUP_SCHED always means CONFIG_CGROUP_SCHED Signed-off-by: Yong Zhang <yong.zhang0@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> LKML-Reference: <1293803938-8157-1-git-send-email-yong.zhang0@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-04perf: Clean up power events by introducing new, more generic onesThomas Renninger
Add these new power trace events: power:cpu_idle power:cpu_frequency power:machine_suspend The old C-state/idle accounting events: power:power_start power:power_end Have now a replacement (but we are still keeping the old tracepoints for compatibility): power:cpu_idle and power:power_frequency is replaced with: power:cpu_frequency power:machine_suspend is newly introduced. Jean Pihet has a patch integrated into the generic layer (kernel/power/suspend.c) which will make use of it. the type= field got removed from both, it was never used and the type is differed by the event type itself. perf timechart userspace tool gets adjusted in a separate patch. Signed-off-by: Thomas Renninger <trenn@suse.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Acked-by: Arjan van de Ven <arjan@linux.intel.com> Acked-by: Jean Pihet <jean.pihet@newoldbits.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: rjw@sisk.pl LKML-Reference: <1294073445-14812-3-git-send-email-trenn@suse.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> LKML-Reference: <1290072314-31155-2-git-send-email-trenn@suse.de>
2011-01-04perf: Do not export power_frequency, but power_start eventThomas Renninger
power_frequency moved to drivers/cpufreq/cpufreq.c which has to be compiled in, no need to export it. intel_idle can a be module though... Signed-off-by: Thomas Renninger <trenn@suse.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Acked-by: Jean Pihet <jean.pihet@newoldbits.com> Cc: Jean Pihet <j-pihet@ti.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: rjw@sisk.pl LKML-Reference: <1294073445-14812-2-git-send-email-trenn@suse.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> LKML-Reference: <1290072314-31155-2-git-send-email-trenn@suse.de>
2011-01-04Merge commit 'v2.6.37-rc8' into perf/coreIngo Molnar
Merge reason: pick up latest -rc. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-03watchdog: Improve initialisation error message and documentationBen Hutchings
The error message 'NMI watchdog failed to create perf event...' does not make it clear that this is a fatal error for the watchdog. It also currently prints the error value as a pointer, rather than extracting the error code with PTR_ERR(). Fix that. Add a note to the description of the 'nowatchdog' kernel parameter to associate it with this message. Reported-by: Cesare Leonardi <celeonar@gmail.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Cc: 599368@bugs.debian.org Cc: 608138@bugs.debian.org Cc: Don Zickus <dzickus@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: <stable@kernel.org> # .37.x and later LKML-Reference: <1294009362.3167.126.camel@localhost> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-12-29fix freeing user_struct in user cacheHillf Danton
When racing on adding into user cache, the new allocated from mm slab is freed without putting user namespace. Since the user namespace is already operated by getting, putting has to be issued. Signed-off-by: Hillf Danton <dhillf@gmail.com> Acked-by: Serge Hallyn <serge@hallyn.com> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-12-28Merge branch 'perf-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: ring_buffer: Off-by-one and duplicate events in ring_buffer_read_page
2010-12-24Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: percpu: print out alloc information with KERN_DEBUG instead of KERN_INFO kthread_work: make lockdep happy
2010-12-23ring_buffer: Off-by-one and duplicate events in ring_buffer_read_pageDavid Sharp
Fix two related problems in the event-copying loop of ring_buffer_read_page. The loop condition for copying events is off-by-one. "len" is the remaining space in the caller-supplied page. "size" is the size of the next event (or two events). If len == size, then there is just enough space for the next event. size was set to rb_event_ts_length, which may include the size of two events if the first event is a time-extend, in order to assure time- extends are kept together with the event after it. However, rb_advance_reader always advances by one event. This would result in the event after any time-extend being duplicated. Instead, get the size of a single event for the memcpy, but use rb_event_ts_length for the loop condition. Signed-off-by: David Sharp <dhsharp@google.com> LKML-Reference: <1293064704-8101-1-git-send-email-dhsharp@google.com> LKML-Reference: <AANLkTin7nLrRPc9qGjdjHbeVDDWiJjAiYyb-L=gH85bx@mail.gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-12-23Merge branch 'rcu/next' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-2.6-rcu into core/rcu
2010-12-22taskstats: pad taskstats netlink response for aligment issues on ia64Jeff Mahoney
The taskstats structure is internally aligned on 8 byte boundaries but the layout of the aggregrate reply, with two NLA headers and the pid (each 4 bytes), actually force the entire structure to be unaligned. This causes the kernel to issue unaligned access warnings on some architectures like ia64. Unfortunately, some software out there doesn't properly unroll the NLA packet and assumes that the start of the taskstats structure will always be 20 bytes from the start of the netlink payload. Aligning the start of the taskstats structure breaks this software, which we don't want. So, for now the alignment only happens on architectures that require it and those users will have to update to fixed versions of those packages. Space is reserved in the packet only when needed. This ifdef should be removed in several years e.g. 2012 once we can be confident that fixed versions are installed on most systems. We add the padding before the aggregate since the aggregate is already a defined type. Commit 85893120 ("delayacct: align to 8 byte boundary on 64-bit systems") previously addressed the alignment issues by padding out the pid field. This was supposed to be a compatible change but the circumstances described above mean that it wasn't. This patch backs out that change, since it was a hack, and introduces a new NULL attribute type to provide the padding. Padding the response with 4 bytes avoids allocating an aligned taskstats structure and copying it back. Since the structure weighs in at 328 bytes, it's too big to do it on the stack. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Reported-by: Brian Rogers <brian@xyzw.org> Cc: Jeff Mahoney <jeffm@suse.com> Cc: Guillaume Chazarain <guichaz@gmail.com> Cc: Balbir Singh <balbir@in.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-12-22Merge branch 'tip/perf/core' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into perf/core
2010-12-22Merge commit 'v2.6.37-rc7' into perf/coreIngo Molnar
Merge reason: Pick up the latest -rc. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-12-22kthread_work: make lockdep happyYong Zhang
spinlock in kthread_worker and wait_queue_head in kthread_work both should be lockdep sensible, so change the interface to make it suiltable for CONFIG_LOCKDEP. tj: comment update Reported-by: Nicolas <nicolas.mailhot@laposte.net> Signed-off-by: Yong Zhang <yong.zhang0@gmail.com> Signed-off-by: Andy Walls <awalls@md.metrocast.net> Tested-by: Andy Walls <awalls@md.metrocast.net> Cc: Tejun Heo <tj@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Tejun Heo <tj@kernel.org>
2010-12-20Merge branch 'sched-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: sched: Remove debugging check
2010-12-19sched: Remove debugging checkIngo Molnar
Linus reported that the new warning introduced by commit f26f9aff6aaf "Sched: fix skip_clock_update optimization" triggers. The need_resched flag can be set by other CPUs asynchronously so this debug check is bogus - remove it. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> LKML-Reference: <AANLkTinJ8hAG1TpyC+CSYPR47p48+1=E7fiC45hMXT_1@mail.gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-12-19Merge branches 'x86-fixes-for-linus' and 'perf-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86-32: Make sure we can map all of lowmem if we need to x86, vt-d: Handle previous faults after enabling fault handling x86: Enable the intr-remap fault handling after local APIC setup x86, vt-d: Fix the vt-d fault handling irq migration in the x2apic mode x86, vt-d: Quirk for masking vtd spec errors to platform error handling logic x86, xsave: Use alloc_bootmem_align() instead of alloc_bootmem() bootmem: Add alloc_bootmem_align() x86, gcc-4.6: Use gcc -m options when building vdso x86: HPET: Chose a paranoid safe value for the ETIME check x86: io_apic: Avoid unused variable warning when CONFIG_GENERIC_PENDING_IRQ=n * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: perf: Fix off by one in perf_swevent_init() perf: Fix duplicate events with multiple-pmu vs software events ftrace: Have recordmcount honor endianness in fn_ELF_R_INFO scripts/tags.sh: Add magic for trace-events tracing: Fix panic when lseek() called on "trace" opened for writing
2010-12-19Merge branch 'sched-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: sched: Fix the irqtime code for 32bit sched: Fix the irqtime code to deal with u64 wraps nohz: Fix get_next_timer_interrupt() vs cpu hotplug Sched: fix skip_clock_update optimization sched: Cure more NO_HZ load average woes
2010-12-19sched: Fix interactivity bug by charging unaccounted run-time on entity ↵Paul Turner
re-weight Mike Galbraith reported poor interactivity[*] when the new shares distribution code was combined with autogroups. The root cause turns out to be a mis-ordering of accounting accrued execution time and shares updates. Since update_curr() is issued hierarchically, updating the parent entity weights to reflect child enqueue/dequeue results in the parent's unaccounted execution time then being accrued (vs vruntime) at the new weight as opposed to the weight present at accumulation. While this doesn't have much effect on processes with timeslices that cross a tick, it is particularly problematic for an interactive process (e.g. Xorg) which incurs many (tiny) timeslices. In this scenario almost all updates are at dequeue which can result in significant fairness perturbation (especially if it is the only thread, resulting in potential {tg->shares, MIN_SHARES} transitions). Correct this by ensuring unaccounted time is accumulated prior to manipulating an entity's weight. [*] http://xkcd.com/619/ is perversely Nostradamian here. Signed-off-by: Paul Turner <pjt@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> LKML-Reference: <20101216031038.159704378@google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-12-19sched: Move periodic share updates to entity_tick()Paul Turner
Long running entities that do not block (dequeue) require periodic updates to maintain accurate share values. (Note: group entities with several threads are quite likely to be non-blocking in many circumstances). By virtue of being long-running however, we will see entity ticks (otherwise the required update occurs in dequeue/put and we are done). Thus we can move the detection (and associated work) for these updates into the periodic path. This restores the 'atomicity' of update_curr() with respect to accounting. Signed-off-by: Paul Turner <pjt@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20101216031038.067028969@google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-12-19Merge commit 'v2.6.37-rc6' into sched/coreIngo Molnar
Merge reason: Update to the latest -rc. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-12-18Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: x86: avoid high BIOS area when allocating address space x86: avoid E820 regions when allocating address space x86: avoid low BIOS area when allocating address space resources: add arch hook for preventing allocation in reserved areas Revert "resources: support allocating space within a region from the top down" Revert "PCI: allocate bus resources from the top down" Revert "x86/PCI: allocate space from the end of a region, not the beginning" Revert "x86: allocate space within a region top-down" Revert "PCI: fix pci_bus_alloc_resource() hang, prefer positive decode" PCI: Update MCP55 quirk to not affect non HyperTransport variants
2010-12-17rcu: reduce __call_rcu()-induced contention on rcu_node structuresPaul E. McKenney
When the current __call_rcu() function was written, the expedited APIs did not exist. The __call_rcu() implementation therefore went to great lengths to detect the end of old grace periods and to start new ones, all in the name of reducing grace-period latency. Now the expedited APIs do exist, and the usage of __call_rcu() has increased considerably. This commit therefore causes __call_rcu() to avoid worrying about grace periods unless there are a large number of RCU callbacks stacked up on the current CPU. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2010-12-17rcu: limit rcu_node leaf-level fanoutPaul E. McKenney
Some recent benchmarks have indicated possible lock contention on the leaf-level rcu_node locks. This commit therefore limits the number of CPUs per leaf-level rcu_node structure to 16, in other words, there can be at most 16 rcu_data structures fanning into a given rcu_node structure. Prior to this, the limit was 32 on 32-bit systems and 64 on 64-bit systems. Note that the fanout of non-leaf rcu_node structures is unchanged. The organization of accesses to the rcu_node tree is such that references to non-leaf rcu_node structures are much less frequent than to the leaf structures. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2010-12-17rcu: fine-tune grace-period begin/end checksPaul E. McKenney
Use the CPU's bit in rnp->qsmask to determine whether or not the CPU should try to report a quiescent state. Handle overflow in the check for rdp->gpnum having fallen behind. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2010-12-17rcu: Keep gpnum and completed fields synchronizedFrederic Weisbecker
When a CPU that was in an extended quiescent state wakes up and catches up with grace periods that remote CPUs completed on its behalf, we update the completed field but not the gpnum that keeps a stale value of a backward grace period ID. Later, note_new_gpnum() will interpret the shift between the local CPU and the node grace period ID as some new grace period to handle and will then start to hunt quiescent state. But if every grace periods have already been completed, this interpretation becomes broken. And we'll be stuck in clusters of spurious softirqs because rcu_report_qs_rdp() will make this broken state run into infinite loop. The solution, as suggested by Lai Jiangshan, is to ensure that the gpnum and completed fields are well synchronized when we catch up with completed grace periods on their behalf by other cpus. This way we won't start noting spurious new grace periods. Suggested-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Steven Rostedt <rostedt@goodmis.org Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2010-12-17rcu: Stop chasing QS if another CPU did it for usFrederic Weisbecker
When a CPU is idle and others CPUs handled its extended quiescent state to complete grace periods on its behalf, it will catch up with completed grace periods numbers when it wakes up. But at this point there might be no more grace period to complete, but still the woken CPU always keeps its stale qs_pending value and will then continue to chase quiescent states even if its not needed anymore. This results in clusters of spurious softirqs until a new real grace period is started. Because if we continue to chase quiescent states but we have completed every grace periods, rcu_report_qs_rdp() is puzzled and makes that state run into infinite loops. As suggested by Lai Jiangshan, just reset qs_pending if someone completed every grace periods on our behalf. Suggested-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2010-12-17rcu: increase synchronize_sched_expedited() batchingTejun Heo
The fix in commit #6a0cc49 requires more than three concurrent instances of synchronize_sched_expedited() before batching is possible. This patch uses a ticket-counter-like approach that is also not unrelated to Lai Jiangshan's Ring RCU to allow sharing of expedited grace periods even when there are only two concurrent instances of synchronize_sched_expedited(). This commit builds on Tejun's original posting, which may be found at http://lkml.org/lkml/2010/11/9/204, adding memory barriers, avoiding overflow of signed integers (other than via atomic_t), and fixing the detection of batching. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2010-12-17resources: add arch hook for preventing allocation in reserved areasBjorn Helgaas
This adds arch_remove_reservations(), which an arch can implement if it needs to protect part of the address space from allocation. Sometimes that can be done by just putting a region in the resource tree, but there are cases where that doesn't work well. For example, x86 BIOS E820 reservations are not related to devices, so they may overlap part of, all of, or more than a device resource, so they may not end up at the correct spot in the resource tree. Acked-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-12-17Revert "resources: support allocating space within a region from the top down"Bjorn Helgaas
This reverts commit e7f8567db9a7f6b3151b0b275e245c1cef0d9c70. Acked-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-12-16PM / Hibernate: Restore old swap signature to avoid user space breakageRafael J. Wysocki
Commit 3624eb0 (PM / Hibernate: Modify signature used to mark swap) attempted to modify hibernate signature used to mark swap partitions containing hibernation images, so that old kernels don't try to handle compressed images. However, this change broke resume from hibernation on Fedora 14 that apparently doesn't pass the resume= argument to the kernel and tries to trigger resume from early user space. This doesn't work, because the signature is now different, so the old signature has to be restored to avoid the problem. Addresses https://bugzilla.kernel.org/show_bug.cgi?id=22732 . Reported-by: Dr. David Alan Gilbert <linux@treblig.org> Reported-by: Zhang Rui <rui.zhang@intel.com> Reported-by: Pascal Chapperon <pascal.chapperon@wanadoo.fr> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2010-12-16PM / Hibernate: Fix PM_POST_* notification with user-space suspendTakashi Iwai
The user-space hibernation sends a wrong notification after the image restoration because of thinko for the file flag check. RDONLY corresponds to hibernation and WRONLY to restoration, confusingly. Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Cc: stable@kernel.org
2010-12-16perf: Sysfs enumerationPeter Zijlstra
Simple sysfs emumeration of the PMUs. Use a "event_source" bus, and add PMU devices using their name. Each PMU device has a type attribute which contrains the value needed for perf_event_attr::type to identify this PMU. This is the minimal stub needed to start using this interface, we'll consider extending the sysfs usage later. Cc: Kay Sievers <kay.sievers@vrfy.org> Cc: Greg KH <gregkh@suse.de> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20101117222056.316982569@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-12-16perf: Dynamic pmu typesPeter Zijlstra
Extend the perf_pmu_register() interface to allow for named and dynamic pmu types. Because we need to support the existing static types we cannot use dynamic types for everything, hence provide a type argument. If we want to enumerate the PMUs they need a name, provide one. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20101117222056.259707703@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-12-16perf: Move perf_event_init() into main.cPeter Zijlstra
Currently we call perf_event_init() from sched_init(). In order to make it more obvious move it to the cannnonical location. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20101117222056.093629821@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-12-16Merge branch 'perf/urgent' into perf/coreIngo Molnar
Merge reason: We want to apply a dependent patch. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-12-16Merge branch 'tip/perf/urgent' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into perf/urgent
2010-12-16sched: Fix the irqtime code for 32bitPeter Zijlstra
Since the irqtime accounting is using non-atomic u64 and can be read from remote cpus (writes are strictly cpu local, reads are not) we have to deal with observing partial updates. When we do observe partial updates the clock movement (in particular, ->clock_task movement) will go funny (in either direction), a subsequent clock update (observing the full update) will make it go funny in the oposite direction. Since we rely on these clocks to be strictly monotonic we cannot suffer backwards motion. One possible solution would be to simply ignore all backwards deltas, but that will lead to accounting artefacts, most notable: clock_task + irq_time != clock, this inaccuracy would end up in user visible stats. Therefore serialize the reads using a seqcount. Reviewed-by: Venkatesh Pallipadi <venki@google.com> Reported-by: Mikael Pettersson <mikpe@it.uu.se> Tested-by: Mikael Pettersson <mikpe@it.uu.se> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1292242434.6803.200.camel@twins> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-12-16sched: Fix the irqtime code to deal with u64 wrapsPeter Zijlstra
Some ARM systems have a short sched_clock() [ which needs to be fixed too ], but this exposed a bug in the irq_time code as well, it doesn't deal with wraps at all. Fix the irq_time code to deal with u64 wraps by re-writing the code to only use delta increments, which avoids the whole issue. Reviewed-by: Venkatesh Pallipadi <venki@google.com> Reported-by: Mikael Pettersson <mikpe@it.uu.se> Tested-by: Mikael Pettersson <mikpe@it.uu.se> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1292242433.6803.199.camel@twins> Signed-off-by: Ingo Molnar <mingo@elte.hu>