summaryrefslogtreecommitdiffstats
path: root/kernel
AgeCommit message (Collapse)Author
2011-04-30Merge branch 'fixes-2.6.39' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq * 'fixes-2.6.39' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: fix deadlock in worker_maybe_bind_and_lock() workqueue: Document debugging tricks Fix up trivial spelling conflict in kernel/workqueue.c
2011-04-29Merge branch 'perf-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: perf, x86, nmi: Move LVT un-masking into irq handlers perf events, x86: Work around the Nehalem AAJ80 erratum perf, x86: Fix BTS condition ftrace: Build without frame pointers on Microblaze
2011-04-29Merge branch 'timer-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'timer-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: hrtimer: Initialize CLOCK_ID to HRTIMER_BASE table statically rtc: max8925: Call dev_set_drvdata before rtc_device_register
2011-04-29workqueue: fix deadlock in worker_maybe_bind_and_lock()Tejun Heo
If a rescuer and stop_machine() bringing down a CPU race with each other, they may deadlock on non-preemptive kernel. The CPU won't accept a new task, so the rescuer can't migrate to the target CPU, while stop_machine() can't proceed because the rescuer is holding one of the CPU retrying migration. GCWQ_DISASSOCIATED is never cleared and worker_maybe_bind_and_lock() retries indefinitely. This problem can be reproduced semi reliably while the system is entering suspend. http://thread.gmane.org/gmane.linux.kernel/1122051 A lot of kudos to Thilo-Alexander for reporting this tricky issue and painstaking testing. stable: This affects all kernels with cmwq, so all kernels since and including v2.6.36 need this fix. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Thilo-Alexander Ginkel <thilo@ginkel.com> Tested-by: Thilo-Alexander Ginkel <thilo@ginkel.com> Cc: stable@kernel.org
2011-04-29hrtimer: Initialize CLOCK_ID to HRTIMER_BASE table staticallyThomas Gleixner
Sedat and Bruno reported RCU stalls which turned out to be caused by the following; sched_init() calls init_rt_bandwidth() which calls hrtimer_init() _BEFORE_ hrtimers_init() is called. While not entirely correct this worked because hrtimer_init() only accessed statically initialized data (hrtimer_bases.clock_base[CLOCK_MONOTONIC]) Commit e06383db9 (hrtimers: extend hrtimer base code to handle more then 2 clockids) added an indirection to the hrtimer_bases.clock_base lookup to avoid gap handling in the hot path. The table which is used for the translataion from CLOCK_ID to HRTIMER_BASE index is initialized at runtime in hrtimers_init(). So the early call of the scheduler code translates CLOCK_MONOTONIC to HRTIMER_BASE_REALTIME. Thus the rt_bandwith timer ends up on CLOCK_REALTIME. If the timer is armed and the wall clock time is set (e.g. ntpdate in the early boot process - which also gives the problem deterministic behaviour i.e. magic recovery after N hours), then the timer ends up with an expiry time far into the future. That breaks the RT throttler mechanism as rt runtime is accumulated and never cleared, so the rt throttler detects a false cpu hog condition and blocks all RT tasks until the timer finally expires. That in turn stalls the RCU thread of TINYRCU which leads to an huge amount of RCU callbacks piling up. Make the translation table statically initialized, so we are back to the status of <= 2.6.39. Reported-and-tested-by: Sedat Dilek <sedat.dilek@gmail.com> Reported-by: Bruno Prémont <bonbons@linux-vserver.org> Cc: John stultz <johnstul@us.ibm.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/%3Calpine.LFD.2.02.1104282353140.3005%40ionos%3E Reviewed-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-04-28kernel/watchdog.c: disable nmi perf event in the error path of enabling watchdogHillf Danton
In corner cases where softlockup watchdog is not setup successfully, the relevant nmi perf event for hardlockup watchdog could be disabled, then the status of the underlying hardware remains unchanged. Also, if the kthread doesn't start then the hrtimer won't run and the hardlockup detector will falsely fire. Signed-off-by: Hillf Danton <dhillf@gmail.com> Signed-off-by: Don Zickus <dzickus@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-04-27Merge branch 'tip/perf/urgent' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into perf/urgent
2011-04-23Merge branch 'pm-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6 * 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6: PM: Add missing syscore_suspend() and syscore_resume() calls PM: Fix error code paths executed after failing syscore_suspend()
2011-04-21ftrace: Build without frame pointers on MicroblazeMichal Simek
Microblaze doesn't need/support FRAME_POINTERS in order to have a working function tracer. The patch remove Kconfig warning. Warning log: warning: (LOCKDEP && FAULT_INJECTION_STACKTRACE_FILTER && LATENCYTOP && FUNCTION_TRACER && KMEMCHECK) selects FRAME_POINTER which has unmet direct dependencies (DEBUG_KERNEL && (CRIS || M68K || FRV || UML || AVR32 || SUPERH || BLACKFIN || MN10300) || ARCH_WANT_FRAME_POINTERS) Signed-off-by: Michal Simek <monstr@monstr.eu> Link: http://lkml.kernel.org/r/1301908812-8119-2-git-send-email-monstr@monstr.eu CC: Frederic Weisbecker <fweisbec@gmail.com> CC: Ingo Molnar <mingo@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-04-20PM: Add missing syscore_suspend() and syscore_resume() callsRafael J. Wysocki
Device suspend/resume infrastructure is used not only by the suspend and hibernate code in kernel/power, but also by APM, Xen and the kexec jump feature. However, commit 40dc166cb5dddbd36aa4ad11c03915ea (PM / Core: Introduce struct syscore_ops for core subsystems PM) failed to add syscore_suspend() and syscore_resume() calls to that code, which generally leads to breakage when the features in question are used. To fix this problem, add the missing syscore_suspend() and syscore_resume() calls to arch/x86/kernel/apm_32.c, kernel/kexec.c and drivers/xen/manage.c. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Greg Kroah-Hartman <gregkh@suse.de> Acked-by: Ian Campbell <ian.campbell@citrix.com>
2011-04-19Merge branch 'timer-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'timer-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: RTC: rtc-omap: Fix a leak of the IRQ during init failure posix clocks: Replace mutex with reader/writer semaphore
2011-04-18PM: Fix error code paths executed after failing syscore_suspend()Rafael J. Wysocki
If syscore_suspend() fails in suspend_enter(), create_image() or resume_target_kernel(), it is necessary to call sysdev_resume(), because sysdev_suspend() has been called already and succeeded and we are going to abort the transition. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-04-18next_pidmap: fix overflow conditionLinus Torvalds
next_pidmap() just quietly accepted whatever 'last' pid that was passed in, which is not all that safe when one of the users is /proc. Admittedly the proc code should do some sanity checking on the range (and that will be the next commit), but that doesn't mean that the helper functions should just do that pidmap pointer arithmetic without checking the range of its arguments. So clamp 'last' to PID_MAX_LIMIT. The fact that we then do "last+1" doesn't really matter, the for-loop does check against the end of the pidmap array properly (it's only the actual pointer arithmetic overflow case we need to worry about, and going one bit beyond isn't going to overflow). [ Use PID_MAX_LIMIT rather than pid_max as per Eric Biederman ] Reported-by: Tavis Ormandy <taviso@cmpxchg8b.com> Analyzed-by: Robert Święcki <robert@swiecki.net> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-04-18posix clocks: Replace mutex with reader/writer semaphoreRichard Cochran
A dynamic posix clock is protected from asynchronous removal by a mutex. However, using a mutex has the unwanted effect that a long running clock operation in one process will unnecessarily block other processes. For example, one process might call read() to get an external time stamp coming in at one pulse per second. A second process calling clock_gettime would have to wait for almost a whole second. This patch fixes the issue by using a reader/writer semaphore instead of a mutex. Signed-off-by: Richard Cochran <richard.cochran@omicron.at> Cc: John Stultz <john.stultz@linaro.org> Link: http://lkml.kernel.org/r/%3C20110330132421.GA31771%40riccoc20.at.omicron.at%3E Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-04-16Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-blockLinus Torvalds
* 'for-linus' of git://git.kernel.dk/linux-2.6-block: block: make unplug timer trace event correspond to the schedule() unplug block: let io_schedule() flush the plug inline
2011-04-16Merge branches 'core-fixes-for-linus', 'perf-fixes-for-linus', ↵Linus Torvalds
'sched-fixes-for-linus', 'timer-fixes-for-linus' and 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: futex: Set FLAGS_HAS_TIMEOUT during futex_wait restart setup * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: perf_event: Fix cgrp event scheduling bug in perf_enable_on_exec() perf: Fix a build error with some GCC versions * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: sched: Fix erroneous all_pinned logic sched: Fix sched-domain avg_load calculation * 'timer-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: RTC: rtc-mrst: follow on to the change of rtc_device_register() RTC: add missing "return 0" in new alarm func for rtc-bfin.c RTC: Fix s3c compile error due to missing s3c_rtc_setpie RTC: Fix early irqs caused by calling rtc_set_alarm too early * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, amd: Disable GartTlbWlkErr when BIOS forgets it x86, NUMA: Fix fakenuma boot failure x86/mrst: Fix boot crash caused by incorrect pin to irq mapping x86/ce4100: Add reg property to bridges
2011-04-16block: make unplug timer trace event correspond to the schedule() unplugJens Axboe
It's a pretty close match to what we had before - the timer triggering would mean that nobody unplugged the plug in due time, in the new scheme this matches very closely what the schedule() unplug now is. It's essentially the difference between an explicit unplug (IO unplug) or an implicit unplug (timer unplug, we scheduled with pending IO queued). Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-04-16block: let io_schedule() flush the plug inlineJens Axboe
Linus correctly observes that the most important dispatch cases are now done from kblockd, this isn't ideal for latency reasons. The original reason for switching dispatches out-of-line was to avoid too deep a stack, so by _only_ letting the "accidental" flush directly in schedule() be guarded by offload to kblockd, we should be able to get the best of both worlds. So add a blk_schedule_flush_plug() that offloads to kblockd, and only use that from the schedule() path. Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-04-15Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-blockLinus Torvalds
* 'for-linus' of git://git.kernel.dk/linux-2.6-block: block: only force kblockd unplugging from the schedule() path block: cleanup the block plug helper functions block, blk-sysfs: Use the variable directly instead of a function call block: move queue run on unplug to kblockd block: kill queue_sync_plugs() block: readd plug trace event block: add callback function for unplug notification block: add comment on why we save and disable interrupts in flush_plug_list() block: fixup block IO unplug trace call block: remove block_unplug_timer() trace point block: splice plug list to local context
2011-04-15futex: Set FLAGS_HAS_TIMEOUT during futex_wait restart setupDarren Hart
The FLAGS_HAS_TIMEOUT flag was not getting set, causing the restart_block to restart futex_wait() without a timeout after a signal. Commit b41277dc7a18ee332d in 2.6.38 introduced the regression by accidentally removing the the FLAGS_HAS_TIMEOUT assignment from futex_wait() during the setup of the restart block. Restore the originaly behavior. Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=32922 Reported-by: Tim Smith <tsmith201104@yahoo.com> Reported-by: Torsten Hilbrich <torsten.hilbrich@secunet.com> Signed-off-by: Darren Hart <dvhart@linux.intel.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: John Kacur <jkacur@redhat.com> Cc: stable@kernel.org Link: http://lkml.kernel.org/r/%3Cdaac0eb3af607f72b9a4d3126b2ba8fb5ed3b883.1302820917.git.dvhart%40linux.intel.com%3E Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-04-13block: don't flush plugged IO on forced preemtion schedulingLinus Torvalds
We really only want to unplug the pending IO when the process actually goes to sleep. So move the test for flushing the plug up to the place where we actually deactivate the task - where we have properly checked for preemption and for the process really sleeping. Acked-by: Jens Axboe <jaxboe@fusionio.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-04-12block: fixup block IO unplug trace callJens Axboe
It was removed with the on-stack plugging, readd it and track the depth of requests added when flushing the plug. Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-04-12block: remove block_unplug_timer() trace pointJens Axboe
We no longer have an unplug timer running, so no point in keeping the trace point. Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-04-11fix XEN_SAVE_RESTORE Kconfig dependenciesShriram Rajagopalan
Make XEN_SAVE_RESTORE select HIBERNATE_CALLBACKS. Remove XEN_SAVE_RESTORE dependency from PM_SLEEP. Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca> Acked-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-04-11PM / Hibernate: Introduce CONFIG_HIBERNATE_CALLBACKSRafael J. Wysocki
Xen save/restore is going to use hibernate device callbacks for quiescing devices and putting them back to normal operations and it would need to select CONFIG_HIBERNATION for this purpose. However, that also would cause the hibernate interfaces for user space to be enabled, which might confuse user space, because the Xen kernels don't support hibernation. Moreover, it would be wasteful, as it would make the Xen kernels include a substantial amount of code that they would never use. To address this issue introduce new power management Kconfig option CONFIG_HIBERNATE_CALLBACKS, such that it will only select the code that is necessary for the hibernate device callbacks to work and make CONFIG_HIBERNATION select it. Then, Xen save/restore will be able to select CONFIG_HIBERNATE_CALLBACKS without dragging the entire hibernate code along with it. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Tested-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>
2011-04-11sched: Fix erroneous all_pinned logicKen Chen
The scheduler load balancer has specific code to deal with cases of unbalanced system due to lots of unmovable tasks (for example because of hard CPU affinity). In those situation, it excludes the busiest CPU that has pinned tasks for load balance consideration such that it can perform second 2nd load balance pass on the rest of the system. This all works as designed if there is only one cgroup in the system. However, when we have multiple cgroups, this logic has false positives and triggers multiple load balance passes despite there are actually no pinned tasks at all. The reason it has false positives is that the all pinned logic is deep in the lowest function of can_migrate_task() and is too low level: load_balance_fair() iterates each task group and calls balance_tasks() to migrate target load. Along the way, balance_tasks() will also set a all_pinned variable. Given that task-groups are iterated, this all_pinned variable is essentially the status of last group in the scanning process. Task group can have number of reasons that no load being migrated, none due to cpu affinity. However, this status bit is being propagated back up to the higher level load_balance(), which incorrectly think that no tasks were moved. It kick off the all pinned logic and start multiple passes attempt to move load onto puller CPU. To fix this, move the all_pinned aggregation up at the iterator level. This ensures that the status is aggregated over all task-groups, not just last one in the list. Signed-off-by: Ken Chen <kenchen@google.com> Cc: stable@kernel.org Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/BANLkTi=ernzNawaR5tJZEsV_QVnfxqXmsQ@mail.gmail.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-04-11sched: Fix sched-domain avg_load calculationKen Chen
In function find_busiest_group(), the sched-domain avg_load isn't calculated at all if there is a group imbalance within the domain. This will cause erroneous imbalance calculation. The reason is that calculate_imbalance() sees sds->avg_load = 0 and it will dump entire sds->max_load into imbalance variable, which is used later on to migrate entire load from busiest CPU to the puller CPU. This has two really bad effect: 1. stampede of task migration, and they won't be able to break out of the bad state because of positive feedback loop: large load delta -> heavier load migration -> larger imbalance and the cycle goes on. 2. severe imbalance in CPU queue depth. This causes really long scheduling latency blip which affects badly on application that has tight latency requirement. The fix is to have kernel calculate domain avg_load in both cases. This will ensure that imbalance calculation is always sensible and the target is usually half way between busiest and puller CPU. Signed-off-by: Ken Chen <kenchen@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: <stable@kernel.org> Link: http://lkml.kernel.org/r/20110408002322.3A0D812217F@elm.corp.google.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-04-11perf_event: Fix cgrp event scheduling bug in perf_enable_on_exec()Stephane Eranian
There is a bug in perf_event_enable_on_exec() when cgroup events are active on a CPU: the cgroup events may be scheduled twice causing event state corruptions which eventually may lead to kernel panics. The reason is that the function needs to first schedule out the cgroup events, just like for the per-thread events. The cgroup event are scheduled back in automatically from the perf_event_context_sched_in() function. The patch also adds a WARN_ON_ONCE() is perf_cgroup_switch() to catch any bogus state. Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20110406005454.GA1062@quad Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-04-08signal.c: fix erroneous syscall kernel-docRandy Dunlap
Fix erroneous syscall kernel-doc comments in kernel/signal.c. Reported-by: Matt Fleming <matt@console-pimps.org> Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-04-07Merge branches 'x86-fixes-for-linus', 'sched-fixes-for-linus', ↵Linus Torvalds
'timers-fixes-for-linus', 'irq-fixes-for-linus' and 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86-32, fpu: Fix FPU exception handling on non-SSE systems x86, hibernate: Initialize mmu_cr4_features during boot x86-32, NUMA: Fix ACPI NUMA init broken by recent x86-64 change x86: visws: Fixup irq overhaul fallout * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: sched: Clean up rebalance_domains() load-balance interval calculation * 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86/mrst/vrtc: Fix boot crash in mrst_rtc_init() rtc, x86/mrst/vrtc: Fix boot crash in rtc_read_alarm() * 'irq-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: genirq: Fix cpumask leak in __setup_irq() * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: perf probe: Fix listing incorrect line number with inline function perf probe: Fix to find recursively inlined function perf probe: Fix multiple --vars options behavior perf probe: Fix to remove redundant close perf probe: Fix to ensure function declared file
2011-04-07Merge branch 'for-linus2' of git://git.profusion.mobi/users/lucas/linux-2.6Linus Torvalds
* 'for-linus2' of git://git.profusion.mobi/users/lucas/linux-2.6: Fix common misspellings
2011-04-05sched: Clean up rebalance_domains() load-balance interval calculationPeter Zijlstra
Instead of the possible multiple-evaluation of num_online_cpus() in rebalance_domains() that Linus reported, avoid it altogether in the normal case since it's implemented with a Hamming weight function over a cpu bitmask which can be darn expensive for those with big iron. This also makes it cleaner, smaller and documents the code. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1301991265.2225.12.camel@twins> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-04-04kernel/signal.c: add kernel-doc notation to syscallsRandy Dunlap
Add kernel-doc to syscalls in signal.c. Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-04-04kernel/signal.c: fix typos and coding styleRandy Dunlap
General coding style and comment fixes; no code changes: - Use multi-line-comment coding style. - Put some function signatures completely on one line. - Hyphenate some words. - Spell Posix as POSIX. - Correct typos & spellos in some comments. - Drop trailing whitespace. - End sentences with periods. Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-04-04Merge branch 'sched-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: sched: Fix rebalance interval calculation sched, doc: Beef up load balancing description sched: Leave sched_setscheduler() earlier if possible, do not disturb SCHED_FIFO tasks
2011-04-04Merge branch 'perf-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: perf: Fix task_struct reference leak perf: Fix task context scheduling perf: mmap 512 kiB by default perf: Rebase max unprivileged mlock threshold on top of page size perf tools: Fix NO_NEWT=1 python build error perf symbols: Properly align symbol_conf.priv_size perf tools: Emit clearer message for sys_perf_event_open ENOENT return perf tools: Fixup exit path when not able to open events perf symbols: Fix vsyscall symbol lookup oprofile, x86: Allow setting EDGE/INV/CMASK for counter events
2011-04-04ntp: fix non privileged system time shiftingRichard Cochran
The ADJ_SETOFFSET bit added in commit 094aa188 ("ntp: Add ADJ_SETOFFSET mode bit") also introduced a way for any user to change the system time. Sneaky or buggy calls to adjtimex() could set ADJ_OFFSET_SS_READ | ADJ_SETOFFSET which would result in a successful call to timekeeping_inject_offset(). This patch fixes the issue by adding the capability check. Signed-off-by: Richard Cochran <richard.cochran@omicron.at> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-04-02genirq: Fix cpumask leak in __setup_irq()Xiaotian Feng
The allocated cpumask should be freed in __setup_irq(). Signed-off-by: Xiaotian Feng <dfeng@redhat.com> LKML-Reference: <1301744375-6812-1-git-send-email-dfeng@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-04-01kdump: Allow shrinking of kdump region to be overriddenAnton Blanchard
On ppc64 the crashkernel region almost always overlaps an area of firmware. This works fine except when using the sysfs interface to reduce the kdump region. If we free the firmware area we are guaranteed to crash. Rename free_reserved_phys_range to crash_free_reserved_phys_range and make it a weak function so we can override it. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-03-31Fix common misspellingsLucas De Marchi
Fixes generated by 'codespell' and manually reviewed. Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
2011-03-31perf: Fix task_struct reference leakPeter Zijlstra
sys_perf_event_open() had an imbalance in the number of task refs it took causing memory leakage Cc: Jiri Olsa <jolsa@redhat.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: stable@kernel.org # .37+ Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-03-31perf: Rebase max unprivileged mlock threshold on top of page sizeFrederic Weisbecker
Ensure we allow 512 kiB + 1 page for user control without assuming a 4096 bytes page size. Reported-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Stephane Eranian <eranian@google.com> Cc: <stable@kernel.org> LKML-Reference: <1301535209-9679-1-git-send-email-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-03-31sched: Fix rebalance interval calculationSisir Koppaka
The interval for checking scheduling domains if they are due to be balanced currently depends on boot state NR_CPUS, which may not accurately reflect the number of online CPUs at the time of check. Thus replace NR_CPUS with num_online_cpus(). (ed: Should only affect those who set NR_CPUS really high, such as 4096 or so :-) Signed-off-by: Sisir Koppaka <sisir.koppaka@gmail.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <AANLkTikqHWid2Q93F5U5Qw5snJH8C5PXoa7J6=6hYO94@mail.gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-03-31sched: Leave sched_setscheduler() earlier if possible, do not disturb ↵Dario Faggioli
SCHED_FIFO tasks sched_setscheduler() (in sched.c) is called in order of changing the scheduling policy and/or the real-time priority of a task. Thus, if we find out that neither of those are actually being modified, it is possible to return earlier and save the overhead of a full deactivate+activate cycle of the task in question. Beside that, if we have more than one SCHED_FIFO task with the same priority on the same rq (which means they share the same priority queue) having one of them changing its position in the priority queue because of a sched_setscheduler (as it happens by means of the deactivate+activate) that does not actually change the priority violates POSIX which states, for SCHED_FIFO: "If a thread whose policy or priority has been modified by pthread_setschedprio() is a running thread or is runnable, the effect on its position in the thread list depends on the direction of the modification, as follows: a. <...> b. If the priority is unchanged, the thread does not change position in the thread list. c. <...>" http://pubs.opengroup.org/onlinepubs/009695399/functions/xsh_chap02_08.html (ed: And the POSIX specification here does, briefly and somewhat unexpectedly, match what common sense tells us as well. ) Signed-off-by: Dario Faggioli <raistlin@linux.it> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1300971618.3960.82.camel@Palantir> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-03-30genirq: Remove the now obsolete config options and select statementsThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-03-29genirq: Fix misnamed label in handle_edge_eoi_irqThomas Gleixner
Reported-by: michael@ellerman.id.au Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: linuxppc-dev@lists.ozlabs.org
2011-03-29genirq: Remove move_*irq leftoversThomas Gleixner
All users converted to new interface. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-03-29genirq: Remove compat codeThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-03-29alpha: Use generic show_interrupts()Thomas Gleixner
The only subtle difference is that alpha uses ACTUAL_NR_IRQS and prints the IRQF_DISABLED flag. Change the generic implementation to deal with ACTUAL_NR_IRQS if defined. The IRQF_DISABLED printing is pointless, as we nowadays run all interrupts with irqs disabled. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-03-29genirq: Fix harmless typoThomas Gleixner
The late night fixup missed to convert the data type from irq_desc to irq_data, which results in a harmless but annoying warning. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>