summaryrefslogtreecommitdiffstats
path: root/kernel
AgeCommit message (Collapse)Author
2010-12-17Merge branch 'this_cpu_ops' into for-2.6.38Tejun Heo
2010-12-17core: Replace __get_cpu_var with __this_cpu_read if not used for an address.Christoph Lameter
__get_cpu_var() can be replaced with this_cpu_read and will then use a single read instruction with implied address calculation to access the correct per cpu instance. However, the address of a per cpu variable passed to __this_cpu_read() cannot be determined (since it's an implied address conversion through segment prefixes). Therefore apply this only to uses of __get_cpu_var where the address of the variable is not used. Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Hugh Dickins <hughd@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Acked-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2010-12-17kprobes: Use this_cpu_opsChristoph Lameter
Use this_cpu ops in various places to optimize per cpu data access. Cc: Jason Baron <jbaron@redhat.com> Cc: Namhyung Kim <namhyung@gmail.com> Acked-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2010-12-16tty: add 'active' sysfs attribute to tty0 and console deviceKay Sievers
tty: add 'active' sysfs attribute to tty0 and console device Userspace can query the actual virtual console, and the configured console devices behind /dev/tt0 and /dev/console. The last entry in the list of devices is the active device, analog to the console= kernel command line option. The attribute supports poll(), which is raised when the virtual console is changed or /dev/console is reconfigured. Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> index 0000000..b138b66
2010-12-16PM / Hibernate: Restore old swap signature to avoid user space breakageRafael J. Wysocki
Commit 3624eb0 (PM / Hibernate: Modify signature used to mark swap) attempted to modify hibernate signature used to mark swap partitions containing hibernation images, so that old kernels don't try to handle compressed images. However, this change broke resume from hibernation on Fedora 14 that apparently doesn't pass the resume= argument to the kernel and tries to trigger resume from early user space. This doesn't work, because the signature is now different, so the old signature has to be restored to avoid the problem. Addresses https://bugzilla.kernel.org/show_bug.cgi?id=22732 . Reported-by: Dr. David Alan Gilbert <linux@treblig.org> Reported-by: Zhang Rui <rui.zhang@intel.com> Reported-by: Pascal Chapperon <pascal.chapperon@wanadoo.fr> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2010-12-16PM / Hibernate: Fix PM_POST_* notification with user-space suspendTakashi Iwai
The user-space hibernation sends a wrong notification after the image restoration because of thinko for the file flag check. RDONLY corresponds to hibernation and WRONLY to restoration, confusingly. Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Cc: stable@kernel.org
2010-12-16perf: Sysfs enumerationPeter Zijlstra
Simple sysfs emumeration of the PMUs. Use a "event_source" bus, and add PMU devices using their name. Each PMU device has a type attribute which contrains the value needed for perf_event_attr::type to identify this PMU. This is the minimal stub needed to start using this interface, we'll consider extending the sysfs usage later. Cc: Kay Sievers <kay.sievers@vrfy.org> Cc: Greg KH <gregkh@suse.de> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20101117222056.316982569@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-12-16perf: Dynamic pmu typesPeter Zijlstra
Extend the perf_pmu_register() interface to allow for named and dynamic pmu types. Because we need to support the existing static types we cannot use dynamic types for everything, hence provide a type argument. If we want to enumerate the PMUs they need a name, provide one. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20101117222056.259707703@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-12-16perf: Move perf_event_init() into main.cPeter Zijlstra
Currently we call perf_event_init() from sched_init(). In order to make it more obvious move it to the cannnonical location. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20101117222056.093629821@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-12-16Merge branch 'perf/urgent' into perf/coreIngo Molnar
Merge reason: We want to apply a dependent patch. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-12-16Merge branch 'tip/perf/urgent' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into perf/urgent
2010-12-16sched: Fix the irqtime code for 32bitPeter Zijlstra
Since the irqtime accounting is using non-atomic u64 and can be read from remote cpus (writes are strictly cpu local, reads are not) we have to deal with observing partial updates. When we do observe partial updates the clock movement (in particular, ->clock_task movement) will go funny (in either direction), a subsequent clock update (observing the full update) will make it go funny in the oposite direction. Since we rely on these clocks to be strictly monotonic we cannot suffer backwards motion. One possible solution would be to simply ignore all backwards deltas, but that will lead to accounting artefacts, most notable: clock_task + irq_time != clock, this inaccuracy would end up in user visible stats. Therefore serialize the reads using a seqcount. Reviewed-by: Venkatesh Pallipadi <venki@google.com> Reported-by: Mikael Pettersson <mikpe@it.uu.se> Tested-by: Mikael Pettersson <mikpe@it.uu.se> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1292242434.6803.200.camel@twins> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-12-16sched: Fix the irqtime code to deal with u64 wrapsPeter Zijlstra
Some ARM systems have a short sched_clock() [ which needs to be fixed too ], but this exposed a bug in the irq_time code as well, it doesn't deal with wraps at all. Fix the irq_time code to deal with u64 wraps by re-writing the code to only use delta increments, which avoids the whole issue. Reviewed-by: Venkatesh Pallipadi <venki@google.com> Reported-by: Mikael Pettersson <mikpe@it.uu.se> Tested-by: Mikael Pettersson <mikpe@it.uu.se> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1292242433.6803.199.camel@twins> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-12-16perf: Fix off by one in perf_swevent_init()Dan Carpenter
The perf_swevent_enabled[] array has PERF_COUNT_SW_MAX elements. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20101024195041.GT5985@bicker> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-12-14Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wqLinus Torvalds
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: It is likely that WORKER_NOT_RUNNING is true MAINTAINERS: Add workqueue entry workqueue: check the allocation of system_unbound_wq
2010-12-14kbuild: fix interaction of CONFIG_IKCONFIG and KCONFIG_CONFIGBen Gardiner
If you try to build a kernel with KCONFIG_CONFIG set (to a value not equal to .config) and that config sets CONFIG_IKCONFIG then the build will fail with: make[1]: *** No rule to make target `.config', needed by \ `kernel/config_data.gz'. Stop. because the kernel/Makefile contains a direct reference to .config. This issue has been present since the introduction of KCONFIG_CONFIG in 14cdd3c402bf7c66f0bcd76e290f0770a54a4b21. Signed-off-by: Ben Gardiner <bengardiner@nanometrics.ca> CC: Roman Zippel <zippel@linux-m68k.org> CC: Michal Marek <mmarek@suse.cz> Reviewed-by: Michal Marek <mmarek@suse.cz> Signed-off-by: Michal Marek <mmarek@suse.cz>
2010-12-14workqueue: It is likely that WORKER_NOT_RUNNING is trueSteven Rostedt
Running the annotate branch profiler on three boxes, including my main box that runs firefox, evolution, xchat, and is part of the distcc farm, showed this with the likelys in the workqueue code: correct incorrect % Function File Line ------- --------- - -------- ---- ---- 96 996253 99 wq_worker_sleeping workqueue.c 703 96 996247 99 wq_worker_waking_up workqueue.c 677 The likely()s in this case were assuming that WORKER_NOT_RUNNING will most likely be false. But this is not the case. The reason is (and shown by adding trace_printks and testing it) that most of the time WORKER_PREP is set. In worker_thread() we have: worker_clr_flags(worker, WORKER_PREP); [ do work stuff ] worker_set_flags(worker, WORKER_PREP, false); (that 'false' means not to wake up an idle worker) The wq_worker_sleeping() is called from schedule when a worker thread is putting itself to sleep. Which happens most of the time outside of that [ do work stuff ]. The wq_worker_waking_up is called by the wakeup worker code, which is also callod outside that [ do work stuff ]. Thus, the likely and unlikely used by those two functions are actually backwards. Remove the annotation and let gcc figure it out. Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Tejun Heo <tj@kernel.org>
2010-12-13x86, suspend: Avoid unnecessary smp alternatives switch during suspend/resumeSuresh Siddha
During suspend, we disable all the non boot cpus. And during resume we bring them all back again. So no need to do alternatives_smp_switch() in between. On my core 2 based laptop, this speeds up the suspend path by 15msec and the resume path by 5 msec (suspend/resume speed up differences can be attributed to the different P-states that the cpu is in during suspend/resume). Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> LKML-Reference: <1290557500.4946.8.camel@sbsiddha-MOBL3.sc.intel.com> Cc: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-12-12timers: Use this_cpu_readChristoph Lameter
Eric asked for this. [tglx: Because it generates faster code according to Erics ] Signed-off-by: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Tejun Heo <tj@kernel.org> Cc: linux-mm@kvack.org LKML-Reference: <alpine.DEB.2.00.1011301404490.4039@router.home> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-12-10hrtimer: fix timerqueue conversion flubJohn Stultz
In converting the hrtimers to timerqueue, I missed a spot in hrtimer_run_queues where we loop running timers. We end up not pulling the new next value out and instead just use the last next value, causing boot time hangs in some cases. The proper fix is to pull timerqueue_getnext each iteration instead of using a local next value. Reported-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: John Stultz <john.stultz@linaro.org>
2010-12-10hrtimers: Convert hrtimers to use timerlist infrastructureJohn Stultz
Converts the hrtimer code to use the new timerlist infrastructure Signed-off-by: John Stultz <john.stultz@linaro.org> LKML Reference: <1290136329-18291-3-git-send-email-john.stultz@linaro.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> CC: Alessandro Zummo <a.zummo@towertech.it> CC: Thomas Gleixner <tglx@linutronix.de> CC: Richard Cochran <richardcochran@gmail.com>
2010-12-10x86, NMI: Add back unknown_nmi_panic and nmi_watchdog sysctlsDon Zickus
Originally adapted from Huang Ying's patch which moved the unknown_nmi_panic to the traps.c file. Because the old nmi watchdog was deleted before this change happened, the unknown_nmi_panic sysctl was lost. This re-adds it. Also, the nmi_watchdog sysctl was re-implemented and its documentation updated accordingly. Patch-inspired-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Don Zickus <dzickus@redhat.com> Reviewed-by: Cyrill Gorcunov <gorcunov@gmail.com> Acked-by: Yinghai Lu <yinghai@kernel.org> Cc: fweisbec@gmail.com LKML-Reference: <1291068437-5331-3-git-send-email-dzickus@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-12-09syslog: check cap_syslog when dmesg_restrictSerge E. Hallyn
Eric Paris pointed out that it doesn't make sense to require both CAP_SYS_ADMIN and CAP_SYSLOG for certain syslog actions. So require CAP_SYSLOG, not CAP_SYS_ADMIN, when dmesg_restrict is set. (I'm also consolidating the now common error path) Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com> Acked-by: Eric Paris <eparis@redhat.com> Acked-by: Kees Cook <kees.cook@canonical.com> Signed-off-by: James Morris <jmorris@namei.org>
2010-12-08perf: Stop all counters on rebootPeter Zijlstra
Use the reboot notifier to detach all running counters on reboot, this solves a problem with kexec where the new kernel doesn't expect running counters (rightly so). It will however decrease the coverage of the NMI watchdog. Making a kexec specific reboot notifier callback would be best, however that would require touching all notifier callback handlers as they are not properly structured to deal with new state. As a compromise, place the perf reboot notifier at the very last position in the list. Reported-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Jason Wessel <jason.wessel@windriver.com> Cc: Don Zickus <dzickus@redhat.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-12-08printk: Use this_cpu_{read|write} api on printk_pendingEric Dumazet
__get_cpu_var() is a bit inefficient, lets use __this_cpu_read() and __this_cpu_write() to manipulate printk_pending. printk_needs_cpu(cpu) is called only for the current cpu : Use faster __this_cpu_read(). Remove the redundant unlikely on (cpu_is_offline(cpu)) test: # size kernel/printk.o* text data bss dec hex filename 9942 756 263488 274186 42f0a kernel/printk.o.new 9990 756 263488 274234 42f3a kernel/printk.o.old Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Christoph Lameter <cl@linux.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1290788536.2855.237.camel@edumazet-laptop> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-12-08sched: Make pushable_tasks CONFIG_SMP dependantDario Faggioli
As noted by Peter Zijlstra at https://lkml.org/lkml/2010/11/10/391 (while reviewing other stuff, though), tracking pushable tasks only makes sense on SMP systems. Signed-off-by: Dario Faggioli <raistlin@linux.it> Acked-by: Steven Rostedt <rostedt@goodmis.org> Acked-by: Gregory Haskins <ghaskins@novell.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1291143093.2697.298.camel@Palantir> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-12-08Merge branch 'linus' into sched/coreIngo Molnar
Merge reason: we want to queue up dependent cleanup Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-12-08nohz: Fix get_next_timer_interrupt() vs cpu hotplugHeiko Carstens
This fixes a bug as seen on 2.6.32 based kernels where timers got enqueued on offline cpus. If a cpu goes offline it might still have pending timers. These will be migrated during CPU_DEAD handling after the cpu is offline. However while the cpu is going offline it will schedule the idle task which will then call tick_nohz_stop_sched_tick(). That function in turn will call get_next_timer_intterupt() to figure out if the tick of the cpu can be stopped or not. If it turns out that the next tick is just one jiffy off (delta_jiffies == 1) tick_nohz_stop_sched_tick() incorrectly assumes that the tick should not stop and takes an early exit and thus it won't update the load balancer cpu. Just afterwards the cpu will be killed and the load balancer cpu could be the offline cpu. On 2.6.32 based kernel get_nohz_load_balancer() gets called to decide on which cpu a timer should be enqueued (see __mod_timer()). Which leads to the possibility that timers get enqueued on an offline cpu. These will never expire and can cause a system hang. This has been observed 2.6.32 kernels. On current kernels __mod_timer() uses get_nohz_timer_target() which doesn't have that problem. However there might be other problems because of the too early exit tick_nohz_stop_sched_tick() in case a cpu goes offline. The easiest and probably safest fix seems to be to let get_next_timer_interrupt() just lie and let it say there isn't any pending timer if the current cpu is offline. I also thought of moving migrate_[hr]timers() from CPU_DEAD to CPU_DYING, but seeing that there already have been fixes at least in the hrtimer code in this area I'm afraid that this could add new subtle bugs. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20101201091109.GA8984@osiris.boeblingen.de.ibm.com> Cc: stable@kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-12-08Sched: fix skip_clock_update optimizationMike Galbraith
idle_balance() drops/retakes rq->lock, leaving the previous task vulnerable to set_tsk_need_resched(). Clear it after we return from balancing instead, and in setup_thread_stack() as well, so no successfully descheduled or never scheduled task has it set. Need resched confused the skip_clock_update logic, which assumes that the next call to update_rq_clock() will come nearly immediately after being set. Make the optimization robust against the waking a sleeper before it sucessfully deschedules case by checking that the current task has not been dequeued before setting the flag, since it is that useless clock update we're trying to save, and clear unconditionally in schedule() proper instead of conditionally in put_prev_task(). Signed-off-by: Mike Galbraith <efault@gmx.de> Reported-by: Bjoern B. Brandenburg <bbb.lst@gmail.com> Tested-by: Yong Zhang <yong.zhang0@gmail.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: stable@kernel.org LKML-Reference: <1291802742.1417.9.camel@marge.simson.net> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-12-08sched: Cure more NO_HZ load average woesPeter Zijlstra
There's a long-running regression that proved difficult to fix and which is hitting certain people and is rather annoying in its effects. Damien reported that after 74f5187ac8 (sched: Cure load average vs NO_HZ woes) his load average is unnaturally high, he also noted that even with that patch reverted the load avgerage numbers are not correct. The problem is that the previous patch only solved half the NO_HZ problem, it addressed the part of going into NO_HZ mode, not of comming out of NO_HZ mode. This patch implements that missing half. When comming out of NO_HZ mode there are two important things to take care of: - Folding the pending idle delta into the global active count. - Correctly aging the averages for the idle-duration. So with this patch the NO_HZ interaction should be complete and behaviour between CONFIG_NO_HZ=[yn] should be equivalent. Furthermore, this patch slightly changes the load average computation by adding a rounding term to the fixed point multiplication. Reported-by: Damien Wyart <damien.wyart@free.fr> Reported-by: Tim McGrath <tmhikaru@gmail.com> Tested-by: Damien Wyart <damien.wyart@free.fr> Tested-by: Orion Poplawski <orion@cora.nwra.com> Tested-by: Kyle McMartin <kyle@mcmartin.ca> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: stable@kernel.org Cc: Chase Douglas <chase.douglas@canonical.com> LKML-Reference: <1291129145.32004.874.camel@laptop> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-12-08perf: Fix duplicate events with multiple-pmu vs software eventsPeter Zijlstra
Because the multi-pmu bits can share contexts between struct pmu instances we could get duplicate events by iterating the pmu list. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-12-08Merge branches 'x86-fixes-for-linus', 'perf-fixes-for-linus' and ↵Linus Torvalds
'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86/pvclock: Zero last_value on resume * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: perf record: Fix eternal wait for stillborn child perf header: Don't assume there's no attr info if no sample ids is provided perf symbols: Figure out start address of kernel map from kallsyms perf symbols: Fix kallsyms kernel/module map splitting * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: nohz: Fix printk_needs_cpu() return value on offline cpus printk: Fix wake_up_klogd() vs cpu hotplug
2010-12-07Merge branch 'irq-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'irq-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: genirq: Fix incorrect proc spurious output
2010-12-07Merge branch 'perf/core' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux-2.6 into perf/core
2010-12-07Merge commit 'v2.6.37-rc5' into perf/coreIngo Molnar
Merge reason: Pick up the latest -rc. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-12-06Merge branch 'pm-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6 * 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6: PM / Hibernate: Fix memory corruption related to swap PM / Hibernate: Use async I/O when reading compressed hibernation image
2010-12-06PM / Hibernate: Fix memory corruption related to swapRafael J. Wysocki
There is a problem that swap pages allocated before the creation of a hibernation image can be released and used for storing the contents of different memory pages while the image is being saved. Since the kernel stored in the image doesn't know of that, it causes memory corruption to occur after resume from hibernation, especially on systems with relatively small RAM that need to swap often. This issue can be addressed by keeping the GFP_IOFS bits clear in gfp_allowed_mask during the entire hibernation, including the saving of the image, until the system is finally turned off or the hibernation is aborted. Unfortunately, for this purpose it's necessary to rework the way in which the hibernate and suspend code manipulates gfp_allowed_mask. This change is based on an earlier patch from Hugh Dickins. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Reported-by: Ondrej Zary <linux@rainbow-software.org> Acked-by: Hugh Dickins <hughd@google.com> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: stable@kernel.org
2010-12-06PM / Hibernate: Use async I/O when reading compressed hibernation imageBojan Smojver
This is a fix for reading LZO compressed image using async I/O. Essentially, instead of having just one page into which we keep reading blocks from swap, we allocate enough of them to cover the largest compressed size and then let block I/O pick them all up. Once we have them all (and here we wait), we decompress them, as usual. Obviously, the very first block we still pick up synchronously, because we need to know the size of the lot before we pick up the rest. Also fixed the copyright line, which I've forgotten before. Signed-off-by: Bojan Smojver <bojan@rexursive.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2010-12-06kprobes: Use text_poke_smp_batch for unoptimizingMasami Hiramatsu
Use text_poke_smp_batch() on unoptimization path for reducing the number of stop_machine() issues. If the number of unoptimizing probes is more than MAX_OPTIMIZE_PROBES(=256), kprobes unoptimizes first MAX_OPTIMIZE_PROBES probes and kicks optimizer for remaining probes. Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Jason Baron <jbaron@redhat.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: 2nddept-manager@sdl.hitachi.co.jp Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Steven Rostedt <rostedt@goodmis.org> LKML-Reference: <20101203095434.2961.22657.stgit@ltc236.sdl.hitachi.co.jp> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-12-06kprobes: Use text_poke_smp_batch for optimizingMasami Hiramatsu
Use text_poke_smp_batch() in optimization path for reducing the number of stop_machine() issues. If the number of optimizing probes is more than MAX_OPTIMIZE_PROBES(=256), kprobes optimizes first MAX_OPTIMIZE_PROBES probes and kicks optimizer for remaining probes. Changes in v5: - Use kick_kprobe_optimizer() instead of directly calling schedule_delayed_work(). - Rescheduling optimizer outside of kprobe mutex lock. Changes in v2: - Allocate code buffer and parameters in arch_init_kprobes() instead of using static arraies. - Merge previous max optimization limit patch into this patch. So, this patch introduces upper limit of optimization at once. Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Jason Baron <jbaron@redhat.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: 2nddept-manager@sdl.hitachi.co.jp Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Steven Rostedt <rostedt@goodmis.org> LKML-Reference: <20101203095428.2961.8994.stgit@ltc236.sdl.hitachi.co.jp> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-12-06kprobes: Reuse unused kprobeMasami Hiramatsu
Reuse unused (waiting for unoptimizing and no user handler) kprobe on given address instead of returning -EBUSY for registering a new kprobe. Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Jason Baron <jbaron@redhat.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: 2nddept-manager@sdl.hitachi.co.jp LKML-Reference: <20101203095416.2961.39080.stgit@ltc236.sdl.hitachi.co.jp> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-12-06kprobes: Support delayed unoptimizingMasami Hiramatsu
Unoptimization occurs when a probe is unregistered or disabled, and is heavy because it recovers instructions by using stop_machine(). This patch delays unoptimization operations and unoptimize several probes at once by using text_poke_smp_batch(). This can avoid unexpected system slowdown coming from stop_machine(). Changes in v5: - Split this patch into several cleanup patches and this patch. - Fix some text_mutex lock miss. - Use bool instead of int for behavior flags. - Add additional comment for (un)optimizing path. Changes in v2: - Use dynamic allocated buffers and params. Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Jason Baron <jbaron@redhat.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: 2nddept-manager@sdl.hitachi.co.jp LKML-Reference: <20101203095409.2961.82733.stgit@ltc236.sdl.hitachi.co.jp> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-12-06kprobes: Separate kprobe optimizing code from optimizerMasami Hiramatsu
Separate kprobe optimizing code from optimizer, this will make easy to introducing unoptimizing code in optimizer. Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Jason Baron <jbaron@redhat.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: 2nddept-manager@sdl.hitachi.co.jp LKML-Reference: <20101203095403.2961.91201.stgit@ltc236.sdl.hitachi.co.jp> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-12-06kprobes: Cleanup disabling and unregistering pathMasami Hiramatsu
Merge disabling kprobe to unregistering kprobe function and add comments for disabing/unregistring process. Current unregistering code disables(disarms) kprobes after checking target kprobe status. This patch changes it to disabling kprobe first after that it changing the kprobe's state. This allows to share probe disabling code between disable_kprobe() and unregister_kprobe(). Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Jason Baron <jbaron@redhat.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: 2nddept-manager@sdl.hitachi.co.jp LKML-Reference: <20101203095356.2961.30152.stgit@ltc236.sdl.hitachi.co.jp> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-12-06kprobes: Rename old_p to more appropriate nameMasami Hiramatsu
Rename irrelevant uses of "old_p" to more appropriate names. Originally, "old_p" just meant "the old kprobe on given address" but current code uses that name as "just another kprobe" or something like that. This patch renames those pointer names to more appropriate one for maintainability. Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Jason Baron <jbaron@redhat.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: 2nddept-manager@sdl.hitachi.co.jp LKML-Reference: <20101203095350.2961.48110.stgit@ltc236.sdl.hitachi.co.jp> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-12-04perf events: Make sample_type identity fields available in all PERF_RECORD_ ↵Arnaldo Carvalho de Melo
events If perf_event_attr.sample_id_all is set it will add the PERF_SAMPLE_ identity info: TID, TIME, ID, CPU, STREAM_ID As a trailer, so that older perf tools can process new files, just ignoring the extra payload. With this its possible to do further analysis on problems in the event stream, like detecting reordering of MMAP and FORK events, etc. V2: Fixup header size in comm, mmap and task processing, as we have to take into account different sample_types for each matching event, noticed by Thomas Gleixner. Thomas also noticed a problem in v2 where if we didn't had space in the buffer we wouldn't restore the header size. Tested-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Ian Munsie <imunsie@au1.ibm.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Frédéric Weisbecker <fweisbec@gmail.com> Cc: Ian Munsie <imunsie@au1.ibm.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> LKML-Reference: <new-submission> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-12-04perf events: Separate the routines handling the PERF_SAMPLE_ identity fieldsArnaldo Carvalho de Melo
Those will be made available in sample like events like MMAP, EXEC, etc in a followup patch. So precalculate the extra id header space and have a separate routine to fill them up. V2: Thomas noticed that the id header needs to be precalculated at inherit_events too: LKML-Reference: <alpine.LFD.2.00.1012031245220.2653@localhost6.localdomain6> Tested-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Ian Munsie <imunsie@au1.ibm.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Frédéric Weisbecker <fweisbec@gmail.com> Cc: Ian Munsie <imunsie@au1.ibm.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> LKML-Reference: <1291318772-30880-2-git-send-email-acme@infradead.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-12-04perf events: Fix event inherit fallout of precalculated headersThomas Gleixner
The precalculated header size is not updated when an event is inherited. That results in bogus sample entries for all child events. Bug introduced in c320c7b. Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ian Munsie <imunsie@au1.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> LKML-Reference: <alpine.LFD.2.00.1012031245220.2653@localhost6.localdomain6> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-12-02do_exit(): make sure that we run with get_fs() == USER_DSNelson Elhage
If a user manages to trigger an oops with fs set to KERNEL_DS, fs is not otherwise reset before do_exit(). do_exit may later (via mm_release in fork.c) do a put_user to a user-controlled address, potentially allowing a user to leverage an oops into a controlled write into kernel memory. This is only triggerable in the presence of another bug, but this potentially turns a lot of DoS bugs into privilege escalations, so it's worth fixing. I have proof-of-concept code which uses this bug along with CVE-2010-3849 to write a zero to an arbitrary kernel address, so I've tested that this is not theoretical. A more logical place to put this fix might be when we know an oops has occurred, before we call do_exit(), but that would involve changing every architecture, in multiple places. Let's just stick it in do_exit instead. [akpm@linux-foundation.org: update code comment] Signed-off-by: Nelson Elhage <nelhage@ksplice.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-12-01genirq: Fix incorrect proc spurious outputKenji Kaneshige
Since commit a1afb637(switch /proc/irq/*/spurious to seq_file) all /proc/irq/XX/spurious files show the information of irq 0. Current irq_spurious_proc_open() passes on NULL as the 3rd argument, which is used as an IRQ number in irq_spurious_proc_show(), to the single_open(). Because of this, all the /proc/irq/XX/spurious file shows IRQ 0 information regardless of the IRQ number. To fix the problem, irq_spurious_proc_open() must pass on the appropreate data (IRQ number) to single_open(). Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com> Reviewed-by: Yong Zhang <yong.zhang0@gmail.com> LKML-Reference: <4CF4B778.90604@jp.fujitsu.com> Cc: stable@kernel.org [2.6.33+] Signed-off-by: Thomas Gleixner <tglx@linutronix.de>