Age | Commit message (Collapse) | Author |
|
1. perf_event_mmap(vma) is never called with a gate_vma-like arg,
remove the "if (!vma->vm_mm)" code.
2. arch_vma_name() can use the chached value of mmap_event->vma.
3. Change the code to not call arch_vma_name() twice.
4. Purely cosmetic, but since we use "goto got_name" all the time
remove "else" from "[stack]" branch just for symmetry.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20131016200945.GB23214@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
There's nothing atomic about atomic_set vs atomic_read; so remove the
atomic_t usage.
Also, make running_sample_length static as it really is (and should
be) local to this translation unit.
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: eranian@google.com
Cc: Don Zickus <dzickus@redhat.com>
Cc: jmario@redhat.com
Cc: acme@infradead.org
Cc: dave.hansen@linux.intel.com
Link: http://lkml.kernel.org/n/tip-vw9lg588x1ic248whybjon0c@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
throttle_cfs_rq() doesn't check to make sure that period_timer is running,
and while update_curr/assign_cfs_runtime does, a concurrently running
period_timer on another cpu could cancel itself between this cpu's
update_curr and throttle_cfs_rq(). If there are no other cfs_rqs running
in the tg to restart the timer, this causes the cfs_rq to be stranded
forever.
Fix this by calling __start_cfs_bandwidth() in throttle if the timer is
inactive.
(Also add some sched_debug lines for cfs_bandwidth.)
Tested: make a run/sleep task in a cgroup, loop switching the cgroup
between 1ms/100ms quota and unlimited, checking for timer_active=0 and
throttled=1 as a failure. With the throttle_cfs_rq() change commented out
this fails, with the full patch it passes.
Signed-off-by: Ben Segall <bsegall@google.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: pjt@google.com
Link: http://lkml.kernel.org/r/20131016181632.22647.84174.stgit@sword-of-the-dawn.mtv.corp.google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Currently, group entity load-weights are initialized to zero. This
admits some races with respect to the first time they are re-weighted in
earlty use. ( Let g[x] denote the se for "g" on cpu "x". )
Suppose that we have root->a and that a enters a throttled state,
immediately followed by a[0]->t1 (the only task running on cpu[0])
blocking:
put_prev_task(group_cfs_rq(a[0]), t1)
put_prev_entity(..., t1)
check_cfs_rq_runtime(group_cfs_rq(a[0]))
throttle_cfs_rq(group_cfs_rq(a[0]))
Then, before unthrottling occurs, let a[0]->b[0]->t2 wake for the first
time:
enqueue_task_fair(rq[0], t2)
enqueue_entity(group_cfs_rq(b[0]), t2)
enqueue_entity_load_avg(group_cfs_rq(b[0]), t2)
account_entity_enqueue(group_cfs_ra(b[0]), t2)
update_cfs_shares(group_cfs_rq(b[0]))
< skipped because b is part of a throttled hierarchy >
enqueue_entity(group_cfs_rq(a[0]), b[0])
...
We now have b[0] enqueued, yet group_cfs_rq(a[0])->load.weight == 0
which violates invariants in several code-paths. Eliminate the
possibility of this by initializing group entity weight.
Signed-off-by: Paul Turner <pjt@google.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20131016181627.22647.47543.stgit@sword-of-the-dawn.mtv.corp.google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
__start_cfs_bandwidth calls hrtimer_cancel while holding rq->lock,
waiting for the hrtimer to finish. However, if sched_cfs_period_timer
runs for another loop iteration, the hrtimer can attempt to take
rq->lock, resulting in deadlock.
Fix this by ensuring that cfs_b->timer_active is cleared only if the
_latest_ call to do_sched_cfs_period_timer is returning as idle. Then
__start_cfs_bandwidth can just call hrtimer_try_to_cancel and wait for
that to succeed or timer_active == 1.
Signed-off-by: Ben Segall <bsegall@google.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: pjt@google.com
Link: http://lkml.kernel.org/r/20131016181622.22647.16643.stgit@sword-of-the-dawn.mtv.corp.google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
hrtimer_expires_remaining does not take internal hrtimer locks and thus
must be guarded against concurrent __hrtimer_start_range_ns (but
returning HRTIMER_RESTART is safe). Use cfs_b->lock to make it safe.
Signed-off-by: Ben Segall <bsegall@google.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: pjt@google.com
Link: http://lkml.kernel.org/r/20131016181617.22647.73829.stgit@sword-of-the-dawn.mtv.corp.google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
When we transition cfs_bandwidth_used to false, any currently
throttled groups will incorrectly return false from cfs_rq_throttled.
While tg_set_cfs_bandwidth will unthrottle them eventually, currently
running code (including at least dequeue_task_fair and
distribute_cfs_runtime) will cause errors.
Fix this by turning off cfs_bandwidth_used only after unthrottling all
cfs_rqs.
Tested: toggle bandwidth back and forth on a loaded cgroup. Caused
crashes in minutes without the patch, hasn't crashed with it.
Signed-off-by: Ben Segall <bsegall@google.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: pjt@google.com
Link: http://lkml.kernel.org/r/20131016181611.22647.80365.stgit@sword-of-the-dawn.mtv.corp.google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
The PPC64 people noticed a missing memory barrier and crufty old
comments in the perf ring buffer code. So update all the comments and
add the missing barrier.
When the architecture implements local_t using atomic_long_t there
will be double barriers issued; but short of introducing more
conditional barrier primitives this is the best we can do.
Reported-by: Victor Kaplansky <victork@il.ibm.com>
Tested-by: Victor Kaplansky <victork@il.ibm.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: michael@ellerman.id.au
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: anton@samba.org
Cc: benh@kernel.crashing.org
Link: http://lkml.kernel.org/r/20131025173749.GG19466@laptop.lan
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Conflicts:
tools/perf/builtin-record.c
tools/perf/builtin-top.c
tools/perf/util/hist.h
|
|
Commit 6acce3ef8:
sched: Remove get_online_cpus() usage
has left one extra put_online_cpus() inside sched_setaffinity(),
remove it to fix the WARN:
------------[ cut here ]------------
WARNING: CPU: 0 PID: 3166 at kernel/cpu.c:84 put_online_cpus+0x43/0x70()
...
[<ffffffff810c3fef>] put_online_cpus+0x43/0x70 [
[<ffffffff810efd59>] sched_setaffinity+0x7d/0x1f9 [
...
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Tested-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Michael Wang <wangyun@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/526DD0EE.1090309@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
In commit ee23871389 ("genirq: Set irq thread to RT priority on
creation") we moved the assigment of the thread's priority from the
thread's function into __setup_irq(). That function may run in user
context for instance if the user opens an UART node and then driver
calls requests in the ->open() callback. That user may not have
CAP_SYS_NICE and so the irq thread won't run with the SCHED_OTHER
policy.
This patch uses sched_setscheduler_nocheck() so we omit the CAP_SYS_NICE
check which is otherwise required for the SCHED_OTHER policy.
[bigeasy: Rewrite the changelog]
Signed-off-by: Thomas Pfaff <tpfaff@pcs.com>
Cc: Ivo Sieben <meltedpianoman@gmail.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1381489240-29626-1-git-send-email-bigeasy@linutronix.de
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
* pm-sleep:
PM / Hibernate: Use bool for boolean fields of struct snapshot_data
PM / Sleep: Detect device suspend/resume lockup and log event
|
|
* pm-qos:
PM / QoS: simplify pm_qos_power_write()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fix from Ingo Molnar:
"This tree contains a clockevents regression fix for certain ARM
subarchitectures"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
clockevents: Sanitize ticks to nsec conversion
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
"The tree contains three fixes:
- Two tooling fixes
- Reversal of the new 'MMAP2' extended mmap record ABI, introduced in
this merge window. (Patches were proposed to fix it but it was all
a bit late and we felt it's safer to just delay the ABI one more
kernel release and do it right)"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf: Disable PERF_RECORD_MMAP2 support
perf scripting perl: Fix build error on Fedora 12
perf probe: Fix to initialize fname always before use it
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking fix from Ingo Molnar:
"This tree fixes a boot crash in CONFIG_DEBUG_MUTEXES=y kernels, on
kernels built with GCC 3.x (there are still such distros)"
Side note: it's not just a fix for old gcc versions, it's also removing
an incredibly broken/subtle check that LLVM had issues with, and that
made no sense.
* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
mutex: Avoid gcc version dependent __builtin_constant_p() usage
|
|
This issue was introduced by 454c79999f7e ("sched/rt: Fix SCHED_RR
across cgroups") that missed the word 'not'. Fix it.
Signed-off-by: Li Bin <huawei.libin@huawei.com>
Cc: <guohanjun@huawei.com>
Cc: <xiexiuqi@huawei.com>
Cc: <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1382357743-54136-1-git-send-email-huawei.libin@huawei.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI and power management fixes from
"These fix two bugs in the intel_pstate driver, a hibernate bug leading
to nasty resume failures sometimes and acpi-cpufreq initialization bug
that causes problems to happen during module unload when intel_pstate
is in use.
Specifics:
- Fix for rounding errors in intel_pstate causing CPU utilization to
be underestimated from Brennan Shacklett.
- intel_pstate fix to always use the correct max pstate value when
computing the min pstate from Dirk Brandewie.
- Hibernation fix for deadlocking resume in cases when the probing of
the device containing the image is deferred from Russ Dill.
- acpi-cpufreq fix to prevent the module from staying in memory when
the driver cannot be registered and then attempting to unregister
things that have never been registered on exit"
* tag 'pm+acpi-3.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
acpi-cpufreq: Fail initialization if driver cannot be registered
PM / hibernate: Move software_resume to late_initcall_sync
intel_pstate: Correct calculation of min pstate value
intel_pstate: Improve accuracy by not truncating until final result
|
|
This patch makes use of the newly defined common hash algorithm info,
replacing, for example, PKEY_HASH with HASH_ALGO.
Changelog:
- Lindent fixes - Mimi
CC: David Howells <dhowells@redhat.com>
Signed-off-by: Dmitry Kasatkin <d.kasatkin@samsung.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
|
|
blk-mq reuses the request potentially immediately, since the most
cache hot is always given out first. This means that rq->csd could
be reused between csd->func() being called and csd_unlock() being
called. This isn't a problem, since we never use wait == 1 for
the smp call function. Add CSD_FLAG_WAIT to be able to tell the
difference, retaining the warning for other cases.
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
The blk-mq core and the blk-mq null driver uses it.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
makes procfs ->premission() instances safety in RCU mode independent
from vfsmount_lock.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
software_resume is being called after deferred_probe_initcall in
drivers base. If the probing of the device that contains the resume
image is deferred, and the system has been instructed to wait for
it to show up, this wait will occur in software_resume. This causes
a deadlock.
Move software_resume into late_initcall_sync so that it happens
after all the other late_initcalls.
Signed-off-by: Russ Dill <Russ.Dill@ti.com>
Acked-by: Pavel Machek <Pavel@ucw.cz>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
All the callers of irq_create_of_mapping() pass the contents of a struct
of_phandle_args structure to the function. Since all the callers already
have an of_phandle_args pointer, why not pass it directly to
irq_create_of_mapping()?
Signed-off-by: Grant Likely <grant.likely@linaro.org>
Acked-by: Michal Simek <monstr@monstr.eu>
Acked-by: Tony Lindgren <tony@atomide.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
Conflicts:
drivers/net/usb/qmi_wwan.c
include/net/dst.h
Trivial merge conflicts, both were overlapping changes.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Marc Kleine-Budde pointed out, that commit 77cc982 "clocksource: use
clockevents_config_and_register() where possible" caused a regression
for some of the converted subarchs.
The reason is, that the clockevents core code converts the minimal
hardware tick delta to a nanosecond value for core internal
usage. This conversion is affected by integer math rounding loss, so
the backwards conversion to hardware ticks will likely result in a
value which is less than the configured hardware limitation. The
affected subarchs used their own workaround (SIGH!) which got lost in
the conversion.
The solution for the issue at hand is simple: adding evt->mult - 1 to
the shifted value before the integer divison in the core conversion
function takes care of it. But this only works for the case where for
the scaled math mult/shift pair "mult <= 1 << shift" is true. For the
case where "mult > 1 << shift" we can apply the rounding add only for
the minimum delta value to make sure that the backward conversion is
not less than the given hardware limit. For the upper bound we need to
omit the rounding add, because the backwards conversion is always
larger than the original latch value. That would violate the upper
bound of the hardware device.
Though looking closer at the details of that function reveals another
bogosity: The upper bounds check is broken as well. Checking for a
resulting "clc" value greater than KTIME_MAX after the conversion is
pointless. The conversion does:
u64 clc = (latch << evt->shift) / evt->mult;
So there is no sanity check for (latch << evt->shift) exceeding the
64bit boundary. The latch argument is "unsigned long", so on a 64bit
arch the handed in argument could easily lead to an unnoticed shift
overflow. With the above rounding fix applied the calculation before
the divison is:
u64 clc = (latch << evt->shift) + evt->mult - 1;
So we need to make sure, that neither the shift nor the rounding add
is overflowing the u64 boundary.
[ukl: move assignment to rnd after eventually changing mult, fix build
issue and correct comment with the right math]
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Russell King - ARM Linux <linux@arm.linux.org.uk>
Cc: Marc Kleine-Budde <mkl@pengutronix.de>
Cc: nicolas.ferre@atmel.com
Cc: Marc Pignat <marc.pignat@hevs.ch>
Cc: john.stultz@linaro.org
Cc: kernel@pengutronix.de
Cc: Ronald Wahl <ronald.wahl@raritan.com>
Cc: LAK <linux-arm-kernel@lists.infradead.org>
Cc: Ludovic Desroches <ludovic.desroches@atmel.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1380052223-24139-1-git-send-email-u.kleine-koenig@pengutronix.de
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup fixes from Tejun Heo:
"Two late fixes for cgroup.
One fixes descendant walk introduced during this rc1 cycle. The other
fixes a post 3.9 bug during task attach which can lead to hang. Both
fixes are critical and the fixes are relatively straight-forward"
* 'for-3.12-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup: fix to break the while loop in cgroup_attach_task() correctly
cgroup: fix cgroup post-order descendant walk of empty subtree
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent
Pull perf/urgent fixes from Arnaldo Carvalho de Melo:
" * Fix build error on Fedora 12.
* Fix to initialize fname always before use it, bug introduced
during this merge window, from Masami Hiramatsu.
* Disable PERF_RECORD_MMAP2 support, from Stephane Eranian. "
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Usage of the static key primitives to toggle a branch must not be used
before jump_label_init() is called from init/main.c. jump_label_init
reorganizes and wires up the jump_entries so usage before that could
have unforeseen consequences.
Following primitives are now checked for correct use:
* static_key_slow_inc
* static_key_slow_dec
* static_key_slow_dec_deferred
* jump_label_rate_limit
The x86 architecture already checks this by testing if the default_nop
was already replaced with an optimal nop or with a branch instruction. It
will panic then. Other architectures don't check for this.
Because we need to relax this check for the x86 arch to allow code to
transition from default_nop to the enabled state and other architectures
did not check for this at all this patch introduces checking on the
static_key primitives in a non-arch dependent manner.
All checked functions are considered slow-path so the additional check
does no harm to performance.
The warnings are best observed with earlyprintk.
Based on a patch from Andi Kleen.
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We want these fixes here too.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
The set_graph_notrace filter is analogous to set_ftrace_notrace and
can be used for eliminating uninteresting part of function graph trace
output. It also works with set_graph_function nicely.
# cd /sys/kernel/debug/tracing/
# echo do_page_fault > set_graph_function
# perf ftrace live true
2) | do_page_fault() {
2) | __do_page_fault() {
2) 0.381 us | down_read_trylock();
2) 0.055 us | __might_sleep();
2) 0.696 us | find_vma();
2) | handle_mm_fault() {
2) | handle_pte_fault() {
2) | __do_fault() {
2) | filemap_fault() {
2) | find_get_page() {
2) 0.033 us | __rcu_read_lock();
2) 0.035 us | __rcu_read_unlock();
2) 1.696 us | }
2) 0.031 us | __might_sleep();
2) 2.831 us | }
2) | _raw_spin_lock() {
2) 0.046 us | add_preempt_count();
2) 0.841 us | }
2) 0.033 us | page_add_file_rmap();
2) | _raw_spin_unlock() {
2) 0.057 us | sub_preempt_count();
2) 0.568 us | }
2) | unlock_page() {
2) 0.084 us | page_waitqueue();
2) 0.126 us | __wake_up_bit();
2) 1.117 us | }
2) 7.729 us | }
2) 8.397 us | }
2) 8.956 us | }
2) 0.085 us | up_read();
2) + 12.745 us | }
2) + 13.401 us | }
...
# echo handle_mm_fault > set_graph_notrace
# perf ftrace live true
1) | do_page_fault() {
1) | __do_page_fault() {
1) 0.205 us | down_read_trylock();
1) 0.041 us | __might_sleep();
1) 0.344 us | find_vma();
1) 0.069 us | up_read();
1) 4.692 us | }
1) 5.311 us | }
...
Link: http://lkml.kernel.org/r/1381739066-7531-5-git-send-email-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
The parser set up is just a generic utility that uses local variables
allocated by the function. There's no need to hold the graph_lock for
this set up.
This also makes the code simpler.
Link: http://lkml.kernel.org/r/1381739066-7531-4-git-send-email-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
The struct ftrace_graph_data is for generalizing the access to
set_graph_function file. This is a preparation for adding support to
set_graph_notrace.
Link: http://lkml.kernel.org/r/1381739066-7531-3-git-send-email-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
The ftrace_graph_filter_enabled means that user sets function filter
and it always has same meaning of ftrace_graph_count > 0.
Link: http://lkml.kernel.org/r/1381739066-7531-2-git-send-email-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
Andrey reported the following report:
ERROR: AddressSanitizer: heap-buffer-overflow on address ffff8800359c99f3
ffff8800359c99f3 is located 0 bytes to the right of 243-byte region [ffff8800359c9900, ffff8800359c99f3)
Accessed by thread T13003:
#0 ffffffff810dd2da (asan_report_error+0x32a/0x440)
#1 ffffffff810dc6b0 (asan_check_region+0x30/0x40)
#2 ffffffff810dd4d3 (__tsan_write1+0x13/0x20)
#3 ffffffff811cd19e (ftrace_regex_release+0x1be/0x260)
#4 ffffffff812a1065 (__fput+0x155/0x360)
#5 ffffffff812a12de (____fput+0x1e/0x30)
#6 ffffffff8111708d (task_work_run+0x10d/0x140)
#7 ffffffff810ea043 (do_exit+0x433/0x11f0)
#8 ffffffff810eaee4 (do_group_exit+0x84/0x130)
#9 ffffffff810eafb1 (SyS_exit_group+0x21/0x30)
#10 ffffffff81928782 (system_call_fastpath+0x16/0x1b)
Allocated by thread T5167:
#0 ffffffff810dc778 (asan_slab_alloc+0x48/0xc0)
#1 ffffffff8128337c (__kmalloc+0xbc/0x500)
#2 ffffffff811d9d54 (trace_parser_get_init+0x34/0x90)
#3 ffffffff811cd7b3 (ftrace_regex_open+0x83/0x2e0)
#4 ffffffff811cda7d (ftrace_filter_open+0x2d/0x40)
#5 ffffffff8129b4ff (do_dentry_open+0x32f/0x430)
#6 ffffffff8129b668 (finish_open+0x68/0xa0)
#7 ffffffff812b66ac (do_last+0xb8c/0x1710)
#8 ffffffff812b7350 (path_openat+0x120/0xb50)
#9 ffffffff812b8884 (do_filp_open+0x54/0xb0)
#10 ffffffff8129d36c (do_sys_open+0x1ac/0x2c0)
#11 ffffffff8129d4b7 (SyS_open+0x37/0x50)
#12 ffffffff81928782 (system_call_fastpath+0x16/0x1b)
Shadow bytes around the buggy address:
ffff8800359c9700: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
ffff8800359c9780: fd fd fd fd fd fd fd fd fa fa fa fa fa fa fa fa
ffff8800359c9800: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
ffff8800359c9880: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
ffff8800359c9900: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>ffff8800359c9980: 00 00 00 00 00 00 00 00 00 00 00 00 00 00[03]fb
ffff8800359c9a00: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
ffff8800359c9a80: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
ffff8800359c9b00: fa fa fa fa fa fa fa fa 00 00 00 00 00 00 00 00
ffff8800359c9b80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff8800359c9c00: 00 00 00 00 00 00 00 00 fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap redzone: fa
Heap kmalloc redzone: fb
Freed heap region: fd
Shadow gap: fe
The out-of-bounds access happens on 'parser->buffer[parser->idx] = 0;'
Although the crash happened in ftrace_regex_open() the real bug
occurred in trace_get_user() where there's an incrementation to
parser->idx without a check against the size. The way it is triggered
is if userspace sends in 128 characters (EVENT_BUF_SIZE + 1), the loop
that reads the last character stores it and then breaks out because
there is no more characters. Then the last character is read to determine
what to do next, and the index is incremented without checking size.
Then the caller of trace_get_user() usually nulls out the last character
with a zero, but since the index is equal to the size, it writes a nul
character after the allocated space, which can corrupt memory.
Luckily, only root user has write access to this file.
Link: http://lkml.kernel.org/r/20131009222323.04fd1a0d@gandalf.local.home
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
sysfs_get_uname() is erroneously declared as returning size_t even
though it may return a negative value, specifically -EINVAL. Its
callers then check whether its return value is less than zero and indeed
that is never the case for size_t.
This patch changes sysfs_get_uname() to return ssize_t and makes sure
its callers use ssize_t accordingly.
Signed-off-by: Patrick Palka <patrick@parcs.ath.cx>
[jstultz: Didn't apply cleanly, as a similar partial fix was also applied
so had to resolve the collisions]
Signed-off-by: John Stultz <john.stultz@linaro.org>
|
|
Fix some typos in timekeeping comments.
Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com>
[jstultz: Commit message tweaks]
Signed-off-by: John Stultz <john.stultz@linaro.org>
|
|
Fedora Ruby maintainer reported latest Ruby doesn't work on Fedora Rawhide
on ARM. (http://bugs.ruby-lang.org/issues/9008)
Because of, commit 1c6b39ad3f (alarmtimers: Return -ENOTSUPP if no
RTC device is present) intruduced to return ENOTSUPP when
clock_get{time,res} can't find a RTC device. However this is incorrect.
First, ENOTSUPP isn't exported to userland (ENOTSUP or EOPNOTSUP are the
closest userland equivlents).
Second, Posix and Linux man pages agree that clock_gettime and
clock_getres should return EINVAL if clk_id argument is invalid.
While the arugment that the clockid is valid, but just not supported
on this hardware could be made, this is just a technicality that
doesn't help userspace applicaitons, and only complicates error
handling.
Thus, this patch changes the code to use EINVAL.
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: stable <stable@vger.kernel.org> #3.0 and up
Reported-by: Vit Ondruch <v.ondruch@tiscali.cz>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
[jstultz: Tweaks to commit message to include full rational]
Signed-off-by: John Stultz <john.stultz@linaro.org>
|
|
The snapshot_data structure used internally by the hibernate user
space interface code in user.c has three char fields that are used
to store boolean values. Change their data type to bool and use
true and false instead of 1 and 0, respectively, in assignments
involving those fields.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Commit 040a0a37 ("mutex: Add support for wound/wait style locks")
used "!__builtin_constant_p(p == NULL)" but gcc 3.x cannot
handle such expression correctly, leading to boot failure when
built with CONFIG_DEBUG_MUTEXES=y.
Fix it by explicitly passing a bool which tells whether p != NULL
or not.
[ PeterZ: This is a sad patch, but provided it actually generates
similar code I suppose its the best we can do bar whole
sale deprecating gcc-3. ]
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Cc: peterz@infradead.org
Cc: imirkin@alum.mit.edu
Cc: daniel.vetter@ffwll.ch
Cc: robdclark@gmail.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/201310171945.AGB17114.FSQVtHOJFOOFML@I-love.SAKURA.ne.jp
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
|
|
Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com>
[jkosina@suse.cz: fix 'explicitly', noticed by Randy Dunlap]
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
|
|
Rather than hard-lock the kernel, dump the suspend/resume thread stack
and panic() to capture a message in pstore when a driver takes too long
to suspend/resume. Default suspend/resume watchdog timeout is set to 12
seconds to be longer than the usbhid 10 second timeout, but could be
changed at compile time.
Exclude from the watchdog the time spent waiting for children that
are resumed asynchronously and time every device, whether or not they
resumed synchronously.
This patch is targeted for mobile devices where a suspend/resume lockup
could cause a system reboot. Information about failing device can be
retrieved in subsequent boot session by mounting pstore and inspecting
the log. Laptops with EFI-enabled pstore could also benefit from
this feature.
The hardware watchdog timer is likely suspended during this time and
couldn't be relied upon. The soft-lockup detector would eventually tell
that tasks are not scheduled, but would provide little context as to why.
The patch hence uses system timer and assumes it is still active while the
devices are suspended/resumed.
This feature can be enabled/disabled during kernel configuration.
This change is based on earlier work by San Mehat.
Signed-off-by: Benoit Goby <benoit@android.com>
Signed-off-by: Zoran Markovic <zoran.markovic@linaro.org>
Acked-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu
Pull RCU updates from Paul E. McKenney.
Major changes:
" 1. Update RCU documentation. These were posted to LKML at
http://article.gmane.org/gmane.linux.kernel/1566994.
2. Miscellaneous fixes. These were posted to LKML at
http://article.gmane.org/gmane.linux.kernel/1567027.
3. Grace-period-related changes, primarily to aid in debugging,
inspired by a -rt debugging session. These were posted to
LKML at http://article.gmane.org/gmane.linux.kernel/1567076.
4. Idle entry/exit changes, primarily to address issues located
by Tibor Billes. These were posted to LKML at
http://article.gmane.org/gmane.linux.kernel/1567096.
5. Code reorganization moving RCU's source files from kernel
to kernel/rcu. This was posted to LKML at
http://article.gmane.org/gmane.linux.kernel/1577344."
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Let kstrtos32_from_user() do the necessary calls and checks.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
For now, we disable the extended MMAP record support (MMAP2).
We have identified cases where it would not report the correct mapping
information, clone(VM_CLONE) but with separate pids. We will revisit
the support once we find a solution for this case.
The patch changes the kernel to return EINVAL if attr->mmap2 is set. The
patch also modifies the perf tool to use regular PERF_RECORD_MMAP for
synthetic events and it also prevents the tool from requesting
attr->mmap2 mode because the kernel would reject it.
The support will be revisited once the kenrel interface is updated.
In V2, we reduce the patch to the strict minimum.
In V3, we avoid calling perf_event_open() with mmap2 set because we know
it will fail and require fallback retry.
Signed-off-by: Stephane Eranian <eranian@google.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20131017173215.GA8820@quad
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
This adds the .init_array section as yet another section with constructors. This
is needed because gcc could add __gcov_init calls to .init_array or .ctors
section, depending on gcc (and binutils) version .
v2: - reuse mod->ctors for .init_array section for modules, because gcc uses
.ctors or .init_array, but not both at the same time
v3: - fail to load if that does happen somehow.
Signed-off-by: Frantisek Hrbata <fhrbata@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
|
Turn the initial value of sysctl kernel.sysrq (SYSRQ_DEFAULT_ENABLE)
into a Kconfig variable.
Original version by Bastian Blank <waldi@debian.org>.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Add the new helper, prepare_to_wait_event() which should only be used
by ___wait_event().
prepare_to_wait_event() returns -ERESTARTSYS if signal_pending_state()
is true, otherwise it does prepare_to_wait/exclusive. This allows to
uninline the signal-pending checks in wait_event*() macros.
Also, it can initialize wait->private/func. We do not care if they were
already initialized, the values are the same. This also shaves a couple
of insns from the inlined code.
This obviously makes prepare_*() path a little bit slower, but we are
likely going to sleep anyway, so I think it makes sense to shrink .text:
text data bss dec hex filename
===================================================
before: 5126092 2959248 10117120 18202460 115bf5c vmlinux
after: 5124618 2955152 10117120 18196890 115a99a vmlinux
on my build.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20131007161824.GA29757@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Remove get_online_cpus() usage from the scheduler; there's 4 sites that
use it:
- sched_init_smp(); where its completely superfluous since we're in
'early' boot and there simply cannot be any hotplugging.
- sched_getaffinity(); we already take a raw spinlock to protect the
task cpus_allowed mask, this disables preemption and therefore
also stabilizes cpu_online_mask as that's modified using
stop_machine. However switch to active mask for symmetry with
sched_setaffinity()/set_cpus_allowed_ptr(). We guarantee active
mask stability by inserting sync_rcu/sched() into _cpu_down.
- sched_setaffinity(); we don't appear to need get_online_cpus()
either, there's two sites where hotplug appears relevant:
* cpuset_cpus_allowed(); for the !cpuset case we use possible_mask,
for the cpuset case we hold task_lock, which is a spinlock and
thus for mainline disables preemption (might cause pain on RT).
* set_cpus_allowed_ptr(); Holds all scheduler locks and thus has
preemption properly disabled; also it already deals with hotplug
races explicitly where it releases them.
- migrate_swap(); we can make stop_two_cpus() do the heavy lifting for
us with a little trickery. By adding a sync_sched/rcu() after the
CPU_DOWN_PREPARE notifier we can provide preempt/rcu guarantees for
cpu_active_mask. Use these to validate that both our cpus are active
when queueing the stop work before we queue the stop_machine works
for take_cpu_down().
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Link: http://lkml.kernel.org/r/20131011123820.GV3081@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|