From bf26c018490c2fce7fe9b629083b96ce0e6ad019 Mon Sep 17 00:00:00 2001 From: Frederic Weisbecker Date: Thu, 7 Apr 2011 16:53:20 +0200 Subject: ptrace: Prepare to fix racy accesses on task breakpoints When a task is traced and is in a stopped state, the tracer may execute a ptrace request to examine the tracee state and get its task struct. Right after, the tracee can be killed and thus its breakpoints released. This can happen concurrently when the tracer is in the middle of reading or modifying these breakpoints, leading to dereferencing a freed pointer. Hence, to prepare the fix, create a generic breakpoint reference holding API. When a reference on the breakpoints of a task is held, the breakpoints won't be released until the last reference is dropped. After that, no more ptrace request on the task's breakpoints can be serviced for the tracer. Reported-by: Oleg Nesterov Signed-off-by: Frederic Weisbecker Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Will Deacon Cc: Prasad Cc: Paul Mundt Cc: v2.6.33.. Link: http://lkml.kernel.org/r/1302284067-7860-2-git-send-email-fweisbec@gmail.com --- kernel/exit.c | 2 +- kernel/ptrace.c | 17 +++++++++++++++++ 2 files changed, 18 insertions(+), 1 deletion(-) (limited to 'kernel') diff --git a/kernel/exit.c b/kernel/exit.c index f5d2f63bae0..8dd87418154 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -1016,7 +1016,7 @@ NORET_TYPE void do_exit(long code) /* * FIXME: do that only when needed, using sched_exit tracepoint */ - flush_ptrace_hw_breakpoint(tsk); + ptrace_put_breakpoints(tsk); exit_notify(tsk, group_dead); #ifdef CONFIG_NUMA diff --git a/kernel/ptrace.c b/kernel/ptrace.c index 0fc1eed28d2..dc7ab65f3b3 100644 --- a/kernel/ptrace.c +++ b/kernel/ptrace.c @@ -22,6 +22,7 @@ #include #include #include +#include /* @@ -879,3 +880,19 @@ asmlinkage long compat_sys_ptrace(compat_long_t request, compat_long_t pid, return ret; } #endif /* CONFIG_COMPAT */ + +#ifdef CONFIG_HAVE_HW_BREAKPOINT +int ptrace_get_breakpoints(struct task_struct *tsk) +{ + if (atomic_inc_not_zero(&tsk->ptrace_bp_refcnt)) + return 0; + + return -1; +} + +void ptrace_put_breakpoints(struct task_struct *tsk) +{ + if (atomic_dec_and_test(&tsk->ptrace_bp_refcnt)) + flush_ptrace_hw_breakpoint(tsk); +} +#endif /* CONFIG_HAVE_HW_BREAKPOINT */ -- cgit v1.2.3-70-g09d2 From 1409f141ac719b994d2832911b1e9ec928943fc2 Mon Sep 17 00:00:00 2001 From: Hillf Danton Date: Wed, 27 Apr 2011 15:26:55 -0700 Subject: kernel/watchdog.c: disable nmi perf event in the error path of enabling watchdog In corner cases where softlockup watchdog is not setup successfully, the relevant nmi perf event for hardlockup watchdog could be disabled, then the status of the underlying hardware remains unchanged. Also, if the kthread doesn't start then the hrtimer won't run and the hardlockup detector will falsely fire. Signed-off-by: Hillf Danton Signed-off-by: Don Zickus Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- kernel/watchdog.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) (limited to 'kernel') diff --git a/kernel/watchdog.c b/kernel/watchdog.c index 140dce75045..14733d4d156 100644 --- a/kernel/watchdog.c +++ b/kernel/watchdog.c @@ -430,9 +430,12 @@ static int watchdog_enable(int cpu) p = kthread_create(watchdog, (void *)(unsigned long)cpu, "watchdog/%d", cpu); if (IS_ERR(p)) { printk(KERN_ERR "softlockup watchdog for %i failed\n", cpu); - if (!err) + if (!err) { /* if hardlockup hasn't already set this */ err = PTR_ERR(p); + /* and disable the perf event */ + watchdog_nmi_disable(cpu); + } goto out; } kthread_bind(p, cpu); -- cgit v1.2.3-70-g09d2 From ce31332d3c77532d6ea97ddcb475a2b02dd358b4 Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Fri, 29 Apr 2011 00:02:00 +0200 Subject: hrtimer: Initialize CLOCK_ID to HRTIMER_BASE table statically MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sedat and Bruno reported RCU stalls which turned out to be caused by the following; sched_init() calls init_rt_bandwidth() which calls hrtimer_init() _BEFORE_ hrtimers_init() is called. While not entirely correct this worked because hrtimer_init() only accessed statically initialized data (hrtimer_bases.clock_base[CLOCK_MONOTONIC]) Commit e06383db9 (hrtimers: extend hrtimer base code to handle more then 2 clockids) added an indirection to the hrtimer_bases.clock_base lookup to avoid gap handling in the hot path. The table which is used for the translataion from CLOCK_ID to HRTIMER_BASE index is initialized at runtime in hrtimers_init(). So the early call of the scheduler code translates CLOCK_MONOTONIC to HRTIMER_BASE_REALTIME. Thus the rt_bandwith timer ends up on CLOCK_REALTIME. If the timer is armed and the wall clock time is set (e.g. ntpdate in the early boot process - which also gives the problem deterministic behaviour i.e. magic recovery after N hours), then the timer ends up with an expiry time far into the future. That breaks the RT throttler mechanism as rt runtime is accumulated and never cleared, so the rt throttler detects a false cpu hog condition and blocks all RT tasks until the timer finally expires. That in turn stalls the RCU thread of TINYRCU which leads to an huge amount of RCU callbacks piling up. Make the translation table statically initialized, so we are back to the status of <= 2.6.39. Reported-and-tested-by: Sedat Dilek Reported-by: Bruno Prémont Cc: John stultz Cc: Mike Galbraith Cc: Paul E. McKenney Cc: Peter Zijlstra Link: http://lkml.kernel.org/r/%3Calpine.LFD.2.02.1104282353140.3005%40ionos%3E Reviewed-by: Ingo Molnar Signed-off-by: Thomas Gleixner --- kernel/hrtimer.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) (limited to 'kernel') diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c index 9017478c5d4..87fdb3f8db1 100644 --- a/kernel/hrtimer.c +++ b/kernel/hrtimer.c @@ -81,7 +81,11 @@ DEFINE_PER_CPU(struct hrtimer_cpu_base, hrtimer_bases) = } }; -static int hrtimer_clock_to_base_table[MAX_CLOCKS]; +static int hrtimer_clock_to_base_table[MAX_CLOCKS] = { + [CLOCK_REALTIME] = HRTIMER_BASE_REALTIME, + [CLOCK_MONOTONIC] = HRTIMER_BASE_MONOTONIC, + [CLOCK_BOOTTIME] = HRTIMER_BASE_BOOTTIME, +}; static inline int hrtimer_clockid_to_base(clockid_t clock_id) { @@ -1722,10 +1726,6 @@ static struct notifier_block __cpuinitdata hrtimers_nb = { void __init hrtimers_init(void) { - hrtimer_clock_to_base_table[CLOCK_REALTIME] = HRTIMER_BASE_REALTIME; - hrtimer_clock_to_base_table[CLOCK_MONOTONIC] = HRTIMER_BASE_MONOTONIC; - hrtimer_clock_to_base_table[CLOCK_BOOTTIME] = HRTIMER_BASE_BOOTTIME; - hrtimer_cpu_notify(&hrtimers_nb, (unsigned long)CPU_UP_PREPARE, (void *)(long)smp_processor_id()); register_cpu_notifier(&hrtimers_nb); -- cgit v1.2.3-70-g09d2 From 5035b20fa5cd146b66f5f89619c20a4177fb736d Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Fri, 29 Apr 2011 18:08:37 +0200 Subject: workqueue: fix deadlock in worker_maybe_bind_and_lock() If a rescuer and stop_machine() bringing down a CPU race with each other, they may deadlock on non-preemptive kernel. The CPU won't accept a new task, so the rescuer can't migrate to the target CPU, while stop_machine() can't proceed because the rescuer is holding one of the CPU retrying migration. GCWQ_DISASSOCIATED is never cleared and worker_maybe_bind_and_lock() retries indefinitely. This problem can be reproduced semi reliably while the system is entering suspend. http://thread.gmane.org/gmane.linux.kernel/1122051 A lot of kudos to Thilo-Alexander for reporting this tricky issue and painstaking testing. stable: This affects all kernels with cmwq, so all kernels since and including v2.6.36 need this fix. Signed-off-by: Tejun Heo Reported-by: Thilo-Alexander Ginkel Tested-by: Thilo-Alexander Ginkel Cc: stable@kernel.org --- kernel/workqueue.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) (limited to 'kernel') diff --git a/kernel/workqueue.c b/kernel/workqueue.c index 04ef830690e..e3378e8d3a5 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -1291,8 +1291,14 @@ __acquires(&gcwq->lock) return true; spin_unlock_irq(&gcwq->lock); - /* CPU has come up inbetween, retry migration */ + /* + * We've raced with CPU hot[un]plug. Give it a breather + * and retry migration. cond_resched() is required here; + * otherwise, we might deadlock against cpu_stop trying to + * bring down the CPU on non-preemptive kernel. + */ cpu_relax(); + cond_resched(); } } -- cgit v1.2.3-70-g09d2 From 94b2c363dcf732a4edab4ed66041cb36e7f28fbf Mon Sep 17 00:00:00 2001 From: Geert Uytterhoeven Date: Sat, 30 Apr 2011 22:56:20 +0200 Subject: genirq: Fix typo CONFIG_GENIRC_IRQ_SHOW_LEVEL commit ab7798ffcf98b11a9525cf65bacdae3fd58d357f ("genirq: Expand generic show_interrupts()") added the Kconfig option GENERIC_IRQ_SHOW_LEVEL to accomodate PowerPC, but this doesn't actually enable the functionality due to a typo in the #ifdef check. Signed-off-by: Geert Uytterhoeven Cc: Linux/PPC Development Link: http://lkml.kernel.org/r/%3Calpine.DEB.2.00.1104302251370.19068%40ayla.of.borg%3E Signed-off-by: Thomas Gleixner --- kernel/irq/proc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'kernel') diff --git a/kernel/irq/proc.c b/kernel/irq/proc.c index dd201bd3510..834899f2500 100644 --- a/kernel/irq/proc.c +++ b/kernel/irq/proc.c @@ -419,7 +419,7 @@ int show_interrupts(struct seq_file *p, void *v) } else { seq_printf(p, " %8s", "None"); } -#ifdef CONFIG_GENIRC_IRQ_SHOW_LEVEL +#ifdef CONFIG_GENERIC_IRQ_SHOW_LEVEL seq_printf(p, " %-8s", irqd_is_level_type(&desc->irq_data) ? "Level" : "Edge"); #endif if (desc->name) -- cgit v1.2.3-70-g09d2 From a3a4a5acd3bd2f6f1e102e1f1b9d2e2bb320a7fd Mon Sep 17 00:00:00 2001 From: Arjan van de Ven Date: Thu, 5 May 2011 23:55:18 -0400 Subject: Regression: partial revert "tracing: Remove lock_depth from event entry" This partially reverts commit e6e1e2593592a8f6f6380496655d8c6f67431266. That commit changed the structure layout of the trace structure, which in turn broke PowerTOP (1.9x generation) quite badly. I appreciate not wanting to expose the variable in question, and PowerTOP was not using it, so I've replaced the variable with just a padding field - that way if in the future a new field is needed it can just use this padding field. Signed-off-by: Arjan van de Ven Signed-off-by: Linus Torvalds --- include/linux/ftrace_event.h | 1 + kernel/trace/trace.c | 1 + kernel/trace/trace_events.c | 1 + 3 files changed, 3 insertions(+) (limited to 'kernel') diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h index 22b32af1b5e..b5a550a39a7 100644 --- a/include/linux/ftrace_event.h +++ b/include/linux/ftrace_event.h @@ -37,6 +37,7 @@ struct trace_entry { unsigned char flags; unsigned char preempt_count; int pid; + int padding; }; #define FTRACE_MAX_EVENT \ diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index d38c16a06a6..1cb49be7c7f 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -1110,6 +1110,7 @@ tracing_generic_entry_update(struct trace_entry *entry, unsigned long flags, entry->preempt_count = pc & 0xff; entry->pid = (tsk) ? tsk->pid : 0; + entry->padding = 0; entry->flags = #ifdef CONFIG_TRACE_IRQFLAGS_SUPPORT (irqs_disabled_flags(flags) ? TRACE_FLAG_IRQS_OFF : 0) | diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c index e88f74fe1d4..2fe11034135 100644 --- a/kernel/trace/trace_events.c +++ b/kernel/trace/trace_events.c @@ -116,6 +116,7 @@ static int trace_define_common_fields(void) __common_field(unsigned char, flags); __common_field(unsigned char, preempt_count); __common_field(int, pid); + __common_field(int, padding); return ret; } -- cgit v1.2.3-70-g09d2