summaryrefslogtreecommitdiffstats
path: root/kernel
AgeCommit message (Collapse)Author
2013-11-05audit: loginuid functions coding styleEric Paris
This is just a code rework. It makes things more readable. It does not make any functional changes. It does change the log messages to include both the old session id as well the new and it includes a new res field, which means we get messages even when the user did not have permission to change the loginuid. Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Eric Paris <eparis@redhat.com>
2013-11-05audit: implement generic feature setting and retrievingEric Paris
The audit_status structure was not designed with extensibility in mind. Define a new AUDIT_SET_FEATURE message type which takes a new structure of bits where things can be enabled/disabled/locked one at a time. This structure should be able to grow in the future while maintaining forward and backward compatibility (based loosly on the ideas from capabilities and prctl) This does not actually add any features, but is just infrastructure to allow new on/off types of audit system features. Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Eric Paris <eparis@redhat.com>
2013-11-05audit: change decimal constant to macro for invalid uidRichard Guy Briggs
SFR reported this 2013-05-15: > After merging the final tree, today's linux-next build (i386 defconfig) > produced this warning: > > kernel/auditfilter.c: In function 'audit_data_to_entry': > kernel/auditfilter.c:426:3: warning: this decimal constant is unsigned only > in ISO C90 [enabled by default] > > Introduced by commit 780a7654cee8 ("audit: Make testing for a valid > loginuid explicit") from Linus' tree. Replace this decimal constant in the code with a macro to make it more readable (add to the unsigned cast to quiet the warning). Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Eric Paris <eparis@redhat.com>
2013-11-05audit: printk USER_AVC messages when audit isn't enabledTyler Hicks
When the audit=1 kernel parameter is absent and auditd is not running, AUDIT_USER_AVC messages are being silently discarded. AUDIT_USER_AVC messages should be sent to userspace using printk(), as mentioned in the commit message of 4a4cd633 ("AUDIT: Optimise the audit-disabled case for discarding user messages"). When audit_enabled is 0, audit_receive_msg() discards all user messages except for AUDIT_USER_AVC messages. However, audit_log_common_recv_msg() refuses to allocate an audit_buffer if audit_enabled is 0. The fix is to special case AUDIT_USER_AVC messages in both functions. It looks like commit 50397bd1 ("[AUDIT] clean up audit_receive_msg()") introduced this bug. Cc: <stable@kernel.org> # v2.6.25+ Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Eric Paris <eparis@redhat.com> Cc: linux-audit@redhat.com Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Eric Paris <eparis@redhat.com>
2013-11-05audit_alloc: clear TIF_SYSCALL_AUDIT if !audit_contextOleg Nesterov
If audit_filter_task() nacks the new thread it makes sense to clear TIF_SYSCALL_AUDIT which can be copied from parent by dup_task_struct(). A wrong TIF_SYSCALL_AUDIT is not really bad but it triggers the "slow" audit paths in entry.S to ensure the task can not miss audit_syscall_*() calls, this is pointless if the task has no ->audit_context. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Steve Grubb <sgrubb@redhat.com> Acked-by: Eric Paris <eparis@redhat.com> Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Eric Paris <eparis@redhat.com>
2013-11-05Audit: remove duplicate commentsGao feng
Remove it. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Eric Paris <eparis@redhat.com>
2013-11-05audit: remove newline accidentally added during session id helper refactorRichard Guy Briggs
A newline was accidentally added during session ID helper refactorization in commit 4d3fb709. This needlessly uses up buffer space, messes up syslog formatting and makes userspace processing less efficient. Remove it. Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Acked-by: Eric Paris <eparis@redhat.com> Signed-off-by: Eric Paris <eparis@redhat.com>
2013-11-05audit: remove duplicate inclusion of the netlink headerIlya V. Matveychikov
Signed-off-by: Ilya V. Matveychikov <matvejchikov@gmail.com> Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Eric Paris <eparis@redhat.com>
2013-11-05audit: format user messages to size of MAX_AUDIT_MESSAGE_LENGTHRichard Guy Briggs
Messages of type AUDIT_USER_TTY were being formatted to 1024 octets, truncating messages approaching MAX_AUDIT_MESSAGE_LENGTH (8970 octets). Set the formatting to 8560 characters, given maximum estimates for prefix and suffix budgets. See the problem discussion: https://www.redhat.com/archives/linux-audit/2009-January/msg00030.html And the new size rationale: https://www.redhat.com/archives/linux-audit/2013-September/msg00016.html Test ~8k messages with: auditctl -m "$(for i in $(seq -w 001 820);do echo -n "${i}0______";done)" Reported-by: LC Bruzenak <lenny@magitekltd.com> Reported-by: Justin Stephenson <jstephen@redhat.com> Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Eric Paris <eparis@redhat.com>
2013-11-04Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: drivers/net/ethernet/emulex/benet/be.h drivers/net/netconsole.c net/bridge/br_private.h Three mostly trivial conflicts. The net/bridge/br_private.h conflict was a function signature (argument addition) change overlapping with the extern removals from Joe Perches. In drivers/net/netconsole.c we had one change adjusting a printk message whilst another changed "printk(KERN_INFO" into "pr_info(". Lastly, the emulex change was a new inline function addition overlapping with Joe Perches's extern removals. Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-04Merge branch 'perf/urgent' into perf/core to fix conflictsIngo Molnar
Conflicts: tools/perf/bench/numa.c Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-01Merge branch 'linus' into sched/coreIngo Molnar
Resolve cherry-picking conflicts: Conflicts: mm/huge_memory.c mm/memory.c mm/mprotect.c See this upstream merge commit for more details: 52469b4fcd4f Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-31hung_task debugging: Add tracepoint to report the hangOleg Nesterov
Currently check_hung_task() prints a warning if it detects the problem, but it is not convenient to watch the system logs if user-space wants to be notified about the hang. Add the new trace_sched_process_hang() into check_hung_task(), this way a user-space monitor can easily wait for the hang and potentially resolve a problem. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Dave Sullivan <dsulliva@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20131019161828.GA7439@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-30kernel/system_certificate.S: use real contents instead of macro GLOBAL()Chen Gang
If a macro is only used within 2 times, and also its contents are within 2 lines, recommend to expand it to shrink code line. For our case, the macro is not portable either: some architectures' assembler may use another character to mark newline in a macro (e.g. '`' for arc), which will cause issue. If still want to use macro and let it portable enough, it will also need include additional header file (e.g "#include <linux/linkage.h>", although it also need be fixed). Signed-off-by: Chen Gang <gang.chen@asianux.com> Signed-off-by: David Howells <dhowells@redhat.com>
2013-10-30padata: make the sequence counter an atomic_tMathias Krause
Using a spinlock to atomically increase a counter sounds wrong -- we've atomic_t for this! Also move 'seq_nr' to a different cache line than 'lock' to reduce cache line trashing. This has the nice side effect of decreasing the size of struct parallel_data from 192 to 128 bytes for a x86-64 build, e.g. occupying only two instead of three cache lines. Those changes results in a 5% performance increase on an IPsec test run using pcrypt. Btw. the seq_lock spinlock was never explicitly initialized -- one more reason to get rid of it. Signed-off-by: Mathias Krause <mathias.krause@secunet.com> Acked-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2013-10-29uprobes: Teach uprobe_copy_process() to handle CLONE_VFORKOleg Nesterov
uprobe_copy_process() does nothing if the child shares ->mm with the forking process, but there is a special case: CLONE_VFORK. In this case it would be more correct to do dup_utask() but avoid dup_xol(). This is not that important, the child should not unwind its stack too much, this can corrupt the parent's stack, but at least we need this to allow to ret-probe __vfork() itself. Note: in theory, it would be better to check task_pt_regs(p)->sp instead of CLONE_VFORK, we need to dup_utask() if and only if the child can return from the function called by the parent. But this needs the arch-dependant helper, and I think that nobody actually does clone(same_stack, CLONE_VM). Reported-by: Martin Cermak <mcermak@redhat.com> Reported-by: David Smith <dsmith@redhat.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
2013-10-29uprobes: Change uprobe_copy_process() to dup xol_areaOleg Nesterov
This finally fixes the serious bug in uretprobes: a forked child crashes if the parent called fork() with the pending ret probe. Trivial test-case: # perf probe -x /lib/libc.so.6 __fork%return # perf record -e probe_libc:__fork perl -le 'fork || print "OK"' (the child doesn't print "OK", it is killed by SIGSEGV) If the child returns from the probed function it actually returns to trampoline_vaddr, because it got the copy of parent's stack mangled by prepare_uretprobe() when the parent entered this func. It crashes because a) this address is not mapped and b) until the previous change it doesn't have the proper->return_instances info. This means that uprobe_copy_process() has to create xol_area which has the trampoline slot, and its vaddr should be equal to parent's xol_area->vaddr. Unfortunately, uprobe_copy_process() can not simply do __create_xol_area(child, xol_area->vaddr). This could actually work but perf_event_mmap() doesn't expect the usage of foreign ->mm. So we offload this to task_work_run(), and pass the argument via not yet used utask->vaddr. We know that this vaddr is fine for install_special_mapping(), the necessary hole was recently "created" by dup_mmap() which skips the parent's VM_DONTCOPY area, and nobody else could use the new mm. Unfortunately, this also means that we can not handle the errors properly, we obviously can not abort the already completed fork(). So we simply print the warning if GFP_KERNEL allocation (the only possible reason) fails. Reported-by: Martin Cermak <mcermak@redhat.com> Reported-by: David Smith <dsmith@redhat.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2013-10-29uprobes: Change uprobe_copy_process() to dup return_instancesOleg Nesterov
uprobe_copy_process() assumes that the new child doesn't need ->utask, it should be allocated by demand. But this is not true if the forking task has the pending ret- probes, the child should report them as well and thus it needs the copy of parent's ->return_instances chain. Otherwise the child crashes when it returns from the probed function. Alternatively we could cleanup the child's stack, but this needs per-arch changes and this is not what we want. At least systemtap expects a .return in the child too. Note: this change alone doesn't fix the problem, see the next change. Reported-by: Martin Cermak <mcermak@redhat.com> Reported-by: David Smith <dsmith@redhat.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2013-10-29uprobes: Teach __create_xol_area() to accept the predefined vaddrOleg Nesterov
Currently xol_add_vma() uses get_unmapped_area() for area->vaddr, but the next patches need to use the fixed address. So this patch adds the new "vaddr" argument to __create_xol_area() which should be used as area->vaddr if it is nonzero. xol_add_vma() doesn't bother to verify that the predefined addr is not used, insert_vm_struct() should fail if find_vma_links() detects the overlap with the existing vma. Also, __create_xol_area() doesn't need __GFP_ZERO to allocate area. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2013-10-29uprobes: Introduce __create_xol_area()Oleg Nesterov
No functional changes, preparation. Extract the code which actually allocates/installs the new area into the new helper, __create_xol_area(). While at it remove the unnecessary "ret = ENOMEM" and "ret = 0" in xol_add_vma(), they both have no effect. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2013-10-29uprobes: Change the callsite of uprobe_copy_process()Oleg Nesterov
Preparation for the next patches. Move the callsite of uprobe_copy_process() in copy_process() down to the succesfull return. We do not care if copy_process() fails, uprobe_free_utask() won't be called in this case so the wrong ->utask != NULL doesn't matter. OTOH, with this change we know that copy_process() can't fail when uprobe_copy_process() is called, the new task should either return to user-mode or call do_exit(). This way uprobe_copy_process() can: 1. setup p->utask != NULL if necessary 2. setup uprobes_state.xol_area 3. use task_work_add(p) Also, move the definition of uprobe_copy_process() down so that it can see get_utask(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2013-10-29perf: Fix the perf context switch optimizationPeter Zijlstra
Currently we only optimize the context switch between two contexts that have the same parent; this forgoes the optimization between parent and child context, even though these contexts could be equivalent too. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Shishkin, Alexander <alexander.shishkin@intel.com> Link: http://lkml.kernel.org/r/20131007164257.GH3081@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-29perf: Change zero-padding of strings in perf_event_mmap_event()Peter Zijlstra
Oleg complained about the excessive 0-ing in perf_event_mmap_event(), so try and be smarter about it while keeping it fairly fool proof and avoid leaking random bits out to userspace. Suggested-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/n/tip-8jirlm99m6if2z13wd6rbyu6@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-29perf: Do not waste PAGE_SIZE bytes for ALIGN(8) in perf_event_mmap_event()Oleg Nesterov
perf_event_mmap_event() does kzalloc(PATH_MAX + sizeof(u64)) to ensure we can align the size later. However this means that we actually allocate PAGE_SIZE * 2 buffer, seems too much. Change this code to allocate PATH_MAX==PAGE_SIZE bytes, but tell d_path() to not use the last sizeof(u64) bytes. Note: it is not clear why do we need __GFP_ZERO, see the next patch. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20131016201004.GC23214@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-29perf: Kill the dead !vma->vm_mm code in perf_event_mmap_event()Oleg Nesterov
1. perf_event_mmap(vma) is never called with a gate_vma-like arg, remove the "if (!vma->vm_mm)" code. 2. arch_vma_name() can use the chached value of mmap_event->vma. 3. Change the code to not call arch_vma_name() twice. 4. Purely cosmetic, but since we use "goto got_name" all the time remove "else" from "[stack]" branch just for symmetry. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20131016200945.GB23214@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-29perf: Remove useless atomic_tPeter Zijlstra
There's nothing atomic about atomic_set vs atomic_read; so remove the atomic_t usage. Also, make running_sample_length static as it really is (and should be) local to this translation unit. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: eranian@google.com Cc: Don Zickus <dzickus@redhat.com> Cc: jmario@redhat.com Cc: acme@infradead.org Cc: dave.hansen@linux.intel.com Link: http://lkml.kernel.org/n/tip-vw9lg588x1ic248whybjon0c@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-29sched: Avoid throttle_cfs_rq() racing with period_timer stoppingBen Segall
throttle_cfs_rq() doesn't check to make sure that period_timer is running, and while update_curr/assign_cfs_runtime does, a concurrently running period_timer on another cpu could cancel itself between this cpu's update_curr and throttle_cfs_rq(). If there are no other cfs_rqs running in the tg to restart the timer, this causes the cfs_rq to be stranded forever. Fix this by calling __start_cfs_bandwidth() in throttle if the timer is inactive. (Also add some sched_debug lines for cfs_bandwidth.) Tested: make a run/sleep task in a cgroup, loop switching the cgroup between 1ms/100ms quota and unlimited, checking for timer_active=0 and throttled=1 as a failure. With the throttle_cfs_rq() change commented out this fails, with the full patch it passes. Signed-off-by: Ben Segall <bsegall@google.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: pjt@google.com Link: http://lkml.kernel.org/r/20131016181632.22647.84174.stgit@sword-of-the-dawn.mtv.corp.google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-29sched: Guarantee new group-entities always have weightPaul Turner
Currently, group entity load-weights are initialized to zero. This admits some races with respect to the first time they are re-weighted in earlty use. ( Let g[x] denote the se for "g" on cpu "x". ) Suppose that we have root->a and that a enters a throttled state, immediately followed by a[0]->t1 (the only task running on cpu[0]) blocking: put_prev_task(group_cfs_rq(a[0]), t1) put_prev_entity(..., t1) check_cfs_rq_runtime(group_cfs_rq(a[0])) throttle_cfs_rq(group_cfs_rq(a[0])) Then, before unthrottling occurs, let a[0]->b[0]->t2 wake for the first time: enqueue_task_fair(rq[0], t2) enqueue_entity(group_cfs_rq(b[0]), t2) enqueue_entity_load_avg(group_cfs_rq(b[0]), t2) account_entity_enqueue(group_cfs_ra(b[0]), t2) update_cfs_shares(group_cfs_rq(b[0])) < skipped because b is part of a throttled hierarchy > enqueue_entity(group_cfs_rq(a[0]), b[0]) ... We now have b[0] enqueued, yet group_cfs_rq(a[0])->load.weight == 0 which violates invariants in several code-paths. Eliminate the possibility of this by initializing group entity weight. Signed-off-by: Paul Turner <pjt@google.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20131016181627.22647.47543.stgit@sword-of-the-dawn.mtv.corp.google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-29sched: Fix hrtimer_cancel()/rq->lock deadlockBen Segall
__start_cfs_bandwidth calls hrtimer_cancel while holding rq->lock, waiting for the hrtimer to finish. However, if sched_cfs_period_timer runs for another loop iteration, the hrtimer can attempt to take rq->lock, resulting in deadlock. Fix this by ensuring that cfs_b->timer_active is cleared only if the _latest_ call to do_sched_cfs_period_timer is returning as idle. Then __start_cfs_bandwidth can just call hrtimer_try_to_cancel and wait for that to succeed or timer_active == 1. Signed-off-by: Ben Segall <bsegall@google.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: pjt@google.com Link: http://lkml.kernel.org/r/20131016181622.22647.16643.stgit@sword-of-the-dawn.mtv.corp.google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-29sched: Fix cfs_bandwidth misuse of hrtimer_expires_remainingBen Segall
hrtimer_expires_remaining does not take internal hrtimer locks and thus must be guarded against concurrent __hrtimer_start_range_ns (but returning HRTIMER_RESTART is safe). Use cfs_b->lock to make it safe. Signed-off-by: Ben Segall <bsegall@google.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: pjt@google.com Link: http://lkml.kernel.org/r/20131016181617.22647.73829.stgit@sword-of-the-dawn.mtv.corp.google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-29sched: Fix race on toggling cfs_bandwidth_usedBen Segall
When we transition cfs_bandwidth_used to false, any currently throttled groups will incorrectly return false from cfs_rq_throttled. While tg_set_cfs_bandwidth will unthrottle them eventually, currently running code (including at least dequeue_task_fair and distribute_cfs_runtime) will cause errors. Fix this by turning off cfs_bandwidth_used only after unthrottling all cfs_rqs. Tested: toggle bandwidth back and forth on a loaded cgroup. Caused crashes in minutes without the patch, hasn't crashed with it. Signed-off-by: Ben Segall <bsegall@google.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: pjt@google.com Link: http://lkml.kernel.org/r/20131016181611.22647.80365.stgit@sword-of-the-dawn.mtv.corp.google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-29perf: Fix perf ring buffer memory orderingPeter Zijlstra
The PPC64 people noticed a missing memory barrier and crufty old comments in the perf ring buffer code. So update all the comments and add the missing barrier. When the architecture implements local_t using atomic_long_t there will be double barriers issued; but short of introducing more conditional barrier primitives this is the best we can do. Reported-by: Victor Kaplansky <victork@il.ibm.com> Tested-by: Victor Kaplansky <victork@il.ibm.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: michael@ellerman.id.au Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Cc: Michael Neuling <mikey@neuling.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: anton@samba.org Cc: benh@kernel.crashing.org Link: http://lkml.kernel.org/r/20131025173749.GG19466@laptop.lan Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-29Merge branch 'perf/urgent' into perf/coreIngo Molnar
Conflicts: tools/perf/builtin-record.c tools/perf/builtin-top.c tools/perf/util/hist.h
2013-10-28sched: Remove extra put_online_cpus() inside sched_setaffinity()Michael wang
Commit 6acce3ef8: sched: Remove get_online_cpus() usage has left one extra put_online_cpus() inside sched_setaffinity(), remove it to fix the WARN: ------------[ cut here ]------------ WARNING: CPU: 0 PID: 3166 at kernel/cpu.c:84 put_online_cpus+0x43/0x70() ... [<ffffffff810c3fef>] put_online_cpus+0x43/0x70 [ [<ffffffff810efd59>] sched_setaffinity+0x7d/0x1f9 [ ... Reported-by: Fengguang Wu <fengguang.wu@intel.com> Tested-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Michael Wang <wangyun@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/526DD0EE.1090309@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-28genirq: Set the irq thread policy without checking CAP_SYS_NICEThomas Pfaff
In commit ee23871389 ("genirq: Set irq thread to RT priority on creation") we moved the assigment of the thread's priority from the thread's function into __setup_irq(). That function may run in user context for instance if the user opens an UART node and then driver calls requests in the ->open() callback. That user may not have CAP_SYS_NICE and so the irq thread won't run with the SCHED_OTHER policy. This patch uses sched_setscheduler_nocheck() so we omit the CAP_SYS_NICE check which is otherwise required for the SCHED_OTHER policy. [bigeasy: Rewrite the changelog] Signed-off-by: Thomas Pfaff <tpfaff@pcs.com> Cc: Ivo Sieben <meltedpianoman@gmail.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1381489240-29626-1-git-send-email-bigeasy@linutronix.de Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2013-10-28Merge branch 'pm-sleep'Rafael J. Wysocki
* pm-sleep: PM / Hibernate: Use bool for boolean fields of struct snapshot_data PM / Sleep: Detect device suspend/resume lockup and log event
2013-10-28Merge branch 'pm-qos'Rafael J. Wysocki
* pm-qos: PM / QoS: simplify pm_qos_power_write()
2013-10-27Merge branch 'timers-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fix from Ingo Molnar: "This tree contains a clockevents regression fix for certain ARM subarchitectures" * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: clockevents: Sanitize ticks to nsec conversion
2013-10-27Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "The tree contains three fixes: - Two tooling fixes - Reversal of the new 'MMAP2' extended mmap record ABI, introduced in this merge window. (Patches were proposed to fix it but it was all a bit late and we felt it's safer to just delay the ABI one more kernel release and do it right)" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf: Disable PERF_RECORD_MMAP2 support perf scripting perl: Fix build error on Fedora 12 perf probe: Fix to initialize fname always before use it
2013-10-27Merge branch 'core-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fix from Ingo Molnar: "This tree fixes a boot crash in CONFIG_DEBUG_MUTEXES=y kernels, on kernels built with GCC 3.x (there are still such distros)" Side note: it's not just a fix for old gcc versions, it's also removing an incredibly broken/subtle check that LLVM had issues with, and that made no sense. * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: mutex: Avoid gcc version dependent __builtin_constant_p() usage
2013-10-26sched/rt: Fix task_tick_rt() commentLi Bin
This issue was introduced by 454c79999f7e ("sched/rt: Fix SCHED_RR across cgroups") that missed the word 'not'. Fix it. Signed-off-by: Li Bin <huawei.libin@huawei.com> Cc: <guohanjun@huawei.com> Cc: <xiexiuqi@huawei.com> Cc: <peterz@infradead.org> Link: http://lkml.kernel.org/r/1382357743-54136-1-git-send-email-huawei.libin@huawei.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-26Merge tag 'pm+acpi-3.12-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI and power management fixes from "These fix two bugs in the intel_pstate driver, a hibernate bug leading to nasty resume failures sometimes and acpi-cpufreq initialization bug that causes problems to happen during module unload when intel_pstate is in use. Specifics: - Fix for rounding errors in intel_pstate causing CPU utilization to be underestimated from Brennan Shacklett. - intel_pstate fix to always use the correct max pstate value when computing the min pstate from Dirk Brandewie. - Hibernation fix for deadlocking resume in cases when the probing of the device containing the image is deferred from Russ Dill. - acpi-cpufreq fix to prevent the module from staying in memory when the driver cannot be registered and then attempting to unregister things that have never been registered on exit" * tag 'pm+acpi-3.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: acpi-cpufreq: Fail initialization if driver cannot be registered PM / hibernate: Move software_resume to late_initcall_sync intel_pstate: Correct calculation of min pstate value intel_pstate: Improve accuracy by not truncating until final result
2013-10-25keys: change asymmetric keys to use common hash definitionsDmitry Kasatkin
This patch makes use of the newly defined common hash algorithm info, replacing, for example, PKEY_HASH with HASH_ALGO. Changelog: - Lindent fixes - Mimi CC: David Howells <dhowells@redhat.com> Signed-off-by: Dmitry Kasatkin <d.kasatkin@samsung.com> Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
2013-10-25smp: don't warn about csd->flags having CSD_FLAG_LOCK cleared for !waitJens Axboe
blk-mq reuses the request potentially immediately, since the most cache hot is always given out first. This means that rq->csd could be reused between csd->func() being called and csd_unlock() being called. This isn't a problem, since we never use wait == 1 for the smp call function. Add CSD_FLAG_WAIT to be able to tell the difference, retaining the warning for other cases. Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-10-25smp: export __smp_call_function_single()Jens Axboe
The blk-mq core and the blk-mq null driver uses it. Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-10-24pid_namespace: make freeing struct pid_namespace rcu-delayedAl Viro
makes procfs ->premission() instances safety in RCU mode independent from vfsmount_lock. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-10-25PM / hibernate: Move software_resume to late_initcall_syncRuss Dill
software_resume is being called after deferred_probe_initcall in drivers base. If the probing of the device that contains the resume image is deferred, and the system has been instructed to wait for it to show up, this wait will occur in software_resume. This causes a deadlock. Move software_resume into late_initcall_sync so that it happens after all the other late_initcalls. Signed-off-by: Russ Dill <Russ.Dill@ti.com> Acked-by: Pavel Machek <Pavel@ucw.cz> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-10-24of/irq: simplify args to irq_create_of_mappingGrant Likely
All the callers of irq_create_of_mapping() pass the contents of a struct of_phandle_args structure to the function. Since all the callers already have an of_phandle_args pointer, why not pass it directly to irq_create_of_mapping()? Signed-off-by: Grant Likely <grant.likely@linaro.org> Acked-by: Michal Simek <monstr@monstr.eu> Acked-by: Tony Lindgren <tony@atomide.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Russell King <linux@arm.linux.org.uk> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-10-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: drivers/net/usb/qmi_wwan.c include/net/dst.h Trivial merge conflicts, both were overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-23clockevents: Sanitize ticks to nsec conversionThomas Gleixner
Marc Kleine-Budde pointed out, that commit 77cc982 "clocksource: use clockevents_config_and_register() where possible" caused a regression for some of the converted subarchs. The reason is, that the clockevents core code converts the minimal hardware tick delta to a nanosecond value for core internal usage. This conversion is affected by integer math rounding loss, so the backwards conversion to hardware ticks will likely result in a value which is less than the configured hardware limitation. The affected subarchs used their own workaround (SIGH!) which got lost in the conversion. The solution for the issue at hand is simple: adding evt->mult - 1 to the shifted value before the integer divison in the core conversion function takes care of it. But this only works for the case where for the scaled math mult/shift pair "mult <= 1 << shift" is true. For the case where "mult > 1 << shift" we can apply the rounding add only for the minimum delta value to make sure that the backward conversion is not less than the given hardware limit. For the upper bound we need to omit the rounding add, because the backwards conversion is always larger than the original latch value. That would violate the upper bound of the hardware device. Though looking closer at the details of that function reveals another bogosity: The upper bounds check is broken as well. Checking for a resulting "clc" value greater than KTIME_MAX after the conversion is pointless. The conversion does: u64 clc = (latch << evt->shift) / evt->mult; So there is no sanity check for (latch << evt->shift) exceeding the 64bit boundary. The latch argument is "unsigned long", so on a 64bit arch the handed in argument could easily lead to an unnoticed shift overflow. With the above rounding fix applied the calculation before the divison is: u64 clc = (latch << evt->shift) + evt->mult - 1; So we need to make sure, that neither the shift nor the rounding add is overflowing the u64 boundary. [ukl: move assignment to rnd after eventually changing mult, fix build issue and correct comment with the right math] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Russell King - ARM Linux <linux@arm.linux.org.uk> Cc: Marc Kleine-Budde <mkl@pengutronix.de> Cc: nicolas.ferre@atmel.com Cc: Marc Pignat <marc.pignat@hevs.ch> Cc: john.stultz@linaro.org Cc: kernel@pengutronix.de Cc: Ronald Wahl <ronald.wahl@raritan.com> Cc: LAK <linux-arm-kernel@lists.infradead.org> Cc: Ludovic Desroches <ludovic.desroches@atmel.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1380052223-24139-1-git-send-email-u.kleine-koenig@pengutronix.de Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>