summaryrefslogtreecommitdiffstats
path: root/kernel
AgeCommit message (Collapse)Author
2012-09-17audit: Don't pass pid or uid to audit_log_common_recv_msgEric W. Biederman
The only place we use the uid and the pid that we calculate in audit_receive_msg is in audit_log_common_recv_msg so move the calculation of these values into the audit_log_common_recv_msg. Simplify the calcuation of the current pid and uid by reading them from current instead of reading them from NETLINK_CREDS. Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Eric Paris <eparis@redhat.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2012-09-17audit: Remove the unused uid parameter from audit_receive_filterEric W. Biederman
Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Eric Paris <eparis@redhat.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2012-09-17audit: Properly set the origin port id of audit messages.Eric W. Biederman
For user generated audit messages set the portid field in the netlink header to the netlink port where the user generated audit message came from. Reporting the process id in a port id field was just nonsense. Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Eric Paris <eparis@redhat.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2012-09-17audit: Simply AUDIT_TTY_SET and AUDIT_TTY_GETEric W. Biederman
Use current instead of looking up the current up the current task by process identifier. Netlink requests are processed in trhe context of the sending task so this is safe. Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Eric Paris <eparis@redhat.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2012-09-17audit: kill audit_prepare_user_ttyEric W. Biederman
Now that netlink messages are processed in the context of the sender tty_audit_push_task can be called directly and audit_prepare_user_tty which only added looking up the task of the tty by process id is not needed. Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Eric Paris <eparis@redhat.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2012-09-17audit: Use current instead of NETLINK_CREDS() in audit_filterEric W. Biederman
Get caller process uid and gid and pid values from the current task instead of the NETLINK_CB. This is simpler than passing NETLINK_CREDS from from audit_receive_msg to audit_filter_user_rules and avoid the chance of being hit by the occassional bugs in netlink uid/gid credential passing. This is a safe changes because all netlink requests are processed in the task of the sending process. Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Eric Paris <eparis@redhat.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2012-09-17audit: Limit audit requests to processes in the initial pid and user namespaces.Eric W. Biederman
This allows the code to safely make the assumption that all of the uids gids and pids that need to be send in audit messages are in the initial namespaces. If someone cares we may lift this restriction someday but start with limiting access so at least the code is always correct. Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Eric Paris <eparis@redhat.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2012-09-17Merge branch 'for-3.6-fixes' of ↵Tejun Heo
git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq into for-3.7 This merge is necessary as Lai's CPU hotplug restructuring series depends on the CPU hotplug bug fixes in for-3.6-fixes. The merge creates one trivial conflict between the following two commits. 96e65306b8 "workqueue: UNBOUND -> REBIND morphing in rebind_workers() should be atomic" e2b6a6d570 "workqueue: use system_highpri_wq for highpri workers in rebind_workers()" Both add local variable definitions to the same block and can be merged in any order. Signed-off-by: Tejun Heo <tj@kernel.org>
2012-09-17Merge branch 'for-3.6-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull another workqueue fix from Tejun Heo: "Unfortunately, yet another late fix. This too is discovered and fixed by Lai. This bug was introduced during this merge window by commit 25511a477657 ("workqueue: reimplement CPU online rebinding to handle idle workers") which started using WORKER_REBIND flag for idle rebind too. The bug is relatively easy to trigger if the CPU rapidly goes through off, on and then off (and stay off). The fix is on the safer side. This hasn't been on linux-next yet but I'm pushing early so that it can get more exposure before v3.6 release." * 'for-3.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: always clear WORKER_REBIND in busy_worker_rebind_fn()
2012-09-17workqueue: always clear WORKER_REBIND in busy_worker_rebind_fn()Lai Jiangshan
busy_worker_rebind_fn() didn't clear WORKER_REBIND if rebinding failed (CPU is down again). This used to be okay because the flag wasn't used for anything else. However, after 25511a477 "workqueue: reimplement CPU online rebinding to handle idle workers", WORKER_REBIND is also used to command idle workers to rebind. If not cleared, the worker may confuse the next CPU_UP cycle by having REBIND spuriously set or oops / get stuck by prematurely calling idle_worker_rebind(). WARNING: at /work/os/wq/kernel/workqueue.c:1323 worker_thread+0x4cd/0x5 00() Hardware name: Bochs Modules linked in: test_wq(O-) Pid: 33, comm: kworker/1:1 Tainted: G O 3.6.0-rc1-work+ #3 Call Trace: [<ffffffff8109039f>] warn_slowpath_common+0x7f/0xc0 [<ffffffff810903fa>] warn_slowpath_null+0x1a/0x20 [<ffffffff810b3f1d>] worker_thread+0x4cd/0x500 [<ffffffff810bc16e>] kthread+0xbe/0xd0 [<ffffffff81bd2664>] kernel_thread_helper+0x4/0x10 ---[ end trace e977cf20f4661968 ]--- BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffff810b3db0>] worker_thread+0x360/0x500 PGD 0 Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC Modules linked in: test_wq(O-) CPU 0 Pid: 33, comm: kworker/1:1 Tainted: G W O 3.6.0-rc1-work+ #3 Bochs Bochs RIP: 0010:[<ffffffff810b3db0>] [<ffffffff810b3db0>] worker_thread+0x360/0x500 RSP: 0018:ffff88001e1c9de0 EFLAGS: 00010086 RAX: 0000000000000000 RBX: ffff88001e633e00 RCX: 0000000000004140 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000009 RBP: ffff88001e1c9ea0 R08: 0000000000000000 R09: 0000000000000001 R10: 0000000000000002 R11: 0000000000000000 R12: ffff88001fc8d580 R13: ffff88001fc8d590 R14: ffff88001e633e20 R15: ffff88001e1c6900 FS: 0000000000000000(0000) GS:ffff88001fc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000000 CR3: 00000000130e8000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process kworker/1:1 (pid: 33, threadinfo ffff88001e1c8000, task ffff88001e1c6900) Stack: ffff880000000000 ffff88001e1c9e40 0000000000000001 ffff88001e1c8010 ffff88001e519c78 ffff88001e1c9e58 ffff88001e1c6900 ffff88001e1c6900 ffff88001e1c6900 ffff88001e1c6900 ffff88001fc8d340 ffff88001fc8d340 Call Trace: [<ffffffff810bc16e>] kthread+0xbe/0xd0 [<ffffffff81bd2664>] kernel_thread_helper+0x4/0x10 Code: b1 00 f6 43 48 02 0f 85 91 01 00 00 48 8b 43 38 48 89 df 48 8b 00 48 89 45 90 e8 ac f0 ff ff 3c 01 0f 85 60 01 00 00 48 8b 53 50 <8b> 02 83 e8 01 85 c0 89 02 0f 84 3b 01 00 00 48 8b 43 38 48 8b RIP [<ffffffff810b3db0>] worker_thread+0x360/0x500 RSP <ffff88001e1c9de0> CR2: 0000000000000000 There was no reason to keep WORKER_REBIND on failure in the first place - WORKER_UNBOUND is guaranteed to be set in such cases preventing incorrectly activating concurrency management. Always clear WORKER_REBIND. tj: Updated comment and description. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2012-09-17pid-namespace: limit value of ns_last_pid to (0, max_pid)Andrew Vagin
The kernel doesn't check the pid for negative values, so if you try to write -2 to /proc/sys/kernel/ns_last_pid, you will get a kernel panic. The crash happens because the next pid is -1, and alloc_pidmap() will try to access to a nonexistent pidmap. map = &pid_ns->pidmap[pid/BITS_PER_PAGE]; Signed-off-by: Andrew Vagin <avagin@openvz.org> Acked-by: Cyrill Gorcunov <gorcunov@openvz.org> Acked-by: Oleg Nesterov <oleg@redhat.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-09-17Merge branch 'pm-qos'Rafael J. Wysocki
* pm-qos: PM / QoS: Add return code to pm_qos_get_value function.
2012-09-17Merge branch 'pm-sleep'Rafael J. Wysocki
* pm-sleep: properly __init-annotate pm_sysrq_init() PM / wakeup: Use irqsave/irqrestore for events_lock PM / Freezer: Fix small typo "regrigerator" PM / Sleep: Print name of wakeup source that aborts suspend
2012-09-17Merge branch 'pm-timers'Rafael J. Wysocki
* pm-timers: PM: Do not use the syscore flag for runtime PM sh: MTU2: Basic runtime PM support sh: CMT: Basic runtime PM support sh: TMU: Basic runtime PM support PM / Domains: Do not measure start time for "irq safe" devices PM / Domains: Move syscore flag from subsys data to struct device PM / Domains: Rename the always_on device flag to syscore PM / Runtime: Allow helpers to be called by early platform drivers PM: Reorganize device PM initialization sh: MTU2: Introduce clock events suspend/resume routines sh: CMT: Introduce clocksource/clock events suspend/resume routines sh: TMU: Introduce clocksource/clock events suspend/resume routines timekeeping: Add suspend and resume of clock event devices PM / Domains: Add power off/on function for system core suspend stage PM / Domains: Introduce simplified power on routine for system resume
2012-09-17arm64: Add support for /proc/sys/debug/exception-traceCatalin Marinas
This patch allows setting of the show_unhandled_signals variable via /proc/sys/debug/exception-trace. The default value is currently 1 showing unhandled user faults (undefined instructions, data aborts) and invalid signal stack frames. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Tony Lindgren <tony@atomide.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Nicolas Pitre <nico@linaro.org> Acked-by: Olof Johansson <olof@lixom.net> Acked-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
2012-09-16Revert "sched: Improve scalability via 'CPU buddies', which withstand random ↵Linus Torvalds
perturbations" This reverts commit 970e178985cadbca660feb02f4d2ee3a09f7fdda. Nikolay Ulyanitsky reported thatthe 3.6-rc5 kernel has a 15-20% performance drop on PostgreSQL 9.2 on his machine (running "pgbench"). Borislav Petkov was able to reproduce this, and bisected it to this commit 970e178985ca ("sched: Improve scalability via 'CPU buddies' ...") apparently because the new single-idle-buddy model simply doesn't find idle CPU's to reschedule on aggressively enough. Mike Galbraith suspects that it is likely due to the user-mode spinlocks in PostgreSQL not reacting well to preemption, but we don't really know the details - I'll just revert the commit for now. There are hopefully other approaches to improve scheduler scalability without it causing these kinds of downsides. Reported-by: Nikolay Ulyanitsky <lystor@gmail.com> Bisected-by: Borislav Petkov <bp@alien8.de> Acked-by: Mike Galbraith <efault@gmx.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-09-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: net/netfilter/nfnetlink_log.c net/netfilter/xt_LOG.c Rather easy conflict resolution, the 'net' tree had bug fixes to make sure we checked if a socket is a time-wait one or not and elide the logging code if so. Whereas on the 'net-next' side we are calculating the UID and GID from the creds using different interfaces due to the user namespace changes from Eric Biederman. Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-15uprobes: Introduce arch_uprobe_enable/disable_step()Sebastian Andrzej Siewior
As Oleg pointed out in [0] uprobe should not use the ptrace interface for enabling/disabling single stepping. [0] http://lkml.kernel.org/r/20120730141638.GA5306@redhat.com Add the new "__weak arch" helpers which simply call user_*_single_step() as a preparation. This is only needed to not break the powerpc port, we will fold this logic into arch_uprobe_pre/post_xol() hooks later. We should also change handle_singlestep(), _disable_step(&uprobe->arch) should be called before put_uprobe(). Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2012-09-15uprobes: Teach find_active_uprobe() to clear MMF_HAS_UPROBESOleg Nesterov
The wrong MMF_HAS_UPROBES doesn't really hurt, just it triggers the "slow" and unnecessary handle_swbp() path if the task hits the non-uprobe breakpoint. So this patch changes find_active_uprobe() to check every valid vma and clear MMF_HAS_UPROBES if no uprobes were found. This is adds the slow O(n) path, but it is only called in unlikely case when the task hits the normal breakpoint first time after uprobe_unregister(). Note the "not strictly accurate" comment in mmf_recalc_uprobes(). We can fix this, we only need to teach vma_has_uprobes() to return a bit more more info, but I am not sure this worth the trouble. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2012-09-15uprobes: Introduce MMF_RECALC_UPROBESOleg Nesterov
Add the new MMF_RECALC_UPROBES flag, it means that MMF_HAS_UPROBES can be false positive after remove_breakpoint() or uprobe_munmap(). It is also set by uprobe_dup_mmap(), this is not optimal but simple. We could add the new hook, uprobe_dup_vma(), to set MMF_HAS_UPROBES only if the new mm actually has uprobes, but I don't think this makes sense. The next patch will use this flag to clear MMF_HAS_UPROBES. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2012-09-15uprobes: uprobes_treelock should not disable irqsOleg Nesterov
Nobody plays with uprobes_tree/uprobes_treelock in interrupt context, no need to disable irqs. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2012-09-15uprobes: Don't put NULL pointer in uprobe_register()Sebastian Andrzej Siewior
alloc_uprobe() might return a NULL pointer, put_uprobe() can't deal with this. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2012-09-14Merge branch 'sched-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Ingo Molnar: "Smaller fixlets" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched: Fix kernel-doc warnings in kernel/sched/fair.c sched: Unthrottle rt runqueues in __disable_runtime() sched: Add missing call to calc_load_exit_idle() sched: Fix load avg vs cpu-hotplug
2012-09-14Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "This tree includes various fixes" Ingo really needs to improve on the whole "explain git pull" part. "Various fixes" indeed. * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/hwpb: Invoke __perf_event_disable() if interrupts are already disabled perf/x86: Enable Intel Cedarview Atom suppport perf_event: Switch to internal refcount, fix race with close() oprofile, s390: Fix uninitialized memory access when writing to oprofilefs perf/x86: Fix microcode revision check for SNB-PEBS
2012-09-14cgroup: mark subsystems with broken hierarchy support and whine if cgroups ↵Tejun Heo
are nested for them Currently, cgroup hierarchy support is a mess. cpu related subsystems behave correctly - configuration, accounting and control on a parent properly cover its children. blkio and freezer completely ignore hierarchy and treat all cgroups as if they're directly under the root cgroup. Others show yet different behaviors. These differing interpretations of cgroup hierarchy make using cgroup confusing and it impossible to co-mount controllers into the same hierarchy and obtain sane behavior. Eventually, we want full hierarchy support from all subsystems and probably a unified hierarchy. Users using separate hierarchies expecting completely different behaviors depending on the mounted subsystem is deterimental to making any progress on this front. This patch adds cgroup_subsys.broken_hierarchy and sets it to %true for controllers which are lacking in hierarchy support. The goal of this patch is two-fold. * Move users away from using hierarchy on currently non-hierarchical subsystems, so that implementing proper hierarchy support on those doesn't surprise them. * Keep track of which controllers are broken how and nudge the subsystems to implement proper hierarchy support. For now, start with a single warning message. We can whine louder later on. v2: Fixed a typo spotted by Michal. Warning message updated. v3: Updated memcg part so that it doesn't generate warning in the cases where .use_hierarchy=false doesn't make the behavior different from root.use_hierarchy=true. Fixed a typo spotted by Glauber. v4: Check ->broken_hierarchy after cgroup creation is complete so that ->create() can affect the result per Michal. Dropped unnecessary memcg root handling per Michal. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Michal Hocko <mhocko@suse.cz> Acked-by: Li Zefan <lizefan@huawei.com> Acked-by: Serge E. Hallyn <serue@us.ibm.com> Cc: Glauber Costa <glommer@parallels.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Paul Turner <pjt@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Thomas Graf <tgraf@suug.ch> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
2012-09-14cgroup: Assign subsystem IDs during compile timeDaniel Wagner
WARNING: With this change it is impossible to load external built controllers anymore. In case where CONFIG_NETPRIO_CGROUP=m and CONFIG_NET_CLS_CGROUP=m is set, corresponding subsys_id should also be a constant. Up to now, net_prio_subsys_id and net_cls_subsys_id would be of the type int and the value would be assigned during runtime. By switching the macro definition IS_SUBSYS_ENABLED from IS_BUILTIN to IS_ENABLED, all *_subsys_id will have constant value. That means we need to remove all the code which assumes a value can be assigned to net_prio_subsys_id and net_cls_subsys_id. A close look is necessary on the RCU part which was introduces by following patch: commit f845172531fb7410c7fb7780b1a6e51ee6df7d52 Author: Herbert Xu <herbert@gondor.apana.org.au> Mon May 24 09:12:34 2010 Committer: David S. Miller <davem@davemloft.net> Mon May 24 09:12:34 2010 cls_cgroup: Store classid in struct sock Tis code was added to init_cgroup_cls() /* We can't use rcu_assign_pointer because this is an int. */ smp_wmb(); net_cls_subsys_id = net_cls_subsys.subsys_id; respectively to exit_cgroup_cls() net_cls_subsys_id = -1; synchronize_rcu(); and in module version of task_cls_classid() rcu_read_lock(); id = rcu_dereference(net_cls_subsys_id); if (id >= 0) classid = container_of(task_subsys_state(p, id), struct cgroup_cls_state, css)->classid; rcu_read_unlock(); Without an explicit explaination why the RCU part is needed. (The rcu_deference was fixed by exchanging it to rcu_derefence_index_check() in a later commit, but that is a minor detail.) So here is my pondering why it was introduced and why it safe to remove it now. Note that this code was copied over to net_prio the reasoning holds for that subsystem too. The idea behind the RCU use for net_cls_subsys_id is to make sure we get a valid pointer back from task_subsys_state(). task_subsys_state() is just blindly accessing the subsys array and returning the pointer. Obviously, passing in -1 as id into task_subsys_state() returns an invalid value (out of lower bound). So this code makes sure that only after module is loaded and the subsystem registered, the id is assigned. Before unregistering the module all old readers must have left the critical section. This is done by assigning -1 to the id and issuing a synchronized_rcu(). Any new readers wont call task_subsys_state() anymore and therefore it is safe to unregister the subsystem. The new code relies on the same trick, but it looks at the subsys pointer return by task_subsys_state() (remember the id is constant and therefore we allways have a valid index into the subsys array). No precautions need to be taken during module loading module. Eventually, all CPUs will get a valid pointer back from task_subsys_state() because rebind_subsystem() which is called after the module init() function will assigned subsys[net_cls_subsys_id] the newly loaded module subsystem pointer. When the subsystem is about to be removed, rebind_subsystem() will called before the module exit() function. In this case, rebind_subsys() will assign subsys[net_cls_subsys_id] a NULL pointer and then it calls synchronize_rcu(). All old readers have left by then the critical section. Any new reader wont access the subsystem anymore. At this point we are safe to unregister the subsystem. No synchronize_rcu() call is needed. Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de> Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Eric Dumazet <edumazet@google.com> Cc: Gao feng <gaofeng@cn.fujitsu.com> Cc: Glauber Costa <glommer@parallels.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: John Fastabend <john.r.fastabend@intel.com> Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: netdev@vger.kernel.org Cc: cgroups@vger.kernel.org
2012-09-14cgroup: Do not depend on a given order when populating the subsys arrayDaniel Wagner
The *_subsys_id will be used as index to access the subsys. Therefore we need to care we populate the subsystem at the correct position by using designated initialization. With this change we are able to interleave builtin and modules in the subsys array. Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de> Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Cc: Gao feng <gaofeng@cn.fujitsu.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: John Fastabend <john.r.fastabend@intel.com> Cc: netdev@vger.kernel.org Cc: cgroups@vger.kernel.org
2012-09-14cgroup: Wrap subsystem selection macroDaniel Wagner
Before we are able to define all subsystem ids at compile time we need a more fine grained control what gets defined when we include cgroup_subsys.h. For example we define the enums for the subsystems or to declare for struct cgroup_subsys (builtin subsystem) by including cgroup_subsys.h and defining SUBSYS accordingly. Currently, the decision if a subsys is used is defined inside the header by testing if CONFIG_*=y is true. By moving this test outside of cgroup_subsys.h we are able to control it on the include level. This is done by introducing IS_SUBSYS_ENABLED which then is defined according the task, e.g. is CONFIG_*=y or CONFIG_*=m. Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de> Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Cc: Gao feng <gaofeng@cn.fujitsu.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: John Fastabend <john.r.fastabend@intel.com> Cc: netdev@vger.kernel.org Cc: cgroups@vger.kernel.org
2012-09-14cgroup: Remove CGROUP_BUILTIN_SUBSYS_COUNTDaniel Wagner
CGROUP_BUILTIN_SUBSYS_COUNT is used as start index or stop index when looping over the subsys array looking either at the builtin or the module subsystems. Since all the builtin subsystems have an id which is lower then CGROUP_BUILTIN_SUBSYS_COUNT we know that any module will have an id larger than CGROUP_BUILTIN_SUBSYS_COUNT. In short the ids are sorted. We are about to change id assignment to happen only at compile time later in this series. That means we can't rely on the above trick since all ids will always be defined at compile time. Furthermore, ordering the builtin subsystems and the module subsystems is not really necessary. So we need a different way to know which subsystem is a builtin or a module one. We can use the subsys[]->module pointer for this. Any place where we need to know if a subsys is module we just check for the pointer. If it is NULL then the subsystem is a builtin one. With this we are able to drop the CGROUP_BUILTIN_SUBSYS_COUNT enum. Though we need to introduce a temporary placeholder so that we don't get a compilation error when only CONFIG_CGROUP is selected and no single controller. An empty enum definition is not valid. Later in this series we are able to remove the placeholder again. And with this change we get a fix for this: kernel/cgroup.c: In function ‘cgroup_load_subsys’: kernel/cgroup.c:4326:38: warning: array subscript is below array bounds [-Warray-bounds] when CONFIG_CGROUP=y and no built in controller was enabled. Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de> Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Cc: Gao feng <gaofeng@cn.fujitsu.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: John Fastabend <john.r.fastabend@intel.com> Cc: netdev@vger.kernel.org Cc: cgroups@vger.kernel.org
2012-09-14Merge branch 'tip/perf/core' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace into perf/core Pull tracing updates from Steve Rostedt. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-09-13kprobes/x86: Fix to support jprobes on ftrace-based kprobeMasami Hiramatsu
Fix kprobes/x86 to support jprobes on ftrace-based kprobes. Because of -mfentry support of ftrace, ftrace is now put on the beginning of function where jprobes are put. Originally ftrace-based kprobes doesn't support jprobe because it will change regs->ip and ftrace doesn't support changing IP and ftrace itself doesn't conflict jprobe. However, ftrace -mfentry support moves mcount call on the top of functions where jprobes are put. This means that jprobe always conflicts with ftrace-based kprobe and fails. This patch allows ftrace-based kprobes to support jprobes by allowing to modify regs->ip and kprobes breakpoint handler also allows to skip singlestepping because there is a ftrace call (not an original instruction). Link: http://lkml.kernel.org/r/20120905143125.10329.90836.stgit@localhost.localdomain Reported-by: Fengguang Wu <fengguang.wu@intel.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-09-13trace: Stop compiling in trace_clock unconditionallyJosh Triplett
Commit 56449f437 "tracing: make the trace clocks available generally", in April 2009, made trace_clock available unconditionally, since CONFIG_X86_DS used it too. Commit faa4602e47 "x86, perf, bts, mm: Delete the never used BTS-ptrace code", in March 2010, removed CONFIG_X86_DS, and now only CONFIG_RING_BUFFER (split out from CONFIG_TRACING for general use) has a dependency on trace_clock. So, only compile in trace_clock with CONFIG_RING_BUFFER or CONFIG_TRACING enabled. Link: http://lkml.kernel.org/r/20120903024513.GA19583@leaf Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Josh Triplett <josh@joshtriplett.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-09-13tracing: Skip printing "OK" if failed to disable eventYuanhan Liu
No acutal case found. But logically, we should skip "OK" in case any error met. Link: http://lkml.kernel.org/r/1346051625-25231-1-git-send-email-yuanhan.liu@linux.intel.com Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-09-13locking: Adjust spin lock inlining Kconfig optionsJan Beulich
Break out the DEBUG_SPINLOCK dependency (requires moving up UNINLINE_SPIN_UNLOCK, as this was the only one in that block not depending on that option). Avoid putting values not selected into the resulting .config - they are not useful for anything, make the output less legible, and just consume space: Use "depends on" rather than directly setting the default from the combined dependency values. Signed-off-by: Jan Beulich <jbeulich@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/504DF2AC020000780009A2DF@nat28.tlf.novell.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-09-13time: Fix timeekeping_get_ns overflow on 32bit systemsJohn Stultz
Daniel Lezcano reported seeing multi-second stalls from keyboard input on his T61 laptop when NOHZ and CPU_IDLE were enabled on a 32bit kernel. He bisected the problem down to commit 1e75fa8be9fb6 ("time: Condense timekeeper.xtime into xtime_sec"). After reproducing this issue, I narrowed the problem down to the fact that timekeeping_get_ns() returns a 64bit nsec value that hasn't been accumulated. In some cases this value was being then stored in timespec.tv_nsec (which is a long). On 32bit systems, with idle times larger then 4 seconds (or less, depending on the value of xtime_nsec), the returned nsec value would overflow 32bits. This limited kept time from increasing, causing timers to not expire. The fix is to make sure we don't directly store the result of timekeeping_get_ns() into a tv_nsec field, instead using a 64bit nsec value which can then be added into the timespec via timespec_add_ns(). Reported-and-bisected-by: Daniel Lezcano <daniel.lezcano@linaro.org> Tested-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: John Stultz <john.stultz@linaro.org> Acked-by: Prarit Bhargava <prarit@redhat.com> Cc: Richard Cochran <richardcochran@gmail.com> Link: http://lkml.kernel.org/r/1347405963-35715-1-git-send-email-john.stultz@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-09-13Merge branch 'core/rcu' into perf/coreIngo Molnar
Steve Rostedt asked for the merge of a single commit, into both the RCU and the perf/tracing tree: | Josh made a change to the tracing code that affects both the | work Paul McKenney and I are currently doing. At the last | Kernel Summit back in August, Linus said when such a case | exists, it is best to make a separate branch based off of his | tree and place the change there. This way, the repositories | that need to share the change can both pull them in and the | SHA1 will match for both. Whichever branch is pulled in first | by Linus will also pull in the necessary change for the other | branch as well. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-09-13lockdep: Check if nested lock is actually heldMaarten Lankhorst
It is considered good form to lock the lock you claim to be nested in. Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com> [ removed nest_lock arg to print_lock_nested_lock_not_held in favour of hlock->nest_lock, also renamed the lock arg to hlock since its a held_lock type ] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/5051A9E7.5040501@canonical.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-09-13sched: cpu_power: enable ARCH_POWERVincent Guittot
Heteregeneous ARM platform uses arch_scale_freq_power function to reflect the relative capacity of each core Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1341826026-6504-6-git-send-email-vincent.guittot@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-09-13sched/nohz: Clean up select_nohz_load_balancer()Alex Shi
There is no load_balancer to be selected now. It just sets the state of the nohz tick to stop. So rename the function, pass the 'cpu' as a parameter and then remove the useless call from tick_nohz_restart_sched_tick(). [ s/set_nohz_tick_stopped/nohz_balance_enter_idle/g s/clear_nohz_tick_stopped/nohz_balance_exit_idle/g ] Signed-off-by: Alex Shi <alex.shi@intel.com> Acked-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Venkatesh Pallipadi <venki@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1347261059-24747-1-git-send-email-alex.shi@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-09-13sched: Fix load avg vs. cpu-hotplugPeter Zijlstra
Commit f319da0c68 ("sched: Fix load avg vs cpu-hotplug") was an incomplete fix: In particular, the problem is that at the point it calls calc_load_migrate() nr_running := 1 (the stopper thread), so move the call to CPU_DEAD where we're sure that nr_running := 0. Also note that we can call calc_load_migrate() without serialization, we know the state of rq is stable since its cpu is dead, and we modify the global state using appropriate atomic ops. Suggested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1346882630.2600.59.camel@twins Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-09-13sched: Remove __ARCH_WANT_INTERRUPTS_ON_CTXSWPeter Zijlstra
Now that the last architecture to use this has stopped doing so (ARM, thanks Catalin!) we can remove this complexity from the scheduler core. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Link: http://lkml.kernel.org/n/tip-g9p2a1w81xxbrze25v9zpzbf@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-09-13sched: Fix nohz_idle_balance()Vincent Guittot
On tickless systems, one CPU runs load balance for all idle CPUs. The cpu_load of this CPU is updated before starting the load balance of each other idle CPUs. We should instead update the cpu_load of the balance_cpu. Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Venkatesh Pallipadi <venki@google.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Link: http://lkml.kernel.org/r/1347509486-8688-1-git-send-email-vincent.guittot@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-09-13task_work: Simplify the usage in ptrace_notify() and get_signal_to_deliver()Oleg Nesterov
ptrace_notify() and get_signal_to_deliver() do unnecessary things before task_work_run(): 1. smp_mb__after_clear_bit() is not needed, test_and_clear_bit() implies mb(). 2. And we do not need the barrier at all, in this case we only care about the "synchronous" works added by the task itself. 3. No need to clear TIF_NOTIFY_RESUME, and we should not assume task_works is the only user of this flag. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20120826191217.GA4238@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-09-13task_work: task_work_add() should not succeed after exit_task_work()Oleg Nesterov
ed3e694d "move exit_task_work() past exit_files() et.al" destroyed the add/exit synchronization we had, the caller itself should ensure task_work_add() can't race with the exiting task. However, this is not convenient/simple, and the only user which tries to do this is buggy (see the next patch). Unless the task is current, there is simply no way to do this in general. Change exit_task_work()->task_work_run() to use the dummy "work_exited" entry to let task_work_add() know it should fail. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20120826191211.GA4228@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-09-13task_work: Make task_work_add() locklessOleg Nesterov
Change task_work's to use llist-like code to avoid pi_lock in task_work_add(), this makes it useable under rq->lock. task_work_cancel() and task_work_run() still use pi_lock to synchronize with each other. (This is in preparation for a deadlock fix.) Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20120826191209.GA4221@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-09-12audit: export audit_log_task_infoPeter Moody
At the suggestion of eparis@redhat.com, move this chunk of task logging from audit_log_exit to audit_log_task_info and export this function so it's usuable elsewhere in the kernel. This patch is against git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity#next-ima-appraisal Changelog v2: - add empty audit_log_task_info if CONFIG_AUDITSYSCALL isn't set. Changelog v1: - Initial post. Signed-off-by: Peter Moody <pmoody@google.com> Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
2012-09-12Merge branch 'for-3.6-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull workqueue fixes from Tejun Heo: "It's later than I'd like but well the timing just didn't work out this time. There are three bug fixes. One from before 3.6-rc1 and two from the new CPU hotplug code. Kudos to Lai for discovering all of them and providing fixes. * Atomicity bug when clearing a flag and setting another. The two operation should have been atomic but wasn't. This bug has existed for a long time but is unlikely to have actually happened. Fix is safe. Marked for -stable. * If CPU hotplug cycles happen back-to-back before workers finish the previous cycle, the states could get out of sync and it could get stuck. Fixed by waiting for workers to complete before finishing hotplug cycle. * While CPU hotplug is in progress, idle workers could be depleted which can then lead to deadlock. I think both happening together is highly unlikely but still better to fix it and the fix isn't too scary. There's another workqueue related regression which reported a few days ago: https://bugzilla.kernel.org/show_bug.cgi?id=47301 It's a bit of head scratcher but there is a semi-reliable reproduce case, so I'm hoping to resolve it soonish." * 'for-3.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: fix possible idle worker depletion across CPU hotplug workqueue: restore POOL_MANAGING_WORKERS workqueue: fix possible deadlock in idle worker rebinding workqueue: move WORKER_REBIND clearing in rebind_workers() to the end of the function workqueue: UNBOUND -> REBIND morphing in rebind_workers() should be atomic
2012-09-10netlink: Rename pid to portid to avoid confusionEric W. Biederman
It is a frequent mistake to confuse the netlink port identifier with a process identifier. Try to reduce this confusion by renaming fields that hold port identifiers portid instead of pid. I have carefully avoided changing the structures exported to userspace to avoid changing the userspace API. I have successfully built an allyesconfig kernel with this change. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-10properly __init-annotate pm_sysrq_init()Jan Beulich
This is used only as argument to subsys_initcall(). Signed-off-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2012-09-10workqueue: fix possible idle worker depletion across CPU hotplugLai Jiangshan
To simplify both normal and CPU hotplug paths, worker management is prevented while CPU hoplug is in progress. This is achieved by CPU hotplug holding the same exclusion mechanism used by workers to ensure there's only one manager per pool. If someone else seems to be performing the manager role, workers proceed to execute work items. CPU hotplug using the same mechanism can lead to idle worker depletion because all workers could proceed to execute work items while CPU hotplug is in progress and CPU hotplug itself wouldn't actually perform the worker management duty - it doesn't guarantee that there's an idle worker left when it releases management. This idle worker depletion, under extreme circumstances, can break forward-progress guarantee and thus lead to deadlock. This patch fixes the bug by using separate mechanisms for manager exclusion among workers and hotplug exclusion. For manager exclusion, POOL_MANAGING_WORKERS which was restored by the previous patch is used. pool->manager_mutex is now only used for exclusion between the elected manager and CPU hotplug. The elected manager won't proceed without holding pool->manager_mutex. This ensures that the worker which won the manager position can't skip managing while CPU hotplug is in progress. It will block on manager_mutex and perform management after CPU hotplug is complete. Note that hotplug may happen while waiting for manager_mutex. A manager isn't either on idle or busy list and thus the hoplug code can't unbind/rebind it. Make the manager handle its own un/rebinding. tj: Updated comment and description. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>