summaryrefslogtreecommitdiffstats
path: root/kernel
AgeCommit message (Collapse)Author
2014-03-28time: Revert to calling clock_was_set_delayed() while in irq contextJohn Stultz
In commit 47a1b796306356f35 ("tick/timekeeping: Call update_wall_time outside the jiffies lock"), we moved to calling clock_was_set() due to the fact that we were no longer holding the timekeeping or jiffies lock. However, there is still the problem that clock_was_set() triggers an IPI, which cannot be done from the timer's hard irq context, and will generate WARN_ON warnings. Apparently in my earlier testing, I'm guessing I didn't bump the dmesg log level, so I somehow missed the WARN_ONs. Thus we need to revert back to calling clock_was_set_delayed(). Signed-off-by: John Stultz <john.stultz@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1395963049-11923-1-git-send-email-john.stultz@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-03-26Merge tag 'trace-fixes-v3.14-rc7-v2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fix from Steven Rostedt: "While on my flight to Linux Collaboration Summit, I was working on my slides for the event trigger tutorial. I booted a 3.14-rc7 kernel to perform what I wanted to teach and cut and paste it into my slides. When I tried the traceon event trigger with a condition attached to it (turns tracing on only if a field of the trigger event matches a condition set by the user), nothing happened. Tracing would not turn on. I stopped working on my presentation in order to find what was wrong. It ended up being the way trace event triggers work when they have conditions. Instead of copying the fields, the condition code just looks at the fields that were copied into the ring buffer. This works great, unless tracing is off. That's because when the event is reserved on the ring buffer, the ring buffer returns a NULL pointer, this tells the tracing code that the ring buffer is disabled. This ends up being a problem for the traceon trigger if it is using this information to check its condition. Luckily the code that checks if tracing is on returns the ring buffer to use (because the ring buffer is determined by the event file also passed to that field). I was able to easily solve this bug by checking in that helper function if the returned ring buffer entry is NULL, and if so, also check the file flag if it has a trace event trigger condition, and if so, to pass back a temp ring buffer to use. This will allow the trace event trigger condition to still test the event fields, but nothing will be recorded" * tag 'trace-fixes-v3.14-rc7-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Fix traceon trigger condition to actually turn tracing on
2014-03-25tracing: Fix traceon trigger condition to actually turn tracing onSteven Rostedt (Red Hat)
While working on my tutorial for 2014 Linux Collaboration Summit I found that the traceon trigger did not work when conditions were used. The other triggers worked fine though. Looking into it, it is because of the way the triggers use the ring buffer to store the fields it will use for the condition. But if tracing is off, nothing is stored in the buffer, and the tracepoint exits before calling the trigger to test the condition. This is fine for all the triggers that only work when tracing is on, but for traceon trigger that is to work when tracing is off, nothing happens. The fix is simple, just use a temp ring buffer to record the event if tracing is off and the event has a trace event conditional trigger enabled. The rest of the tracepoint code will work just fine, but the tracepoint wont be recorded in the other buffers. Cc: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-03-26tick: Remove code duplication in tick_handle_periodic()Viresh Kumar
tick_handle_periodic() is calling ktime_add() at two places, first before the infinite loop and then at the end of infinite loop. We can rearrange code a bit to fix code duplication here. It looks quite simple and shouldn't break anything, I guess :) Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Cc: linaro-kernel@lists.linaro.org Cc: fweisbec@gmail.com Link: http://lkml.kernel.org/r/be3481e8f3f71df694a4b43623254fc93ca51b59.1395735873.git.viresh.kumar@linaro.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-03-26tick: Fix spelling mistake in tick_handle_periodic()Viresh Kumar
One of the comments in tick_handle_periodic() had 'when' instead of 'which' (My guess :)). Fix it. Also fix spelling mistake in 'Possible'. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Cc: linaro-kernel@lists.linaro.org Cc: skarafotis@gmail.com Link: http://lkml.kernel.org/r/2b29ca4230c163e44179941d7c7a16c1474385c2.1395743878.git.viresh.kumar@linaro.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-03-25workqueue: Provide destroy_delayed_work_on_stack()Thomas Gleixner
If a delayed or deferrable work is on stack we need to tell debug objects that we are destroying the timer and the work. Otherwise we leak the tracking object. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Acked-by: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/20140323141939.911487677@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-03-25Merge branch 'rcu/next' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu Pull RCU update from Paul E. McKenney: " [...] one late-breaking commit. This one was requested for 3.15 by Peter Zijlstra. It is low risk because it adds a new in-kernel API with minimal changes to the existing code. Those minimal changes are the addition of memory barriers and ACCESS_ONCE() macro calls, neither of which should be able to break things. This commit has passed significant rcutorture testing, with these additional additions to rcutorture slated for 3.16. This commit has also been exposed to -next testing. " Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-03-24tracing: Add BUG_ON when stack end location is over writtenAaron Tomlin
It is difficult to detect a stack overrun when it actually occurs. We have observed that this type of corruption is often silent and can go unnoticed. Once the corrupted region is examined, the outcome is undefined and often results in sporadic system crashes. When the stack tracing feature is enabled, let's check for this condition and take appropriate action. Note: init_task doesn't get its stack end location set to STACK_END_MAGIC. Link: http://lkml.kernel.org/r/1395669837-30209-1-git-send-email-atomlin@redhat.com Signed-off-by: Aaron Tomlin <atomlin@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-03-24cgroup: Use RCU_INIT_POINTER(x, NULL) in cgroup.cMonam Agarwal
This patch replaces rcu_assign_pointer(x, NULL) with RCU_INIT_POINTER(x, NULL) The rcu_assign_pointer() ensures that the initialization of a structure is carried out before storing a pointer to that structure. And in the case of the NULL pointer, there is no structure to initialize. So, rcu_assign_pointer(p, NULL) can be safely converted to RCU_INIT_POINTER(p, NULL) Signed-off-by: Monam Agarwal <monamagarwal123@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2014-03-22genirq: Export symbol no_action()Alexander Shiyan
This will allow to use the dummy IRQ handler no_action() from drivers compiled as module. Drivers which use ARM FIQ interrupts can use this to request the interrupt via the normal request_irq() mechanism w/o having to copy the dummy handler to their own code. Signed-off-by: Alexander Shiyan <shc_work@mail.ru> Link: http://lkml.kernel.org/r/1395476431-16070-1-git-send-email-shc_work@mail.ru Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-03-21tracepoint: Remove unused API functionsMathieu Desnoyers
After the following commit: commit b75ef8b44b1cb95f5a26484b0e2fe37a63b12b44 Author: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Date: Wed Aug 10 15:18:39 2011 -0400 Tracepoint: Dissociate from module mutex The following functions became unnecessary: - tracepoint_probe_register_noupdate, - tracepoint_probe_unregister_noupdate, - tracepoint_probe_update_all. In fact, none of the in-kernel tracers, nor LTTng, nor SystemTAP use them. Remove those. Moreover, the functions: - tracepoint_iter_start, - tracepoint_iter_next, - tracepoint_iter_stop, - tracepoint_iter_reset. are unused by in-kernel tracers, LTTng and SystemTAP. Remove those too. Link: http://lkml.kernel.org/r/1395379142-2118-2-git-send-email-mathieu.desnoyers@efficios.com Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-03-21Revert "tracing: Move event storage for array from macro to standalone function"Steven Rostedt (Red Hat)
I originally wrote commit 35bb4399bd0e to shrink the size of the overhead of tracepoints by several kilobytes. Later, I received a patch from Vaibhav Nagarnaik that fixed a bug in the same code that this commit touches. Not only did it fix a bug, it also removed code and shrunk the size of the overhead of trace events even more than this commit did. Since this commit is scheduled for 3.15 and Vaibhav's patch is already in mainline, I need to revert this patch in order to keep it from conflicting with Vaibhav's patch. Not to mention, Vaibhav's patch makes this patch obsolete. Link: http://lkml.kernel.org/r/20140320225637.0226041b@gandalf.local.home Cc: Vaibhav Nagarnaik <vnagarnaik@google.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-03-20futex: revert back to the explicit waiter counting codeLinus Torvalds
Srikar Dronamraju reports that commit b0c29f79ecea ("futexes: Avoid taking the hb->lock if there's nothing to wake up") causes java threads getting stuck on futexes when runing specjbb on a power7 numa box. The cause appears to be that the powerpc spinlocks aren't using the same ticket lock model that we use on x86 (and other) architectures, which in turn result in the "spin_is_locked()" test in hb_waiters_pending() occasionally reporting an unlocked spinlock even when there are pending waiters. So this reinstates Davidlohr Bueso's original explicit waiter counting code, which I had convinced Davidlohr to drop in favor of figuring out the pending waiters by just using the existing state of the spinlock and the wait queue. Reported-and-tested-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Original-code-by: Davidlohr Bueso <davidlohr@hp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-03-20Merge tag 'trace-fixes-v3.14-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull trace fix from Steven Rostedt: "Vaibhav Nagarnaik discovered that since 3.10 a clean-up patch made the array index in the trace event format bogus. He supplied an elegant solution that uses __stringify() and also removes the need for the event_storage and event_storage_mutex and also cuts off a few K of overhead from the trace events" * tag 'trace-fixes-v3.14-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Fix array size mismatch in format string
2014-03-20rcu: Provide grace-period piggybacking APIPaul E. McKenney
The following pattern is currently not well supported by RCU: 1. Make data element inaccessible to RCU readers. 2. Do work that probably lasts for more than one grace period. 3. Do something to make sure RCU readers in flight before #1 above have completed. Here are some things that could currently be done: a. Do a synchronize_rcu() unconditionally at either #1 or #3 above. This works, but imposes needless work and latency. b. Post an RCU callback at #1 above that does a wakeup, then wait for the wakeup at #3. This works well, but likely results in an extra unneeded grace period. Open-coding this is also a bit more semi-tricky code than would be good. This commit therefore adds get_state_synchronize_rcu() and cond_synchronize_rcu() APIs. Call get_state_synchronize_rcu() at #1 above and pass its return value to cond_synchronize_rcu() at #3 above. This results in a call to synchronize_rcu() if no grace period has elapsed between #1 and #3, but requires only a load, comparison, and memory barrier if a full grace period did elapse. Requested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Peter Zijlstra <peterz@infradead.org>
2014-03-20Rename TAINT_UNSAFE_SMP to TAINT_CPU_OUT_OF_SPECDave Jones
Rename TAINT_UNSAFE_SMP to TAINT_CPU_OUT_OF_SPEC, so we can repurpose the flag to encompass a wider range of pushing the CPU beyond its warrany. Signed-off-by: Dave Jones <davej@fedoraproject.org> Link: http://lkml.kernel.org/r/20140226154949.GA770@redhat.com Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2014-03-20tracing: Fix array size mismatch in format stringVaibhav Nagarnaik
In event format strings, the array size is reported in two locations. One in array subscript and then via the "size:" attribute. The values reported there have a mismatch. For e.g., in sched:sched_switch the prev_comm and next_comm character arrays have subscript values as [32] where as the actual field size is 16. name: sched_switch ID: 301 format: field:unsigned short common_type; offset:0; size:2; signed:0; field:unsigned char common_flags; offset:2; size:1; signed:0; field:unsigned char common_preempt_count; offset:3; size:1;signed:0; field:int common_pid; offset:4; size:4; signed:1; field:char prev_comm[32]; offset:8; size:16; signed:1; field:pid_t prev_pid; offset:24; size:4; signed:1; field:int prev_prio; offset:28; size:4; signed:1; field:long prev_state; offset:32; size:8; signed:1; field:char next_comm[32]; offset:40; size:16; signed:1; field:pid_t next_pid; offset:56; size:4; signed:1; field:int next_prio; offset:60; size:4; signed:1; After bisection, the following commit was blamed: 92edca0 tracing: Use direct field, type and system names This commit removes the duplication of strings for field->name and field->type assuming that all the strings passed in __trace_define_field() are immutable. This is not true for arrays, where the type string is created in event_storage variable and field->type for all array fields points to event_storage. Use __stringify() to create a string constant for the type string. Also, get rid of event_storage and event_storage_mutex that are not needed anymore. also, an added benefit is that this reduces the overhead of events a bit more: text data bss dec hex filename 8424787 2036472 1302528 11763787 b3804b vmlinux 8420814 2036408 1302528 11759750 b37086 vmlinux.patched Link: http://lkml.kernel.org/r/1392349908-29685-1-git-send-email-vnagarnaik@google.com Cc: Laurent Chavey <chavey@google.com> Cc: stable@vger.kernel.org # 3.10+ Signed-off-by: Vaibhav Nagarnaik <vnagarnaik@google.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-03-20cgroup: break kernfs active_ref protection in cgroup directory operationsTejun Heo
cgroup_tree_mutex should nest above the kernfs active_ref protection; however, cgroup_create() and cgroup_rename() were grabbing cgroup_tree_mutex while under kernfs active_ref protection. This has actualy possibility to lead to deadlocks in case these operations race against cgroup_rmdir() which invokes kernfs_remove() on directory kernfs_node while holding cgroup_tree_mutex. Neither cgroup_create() or cgroup_rename() requires active_ref protection. The former already has enough synchronization through cgroup_lock_live_group() and the latter doesn't care, so this can be fixed by updating both functions to break all active_ref protections before grabbing cgroup_tree_mutex. While this patch fixes the immediate issue, it probably needs further work in the long term - kernfs directories should enable lockdep annotations and maybe the better way to handle this is marking directory nodes as not needing active_ref protection rather than breaking it in each operation. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-03-20Merge branches 'pm-runtime' and 'pm-sleep'Rafael J. Wysocki
* pm-runtime: PM / Runtime: Update runtime_idle() documentation for return value meaning * pm-sleep: PM / sleep: Correct whitespace errors in <linux/pm.h> PM: Add missing "freeze" state PM / Hibernate: Spelling s/anonymouns/anonymous/ PM / Runtime: Add missing "it" in comment PM / suspend: Remove unnecessary !! PCI / PM: Resume runtime-suspended devices later during system suspend ACPI / PM: Resume runtime-suspended devices later during system suspend PM / sleep: Set pm_generic functions to NULL for !CONFIG_PM_SLEEP PM: fix typo in comment PM / hibernate: use name_to_dev_t to parse resume PM / wakeup: Include appropriate header file in kernel/power/wakelock.c PM / sleep: Move prototype declaration to header file kernel/power/power.h PM / sleep: Asynchronous threads for suspend_late PM / sleep: Asynchronous threads for suspend_noirq PM / sleep: Asynchronous threads for resume_early PM / sleep: Asynchronous threads for resume_noirq PM / sleep: Two flags for async suspend_noirq and suspend_late
2014-03-20Merge branches 'pm-qos', 'pm-domains' and 'pm-drivers'Rafael J. Wysocki
* pm-qos: PM / QoS: Add type to dev_pm_qos_add_ancestor_request() arguments ACPI / LPSS: Support for device latency tolerance PM QoS ACPI / scan: Add bind/unbind callbacks to struct acpi_scan_handler PM / QoS: Introcuce latency tolerance device PM QoS type PM / QoS: Add no_constraints_value field to struct pm_qos_constraints PM / QoS: Rename device resume latency QoS items * pm-domains: PM / domains: Turn latency warning into debug message * pm-drivers: PM: Add pm_runtime_suspend|resume_force functions PM / runtime: Fetch runtime PM callbacks using a macro
2014-03-20timer: Remove code redundancy while calling get_nohz_timer_target()Viresh Kumar
There are only two users of get_nohz_timer_target(): timer and hrtimer. Both call it under same circumstances, i.e. #ifdef CONFIG_NO_HZ_COMMON if (!pinned && get_sysctl_timer_migration() && idle_cpu(this_cpu)) return get_nohz_timer_target(); #endif So, it makes more sense to get all this as part of get_nohz_timer_target() instead of duplicating code at two places. For this another parameter is required to be passed to this routine, pinned. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Cc: linaro-kernel@lists.linaro.org Cc: fweisbec@gmail.com Cc: peterz@infradead.org Link: http://lkml.kernel.org/r/1e1b53537217d58d48c2d7a222a9c3ac47d5b64c.1395140107.git.viresh.kumar@linaro.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-03-20timer: Use variable head instead of &work_list in __run_timers()Viresh Kumar
We already have a variable 'head' that points to '&work_list', and so we should use that instead wherever possible. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Cc: linaro-kernel@lists.linaro.org Link: http://lkml.kernel.org/r/0d8645a6efc8360c4196c9797d59343abbfdcc5e.1395129136.git.viresh.kumar@linaro.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-03-19Merge branch 'for-3.14-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup fix from Tejun Heo: "One really late cgroup patch to fix error path in create_css(). Hitting this bug would be pretty rare but still possible and it gets delayed we'd need to backport it through -stable anyway. It only updates error path in create_css() and has low chance of new breakages" * 'for-3.14-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup: fix a failure path in create_css()
2014-03-19cgroup: fix cgroup_taskset walking orderTejun Heo
cgroup_taskset is used to track and iterate target tasks while migrating a task or process and should guarantee that the first task iterated is the task group leader if a process is being migrated. b3dc094e9390 ("cgroup: use css_set->mg_tasks to track target tasks during migration") replaced flex array cgroup_taskset->tc_array with css_set->mg_tasks list to remove process size limit and dynamic allocation during migration; unfortunately, it incorrectly used list operations which don't preserve order breaking the guarantee that cgroup_taskset_first() returns the leader for a process target. Fix it by using order preserving list operations. Note that as multiple src_csets may map to a single dst_cset, the iteration order may change across cgroup_task_migrate(); however, the leader is still guaranteed to be the first entry. The switch to list_splice_tail_init() at the end of cgroup_migrate() isn't strictly necessary. Let's still do it for consistency. Signed-off-by: Tejun Heo <tj@kernel.org>
2014-03-19resources: Set type in __request_region()Bjorn Helgaas
We don't set the type (I/O, memory, etc.) of resources added by __request_region(), which leads to confusing messages like this: address space collision: [io 0x1000-0x107f] conflicts with ACPI CPU throttle [??? 0x00001010-0x00001015 flags 0x80000000] Set the type of a new resource added by __request_region() (used by request_region() and request_mem_region()) to the type of its parent. This makes the resource tree internally consistent and fixes messages like the above, where the ACPI CPU throttle resource really is an I/O port region, but request_region() didn't fill in the type, so %pR didn't know how to print it. Sample dmesg showing the issue at the link below. Link: https://bugzilla.kernel.org/show_bug.cgi?id=71611 Reported-by: Paul Bolle <pebolle@tiscali.nl> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2014-03-19cgroup: implement CFTYPE_ONLY_ON_DFLTejun Heo
This cftype flag makes the file only appear on the default hierarchy. This will later be used for cgroup.controllers file. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
2014-03-19cgroup: make cgrp_dfl_root mountableTejun Heo
cgrp_dfl_root will be used as the default unified hierarchy. This patch makes cgrp_dfl_root mountable by making the following changes. * cgroup_init_early() now initializes cgrp_dfl_root w/ CGRP_ROOT_SANE_BEHAVIOR. The default hierarchy is always sane. * parse_cgroupfs_options() and cgroup_mount() are updated such that cgrp_dfl_root is mounted if sane_behavior is specified w/o any subsystems. * rebind_subsystems() now populates the root directory of cgrp_dfl_root. Note that the function still guarantees success of rebinding subsystems to cgrp_dfl_root. If populating fails while rebinding to cgrp_dfl_root, it whines but ignores the error. * For backward compatibility, the default hierarchy shows up in /proc/$PID/cgroup only after it's explicitly mounted so that userland which doesn't make use of it doesn't see any change. * "current_css_set_cg_links" file of debug cgroup now treats the default hierarchy the same as other hierarchies. This is visible to userland. Given that it's for debug controller, this should be fine. * While at it, implement cgroup_on_dfl() which tests whether a give cgroup is on the default hierarchy or not. The above changes make cgrp_dfl_root mostly equivalent to other controllers but the actual unified hierarchy behaviors are not implemented yet. Let's plug child cgroup creation in cgrp_dfl_root from create_cgroup() for now. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
2014-03-19cgroup: drop const from @buffer of cftype->write_string()Tejun Heo
cftype->write_string() just passes on the writeable buffer from kernfs and there's no reason to add const restriction on the buffer. The only thing const achieves is unnecessarily complicating parsing of the buffer. Drop const from @buffer. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Cc: Daniel Borkmann <dborkman@redhat.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Balbir Singh <bsingharora@gmail.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
2014-03-19cgroup: rename cgroup_dummy_root and related namesTejun Heo
The dummy root will be repurposed to serve as the default unified hierarchy. Let's rename things in preparation. * s/cgroup_dummy_root/cgrp_dfl_root/ * s/cgroupfs_root/cgroup_root/ as we don't do fs part directly anymore * s/cgroup_root->top_cgroup/cgroup_root->cgrp/ for brevity This is pure rename. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
2014-03-19cgroup: move ->subsys_mask from cgroupfs_root to cgroupTejun Heo
cgroupfs_root->subsys_mask represents the controllers attached to the hierarchy. This patch moves the field to cgroup. Subsystem initialization and rebinding updates the top cgroup's subsys_mask. For !root cgroups, the subsys_mask bits are set from create_css() and cleared from kill_css(), which effectively means that all cgroups will have the same subsys_mask as the top cgroup. While this doesn't make any difference now, this will help implementation of the default unified hierarchy where !root cgroups may have subsets of the top_cgroup's subsys_mask. While at it, __kill_css() is split out of kill_css(). The former doesn't care about the subsys_mask while the latter becomes noop if the controller is already killed and clears the matching bit if not before proceeding to killing the css. This will be used later by the default unified hierarchy implementation. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
2014-03-19cgroup: treat cgroup_dummy_root as an equivalent hierarchy during rebindingTejun Heo
Currently, while rebinding, cgroup_dummy_root serves as the anchor point. In addition to the target root, rebind_subsystems() takes @added_mask and @removed_mask. The subsystems specified in the former are expected to be on the dummy root and then moved to the target root. The ones in the latter are moved from non-dummy root to dummy. Now that the dummy root is a fully functional one and we're planning to use it for the default unified hierarchy, this level of distinction between dummy and non-dummy roots is quite awkward. This patch updates rebind_subsystems() to take the target root and one subsystem mask and move the specified subsystmes to the target root which may or may not be the dummy root. IOW, unbinding now becomes moving the subsystems to the dummy root and binding to non-dummy root. This makes the dummy root mostly equivalent to other hierarchies in terms of the mechanism of moving subsystems around; however, we still retain all the semantical restrictions so that this patch doesn't introduce any visible behavior differences. Another noteworthy detail is that rebind_subsystems() guarantees that moving a subsystem to the dummy root never fails so that valid unmounting attempts always succeed. This unifies binding and unbinding of subsystems. The invocation points of ->bind() were inconsistent between the two and now moved after whole rebinding is complete. This doesn't break the current users and generally makes more sense. All rebind_subsystems() users are converted accordingly. Note that cgroup_remount() now makes two calls to rebind_subsystems() to bind and then unbind the requested subsystems. This will allow repurposing of the dummy hierarchy as the default unified hierarchy and shouldn't make any userland visible behavior difference. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
2014-03-19cgroup: use cgroup_setup_root() to initialize cgroup_dummy_rootTejun Heo
cgroup_dummy_root is used to host controllers which aren't attached to any other hierarchy. The root is minimally set up during kernfs bootstrap and didn't go through full hierarchy initialization. We're planning to use cgroup_dummy_root for the default unified hierarchy and thus want it to be fully functional. Replace the special initialization, which was collected into cgroup_init() by the previous patch, with an invocation of cgroup_setup_root(). This simplifies the init path and makes cgroup_dummy_root a full hierarchy with its own kernfs_root and all. As this puts the dummy hierarchy on the cgroup_roots list, rename for_each_active_root() to for_each_root() and update its users to skip the dummy root for now. This patch doesn't cause any userland visible behavior changes at this point. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
2014-03-19cgroup: reorganize cgroup bootstrappingTejun Heo
* Fields of init_css_set and css_set_count are now set using initializer instead of programmatically from cgroup_init_early(). * init_cgroup_root() now also takes @opts and performs the optional part of initialization too. The leftover part of cgroup_root_from_opts() is collapsed into its only caller - cgroup_mount(). * Initialization of cgroup_root_count and linking of init_css_set are moved from cgroup_init_early() to to cgroup_init(). None of the early_init users depends on init_css_set being linked. * Subsystem initializations are moved after dummy hierarchy init and init_css_set linking. These changes reorganize the bootstrap logic so that the dummy hierarchy can share the usual hierarchy init path and be made more normal. These changes don't make noticeable behavior changes. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
2014-03-19cgroup: relocate setting of CGRP_DEADTejun Heo
In cgroup_destroy_locked(), move setting of CGRP_DEAD above invocations of kill_css(). This doesn't make any visible behavior difference now but will be used to inhibit manipulating controller enable states of a dying cgroup on the unified hierarchy. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
2014-03-19genirq: procfs: Make smp_affinity values go+rChema Gonzalez
Includes: - /proc/irq/default_smp_affinity - /proc/irq/*/affinity_hint - /proc/irq/*/smp_affinity - /proc/irq/*/smp_affinity_list Users can distill the same information by reading /proc/interrupts. Signed-off-by: Chema Gonzalez <chema@google.com> Cc: Eric Dumazet <edumazet@google.com> Link: http://lkml.kernel.org/r/1394765455-1217-1-git-send-email-chema@google.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-03-19softirq: Add linux/irq.h to make it compile againThomas Gleixner
On Sparc and S390 the removal of irq.h from kernel_stat.h causes: kernel/softirq.c:774:9: error: 'NR_IRQS_LEGACY' undeclared Reported-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-03-18cgroup: fix a failure path in create_css()Li Zefan
If online_css() fails, we should remove cgroup files belonging to css->ss. Signed-off-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2014-03-18uprobes: allow ignoring of probe hitsDavid A. Long
Allow arches to decided to ignore a probe hit. ARM will use this to only call handlers if the conditions to execute a conditionally executed instruction are satisfied. Signed-off-by: David A. Long <dave.long@linaro.org> Acked-by: Oleg Nesterov <oleg@redhat.com>
2014-03-18uprobes: Kconfig dependency fixDavid A. Long
Suggested change from Oleg Nesterov. Fixes incomplete dependencies for uprobes feature. Signed-off-by: David A. Long <dave.long@linaro.org> Acked-by: Oleg Nesterov <oleg@redhat.com>
2014-03-16Merge branch 'sched-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Ingo Molnar: "Three small fixes" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/clock: Prevent tracing recursion in sched_clock_cpu() stop_machine: Fix^2 race between stop_two_cpus() and stop_cpus() sched/deadline: Deny unprivileged users to set/change SCHED_DEADLINE policy
2014-03-14genirq: Add a new IRQCHIP_EOI_THREADED flagThomas Gleixner
The flag is necessary for interrupt chips which require an ACK/EOI after the handler has run. In case of threaded handlers this needs to happen after the threaded handler has completed before the unmask of the interrupt. The flag is only unseful in combination with the handle_fasteoi_irq flow control handler. It can be combined with the flag IRQCHIP_EOI_IF_HANDLED, so the EOI is not issued when the interrupt is disabled or in progress. Tested-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-sunxi@googlegroups.com Cc: Maxime Ripard <maxime.ripard@free-electrons.com> Link: http://lkml.kernel.org/r/1394733834-26839-2-git-send-email-hdegoede@redhat.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-03-13block: remove old blk_iopoll_enabled variableJens Axboe
This was a debugging measure to toggle enabled/disabled when testing. But for real production setups, it's not safe to toggle this setting without either reloading drivers of quiescing IO first. Neither of which the toggle enforces. Additionally, it makes drivers deal with the conditional state. Remove it completely. It's up to the driver whether iopoll is enabled or not. Signed-off-by: Jens Axboe <axboe@fb.com>
2014-03-13sched: Remove needless round trip nsecs <-> tick conversion of steal timeFrederic Weisbecker
When update_rq_clock_task() accounts the pending steal time for a task, it converts the steal delta from nsecs to tick then from tick to nsecs. There is no apparent good reason for doing that though because both the task clock and the prev steal delta are u64 and store values in nsecs. So lets remove the needless conversion. Cc: Ingo Molnar <mingo@kernel.org> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2014-03-13cputime: Fix jiffies based cputime assumption on steal accountingFrederic Weisbecker
The steal guest time accounting code assumes that cputime_t is based on jiffies. So when CONFIG_NO_HZ_FULL=y, which implies that cputime_t is based on nsecs, steal_account_process_tick() passes the delta in jiffies to account_steal_time() which then accounts it as if it's a value in nsecs. As a result, accounting 1 second of steal time (with HZ=100 that would be 100 jiffies) is spuriously accounted as 100 nsecs. As such /proc/stat may report 0 values of steal time even when two guests have run concurrently for a few seconds on the same host and same CPU. In order to fix this, lets convert the nsecs based steal delta to cputime instead of jiffies by using the right conversion API. Given that the steal time is stored in cputime_t and this type can have a smaller granularity than nsecs, we only account the rounded converted value and leave the remaining nsecs for the next deltas. Reported-by: Huiqingding <huding@redhat.com> Reported-by: Marcelo Tosatti <mtosatti@redhat.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2014-03-13Fix: module signature vs tracepoints: add new TAINT_UNSIGNED_MODULEMathieu Desnoyers
Users have reported being unable to trace non-signed modules loaded within a kernel supporting module signature. This is caused by tracepoint.c:tracepoint_module_coming() refusing to take into account tracepoints sitting within force-loaded modules (TAINT_FORCED_MODULE). The reason for this check, in the first place, is that a force-loaded module may have a struct module incompatible with the layout expected by the kernel, and can thus cause a kernel crash upon forced load of that module on a kernel with CONFIG_TRACEPOINTS=y. Tracepoints, however, specifically accept TAINT_OOT_MODULE and TAINT_CRAP, since those modules do not lead to the "very likely system crash" issue cited above for force-loaded modules. With kernels having CONFIG_MODULE_SIG=y (signed modules), a non-signed module is tainted re-using the TAINT_FORCED_MODULE taint flag. Unfortunately, this means that Tracepoints treat that module as a force-loaded module, and thus silently refuse to consider any tracepoint within this module. Since an unsigned module does not fit within the "very likely system crash" category of tainting, add a new TAINT_UNSIGNED_MODULE taint flag to specifically address this taint behavior, and accept those modules within Tracepoints. We use the letter 'X' as a taint flag character for a module being loaded that doesn't know how to sign its name (proposed by Steven Rostedt). Also add the missing 'O' entry to trace event show_module_flags() list for the sake of completeness. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> NAKed-by: Ingo Molnar <mingo@redhat.com> CC: Thomas Gleixner <tglx@linutronix.de> CC: David Howells <dhowells@redhat.com> CC: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2014-03-13module: use pr_contJiri Slaby
When dumping loaded modules, we print them one by one in separate printks. Let's use pr_cont as they are continuation prints. Signed-off-by: Jiri Slaby <jslaby@suse.cz> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2014-03-12Merge branch 'irq/for-gpio' into irq/coreThomas Gleixner
Merge the request/release callbacks which are in a separate branch for consumption by the gpio folks. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-03-12genirq: Provide irq_request/release_resources chip callbacksThomas Gleixner
For certain irq types, e.g. gpios, it's necessary to request resources before starting up the irq. This might fail so we cannot use the irq_startup() callback because we might call the irq_set_type() callback before that which does not make sense when the resource is not available. Calling irq_startup() before irq_set_type() can lead to spurious interrupts which is not desired either. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Jean-Jacques Hiblot <jjhiblot@traphandler.com> Cc: Grant Likely <grant.likely@linaro.org> Cc: linux-arm-kernel@lists.infradead.org Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1403080857160.18573@ionos.tec.linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-03-12locking/mutex: Fix debug checksPeter Zijlstra
OK, so commit: 1d8fe7dc8078 ("locking/mutexes: Unlock the mutex without the wait_lock") generates this boot warning when CONFIG_DEBUG_MUTEXES=y: WARNING: CPU: 0 PID: 139 at /usr/src/linux-2.6/kernel/locking/mutex-debug.c:82 debug_mutex_unlock+0x155/0x180() DEBUG_LOCKS_WARN_ON(lock->owner != current) And that makes sense, because as soon as we release the lock a new owner can come in... One would think that !__mutex_slowpath_needs_to_unlock() implementations suffer the same, but for DEBUG we fall back to mutex-null.h which has an unconditional 1 for that. The mutex debug code requires the mutex to be unlocked after doing the debug checks, otherwise it can find inconsistent state. Reported-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: jason.low2@hp.com Link: http://lkml.kernel.org/r/20140312122442.GB27965@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-03-12sched: Clean up the task_hot() functionAlex Shi
task_hot() doesn't need the 'sched_domain' parameter, so remove it. Signed-off-by: Alex Shi <alex.shi@linaro.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1394607111-1904-1-git-send-email-alex.shi@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>