summaryrefslogtreecommitdiffstats
path: root/kernel
AgeCommit message (Collapse)Author
2014-09-03seccomp: Allow arch code to provide seccomp_dataAndy Lutomirski
populate_seccomp_data is expensive: it works by inspecting task_pt_regs and various other bits to piece together all the information, and it's does so in multiple partially redundant steps. Arch-specific code in the syscall entry path can do much better. Admittedly this adds a bit of additional room for error, but the speedup should be worth it. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Kees Cook <keescook@chromium.org>
2014-09-03seccomp: Refactor the filter callback and the APIAndy Lutomirski
The reason I did this is to add a seccomp API that will be usable for an x86 fast path. The x86 entry code needs to use a rather expensive slow path for a syscall that might be visible to things like ptrace. By splitting seccomp into two phases, we can check whether we need the slow path and then use the fast path in if the filter allows the syscall or just returns some errno. As a side effect, I think the new code is much easier to understand than the old code. This has one user-visible effect: the audit record written for SECCOMP_RET_TRACE is now a simple indication that SECCOMP_RET_TRACE happened. It used to depend in a complicated way on what the tracer did. I couldn't make much sense of it. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Kees Cook <keescook@chromium.org>
2014-09-03seccomp,x86,arm,mips,s390: Remove nr parameter from secure_computingAndy Lutomirski
The secure_computing function took a syscall number parameter, but it only paid any attention to that parameter if seccomp mode 1 was enabled. Rather than coming up with a kludge to get the parameter to work in mode 2, just remove the parameter. To avoid churn in arches that don't have seccomp filters (and may not even support syscall_get_nr right now), this leaves the parameter in secure_computing_strict, which is now a real function. For ARM, this is a bit ugly due to the fact that ARM conditionally supports seccomp filters. Fixing that would probably only be a couple of lines of code, but it should be coordinated with the audit maintainers. This will be a slight slowdown on some arches. The right fix is to pass in all of seccomp_data instead of trying to make just the syscall nr part be fast. This is a prerequisite for making two-phase seccomp work cleanly. Cc: Russell King <linux@arm.linux.org.uk> Cc: linux-arm-kernel@lists.infradead.org Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-mips@linux-mips.org Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: linux-s390@vger.kernel.org Cc: x86@kernel.org Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Kees Cook <keescook@chromium.org>
2014-09-03genirq: Add irq_domain-aware core IRQ handlerMarc Zyngier
Calling irq_find_mapping from outside a irq_{enter,exit} section is unsafe and produces ugly messages if CONFIG_PROVE_RCU is enabled: If coming from the idle state, the rcu_read_lock call in irq_find_mapping will generate an unpleasant warning: <quote> =============================== [ INFO: suspicious RCU usage. ] 3.16.0-rc1+ #135 Not tainted ------------------------------- include/linux/rcupdate.h:871 rcu_read_lock() used illegally while idle! other info that might help us debug this: RCU used illegally from idle CPU! rcu_scheduler_active = 1, debug_locks = 0 RCU used illegally from extended quiescent state! 1 lock held by swapper/0/0: #0: (rcu_read_lock){......}, at: [<ffffffc00010206c>] irq_find_mapping+0x4c/0x198 </quote> As this issue is fairly widespread and involves at least three different architectures, a possible solution is to add a new handle_domain_irq entry point into the generic IRQ code that the interrupt controller code can call. This new function takes an irq_domain, and calls into irq_find_domain inside the irq_{enter,exit} block. An additional "lookup" parameter is used to allow non-domain architecture code to be replaced by this as well. Interrupt controllers can then be updated to use the new mechanism. This code is sitting behind a new CONFIG_HANDLE_DOMAIN_IRQ, as not all architectures implement set_irq_regs (yes, mn10300, I'm looking at you...). Reported-by: Vladimir Murzin <vladimir.murzin@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Link: https://lkml.kernel.org/r/1409047421-27649-2-git-send-email-marc.zyngier@arm.com Signed-off-by: Jason Cooper <jason@lakedaemon.net>
2014-09-03Merge branch 'rcu/urgent' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/urgent Pull an RCU fix from Paul E. McKenney: "This series contains a single commit fixing an initialization bug reported by Amit Shah and fixed by Pranith Kumar (and tested by Amit). This bug results in a boot-time hang in callback-offloaded configurations where callbacks were posted before the offloading ('rcuo') kthreads were created." Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-09-03PM / sleep: Fix test_suspend= command line optionRafael J. Wysocki
After commit d431cbc53cb7 (PM / sleep: Simplify sleep states sysfs interface code) the pm_states[] array is not populated initially, which causes setup_test_suspend() to always fail and the suspend testing during boot doesn't work any more. Fix the problem by using pm_labels[] instead of pm_states[] in setup_test_suspend() and storing a pointer to the label of the sleep state to test rather than the number representing it, because the connection between the state numbers and labels is only established by suspend_set_ops(). Fixes: d431cbc53cb7 (PM / sleep: Simplify sleep states sysfs interface code) Reported-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-09-01Merge branch 'irq-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq handling fixlet from Thomas Gleixner: "Just an export for an interrupt flow handler which is now used in gpio modules" * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irq: Export handle_fasteoi_irq
2014-09-01genirq: Simplify wakeup mechanismThomas Gleixner
Currently we suspend wakeup interrupts by lazy disabling them and check later whether the interrupt has fired, but that's not sufficient for suspend to idle as there is no way to check that once we transitioned into the CPU idle state. So we change the mechanism in the following way: 1) Leave the wakeup interrupts enabled across suspend 2) Add a check to irq_may_run() which is called at the beginning of each flow handler whether the interrupt is an armed wakeup source. This check is basically free as it just extends the existing check for IRQD_IRQ_INPROGRESS. So no new conditional in the hot path. If the IRQD_WAKEUP_ARMED flag is set, then the interrupt is disabled, marked as pending/suspended and the pm core is notified about the wakeup event. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> [ rjw: syscore.c and put irq_pm_check_wakeup() into pm.c ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-09-01genirq: Mark wakeup sources as armed on suspendThomas Gleixner
This allows us to utilize this information in the irq_may_run() check without adding another conditional to the fast path. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-09-01genirq: Create helper for flow handler entry checkThomas Gleixner
All flow handlers - except the per cpu ones - check for an interrupt in progress and an eventual concurrent polling on another cpu. Create a helper function for the repeated code pattern. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-09-01genirq: Distangle edge handler entryThomas Gleixner
If the interrupt is disabled or has no action, then we should not call the poll check. Separate the checks. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-09-01genirq: Avoid double loop on suspendThomas Gleixner
We can synchronize the suspended interrupts right away. No need for an extra loop. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-09-01genirq: Move MASK_ON_SUSPEND handling into suspend_device_irqs()Thomas Gleixner
There is no reason why we should delay the masking of interrupts whose interrupt chip requests MASK_ON_SUSPEND to the point where we check the wakeup interrupts. We can do it right at the point where we mark the interrupt as suspended. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-09-01genirq: Make use of pm misfeature accountingThomas Gleixner
Use the accounting fields which got introduced for snity checking for the various PM options. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-09-01genirq: Add sanity checks for PM options on shared interrupt linesThomas Gleixner
Account the IRQF_NO_SUSPEND and IRQF_RESUME_EARLY actions on shared interrupt lines and yell loudly if there is a mismatch. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-09-01genirq: Move suspend/resume logic into irq/pm codeThomas Gleixner
No functional change. Preparatory patch for cleaning up the suspend abort functionality. Update the comments while at it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-09-01PM / sleep: Mechanism for aborting system suspends unconditionallyRafael J. Wysocki
It sometimes may be necessary to abort a system suspend in progress or wake up the system from suspend-to-idle even if the pm_wakeup_event()/pm_stay_awake() mechanism is not enabled. For this purpose, introduce a new global variable pm_abort_suspend and make pm_wakeup_pending() check its value. Also add routines for manipulating that variable. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-08-29kexec: create a new config option CONFIG_KEXEC_FILE for new syscallVivek Goyal
Currently new system call kexec_file_load() and all the associated code compiles if CONFIG_KEXEC=y. But new syscall also compiles purgatory code which currently uses gcc option -mcmodel=large. This option seems to be available only gcc 4.4 onwards. Hiding new functionality behind a new config option will not break existing users of old gcc. Those who wish to enable new functionality will require new gcc. Having said that, I am trying to figure out how can I move away from using -mcmodel=large but that can take a while. I think there are other advantages of introducing this new config option. As this option will be enabled only on x86_64, other arches don't have to compile generic kexec code which will never be used. This new code selects CRYPTO=y and CRYPTO_SHA256=y. And all other arches had to do this for CONFIG_KEXEC. Now with introduction of new config option, we can remove crypto dependency from other arches. Now CONFIG_KEXEC_FILE is available only on x86_64. So whereever I had CONFIG_X86_64 defined, I got rid of that. For CONFIG_KEXEC_FILE, instead of doing select CRYPTO=y, I changed it to "depends on CRYPTO=y". This should be safer as "select" is not recursive. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: H. Peter Anvin <hpa@zytor.com> Tested-by: Shaun Ruffell <sruffell@digium.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-29resource: fix the case of null pointer accessVivek Goyal
Richard and Daniel reported that UML is broken due to changes to resource traversal functions. Problem is that iomem_resource.child can be null and new code does not consider that possibility. Old code used a for loop and that loop will not even execute if p was null. Revert back to for() loop logic and bail out if p is null. I also moved sibling_only check out of resource_lock. There is no reason to keep it inside the lock. Following is backtrace of the UML crash. RIP: 0033:[<0000000060039b9f>] RSP: 0000000081459da0 EFLAGS: 00010202 RAX: 0000000000000000 RBX: 00000000219b3fff RCX: 000000006010d1d9 RDX: 0000000000000001 RSI: 00000000602dfb94 RDI: 0000000081459df8 RBP: 0000000081459de0 R08: 00000000601b59f4 R09: ffffffff0000ff00 R10: ffffffff0000ff00 R11: 0000000081459e88 R12: 0000000081459df8 R13: 00000000219b3fff R14: 00000000602dfb94 R15: 0000000000000000 Kernel panic - not syncing: Segfault with no mm CPU: 0 PID: 1 Comm: swapper Not tainted 3.16.0-10454-g58d08e3 #13 Stack: 00000000 000080d0 81459df0 219b3fff 81459e70 6010d1d9 ffffffff 6033e010 81459e50 6003a269 81459e30 00000000 Call Trace: [<6010d1d9>] ? kclist_add_private+0x0/0xe7 [<6003a269>] walk_system_ram_range+0x61/0xb7 [<6000e859>] ? proc_kcore_init+0x0/0xf1 [<6010d574>] kcore_update_ram+0x4c/0x168 [<6010d72e>] ? kclist_add+0x0/0x2e [<6000e943>] proc_kcore_init+0xea/0xf1 [<6000e859>] ? proc_kcore_init+0x0/0xf1 [<6000e859>] ? proc_kcore_init+0x0/0xf1 [<600189f0>] do_one_initcall+0x13c/0x204 [<6004ca46>] ? parse_args+0x1df/0x2e0 [<6004c82d>] ? parameq+0x0/0x3a [<601b5990>] ? strcpy+0x0/0x18 [<60001e1a>] kernel_init_freeable+0x240/0x31e [<6026f1c0>] kernel_init+0x12/0x148 [<60019fad>] new_thread_handler+0x81/0xa3 Fixes 8c86e70acead629aacb4a ("resource: provide new functions to walk through resources"). Reported-by: Daniel Walter <sahne@0x90.at> Tested-by: Richard Weinberger <richard@nod.at> Tested-by: Toralf Förster <toralf.foerster@gmx.de> Tested-by: Daniel Walter <sahne@0x90.at> Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-28genirq: fix reference in devm_request_threaded_irq commentEmilio López
It should be request_threaded_irq, not request_irq [jkosina@suse.cz: not that it would matter, as both have the same set of arguments anyway, but for sake of consistency ...] Signed-off-by: Emilio López <emilio@elopez.com.ar> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2014-08-28rcu: Make nocb leader kthreads process pending callbacks after spawningPranith Kumar
The nocb callbacks generated before the nocb kthreads are spawned are enqueued in the nocb queue for later processing. Commit fbce7497ee5af ("rcu: Parallelize and economize NOCB kthread wakeups") introduced nocb leader kthreads which checked the nocb_leader_wake flag to see if there were any such pending callbacks. A case was reported in which newly spawned leader kthreads were not processing the pending callbacks as this flag was not set, which led to a boot hang. The following commit ensures that the newly spawned nocb kthreads process the pending callbacks by allowing the kthreads to run immediately after spawning instead of waiting. This is done by inverting the logic of nocb_leader_wake tests to nocb_leader_sleep which allows us to use the default initialization of this flag to 0 to let the kthreads run. Reported-by: Amit Shah <amit.shah@redhat.com> Signed-off-by: Pranith Kumar <bobby.prani@gmail.com> Link: http://www.spinics.net/lists/kernel/msg1802899.html [ paulmck: Backported to v3.17-rc2. ] Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Amit Shah <amit.shah@redhat.com>
2014-08-28percpu: Resolve ambiguities in __get_cpu_var/cpumask_var_tChristoph Lameter
__get_cpu_var can paper over differences in the definitions of cpumask_var_t and either use the address of the cpumask variable directly or perform a fetch of the address of the struct cpumask allocated elsewhere. This is important particularly when using per cpu cpumask_var_t declarations because in one case we have an offset into a per cpu area to handle and in the other case we need to fetch a pointer from the offset. This patch introduces a new macro this_cpu_cpumask_var_ptr() that is defined where cpumask_var_t is defined and performs the proper actions. All use cases where __get_cpu_var is used with cpumask_var_t are converted to the use of this_cpu_cpumask_var_ptr(). Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2014-08-27Merge tag 'trace-fixes-v3.17-rc1-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull trace buffer epoll hang fix from Steven Rostedt: "Josef Bacik found a bug in the ring_buffer_poll_wait() where the condition variable (waiters_pending) was set before being added to the poll queue via poll_wait(). This allowed for a small race window to happen where an event could come in, check the condition variable see it set to true, clear it, and then wake all the waiters. But because the waiter set the variable before adding itself to the queue, the waker could have cleared the variable after it was set and then miss waking it up as it wasn't added to the queue yet. Discussing this bug, we realized that a memory barrier needed to be added too, for the rare case that something polls for a single trace event to happen (and just one, no more to come in), and miss the wakeup due to memory ordering. Ideally, a memory barrier needs to be added on the writer side too, but as that will kill tracing performance and this is for a situation that tracing wasn't even designed for (who traces one instance of an event, use a printk instead!), this isn't worth adding the barrier. But we can in the future add the barrier for when the buffer goes from empty to the first event, as that would cover this case" * tag 'trace-fixes-v3.17-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: trace: Fix epoll hang when we race with new entries
2014-08-27param: check for tainting before calling set op.Rusty Russell
This means every set op doesn't need to call it, and it can move into params.c. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2014-08-27module: make it possible to have unsafe, tainting module paramsJani Nikula
Add flags field to struct kernel_params, and add the first flag: unsafe parameter. Modifying a kernel parameter with the unsafe flag set, either via the kernel command line or sysfs, will issue a warning and taint the kernel. Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Jean Delvare <khali@linux-fr.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Jon Mason <jon.mason@intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2014-08-27module: rename KERNEL_PARAM_FL_NOARG to avoid confusionJani Nikula
Make it clear this is about kernel_param_ops, not kernel_param (which will soon have a flags field of its own). No functional changes. Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Jean Delvare <khali@linux-fr.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Jon Mason <jon.mason@intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2014-08-26Merge branch 'nohz/drop-double-write-v3' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks into timers/core Pull nohz fixes from Frederic Weisbecker: " The tick reschedules itself unconditionally. It's relevant in periodic mode but not in dynticks mode where it results in spurious double clock writes and even spurious periodic behaviour for low-res case. This set fixes that: * 1st patch removes low-res periodic tick rescheduling in nohz mode. This fixes spurious periodic behaviour. * 2nd patch does the same for high-res mode. Here there is no such spurious periodic behaviour but it still spares a double clock write in some cases. " Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-08-26irqchips: Replace __this_cpu_ptr usesChristoph Lameter
[ARM specific] These are generally replaced with raw_cpu_ptr. However, in gic_get_percpu_base() we immediately dereference the pointer. This is equivalent to a raw_cpu_read. So use that operation there. Cc: nicolas.pitre@linaro.org Cc: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2014-08-26watchdog: Replace __raw_get_cpu_var usesChristoph Lameter
Most of these are the uses of &__raw_get_cpu_var for address calculation. touch_softlockup_watchdog_sync() uses __raw_get_cpu_var to write to per cpu variables. Use __this_cpu_write instead. Cc: Wim Van Sebroeck <wim@iguana.be> Cc: linux-watchdog@vger.kernel.org Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2014-08-26scheduler: Replace __get_cpu_var with this_cpu_ptrChristoph Lameter
Convert all uses of __get_cpu_var for address calculation to use this_cpu_ptr instead. [Uses of __get_cpu_var with cpumask_var_t are no longer handled by this patch] Cc: Peter Zijlstra <peterz@infradead.org> Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2014-08-26time: Convert a bunch of &__get_cpu_var introduced in the 3.16 merge periodChristoph Lameter
Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2014-08-26time: Replace __get_cpu_var usesChristoph Lameter
Convert uses of __get_cpu_var for creating a address from a percpu offset to this_cpu_ptr. The two cases where get_cpu_var is used to actually access a percpu variable are changed to use this_cpu_read/raw_cpu_read. Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2014-08-26kernel misc: Replace __get_cpu_var usesChristoph Lameter
Replace uses of __get_cpu_var for address calculation with this_cpu_ptr. Cc: akpm@linux-foundation.org Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2014-08-25trace: Fix epoll hang when we race with new entriesJosef Bacik
Epoll on trace_pipe can sometimes hang in a weird case. If the ring buffer is empty when we set waiters_pending but an event shows up exactly at that moment we can miss being woken up by the ring buffers irq work. Since ring_buffer_empty() is inherently racey we will sometimes think that the buffer is not empty. So we don't get woken up and we don't think there are any events even though there were some ready when we added the watch, which makes us hang. This patch fixes this by making sure that we are actually on the wait list before we set waiters_pending, and add a memory barrier to make sure ring_buffer_empty() is going to be correct. Link: http://lkml.kernel.org/p/1408989581-23727-1-git-send-email-jbacik@fb.com Cc: stable@vger.kernel.org # 3.10+ Cc: Martin Lau <kafai@fb.com> Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-08-25crash_dump: Make is_kdump_kernel() accessible from modulesAmir Vadai
In order to make is_kdump_kernel() accessible from modules, need to make elfcorehdr_addr exported. This was rejected in the past [1] because reset_devices was prefered in that context (reseting the device in kdump kernel), but now there are some network drivers that need to reduce memory usage when loaded from a kdump kernel. And in that context, is_kdump_kernel() suits better. [1] - https://lkml.org/lkml/2011/1/27/341 CC: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-25Merge tag 'trace-fixes-v3.17-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull fix for ftrace function tracer/profiler conflict from Steven Rostedt: "The rewrite of the ftrace code that makes it possible to allow for separate trampolines had a design flaw with the interaction between the function and function_graph tracers. The main flaw was the simplification of the use of multiple tracers having the same filter (like function and function_graph, that use the set_ftrace_filter file to filter their code). The design assumed that the two tracers could never run simultaneously as only one tracer can be used at a time. The problem with this assumption was that the function profiler could be implemented on top of the function graph tracer, and the function profiler could run at the same time as the function tracer. This caused the assumption to be broken and when ftrace detected this failed assumpiton it would spit out a nasty warning and shut itself down. Instead of using a single ftrace_ops that switches between the function and function_graph callbacks, the two tracers can again use their own ftrace_ops. But instead of having a complex hierarchy of ftrace_ops, the filter fields are placed in its own structure and the ftrace_ops can carefully use the same filter. This change took a bit to be able to allow for this and currently only the global_ops can share the same filter, but this new design can easily be modified to allow for any ftrace_ops to share its filter with another ftrace_ops. The first four patches deal with the change of allowing the ftrace_ops to share the filter (and this needs to go to 3.16 as well). The fifth patch fixes a bug that was also caused by the new changes but only for archs other than x86, and only if those archs implement a direct call to the function_graph tracer which they do not do yet but will in the future. It does not need to go to stable, but needs to be fixed before the other archs update their code to allow direct calls to the function_graph trampoline" * tag 'trace-fixes-v3.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: ftrace: Use current addr when converting to nop in __ftrace_replace_code() ftrace: Fix function_profiler and function tracer together ftrace: Fix up trampoline accounting with looping on hash ops ftrace: Update all ftrace_ops for a ftrace_hash_ops update ftrace: Allow ftrace_ops to use the hashes from other ops
2014-08-25irq: Export handle_fasteoi_irqVincent Stehlé
Export handle_fasteoi_irq to be able to use it in e.g. the Zynq gpio driver since commit 6dd859508336 ("gpio: zynq: Fix IRQ handlers"). This fixes the following link issue: ERROR: "handle_fasteoi_irq" [drivers/gpio/gpio-zynq.ko] undefined! Signed-off-by: Vincent Stehlé <vincent.stehle@laposte.net> Acked-by: Arnd Bergmann <arnd@arndb.de> Cc: linux-arm-kernel@lists.infradead.org Cc: Vincent Stehle <vincent.stehle@laposte.net> Cc: Lars-Peter Clausen <lars@metafoo.de> Cc: Linus Walleij <linus.walleij@linaro.org> Link: http://lkml.kernel.org/r/1408663880-29179-1-git-send-email-vincent.stehle@laposte.net Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-08-25cgroup: fix a typo in comment.Dongsheng Yang
There is no function named cgroup_enable_task_cg_links(). Instead, the correct function name in this comment should be cgroup_enabled_task_cg_lists(). Signed-off-by: Dongsheng Yang <yangds.fnst@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2014-08-25sched: Add function single_task_running to let a task check if it is the ↵Tim Chen
only task running on a cpu This function will help an async task processing batched jobs from workqueue decide if it wants to keep processing on more chunks of batched work that can be delayed, or to accumulate more work for more efficient batched processing later. If no other tasks are running on the cpu, the batching process can take advantgae of the available cpu cycles to a make decision to continue processing the existing accumulated work to minimize delay, otherwise it will yield. Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2014-08-24Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "A kprobes and a perf compat ioctl fix" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf: Handle compat ioctl kprobes: Skip kretprobe hit in NMI context to avoid deadlock
2014-08-24Merge branch 'linus' into perf/core, to fix conflictsIngo Molnar
Conflicts: arch/x86/kernel/cpu/perf_event_intel_uncore*.c Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-08-24perf: Add PERF_EVENT_STATE_EXIT state for events with exited taskJiri Olsa
Adding new perf event state to indicate that the monitored task has exited. In this case the event stays alive until the owner task exits or close the event fd while providing the last data through the read syscall and ring buffer. Instead it needs to propagate the error info (monitored task has died) via poll and read syscalls by returning POLLHUP and 0 respectively. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20140811120102.GY9918@twins.programming.kicks-ass.net Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jean Pihet <jean.pihet@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-t5y3w8jjx6tfo5w8y6oajsjq@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2014-08-24perf: Fix perf_poll to return proper POLLHUP valueJiri Olsa
Currently perf_poll returns POLL_HUP in case of error, which is wrong, because poll syscall expects POLLHUP. The POLL_HUP is meant to be used for SIGIO state. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20140811120102.GY9918@twins.programming.kicks-ass.net Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jean Pihet <jean.pihet@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-0ywfthh4lh65swe15f6w2x2q@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2014-08-22ftrace: Use current addr when converting to nop in __ftrace_replace_code()Steven Rostedt (Red Hat)
In __ftrace_replace_code(), when converting the call to a nop in a function it needs to compare against the "curr" (current) value of the ftrace ops, and not the "new" one. It currently does not affect x86 which is the only arch to do the trampolines with function graph tracer, but when other archs that do depend on this code implement the function graph trampoline, it can crash. Here's an example when ARM uses the trampolines (in the future): ------------[ cut here ]------------ WARNING: CPU: 0 PID: 9 at kernel/trace/ftrace.c:1716 ftrace_bug+0x17c/0x1f4() Modules linked in: omap_rng rng_core ipv6 CPU: 0 PID: 9 Comm: migration/0 Not tainted 3.16.0-test-10959-gf0094b28f303-dirty #52 [<c02188f4>] (unwind_backtrace) from [<c021343c>] (show_stack+0x20/0x24) [<c021343c>] (show_stack) from [<c095a674>] (dump_stack+0x78/0x94) [<c095a674>] (dump_stack) from [<c02532a0>] (warn_slowpath_common+0x7c/0x9c) [<c02532a0>] (warn_slowpath_common) from [<c02532ec>] (warn_slowpath_null+0x2c/0x34) [<c02532ec>] (warn_slowpath_null) from [<c02cbac4>] (ftrace_bug+0x17c/0x1f4) [<c02cbac4>] (ftrace_bug) from [<c02cc44c>] (ftrace_replace_code+0x80/0x9c) [<c02cc44c>] (ftrace_replace_code) from [<c02cc658>] (ftrace_modify_all_code+0xb8/0x164) [<c02cc658>] (ftrace_modify_all_code) from [<c02cc718>] (__ftrace_modify_code+0x14/0x1c) [<c02cc718>] (__ftrace_modify_code) from [<c02c7244>] (multi_cpu_stop+0xf4/0x134) [<c02c7244>] (multi_cpu_stop) from [<c02c6e90>] (cpu_stopper_thread+0x54/0x130) [<c02c6e90>] (cpu_stopper_thread) from [<c0271cd4>] (smpboot_thread_fn+0x1ac/0x1bc) [<c0271cd4>] (smpboot_thread_fn) from [<c026ddf0>] (kthread+0xe0/0xfc) [<c026ddf0>] (kthread) from [<c020f318>] (ret_from_fork+0x14/0x20) ---[ end trace dc9ce72c5b617d8f ]--- [ 65.047264] ftrace failed to modify [<c0208580>] asm_do_IRQ+0x10/0x1c [ 65.054070] actual: 85:1b:00:eb Fixes: 7413af1fb70e7 "ftrace: Make get_ftrace_addr() and get_ftrace_addr_old() global" Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-08-22ftrace: Fix function_profiler and function tracer togetherSteven Rostedt (Red Hat)
The latest rewrite of ftrace removed the separate ftrace_ops of the function tracer and the function graph tracer and had them share the same ftrace_ops. This simplified the accounting by removing the multiple layers of functions called, where the global_ops func would call a special list that would iterate over the other ops that were registered within it (like function and function graph), which itself was registered to the ftrace ops list of all functions currently active. If that sounds confusing, the code that implemented it was also confusing and its removal is a good thing. The problem with this change was that it assumed that the function and function graph tracer can never be used at the same time. This is mostly true, but there is an exception. That is when the function profiler uses the function graph tracer to profile. The function profiler can be activated the same time as the function tracer, and this breaks the assumption and the result is that ftrace will crash (it detects the error and shuts itself down, it does not cause a kernel oops). To solve this issue, a previous change allowed the hash tables for the functions traced by a ftrace_ops to be a pointer and let multiple ftrace_ops share the same hash. This allows the function and function_graph tracer to have separate ftrace_ops, but still share the hash, which is what is done. Now the function and function graph tracers have separate ftrace_ops again, and the function tracer can be run while the function_profile is active. Cc: stable@vger.kernel.org # 3.16 (apply after 3.17-rc4 is out) Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-08-22ftrace: Fix up trampoline accounting with looping on hash opsSteven Rostedt (Red Hat)
Now that a ftrace_hash can be shared by multiple ftrace_ops, they can dec the rec->flags by more than once (one per those that share the ftrace_hash). This means that the tramp_hash may not have a hash item when it was added. For example, if two ftrace_ops share a hash for a ftrace record, and the first ops has a trampoline, when it adds itself it will set the rec->flags TRAMP flag and increments its nr_trampolines counter. When the second ops is added, it must clear that tramp flag but also decrement the other ops that shares its hash. As the update to the function callbacks has not yet been performed, the other ops will not have the tramp hash set yet and it can not be used to know to decrement its nr_trampolines. Luckily, the tramp_hash does not need to be used. As the ftrace_mutex is held, a ops with a trampoline to a record during an update of another ops that shares the record will have its func_hash pointing to it. Since a trampoline can only be set for a record if only one ops is attached to it, we can just check if the record has a trampoline (the FTRACE_FL_TRAMP flag is set) and then find the ops that has this record in its hashes. Also added some output to help debug when things go wrong. Cc: stable@vger.kernel.org # 3.16+ (apply after 3.17-rc4 is out) Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-08-22ftrace: Update all ftrace_ops for a ftrace_hash_ops updateSteven Rostedt (Red Hat)
When updating what an ftrace_ops traces, if it is registered (that is, actively tracing), and that ftrace_ops uses the shared global_ops local_hash, then we need to update all tracers that are active and also share the global_ops' ftrace_hash_ops. Cc: stable@vger.kernel.org # 3.16 (apply after 3.17-rc4 is out) Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-08-22cgroup: Display legacy cgroup files on default hierarchyVivek Goyal
Kernel command line parameter cgroup__DEVEL__legacy_files_on_dfl forces legacy cgroup files to show up on default hierarhcy if susbsystem does not have any files defined for default hierarchy. But this seems to be working only if legacy files are defined in ss->legacy_cftypes. If one adds some cftypes later using cgroup_add_legacy_cftypes(), these files don't show up on default hierarchy. Update the function accordingly so that the dynamically added legacy files also show up in the default hierarchy if the target subsystem is also using the base legacy files for the default hierarchy. tj: Patch description and comment updates. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2014-08-22ftrace: Allow ftrace_ops to use the hashes from other opsSteven Rostedt (Red Hat)
Currently the top level debug file system function tracer shares its ftrace_ops with the function graph tracer. This was thought to be fine because the tracers are not used together, as one can only enable function or function_graph tracer in the current_tracer file. But that assumption proved to be incorrect. The function profiler can use the function graph tracer when function tracing is enabled. Since all function graph users uses the function tracing ftrace_ops this causes a conflict and when a user enables both function profiling as well as the function tracer it will crash ftrace and disable it. The quick solution so far is to move them as separate ftrace_ops like it was earlier. The problem though is to synchronize the functions that are traced because both function and function_graph tracer are limited by the selections made in the set_ftrace_filter and set_ftrace_notrace files. To handle this, a new structure is made called ftrace_ops_hash. This structure will now hold the filter_hash and notrace_hash, and the ftrace_ops will point to this structure. That will allow two ftrace_ops to share the same hashes. Since most ftrace_ops do not share the hashes, and to keep allocation simple, the ftrace_ops structure will include both a pointer to the ftrace_ops_hash called func_hash, as well as the structure itself, called local_hash. When the ops are registered, the func_hash pointer will be initialized to point to the local_hash within the ftrace_ops structure. Some of the ftrace internal ftrace_ops will be initialized statically. This will allow for the function and function_graph tracer to have separate ops but still share the same hash tables that determine what functions they trace. Cc: stable@vger.kernel.org # 3.16 (apply after 3.17-rc4 is out) Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-08-22nohz: Avoid tick's double reprogramming in highres modeViresh Kumar
In highres mode, the tick reschedules itself unconditionally to the next jiffies. However while this clock reprogramming is relevant when the tick is in periodic mode, it's not that interesting when we run in dynticks mode because irq exit is likely going to overwrite the next tick to some randomly deferred future. So lets just get rid of this tick self rescheduling in dynticks mode. This way we can avoid some clockevents double write in favourable scenarios like when we stop the tick completely in idle while no other hrtimer is pending. Suggested-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>