summaryrefslogtreecommitdiffstats
path: root/kernel
AgeCommit message (Collapse)Author
2014-11-04sched: Refactor task_struct to use numa_faults instead of numa_* pointersIulia Manda
This patch simplifies task_struct by removing the four numa_* pointers in the same array and replacing them with the array pointer. By doing this, on x86_64, the size of task_struct is reduced by 3 ulong pointers (24 bytes on x86_64). A new parameter is added to the task_faults_idx function so that it can return an index to the correct offset, corresponding with the old precalculated pointers. All of the code in sched/ that depended on task_faults_idx and numa_* was changed in order to match the new logic. Signed-off-by: Iulia Manda <iulia.manda21@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: mgorman@suse.de Cc: dave@stgolabs.net Cc: riel@redhat.com Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20141031001331.GA30662@winterfell Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-04sched/deadline: Don't check CONFIG_SMP in switched_from_dl()Wanpeng Li
There are both UP and SMP version of pull_dl_task(), so don't need to check CONFIG_SMP in switched_from_dl(); Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Juri Lelli <juri.lelli@arm.com> Cc: Kirill Tkhai <ktkhai@parallels.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1414708776-124078-6-git-send-email-wanpeng.li@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-04sched/deadline: Reschedule from switched_from_dl() after a successful pullWanpeng Li
In switched_from_dl() we have to issue a resched if we successfully pulled some task from other cpus. This patch also aligns the behavior with -rt. Suggested-by: Juri Lelli <juri.lelli@arm.com> Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Kirill Tkhai <ktkhai@parallels.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1414708776-124078-5-git-send-email-wanpeng.li@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-04sched/deadline: Push task away if the deadline is equal to curr during wakeupWanpeng Li
This patch pushes task away if the dealine of the task is equal to current during wake up. The same behavior as rt class. Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Juri Lelli <juri.lelli@arm.com> Cc: Kirill Tkhai <ktkhai@parallels.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1414708776-124078-4-git-send-email-wanpeng.li@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-04sched/deadline: Add deadline rq status printWanpeng Li
This patch add deadline rq status print. Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Juri Lelli <juri.lelli@arm.com> Cc: Kirill Tkhai <ktkhai@parallels.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1414708776-124078-3-git-send-email-wanpeng.li@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-04sched/deadline: Fix artificial overrun introduced by yield_task_dl()Wanpeng Li
The yield semantic of deadline class is to reduce remaining runtime to zero, and then update_curr_dl() will stop it. However, comsumed bandwidth is reduced from the budget of yield task again even if it has already been set to zero which leads to artificial overrun. This patch fix it by make sure we don't steal some more time from the task that yielded in update_curr_dl(). Suggested-by: Juri Lelli <juri.lelli@arm.com> Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Kirill Tkhai <ktkhai@parallels.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1414708776-124078-2-git-send-email-wanpeng.li@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-04sched/rt: Clean up check_preempt_equal_prio()Wanpeng Li
This patch checks if current can be pushed/pulled somewhere else in advance to make logic clear, the same behavior as dl class. - If current can't be migrated, useless to reschedule, let's hope task can move out. - If task is migratable, so let's not schedule it and see if it can be pushed or pulled somewhere else. Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Juri Lelli <juri.lelli@arm.com> Cc: Kirill Tkhai <ktkhai@parallels.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1414708776-124078-1-git-send-email-wanpeng.li@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-04sched/core: Use dl_bw_of() under rcu_read_lock_sched()Juri Lelli
As per commit f10e00f4bf36 ("sched/dl: Use dl_bw_of() under rcu_read_lock_sched()"), dl_bw_of() has to be protected by rcu_read_lock_sched(). Signed-off-by: Juri Lelli <juri.lelli@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1414497286-28824-1-git-send-email-juri.lelli@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-04sched: Check if we got a shallowest_idle_cpu before searching for ↵Yao Dongdong
least_loaded_cpu Idle cpu is idler than non-idle cpu, so we needn't search for least_loaded_cpu after we have found an idle cpu. Signed-off-by: Yao Dongdong <yaodongdong@huawei.com> Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1414469286-6023-1-git-send-email-yaodongdong@huawei.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-04sched/deadline: Implement cancel_dl_timer() to use in switched_from_dl()Kirill Tkhai
Currently used hrtimer_try_to_cancel() is racy: raw_spin_lock(&rq->lock) ... dl_task_timer raw_spin_lock(&rq->lock) ... raw_spin_lock(&rq->lock) ... switched_from_dl() ... ... hrtimer_try_to_cancel() ... ... switched_to_fair() ... ... ... ... ... ... ... ... raw_spin_unlock(&rq->lock) ... (asquired) ... ... ... ... ... ... do_exit() ... ... schedule() ... ... raw_spin_lock(&rq->lock) ... raw_spin_unlock(&rq->lock) ... ... ... raw_spin_unlock(&rq->lock) ... raw_spin_lock(&rq->lock) ... ... (asquired) put_task_struct() ... ... free_task_struct() ... ... ... ... raw_spin_unlock(&rq->lock) ... (asquired) ... ... ... ... ... (use after free) ... So, let's implement 100% guaranteed way to cancel the timer and let's be sure we are safe even in very unlikely situations. rq unlocking does not limit the area of switched_from_dl() use, because this has already been possible in pull_dl_task() below. Let's consider the safety of of this unlocking. New code in the patch is working when hrtimer_try_to_cancel() fails. This means the callback is running. In this case hrtimer_cancel() is just waiting till the callback is finished. Two 1) Since we are in switched_from_dl(), new class is not dl_sched_class and new prio is not less MAX_DL_PRIO. So, the callback returns early; it's right after !dl_task() check. After that hrtimer_cancel() returns back too. The above is: raw_spin_lock(rq->lock); ... ... dl_task_timer() ... raw_spin_lock(rq->lock); switched_from_dl() ... hrtimer_try_to_cancel() ... raw_spin_unlock(rq->lock); ... hrtimer_cancel() ... ... raw_spin_unlock(rq->lock); ... return HRTIMER_NORESTART; ... ... raw_spin_lock(rq->lock); ... 2) But the below is also possible: dl_task_timer() raw_spin_lock(rq->lock); ... raw_spin_unlock(rq->lock); raw_spin_lock(rq->lock); ... switched_from_dl() ... hrtimer_try_to_cancel() ... ... return HRTIMER_NORESTART; raw_spin_unlock(rq->lock); ... hrtimer_cancel(); ... raw_spin_lock(rq->lock); ... In this case hrtimer_cancel() returns immediately. Very unlikely case, just to mention. Nobody can manipulate the task, because check_class_changed() is always called with pi_lock locked. Nobody can force the task to participate in (concurrent) priority inheritance schemes (the same reason). All concurrent task operations require pi_lock, which is held by us. No deadlocks with dl_task_timer() are possible, because it returns right after !dl_task() check (it does nothing). If we receive a new dl_task during the time of unlocked rq, we just don't have to do pull_dl_task() in switched_from_dl() further. Signed-off-by: Kirill Tkhai <ktkhai@parallels.com> [ Added comments] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Juri Lelli <juri.lelli@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1414420852.19914.186.camel@tkhai Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-04sched: Use WARN_ONCE for the might_sleep() TASK_RUNNING testPeter Zijlstra
In some cases this can trigger a true flood of output. Requested-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-04audit, sched/wait: Fixup kauditd_thread() wait loopPeter Zijlstra
The kauditd_thread wait loop is a bit iffy; it has a number of problems: - calls try_to_freeze() before schedule(); you typically want the thread to re-evaluate the sleep condition when unfreezing, also freeze_task() issues a wakeup. - it unconditionally does the {add,remove}_wait_queue(), even when the sleep condition is false. Use wait_event_freezable() that does the right thing. Reported-by: Mike Galbraith <umgwanakikbuti@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Eric Paris <eparis@redhat.com> Cc: oleg@redhat.com Cc: Eric Paris <eparis@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20141002102251.GA6324@worktop.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-04sched/wait: Fix a kthread race with wait_woken()Peter Zijlstra
There is a race between kthread_stop() and the new wait_woken() that can result in a lack of progress. CPU 0 | CPU 1 | rfcomm_run() | kthread_stop() ... | if (!test_bit(KTHREAD_SHOULD_STOP)) | | set_bit(KTHREAD_SHOULD_STOP) | wake_up_process() wait_woken() | wait_for_completion() set_current_state(INTERRUPTIBLE) | if (!WQ_FLAG_WOKEN) | schedule_timeout() | | After which both tasks will wait.. forever. Fix this by having wait_woken() check for kthread_should_stop() but only for kthreads (obviously). Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Peter Hurley <peter@hurleysoftware.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-04sched: Remove lockdep check in sched_move_task()Kirill Tkhai
sched_move_task() is the only interface to change sched_task_group: cpu_cgrp_subsys methods and autogroup_move_group() use it. Everything is synchronized by task_rq_lock(), so cpu_cgroup_attach() is ordered with other users of sched_move_task(). This means we do no need RCU here: if we've dereferenced a tg here, the .attach method hasn't been called for it yet. Thus, we should pass "true" to task_css_check() to silence lockdep warnings. Fixes: eeb61e53ea19 ("sched: Fix race between task_group and sched_task_group") Reported-by: Oleg Nesterov <oleg@redhat.com> Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Kirill Tkhai <ktkhai@parallels.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1414473874.8574.2.camel@tkhai Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-03rcutorture: Fix rcu_torture_cbflood() memory leakPaul E. McKenney
Commit 38706bc5a29a (rcutorture: Add callback-flood test) vmalloc()ed a bunch of RCU callbacks, but failed to free them. This commit fixes that oversight. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Pranith Kumar <bobby.prani@gmail.com>
2014-11-03rcutorture: Add early boot self testsPranith Kumar
Add early boot self tests for RCU under CONFIG_PROVE_RCU. Currently the only test is adding a dummy callback which increments a counter which we then later verify after calling rcu_barrier*(). Signed-off-by: Pranith Kumar <bobby.prani@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2014-11-03cpu: Avoid puts_pending overflowPaul E. McKenney
A long string of get_online_cpus() with each followed by a put_online_cpu() that fails to acquire cpu_hotplug.lock can result in overflow of the cpu_hotplug.puts_pending counter. Although this is perhaps improbably, a system with absolutely no CPU-hotplug operations will have an arbitrarily long time in which this overflow could occur. This commit therefore adds overflow checks to get_online_cpus() and try_get_online_cpus(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Pranith Kumar <bobby.prani@gmail.com>
2014-11-03rcu: Remove "cpu" argument to rcu_cleanup_after_idle()Paul E. McKenney
The "cpu" argument to rcu_cleanup_after_idle() is always the current CPU, so drop it. This moves the smp_processor_id() from the caller to rcu_cleanup_after_idle(), saving argument-passing overhead. Again, the anticipated cross-CPU uses of these functions has been replaced by NO_HZ_FULL. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Pranith Kumar <bobby.prani@gmail.com>
2014-11-03rcu: Remove "cpu" argument to rcu_prepare_for_idle()Paul E. McKenney
The "cpu" argument to rcu_prepare_for_idle() is always the current CPU, so drop it. This in turn allows two of the uses of "cpu" in this function to be replaced with a this_cpu_ptr() and the third by smp_processor_id(), replacing that of the call to rcu_prepare_for_idle(). Again, the anticipated cross-CPU uses of these functions has been replaced by NO_HZ_FULL. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Pranith Kumar <bobby.prani@gmail.com>
2014-11-03rcu: Remove "cpu" argument to rcu_needs_cpu()Paul E. McKenney
The "cpu" argument to rcu_needs_cpu() is always the current CPU, so drop it. This in turn allows the "cpu" argument to rcu_cpu_has_callbacks() to be removed, which allows the uses of "cpu" in both functions to be replaced with a this_cpu_ptr(). Again, the anticipated cross-CPU uses of these functions has been replaced by NO_HZ_FULL. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Pranith Kumar <bobby.prani@gmail.com>
2014-11-03rcu: Remove "cpu" argument to rcu_note_context_switch()Paul E. McKenney
The "cpu" argument to rcu_note_context_switch() is always the current CPU, so drop it. This in turn allows the "cpu" argument to rcu_preempt_note_context_switch() to be removed, which allows the sole use of "cpu" in both functions to be replaced with a this_cpu_ptr(). Again, the anticipated cross-CPU uses of these functions has been replaced by NO_HZ_FULL. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Pranith Kumar <bobby.prani@gmail.com>
2014-11-03rcu: Remove "cpu" argument to rcu_preempt_check_callbacks()Paul E. McKenney
Because rcu_preempt_check_callbacks()'s argument is guaranteed to always be the current CPU, drop the argument and replace per_cpu() with __this_cpu_read(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Pranith Kumar <bobby.prani@gmail.com>
2014-11-03rcu: Remove "cpu" argument to rcu_pending()Paul E. McKenney
Because rcu_pending()'s argument is guaranteed to always be the current CPU, drop the argument and replace per_cpu_ptr() with this_cpu_ptr(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Pranith Kumar <bobby.prani@gmail.com>
2014-11-03rcu: Remove "cpu" argument to rcu_check_callbacks()Paul E. McKenney
The "cpu" argument was kept around on the off-chance that RCU might offload scheduler-clock interrupts. However, this offload approach has been replaced by NO_HZ_FULL, which offloads -all- RCU processing from qualifying CPUs. It is therefore time to remove the "cpu" argument to rcu_check_callbacks(), which this commit does. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Pranith Kumar <bobby.prani@gmail.com>
2014-11-03rcu: Use DEFINE_PER_CPU_SHARED_ALIGNED for rcu_dataPaul E. McKenney
The rcu_data per-CPU variable has a number of fields that are atomically manipulated, potentially by any CPU. This situation can result in false sharing with per-CPU variables that have the misfortune of being allocated adjacent to rcu_data in memory. This commit therefore changes the DEFINE_PER_CPU() to DEFINE_PER_CPU_SHARED_ALIGNED() in order to avoid this false sharing. Reported-by: Christoph Lameter <cl@linux.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Christoph Lameter <cl@linux.com> Reviewed-by: Pranith Kumar <bobby.prani@gmail.com>
2014-11-03rcu: Remove rcu_dynticks * parameters when they are always ↵Christoph Lameter
this_cpu_ptr(&rcu_dynticks) For some functions in kernel/rcu/tree* the rdtp parameter is always this_cpu_ptr(rdtp). Remove the parameter if constant and calculate the pointer in function. This will have the advantage that it is obvious that the address are all per cpu offsets and thus it will enable the use of this_cpu_ops in the future. Signed-off-by: Christoph Lameter <cl@linux.com> [ paulmck: Forward-ported to rcu/dev, whitespace adjustment. ] Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Pranith Kumar <bobby.prani@gmail.com>
2014-11-03move d_rcu from overlapping d_child to overlapping d_aliasAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-11-03PM / Hibernate: Migrate to ktime_tTina Ruchandani
This patch migrates swsusp_show_speed and its callers to using ktime_t instead of 'struct timeval' which suffers from the y2038 problem. Changes to swsusp_show_speed: - use ktime_t for start and stop times - pass start and stop times by value Calling functions affected: - load_image - load_image_lzo - save_image - save_image_lzo - hibernate_preallocate_memory Design decisions: - use ktime_t to preserve same granularity of reporting as before - use centisecs logic as before to avoid 'div by zero' issues caused by using seconds and nanoseconds directly - use monotonic time (ktime_get()) since we only care about elapsed time. Signed-off-by: Tina Ruchandani <ruchandani.tina@gmail.com> Suggested-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-11-01Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: drivers/net/phy/marvell.c Simple overlapping changes in drivers/net/phy/marvell.c Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-31Merge tag 'pm+acpi-3.18-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI and power management fixes from Rafael Wysocki: "These are fixes received after my previous pull request plus one that has been in the works for quite a while, but its previous version caused problems to happen, so it's been deferred till now. Fixed are two recent regressions (MFD enumeration and cpufreq-dt), ACPI EC regression introduced in 3.17, system suspend error code path regression introduced in 3.15, an older bug related to recovery from failing resume from hibernation and a cpufreq-dt driver issue related to operation performance points. Specifics: - Fix a crash on r8a7791/koelsch during resume from system suspend caused by a recent cpufreq-dt commit (Geert Uytterhoeven). - Fix an MFD enumeration problem introduced by a recent commit adding ACPI support to the MFD subsystem that exposed a weakness in the ACPI core causing ACPI enumeration to be applied to all devices associated with one ACPI companion object, although it should be used for one of them only (Mika Westerberg). - Fix an ACPI EC regression introduced during the 3.17 cycle causing some Samsung laptops to misbehave as a result of a workaround targeted at some Acer machines. That includes a revert of a commit that went too far and a quirk for the Acer machines in question. From Lv Zheng. - Fix a regression in the system suspend error code path introduced during the 3.15 cycle that causes it to fail to take errors from asychronous execution of "late" suspend callbacks into account (Imre Deak). - Fix a long-standing bug in the hibernation resume error code path that fails to roll back everything correcty on "freeze" callback errors and leaves some devices in a "suspended" state causing more breakage to happen subsequently (Imre Deak). - Make the cpufreq-dt driver disable operation performance points that are not supported by the VR connected to the CPU voltage plane with acceptable tolerance instead of constantly failing voltage scaling later on (Lucas Stach)" * tag 'pm+acpi-3.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPI / EC: Fix regression due to conflicting firmware behavior between Samsung and Acer. Revert "ACPI / EC: Add support to disallow QR_EC to be issued before completing previous QR_EC" cpufreq: cpufreq-dt: Restore default cpumask_setall(policy->cpus) PM / Sleep: fix recovery during resuming from hibernation PM / Sleep: fix async suspend_late/freeze_late error handling ACPI: Use ACPI companion to match only the first physical device cpufreq: cpufreq-dt: disable unsupported OPPs
2014-10-31Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: "A bit has accumulated, but it's been a week or so since my last batch of post-merge-window fixes, so... 1) Missing module license in netfilter reject module, from Pablo. Lots of people ran into this. 2) Off by one in mac80211 baserate calculation, from Karl Beldan. 3) Fix incorrect return value from ax88179_178a driver's set_mac_addr op, which broke use of it with bonding. From Ian Morgan. 4) Checking of skb_gso_segment()'s return value was not all encompassing, it can return an SKB pointer, a pointer error, or NULL. Fix from Florian Westphal. This is crummy, and longer term will be fixed to just return error pointers or a real SKB. 6) Encapsulation offloads not being handled by skb_gso_transport_seglen(). From Florian Westphal. 7) Fix deadlock in TIPC stack, from Ying Xue. 8) Fix performance regression from using rhashtable for netlink sockets. The problem was the synchronize_net() invoked for every socket destroy. From Thomas Graf. 9) Fix bug in eBPF verifier, and remove the strong dependency of BPF on NET. From Alexei Starovoitov. 10) In qdisc_create(), use the correct interface to allocate ->cpu_bstats, otherwise the u64_stats_sync member isn't initialized properly. From Sabrina Dubroca. 11) Off by one in ip_set_nfnl_get_byindex(), from Dan Carpenter. 12) nf_tables_newchain() was erroneously expecting error pointers from netdev_alloc_pcpu_stats(). It only returna a valid pointer or NULL. From Sabrina Dubroca. 13) Fix use-after-free in _decode_session6(), from Li RongQing. 14) When we set the TX flow hash on a socket, we mistakenly do so before we've nailed down the final source port. Move the setting deeper to fix this. From Sathya Perla. 15) NAPI budget accounting in amd-xgbe driver was counting descriptors instead of full packets, fix from Thomas Lendacky. 16) Fix total_data_buflen calculation in hyperv driver, from Haiyang Zhang. 17) Fix bcma driver build with OF_ADDRESS disabled, from Hauke Mehrtens. 18) Fix mis-use of per-cpu memory in TCP md5 code. The problem is that something that ends up being vmalloc memory can't be passed to the crypto hash routines via scatter-gather lists. From Eric Dumazet. 19) Fix regression in promiscuous mode enabling in cdc-ether, from Olivier Blin. 20) Bucket eviction and frag entry killing can race with eachother, causing an unlink of the object from the wrong list. Fix from Nikolay Aleksandrov. 21) Missing initialization of spinlock in cxgb4 driver, from Anish Bhatt. 22) Do not cache ipv4 routing failures, otherwise if the sysctl for forwarding is subsequently enabled this won't be seen. From Nicolas Cavallari" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (131 commits) drivers: net: cpsw: Support ALLMULTI and fix IFF_PROMISC in switch mode drivers: net: cpsw: Fix broken loop condition in switch mode net: ethtool: Return -EOPNOTSUPP if user space tries to read EEPROM with lengh 0 stmmac: pci: set default of the filter bins net: smc91x: Fix gpios for device tree based booting mpls: Allow mpls_gso to be built as module mpls: Fix mpls_gso handler. r8152: stop submitting intr for -EPROTO netfilter: nft_reject_bridge: restrict reject to prerouting and input netfilter: nft_reject_bridge: don't use IP stack to reject traffic netfilter: nf_reject_ipv6: split nf_send_reset6() in smaller functions netfilter: nf_reject_ipv4: split nf_send_reset() in smaller functions netfilter: nf_tables_bridge: update hook_mask to allow {pre,post}routing drivers/net: macvtap and tun depend on INET drivers/net, ipv6: Select IPv6 fragment idents for virtio UFO packets drivers/net: Disable UFO through virtio net: skb_fclone_busy() needs to detect orphaned skb gre: Use inner mac length when computing tunnel length mlx4: Avoid leaking steering rules on flow creation error flow net/mlx4_en: Don't attempt to TX offload the outer UDP checksum for VXLAN ...
2014-10-31Merge branch 'sched-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Ingo Molnar: "Various scheduler fixes all over the place: three SCHED_DL fixes, three sched/numa fixes, two generic race fixes and a comment fix" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/dl: Fix preemption checks sched: Update comments for CLONE_NEWNS sched: stop the unbound recursion in preempt_schedule_context() sched/fair: Fix division by zero sysctl_numa_balancing_scan_size sched/fair: Care divide error in update_task_scan_period() sched/numa: Fix unsafe get_task_struct() in task_numa_assign() sched/deadline: Fix races between rt_mutex_setprio() and dl_task_timer() sched/deadline: Don't replenish from a !SCHED_DEADLINE entity sched: Fix race between task_group and sched_task_group
2014-10-31Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "Mostly tooling fixes, plus on the kernel side: - a revert for a newly introduced PMU driver which isn't complete yet and where we ran out of time with fixes (to be tried again in v3.19) - this makes up for a large chunk of the diffstat. - compilation warning fixes - a printk message fix - event_idx usage fixes/cleanups" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf probe: Trivial typo fix for --demangle perf tools: Fix report -F dso_from for data without branch info perf tools: Fix report -F dso_to for data without branch info perf tools: Fix report -F symbol_from for data without branch info perf tools: Fix report -F symbol_to for data without branch info perf tools: Fix report -F mispredict for data without branch info perf tools: Fix report -F in_tx for data without branch info perf tools: Fix report -F abort for data without branch info perf tools: Make CPUINFO_PROC an array to support different kernel versions perf callchain: Use global caching provided by libunwind perf/x86/intel: Revert incomplete and undocumented Broadwell client support perf/x86: Fix compile warnings for intel_uncore perf: Fix typos in sample code in the perf_event.h header perf: Fix and clean up initialization of pmu::event_idx perf: Fix bogus kernel printk perf diff: Add missing hists__init() call at tool start
2014-10-31Merge branch 'locking-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull futex fixes from Ingo Molnar: "This contains two futex fixes: one fixes a race condition, the other clarifies shared/private futex comments" * 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: futex: Fix a race condition between REQUEUE_PI and task death futex: Mention key referencing differences between shared and private futexes
2014-10-31Merge branch 'core-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull core fixes from Ingo Molnar: "The tree contains two RCU fixes and a compiler quirk comment fix" * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: rcu: Make rcu_barrier() understand about missing rcuo kthreads compiler/gcc4+: Remove inaccurate comment about 'asm goto' miscompiles rcu: More on deadlock between CPU hotplug and expedited grace periods
2014-10-31Merge branch 'timers-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fixes from Thomas Gleixner: "As you requested in the rc2 release mail the timer department serves you a few real bug fixes: - Fix the probe logic of the architected arm/arm64 timer - Plug a stack info leak in posix-timers - Prevent a shift out of bounds issue in the clockevents core" * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: ARM/ARM64: arch-timer: fix arch_timer_probed logic clockevents: Prevent shift out of bounds posix-timers: Fix stack info leak in timer_create()
2014-10-31Merge tag 'trace-fixes-v3.18-rc1-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fix from Steven Rostedt: "ARM has system calls outside the NR_syscalls range, and the generic tracing system does not support that and without checks, it can cause an oops to be reported. Rabin Vincent added checks in the return code on syscall events to make sure that the system call number is within the range that tracing knows about, and if not, simply ignores the system call. The system call tracing infrastructure needs to be rewritten to handle these cases better, but for now, to keep from oopsing, this patch will do" * tag 'trace-fixes-v3.18-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing/syscalls: Ignore numbers outside NR_syscalls' range
2014-10-31ftrace/x86: Show trampoline call function in enabled_functionsSteven Rostedt (Red Hat)
The file /sys/kernel/debug/tracing/eneabled_functions is used to debug ftrace function hooks. Add to the output what function is being called by the trampoline if the arch supports it. Add support for this feature in x86_64. Cc: H. Peter Anvin <hpa@linux.intel.com> Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Tested-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-10-31ftrace/x86: Add dynamic allocated trampoline for ftrace_opsSteven Rostedt (Red Hat)
The current method of handling multiple function callbacks is to register a list function callback that calls all the other callbacks based on their hash tables and compare it to the function that the callback was called on. But this is very inefficient. For example, if you are tracing all functions in the kernel and then add a kprobe to a function such that the kprobe uses ftrace, the mcount trampoline will switch from calling the function trace callback to calling the list callback that will iterate over all registered ftrace_ops (in this case, the function tracer and the kprobes callback). That means for every function being traced it checks the hash of the ftrace_ops for function tracing and kprobes, even though the kprobes is only set at a single function. The kprobes ftrace_ops is checked for every function being traced! Instead of calling the list function for functions that are only being traced by a single callback, we can call a dynamically allocated trampoline that calls the callback directly. The function graph tracer already uses a direct call trampoline when it is being traced by itself but it is not dynamically allocated. It's trampoline is static in the kernel core. The infrastructure that called the function graph trampoline can also be used to call a dynamically allocated one. For now, only ftrace_ops that are not dynamically allocated can have a trampoline. That is, users such as function tracer or stack tracer. kprobes and perf allocate their ftrace_ops, and until there's a safe way to free the trampoline, it can not be used. The dynamically allocated ftrace_ops may, although, use the trampoline if the kernel is not compiled with CONFIG_PREEMPT. But that will come later. Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Tested-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-10-30tracing/syscalls: Ignore numbers outside NR_syscalls' rangeRabin Vincent
ARM has some private syscalls (for example, set_tls(2)) which lie outside the range of NR_syscalls. If any of these are called while syscall tracing is being performed, out-of-bounds array access will occur in the ftrace and perf sys_{enter,exit} handlers. # trace-cmd record -e raw_syscalls:* true && trace-cmd report ... true-653 [000] 384.675777: sys_enter: NR 192 (0, 1000, 3, 4000022, ffffffff, 0) true-653 [000] 384.675812: sys_exit: NR 192 = 1995915264 true-653 [000] 384.675971: sys_enter: NR 983045 (76f74480, 76f74000, 76f74b28, 76f74480, 76f76f74, 1) true-653 [000] 384.675988: sys_exit: NR 983045 = 0 ... # trace-cmd record -e syscalls:* true [ 17.289329] Unable to handle kernel paging request at virtual address aaaaaace [ 17.289590] pgd = 9e71c000 [ 17.289696] [aaaaaace] *pgd=00000000 [ 17.289985] Internal error: Oops: 5 [#1] PREEMPT SMP ARM [ 17.290169] Modules linked in: [ 17.290391] CPU: 0 PID: 704 Comm: true Not tainted 3.18.0-rc2+ #21 [ 17.290585] task: 9f4dab00 ti: 9e710000 task.ti: 9e710000 [ 17.290747] PC is at ftrace_syscall_enter+0x48/0x1f8 [ 17.290866] LR is at syscall_trace_enter+0x124/0x184 Fix this by ignoring out-of-NR_syscalls-bounds syscall numbers. Commit cd0980fc8add "tracing: Check invalid syscall nr while tracing syscalls" added the check for less than zero, but it should have also checked for greater than NR_syscalls. Link: http://lkml.kernel.org/p/1414620418-29472-1-git-send-email-rabin@rab.in Fixes: cd0980fc8add "tracing: Check invalid syscall nr while tracing syscalls" Cc: stable@vger.kernel.org # 2.6.33+ Signed-off-by: Rabin Vincent <rabin@rab.in> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-10-30audit: AUDIT_FEATURE_CHANGE message format missing delimiting spaceRichard Guy Briggs
Add a space between subj= and feature= fields to make them parsable. Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Paul Moore <pmoore@redhat.com>
2014-10-30bpf: reduce verifier memory consumptionAlexei Starovoitov
verifier keeps track of register state spilled to stack. registers are 8-byte wide and always aligned, so instead of tracking them in every byte-sized stack slot, use MAX_BPF_STACK / 8 array to track spilled register state. Though verifier runs in user context and its state freed immediately after verification, it makes sense to reduce its memory usage. This optimization reduces sizeof(struct verifier_state) from 12464 to 1712 on 64-bit and from 6232 to 1112 on 32-bit. Note, this patch doesn't change existing limits, which are there to bound time and memory during verification: 4k total number of insns in a program, 1k number of jumps (states to visit) and 32k number of processed insn (since an insn may be visited multiple times). Theoretical worst case memory during verification is 1712 * 1k = 17Mbyte. Out-of-memory situation triggers cleanup and rejects the program. Suggested-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-30Merge branch 'urgent-for-mingo' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/urgent Pull two RCU fixes from Paul E. McKenney: " - Complete the work of commit dd56af42bd82 (rcu: Eliminate deadlock between CPU hotplug and expedited grace periods), which was intended to allow synchronize_sched_expedited() to be safely used when holding locks acquired by CPU-hotplug notifiers. This commit makes the put_online_cpus() avoid the deadlock instead of just handling the get_online_cpus(). - Complete the work of commit 35ce7f29a44a (rcu: Create rcuo kthreads only for onlined CPUs), which was intended to allow RCU to avoid allocating unneeded kthreads on systems where the firmware says that there are more CPUs than are really present. This commit makes rcu_barrier() aware of the mismatch, so that it doesn't hang waiting for non-existent CPUs. " Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-10-29kernel/kmod: fix use-after-free of the sub_info structureMartin Schwidefsky
Found this in the message log on a s390 system: BUG kmalloc-192 (Not tainted): Poison overwritten Disabling lock debugging due to kernel taint INFO: 0x00000000684761f4-0x00000000684761f7. First byte 0xff instead of 0x6b INFO: Allocated in call_usermodehelper_setup+0x70/0x128 age=71 cpu=2 pid=648 __slab_alloc.isra.47.constprop.56+0x5f6/0x658 kmem_cache_alloc_trace+0x106/0x408 call_usermodehelper_setup+0x70/0x128 call_usermodehelper+0x62/0x90 cgroup_release_agent+0x178/0x1c0 process_one_work+0x36e/0x680 worker_thread+0x2f0/0x4f8 kthread+0x10a/0x120 kernel_thread_starter+0x6/0xc kernel_thread_starter+0x0/0xc INFO: Freed in call_usermodehelper_exec+0x110/0x1b8 age=71 cpu=2 pid=648 __slab_free+0x94/0x560 kfree+0x364/0x3e0 call_usermodehelper_exec+0x110/0x1b8 cgroup_release_agent+0x178/0x1c0 process_one_work+0x36e/0x680 worker_thread+0x2f0/0x4f8 kthread+0x10a/0x120 kernel_thread_starter+0x6/0xc kernel_thread_starter+0x0/0xc There is a use-after-free bug on the subprocess_info structure allocated by the user mode helper. In case do_execve() returns with an error ____call_usermodehelper() stores the error code to sub_info->retval, but sub_info can already have been freed. Regarding UMH_NO_WAIT, the sub_info structure can be freed by __call_usermodehelper() before the worker thread returns from do_execve(), allowing memory corruption when do_execve() failed after exec_mmap() is called. Regarding UMH_WAIT_EXEC, the call to umh_complete() allows call_usermodehelper_exec() to continue which then frees sub_info. To fix this race the code needs to make sure that the call to call_usermodehelper_freeinfo() is always done after the last store to sub_info->retval. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-29gcov: add ARM64 to GCOV_PROFILE_ALLRiku Voipio
Following up the arm testing of gcov, turns out gcov on ARM64 works fine as well. Only change needed is adding ARM64 to Kconfig depends. Tested with qemu and mach-virt Signed-off-by: Riku Voipio <riku.voipio@linaro.org> Acked-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-29rcu: Fix for rcuo online-time-creation reorganization bugPaul E. McKenney
Commit 35ce7f29a44a (rcu: Create rcuo kthreads only for onlined CPUs) contains checks for the case where CPUs are brought online out of order, re-wiring the rcuo leader-follower relationships as needed. Unfortunately, this rewiring was broken. This apparently went undetected due to the tendency of systems to bring CPUs online in order. This commit nevertheless fixes the rewiring. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2014-10-29rcu: Kick rcuo kthreads after their CPU goes offlinePaul E. McKenney
If a no-CBs CPU were to post an RCU callback with interrupts disabled after it entered the idle loop for the last time, there might be no deferred wakeup for the corresponding rcuo kthreads. This commit therefore adds a set of calls to do_nocb_deferred_wakeup() after the CPU has gone completely offline. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2014-10-29rcu: Remove redundant TREE_PREEMPT_RCU config optionPranith Kumar
PREEMPT_RCU and TREE_PREEMPT_RCU serve the same function after TINY_PREEMPT_RCU has been removed. This patch removes TREE_PREEMPT_RCU and uses PREEMPT_RCU config option in its place. Signed-off-by: Pranith Kumar <bobby.prani@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2014-10-29rcu: Unify boost and kthread prioritiesClark Williams
Rename CONFIG_RCU_BOOST_PRIO to CONFIG_RCU_KTHREAD_PRIO and use this value for both the per-CPU kthreads (rcuc/N) and the rcu boosting threads (rcub/n). Also, create the module_parameter rcutree.kthread_prio to be used on the kernel command line at boot to set a new value (rcutree.kthread_prio=N). Signed-off-by: Clark Williams <clark.williams@gmail.com> [ paulmck: Ported to rcu/dev, applied Paul Bolle and Peter Zijlstra feedback. ] Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2014-10-29signal: Document the RCU protection of ->sighandOleg Nesterov
__cleanup_sighand() frees sighand without RCU grace period. This is correct but this looks "obviously buggy" and constantly confuses the readers, add the comments to explain how this works. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Reviewed-by: Rik van Riel <riel@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Pranith Kumar <bobby.prani@gmail.com>