summaryrefslogtreecommitdiffstats
path: root/include/linux
AgeCommit message (Collapse)Author
2014-06-04Merge branch 'for-3.15-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu Pull percpu fix from Tejun Heo: "It is very late but this is an important percpu-refcount fix from Sebastian Ott. The problem is that percpu_ref_*() used __this_cpu_*() instead of this_cpu_*(). The difference between the two is that the latter is atomic on the local cpu while the former is not. this_cpu_inc() is guaranteed to increment the percpu counter on the cpu that the operation is executed on without any synchronization; however, __this_cpu_inc() doesn't and if the local cpu invokes the function from different contexts (e.g. process and irq) of the same CPU, it's not guaranteed to actually increment as it may be implemented as rmw. This bug existed from the get-go but it hasn't been noticed earlier probably because on x86 __this_cpu_inc() is equivalent to this_cpu_inc() as both get translated into single instruction; however, s390 uses the generic rmw implementation and gets affected by the bug. Kudos to Sebastian and Heiko for diagnosing it. The change is very low risk and fixes a critical issue on the affected architectures, so I think it's a good candidate for inclusion although it's very late in the devel cycle. On the other hand, this has been broken since v3.11, so backporting it through -stable post -rc1 won't be the end of the world. I'll ping Christoph whether __this_cpu_*() ops can be better annotated so that it can trigger lockdep warning when used from multiple contexts" * 'for-3.15-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: percpu-refcount: fix usage of this_cpu_ops
2014-06-04percpu-refcount: fix usage of this_cpu_opsSebastian Ott
The percpu-refcount infrastructure uses the underscore variants of this_cpu_ops in order to modify percpu reference counters. (e.g. __this_cpu_inc()). However the underscore variants do not atomically update the percpu variable, instead they may be implemented using read-modify-write semantics (more than one instruction). Therefore it is only safe to use the underscore variant if the context is always the same (process, softirq, or hardirq). Otherwise it is possible to lose updates. This problem is something that Sebastian has seen within the aio subsystem which uses percpu refcounters both in process and softirq context leading to reference counts that never dropped to zeroes; even though the number of "get" and "put" calls matched. Fix this by using the non-underscore this_cpu_ops variant which provides correct per cpu atomic semantics and fixes the corrupted reference counts. Cc: Kent Overstreet <kmo@daterainc.com> Cc: <stable@vger.kernel.org> # v3.11+ Reported-by: Sebastian Ott <sebott@linux.vnet.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Tejun Heo <tj@kernel.org> References: http://lkml.kernel.org/g/alpine.LFD.2.11.1406041540520.21183@denkbrett
2014-06-03kernfs: move the last knowledge of sysfs out from kernfsJianyu Zhan
There is still one residue of sysfs remaining: the sb_magic SYSFS_MAGIC. However this should be kernfs user specific, so this patch moves it out. Kerrnfs user should specify their magic number while mouting. Signed-off-by: Jianyu Zhan <nasa4836@gmail.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Unbreak zebra and other netlink apps, from Eric W Biederman. 2) Some new qmi_wwan device IDs, from Aleksander Morgado. 3) Fix info leak in DCB netlink handler of qlcnic driver, from Dan Carpenter. 4) inet_getid() and ipv6_select_ident() do not generate monotonically increasing ID numbers, fix from Eric Dumazet. 5) Fix memory leak in __sk_prepare_filter(), from Leon Yu. 6) Netlink leftover bytes warning message is user triggerable, rate limit it. From Michal Schmidt. 7) Fix non-linear SKB panic in ipvs, from Peter Christensen. 8) Congestion window undo needs to be performed even if only never retransmitted data is SACK'd, fix from Yuching Cheng. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (24 commits) net: filter: fix possible memory leak in __sk_prepare_filter() net: ec_bhf: Add runtime dependencies tcp: fix cwnd undo on DSACK in F-RTO netlink: Only check file credentials for implicit destinations ipheth: Add support for iPad 2 and iPad 3 team: fix mtu setting net: fix inet_getid() and ipv6_select_ident() bugs net: qmi_wwan: interface #11 in Sierra Wireless MC73xx is not QMI net: qmi_wwan: add additional Sierra Wireless QMI devices bridge: Prevent insertion of FDB entry with disallowed vlan netlink: rate-limit leftover bytes warning and print process name bridge: notify user space after fdb update net: qmi_wwan: add Netgear AirCard 341U net: fix wrong mac_len calculation for vlans batman-adv: fix NULL pointer dereferences net/mlx4_core: Reset RoCE VF gids when guest driver goes down emac: aggregation of v1-2 PLB errors for IER register emac: add missing support of 10mbit in emac/rgmii can: only rename enabled led triggers when changing the netdev name ipvs: Fix panic due to non-linear skb ...
2014-06-02netlink: Only check file credentials for implicit destinationsEric W. Biederman
It was possible to get a setuid root or setcap executable to write to it's stdout or stderr (which has been set made a netlink socket) and inadvertently reconfigure the networking stack. To prevent this we check that both the creator of the socket and the currentl applications has permission to reconfigure the network stack. Unfortunately this breaks Zebra which always uses sendto/sendmsg and creates it's socket without any privileges. To keep Zebra working don't bother checking if the creator of the socket has privilege when a destination address is specified. Instead rely exclusively on the privileges of the sender of the socket. Note from Andy: This is exactly Eric's code except for some comment clarifications and formatting fixes. Neither I nor, I think, anyone else is thrilled with this approach, but I'm hesitant to wait on a better fix since 3.15 is almost here. Note to stable maintainers: This is a mess. An earlier series of patches in 3.15 fix a rather serious security issue (CVE-2014-0181), but they did so in a way that breaks Zebra. The offending series includes: commit aa4cf9452f469f16cea8c96283b641b4576d4a7b Author: Eric W. Biederman <ebiederm@xmission.com> Date: Wed Apr 23 14:28:03 2014 -0700 net: Add variants of capable for use on netlink messages If a given kernel version is missing that series of fixes, it's probably worth backporting it and this patch. if that series is present, then this fix is critical if you care about Zebra. Cc: stable@vger.kernel.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-02team: fix mtu settingJiri Pirko
Now it is not possible to set mtu to team device which has a port enslaved to it. The reason is that when team_change_mtu() calls dev_set_mtu() for port device, notificator for NETDEV_PRECHANGEMTU event is called and team_device_event() returns NOTIFY_BAD forbidding the change. So fix this by returning NOTIFY_DONE here in case team is changing mtu in team_change_mtu(). Introduced-by: 3d249d4c "net: introduce ethernet teaming device" Signed-off-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Flavio Leitner <fbl@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-29Merge branch 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-armLinus Torvalds
Pull ARM fixes from Russell King: "The usual random collection of relatively small ARM fixes" * 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm: ARM: 8063/1: bL_switcher: fix individual online status reporting of removed CPUs ARM: 8064/1: fix v7-M signal return ARM: 8057/1: amba: Add Qualcomm vendor ID. ARM: 8052/1: unwind: Fix handling of "Pop r4-r[4+nnn],r14" opcode ARM: 8051/1: put_user: fix possible data corruption in put_user ARM: 8048/1: fix v7-M setup stack location
2014-05-27Merge branch 'fixes' of git://git.infradead.org/users/vkoul/slave-dmaLinus Torvalds
Pull slave-dmaengine fixes from Vinod Koul: "We have three small fixes. First one from Andy reverts the devm_request irq as we need to ensure the tasklet is killed after irq is freed, so we need to do free irq in our code. Other two from Arnd are fixing the compilation issue in omap and sa11x0 drivers with ARM randconfigs" * 'fixes' of git://git.infradead.org/users/vkoul/slave-dma: dmaengine: sa11x0: remove broken #ifdef dmaengine: omap: hide filter_fn for built-in drivers dmaengine: dw: went back to plain {request,free}_irq() calls
2014-05-25ARM: 8057/1: amba: Add Qualcomm vendor ID.srinik
This patch adds Qualcomm amba vendor Id to the list. This ID is used in mmci driver. The ID selected in same lines like 0x41 is "A" for ARM, 0x51 is "Q" for Qualcomm. As there are no physical register on Qcom SOC for amba vendor id, this is a fake ID assigned based on "Q" prefix from Qualcomm. Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> Acked-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2014-05-23Merge tag 'dmaengine-fixes-3.15-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/djbw/dmaengine Pull dmaengine fixes from Dan Williams: "Two fixes for -stable: - async_mult() sometimes maps less buffers than initially requested. We end up freeing dmaengine_unmap_data on an invalid pool. - mv_xor: register write ordering fix" * tag 'dmaengine-fixes-3.15-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/dmaengine: dmaengine: fix dmaengine_unmap failure dma: mv_xor: Flush descriptors before activating a channel
2014-05-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: "It looks like a sizeble collection but this is nearly 3 weeks of bug fixing while you were away. 1) Fix crashes over IPSEC tunnels with NAT, the latter can reroute the packet through a non-IPSEC protected path and the code has to be able to handle SKBs attached to routes lacking an attached xfrm state. From Steffen Klassert. 2) Fix OOPSs in ipv4 and ipv6 ipsec layers for unsupported sub-protocols, also from Steffen Klassert. 3) Set local_df on fragmented netfilter skbs otherwise we won't be able to forward successfully, from Florian Westphal. 4) cdc_mbim ipv6 neighbour code does __vlan_find_dev_deep without holding RCU lock, from Bjorn Mork. 5) local_df test in ip_may_fragment is inverted, from Florian Westphal. 6) jme driver doesn't check for DMA mapping failures, from Neil Horman. 7) qlogic driver doesn't calculate number of TX queues properly, from Shahed Shaikh. 8) fib_info_cnt can drift irreversibly positive if we fail to allocate the fi->fib_metrics array, from Sergey Popovich. 9) Fix use after free in ip6_route_me_harder(), also from Sergey Popovich. 10) When SYSCTL is disabled, we don't handle local_port_range and ping_group_range defaults properly at all, from Cong Wang. 11) Unaccelerated VLAN tagged frames improperly handled by cdc_mbim driver, fix from Bjorn Mork. 12) cassini driver needs nested lock annotations for TX locking, from Emil Goode. 13) On init error ipv6 VTI driver can unregister pernet ops twice, oops. Fix from Mahtias Krause. 14) If macvlan device is down, don't propagate IFF_ALLMULTI changes, from Peter Christensen. 15) Missing NULL pointer check while parsing netlink config options in ip6_tnl_validate(). From Susant Sahani. 16) Fix handling of neighbour entries during ipv6 router reachability probing, from Duan Jiong. 17) x86 and s390 JIT address randomization has some address calculation bugs leading to crashes, from Alexei Starovoitov and Heiko Carstens. 18) Clear up those uglies with nop patching and net_get_random_once(), from Hannes Frederic Sowa. 19) Option length miscalculated in ip6_append_data(), fix also from Hannes Frederic Sowa. 20) A while ago we fixed a race during device unregistry when a namespace went down, turns out there is a second place that needs similar protection. From Cong Wang. 21) In the new Altera TSE driver multicast filtering isn't working, disable it and just use promisc mode until the cause is found. From Vince Bridgers. 22) When we disable router enabling in ipv6 we have to flush the cached routes explicitly, from Duan Jiong. 23) NBMA tunnels should not cache routes on the tunnel object because the key is variable, from Timo Teräs. 24) With stacked devices GRO information in skb->cb[] can be not setup properly, make sure it is in all code paths. From Eric Dumazet. 25) Really fix stacked vlan locking, multiple levels of nesting with intervening non-vlan devices are possible. From Vlad Yasevich. 26) Fallback ipip tunnel device's mtu is not setup properly, from Steffen Klassert. 27) The packet scheduler's tcindex filter can crash because we structure copy objects with list_head's inside, oops. From Cong Wang. 28) Fix CHECKSUM_COMPLETE handling for ipv6 GRE tunnels, from Eric Dumazet. 29) In some configurations 'itag' in __mkroute_input() can end up being used uninitialized because of how fib_validate_source() works. Fix it by explitly initializing itag to zero like all the other fib_validate_source() callers do, from Li RongQing" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (116 commits) batman: fix a bogus warning from batadv_is_on_batman_iface() ipv4: initialise the itag variable in __mkroute_input bonding: Send ALB learning packets using the right source bonding: Don't assume 802.1Q when sending alb learning packets. net: doc: Update references to skb->rxhash stmmac: Remove unbalanced clk_disable call ipv6: gro: fix CHECKSUM_COMPLETE support net_sched: fix an oops in tcindex filter can: peak_pci: prevent use after free at netdev removal ip_tunnel: Initialize the fallback device properly vlan: Fix build error wth vlan_get_encap_level() can: c_can: remove obsolete STRICT_FRAME_ORDERING Kconfig option MAINTAINERS: Pravin Shelar is Open vSwitch maintainer. bnx2x: Convert return 0 to return rc bonding: Fix alb mode to only use first level vlans. bonding: Fix stacked device detection in arp monitoring macvlan: Fix lockdep warnings with stacked macvlan devices vlan: Fix lockdep warning with stacked vlan devices. net: Allow for more then a single subclass for netif_addr_lock net: Find the nesting level of a given device by type. ...
2014-05-23Merge branch 'sched-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Ingo Molnar: "The biggest commit is an irqtime accounting loop latency fix, the rest are misc fixes all over the place: deadline scheduling, docs, numa, balancer and a bad to-idle latency fix" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/numa: Initialize newidle balance stats in sd_numa_init() sched: Fix updating rq->max_idle_balance_cost and rq->next_balance in idle_balance() sched: Skip double execution of pick_next_task_fair() sched: Use CPUPRI_NR_PRIORITIES instead of MAX_RT_PRIO in cpupri check sched/deadline: Fix memory leak sched/deadline: Fix sched_yield() behavior sched: Sanitize irq accounting madness sched/docbook: Fix 'make htmldocs' warnings caused by missing description
2014-05-23Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "The biggest changes are fixes for races that kept triggering Trinity crashes, plus liblockdep build fixes and smaller misc fixes. The liblockdep bits in perf/urgent are a pull mistake - they should have been in locking/urgent - but by the time I noticed other commits were added and testing was done :-/ Sorry about that" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf: Fix a race between ring_buffer_detach() and ring_buffer_attach() perf: Prevent false warning in perf_swevent_add perf: Limit perf_event_attr::sample_period to 63 bits tools/liblockdep: Remove all build files when doing make clean tools/liblockdep: Build liblockdep from tools/Makefile perf/x86/intel: Fix Silvermont's event constraints perf: Fix perf_event_init_context() perf: Fix race in removing an event
2014-05-23wait: swap EXIT_ZOMBIE(Z) and EXIT_DEAD(X) chars in TASK_STATE_TO_CHAR_STRMasatake YAMATO
In commit ad86622b478e ("wait: swap EXIT_ZOMBIE and EXIT_DEAD to hide EXIT_TRACE from user-space") the order of task state definitions were changed: EXIT_DEAD and EXIT_ZOMBIE were swapped. Though the charterers for the states in TASK_STATE_TO_CHAR_STR string were not updated. This patch synchronizes the string to the order of definitions. Signed-off-by: Masatake YAMATO <yamato@redhat.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-05-21dmaengine: fix dmaengine_unmap failureXuelin Shi
The count which is used to get_unmap_data maybe not the same as the count computed in dmaengine_unmap which causes to free data in a wrong pool. This patch fixes this issue by keeping the map count with unmap_data structure and use this count to get the pool. Cc: <stable@vger.kernel.org> Signed-off-by: Xuelin Shi <xuelin.shi@freescale.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2014-05-21Merge tag 'driver-core-3.15-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core fixes from Greg KH: "Here are two driver core (well, sysfs) fixes for 3.15-rc6 that resolve some reported issues and a regression from 3.13" * tag 'driver-core-3.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: sysfs: make sure read buffer is zeroed kernfs, sysfs, cgroup: restrict extra perm check on open to sysfs
2014-05-21Merge branch 'for-3.15-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull more cgroup fixes from Tejun Heo: "Three more patches to fix cgroup_freezer breakage due to the recent cgroup internal locking changes - an operation cgroup_freezer was using now requires sleepable context and cgroup_freezer was invoking that while holding a spin lock. cgroup_freezer was using an overly elaborate hierarchical locking scheme. While it's possible to convert the hierarchical spinlocks directly to mutexes, this patch simplifies the overall locking so that it uses a global mutex. This has the added benefit of avoiding iterating potentially huge number of tasks under a spinlock. While the patch is on the larger side in the devel cycle, the changes made are mostly straight-forward and the locking logic is a lot simpler afterwards" * 'for-3.15-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup: fix rcu_read_lock() leak in update_if_frozen() cgroup_freezer: replace freezer->lock with freezer_mutex cgroup: introduce task_css_is_root()
2014-05-21Merge tag 'dt-for-linus' of git://git.secretlab.ca/git/linuxLinus Torvalds
Pull device tree fixes from Grant Likely: "Drivercore bugfixes for v3.15 This branch contains bug fixes important to get into v3.15. There is a fix for modifying properties seen during early boot, a fix for an incorrect prototype when CONFIG_OF=n, and a couple of corrections to device tree memory nodes on a few platforms" * tag 'dt-for-linus' of git://git.secretlab.ca/git/linux: mips: dts: Fix missing device_type="memory" property in memory nodes arm: dts: Fix missing device_type="memory" for ste-ccu8540 of: fix CONFIG_OF=n prototype of of_node_full_name() of: make of_update_property() usable earlier in the boot process
2014-05-21dmaengine: omap: hide filter_fn for built-in driversArnd Bergmann
It is not possible to reference the omap_dma_filter_fn filter function from a built-in driver if the dmaengine driver itself is a loadable module, which is a valid configuration otherwise. This provides only the dummy alternative if the function is referenced by a built-in driver to allow a successful build. The filter function is only required by ATAGS based platforms, which will continue to be broken after this change for the bogus configuration. When booting from DT, with the dma channels correctly listed there, it will work fine. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Tony Lindgren <tony@atomide.com> Cc: Russell King <rmk+kernel@arm.linux.org.uk> Cc: Vinod Koul <vinod.koul@intel.com> Cc: dmaengine@vger.kernel.org Signed-off-by: Vinod Koul <vinod.koul@intel.com>
2014-05-20vlan: Fix build error wth vlan_get_encap_level()Vlad Yasevich
The new function vlan_get_encap_level() uses vlan_dev_priv() which is only conditionally avaialble when VLAN support is enabled. Make vlan_get_encap_level() conditionally available as well. Fixes: 44a4085538c8 ("bonding: Fix stacked device detection in arp monitoring") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> CC: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-20Merge branch 'irq-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fixes from Thomas Gleixner: "Two small updates from the irq departement: - Provide missing inline stub for a SMP only function - Add sub-maintainer for the drivers/irqchip/ part of the irq subsystem. YAY!" * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: MAINTAINERS: Add co-maintainer for drivers/irqchip genirq: Provide irq_force_affinity fallback for non-SMP
2014-05-19perf: Fix a race between ring_buffer_detach() and ring_buffer_attach()Peter Zijlstra
Alexander noticed that we use RCU iteration on rb->event_list but do not use list_{add,del}_rcu() to add,remove entries to that list, nor do we observe proper grace periods when re-using the entries. Merge ring_buffer_detach() into ring_buffer_attach() such that attaching to the NULL buffer is detaching. Furthermore, ensure that between any 'detach' and 'attach' of the same event we observe the required grace period, but only when strictly required. In effect this means that only ioctl(.request = PERF_EVENT_IOC_SET_OUTPUT) will wait for a grace period, while the normal initial attach and final detach will not be delayed. This patch should, I think, do the right thing under all circumstances, the 'normal' cases all should never see the extra grace period, but the two cases: 1) PERF_EVENT_IOC_SET_OUTPUT on an event which already has a ring_buffer set, will now observe the required grace period between removing itself from the old and attaching itself to the new buffer. This case is 'simple' in that both buffers are present in perf_event_set_output() one could think an unconditional synchronize_rcu() would be sufficient; however... 2) an event that has a buffer attached, the buffer is destroyed (munmap) and then the event is attached to a new/different buffer using PERF_EVENT_IOC_SET_OUTPUT. This case is more complex because the buffer destruction does: ring_buffer_attach(.rb = NULL) followed by the ioctl() doing: ring_buffer_attach(.rb = foo); and we still need to observe the grace period between these two calls due to us reusing the event->rb_entry list_head. In order to make 2 happen we use Paul's latest cond_synchronize_rcu() call. Cc: Paul Mackerras <paulus@samba.org> Cc: Stephane Eranian <eranian@google.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Reported-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20140507123526.GD13658@twins.programming.kicks-ass.net Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-05-16bonding: Fix stacked device detection in arp monitoringVlad Yasevich
Prior to commit fbd929f2dce460456807a51e18d623db3db9f077 bonding: support QinQ for bond arp interval the arp monitoring code allowed for proper detection of devices stacked on top of vlans. Since the above commit, the code can still detect a device stacked on top of single vlan, but not a device stacked on top of Q-in-Q configuration. The search will only set the inner vlan tag if the route device is the vlan device. However, this is not always the case, as it is possible to extend the stacked configuration. With this patch it is possible to provision devices on top Q-in-Q vlan configuration that should be used as a source of ARP monitoring information. For example: ip link add link bond0 vlan10 type vlan proto 802.1q id 10 ip link add link vlan10 vlan100 type vlan proto 802.1q id 100 ip link add link vlan100 type macvlan Note: This patch limites the number of stacked VLANs to 2, just like before. The original, however had another issue in that if we had more then 2 levels of VLANs, we would end up generating incorrectly tagged traffic. This is no longer possible. Fixes: fbd929f2dce460456807a51e18d623db3db9f077 (bonding: support QinQ for bond arp interval) CC: Jay Vosburgh <j.vosburgh@gmail.com> CC: Veaceslav Falico <vfalico@redhat.com> CC: Andy Gospodarek <andy@greyhouse.net> CC: Ding Tianhong <dingtianhong@huawei.com> CC: Patric McHardy <kaber@trash.net> Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-16macvlan: Fix lockdep warnings with stacked macvlan devicesVlad Yasevich
Macvlan devices try to avoid stacking, but that's not always successfull or even desired. As an example, the following configuration is perefectly legal and valid: eth0 <--- macvlan0 <---- vlan0.10 <--- macvlan1 However, this configuration produces the following lockdep trace: [ 115.620418] ====================================================== [ 115.620477] [ INFO: possible circular locking dependency detected ] [ 115.620516] 3.15.0-rc1+ #24 Not tainted [ 115.620540] ------------------------------------------------------- [ 115.620577] ip/1704 is trying to acquire lock: [ 115.620604] (&vlan_netdev_addr_lock_key/1){+.....}, at: [<ffffffff815df49c>] dev_uc_sync+0x3c/0x80 [ 115.620686] but task is already holding lock: [ 115.620723] (&macvlan_netdev_addr_lock_key){+.....}, at: [<ffffffff815da5be>] dev_set_rx_mode+0x1e/0x40 [ 115.620795] which lock already depends on the new lock. [ 115.620853] the existing dependency chain (in reverse order) is: [ 115.620894] -> #1 (&macvlan_netdev_addr_lock_key){+.....}: [ 115.620935] [<ffffffff810d57f2>] lock_acquire+0xa2/0x130 [ 115.620974] [<ffffffff816f62e7>] _raw_spin_lock_nested+0x37/0x50 [ 115.621019] [<ffffffffa07296c3>] vlan_dev_set_rx_mode+0x53/0x110 [8021q] [ 115.621066] [<ffffffff815da557>] __dev_set_rx_mode+0x57/0xa0 [ 115.621105] [<ffffffff815da5c6>] dev_set_rx_mode+0x26/0x40 [ 115.621143] [<ffffffff815da6be>] __dev_open+0xde/0x140 [ 115.621174] [<ffffffff815da9ad>] __dev_change_flags+0x9d/0x170 [ 115.621174] [<ffffffff815daaa9>] dev_change_flags+0x29/0x60 [ 115.621174] [<ffffffff815e7f11>] do_setlink+0x321/0x9a0 [ 115.621174] [<ffffffff815ea59f>] rtnl_newlink+0x51f/0x730 [ 115.621174] [<ffffffff815e6e75>] rtnetlink_rcv_msg+0x95/0x250 [ 115.621174] [<ffffffff81608b19>] netlink_rcv_skb+0xa9/0xc0 [ 115.621174] [<ffffffff815e6dca>] rtnetlink_rcv+0x2a/0x40 [ 115.621174] [<ffffffff81608150>] netlink_unicast+0xf0/0x1c0 [ 115.621174] [<ffffffff8160851f>] netlink_sendmsg+0x2ff/0x740 [ 115.621174] [<ffffffff815bc9db>] sock_sendmsg+0x8b/0xc0 [ 115.621174] [<ffffffff815bd4b9>] ___sys_sendmsg+0x369/0x380 [ 115.621174] [<ffffffff815bdbb2>] __sys_sendmsg+0x42/0x80 [ 115.621174] [<ffffffff815bdc02>] SyS_sendmsg+0x12/0x20 [ 115.621174] [<ffffffff816ffd69>] system_call_fastpath+0x16/0x1b [ 115.621174] -> #0 (&vlan_netdev_addr_lock_key/1){+.....}: [ 115.621174] [<ffffffff810d4d43>] __lock_acquire+0x1773/0x1a60 [ 115.621174] [<ffffffff810d57f2>] lock_acquire+0xa2/0x130 [ 115.621174] [<ffffffff816f62e7>] _raw_spin_lock_nested+0x37/0x50 [ 115.621174] [<ffffffff815df49c>] dev_uc_sync+0x3c/0x80 [ 115.621174] [<ffffffffa0696d2a>] macvlan_set_mac_lists+0xca/0x110 [macvlan] [ 115.621174] [<ffffffff815da557>] __dev_set_rx_mode+0x57/0xa0 [ 115.621174] [<ffffffff815da5c6>] dev_set_rx_mode+0x26/0x40 [ 115.621174] [<ffffffff815da6be>] __dev_open+0xde/0x140 [ 115.621174] [<ffffffff815da9ad>] __dev_change_flags+0x9d/0x170 [ 115.621174] [<ffffffff815daaa9>] dev_change_flags+0x29/0x60 [ 115.621174] [<ffffffff815e7f11>] do_setlink+0x321/0x9a0 [ 115.621174] [<ffffffff815ea59f>] rtnl_newlink+0x51f/0x730 [ 115.621174] [<ffffffff815e6e75>] rtnetlink_rcv_msg+0x95/0x250 [ 115.621174] [<ffffffff81608b19>] netlink_rcv_skb+0xa9/0xc0 [ 115.621174] [<ffffffff815e6dca>] rtnetlink_rcv+0x2a/0x40 [ 115.621174] [<ffffffff81608150>] netlink_unicast+0xf0/0x1c0 [ 115.621174] [<ffffffff8160851f>] netlink_sendmsg+0x2ff/0x740 [ 115.621174] [<ffffffff815bc9db>] sock_sendmsg+0x8b/0xc0 [ 115.621174] [<ffffffff815bd4b9>] ___sys_sendmsg+0x369/0x380 [ 115.621174] [<ffffffff815bdbb2>] __sys_sendmsg+0x42/0x80 [ 115.621174] [<ffffffff815bdc02>] SyS_sendmsg+0x12/0x20 [ 115.621174] [<ffffffff816ffd69>] system_call_fastpath+0x16/0x1b [ 115.621174] other info that might help us debug this: [ 115.621174] Possible unsafe locking scenario: [ 115.621174] CPU0 CPU1 [ 115.621174] ---- ---- [ 115.621174] lock(&macvlan_netdev_addr_lock_key); [ 115.621174] lock(&vlan_netdev_addr_lock_key/1); [ 115.621174] lock(&macvlan_netdev_addr_lock_key); [ 115.621174] lock(&vlan_netdev_addr_lock_key/1); [ 115.621174] *** DEADLOCK *** [ 115.621174] 2 locks held by ip/1704: [ 115.621174] #0: (rtnl_mutex){+.+.+.}, at: [<ffffffff815e6dbb>] rtnetlink_rcv+0x1b/0x40 [ 115.621174] #1: (&macvlan_netdev_addr_lock_key){+.....}, at: [<ffffffff815da5be>] dev_set_rx_mode+0x1e/0x40 [ 115.621174] stack backtrace: [ 115.621174] CPU: 3 PID: 1704 Comm: ip Not tainted 3.15.0-rc1+ #24 [ 115.621174] Hardware name: Hewlett-Packard HP xw8400 Workstation/0A08h, BIOS 786D5 v02.38 10/25/2010 [ 115.621174] ffffffff82339ae0 ffff880465f79568 ffffffff816ee20c ffffffff82339ae0 [ 115.621174] ffff880465f795a8 ffffffff816e9e1b ffff880465f79600 ffff880465b019c8 [ 115.621174] 0000000000000001 0000000000000002 ffff880465b019c8 ffff880465b01230 [ 115.621174] Call Trace: [ 115.621174] [<ffffffff816ee20c>] dump_stack+0x4d/0x66 [ 115.621174] [<ffffffff816e9e1b>] print_circular_bug+0x200/0x20e [ 115.621174] [<ffffffff810d4d43>] __lock_acquire+0x1773/0x1a60 [ 115.621174] [<ffffffff810d3172>] ? trace_hardirqs_on_caller+0xb2/0x1d0 [ 115.621174] [<ffffffff810d57f2>] lock_acquire+0xa2/0x130 [ 115.621174] [<ffffffff815df49c>] ? dev_uc_sync+0x3c/0x80 [ 115.621174] [<ffffffff816f62e7>] _raw_spin_lock_nested+0x37/0x50 [ 115.621174] [<ffffffff815df49c>] ? dev_uc_sync+0x3c/0x80 [ 115.621174] [<ffffffff815df49c>] dev_uc_sync+0x3c/0x80 [ 115.621174] [<ffffffffa0696d2a>] macvlan_set_mac_lists+0xca/0x110 [macvlan] [ 115.621174] [<ffffffff815da557>] __dev_set_rx_mode+0x57/0xa0 [ 115.621174] [<ffffffff815da5c6>] dev_set_rx_mode+0x26/0x40 [ 115.621174] [<ffffffff815da6be>] __dev_open+0xde/0x140 [ 115.621174] [<ffffffff815da9ad>] __dev_change_flags+0x9d/0x170 [ 115.621174] [<ffffffff815daaa9>] dev_change_flags+0x29/0x60 [ 115.621174] [<ffffffff811e1db1>] ? mem_cgroup_bad_page_check+0x21/0x30 [ 115.621174] [<ffffffff815e7f11>] do_setlink+0x321/0x9a0 [ 115.621174] [<ffffffff810d394c>] ? __lock_acquire+0x37c/0x1a60 [ 115.621174] [<ffffffff815ea59f>] rtnl_newlink+0x51f/0x730 [ 115.621174] [<ffffffff815ea169>] ? rtnl_newlink+0xe9/0x730 [ 115.621174] [<ffffffff815e6e75>] rtnetlink_rcv_msg+0x95/0x250 [ 115.621174] [<ffffffff810d329d>] ? trace_hardirqs_on+0xd/0x10 [ 115.621174] [<ffffffff815e6dbb>] ? rtnetlink_rcv+0x1b/0x40 [ 115.621174] [<ffffffff815e6de0>] ? rtnetlink_rcv+0x40/0x40 [ 115.621174] [<ffffffff81608b19>] netlink_rcv_skb+0xa9/0xc0 [ 115.621174] [<ffffffff815e6dca>] rtnetlink_rcv+0x2a/0x40 [ 115.621174] [<ffffffff81608150>] netlink_unicast+0xf0/0x1c0 [ 115.621174] [<ffffffff8160851f>] netlink_sendmsg+0x2ff/0x740 [ 115.621174] [<ffffffff815bc9db>] sock_sendmsg+0x8b/0xc0 [ 115.621174] [<ffffffff8119d4af>] ? might_fault+0x5f/0xb0 [ 115.621174] [<ffffffff8119d4f8>] ? might_fault+0xa8/0xb0 [ 115.621174] [<ffffffff8119d4af>] ? might_fault+0x5f/0xb0 [ 115.621174] [<ffffffff815cb51e>] ? verify_iovec+0x5e/0xe0 [ 115.621174] [<ffffffff815bd4b9>] ___sys_sendmsg+0x369/0x380 [ 115.621174] [<ffffffff816faa0d>] ? __do_page_fault+0x11d/0x570 [ 115.621174] [<ffffffff810cfe9f>] ? up_read+0x1f/0x40 [ 115.621174] [<ffffffff816fab04>] ? __do_page_fault+0x214/0x570 [ 115.621174] [<ffffffff8120a10b>] ? mntput_no_expire+0x6b/0x1c0 [ 115.621174] [<ffffffff8120a0b7>] ? mntput_no_expire+0x17/0x1c0 [ 115.621174] [<ffffffff8120a284>] ? mntput+0x24/0x40 [ 115.621174] [<ffffffff815bdbb2>] __sys_sendmsg+0x42/0x80 [ 115.621174] [<ffffffff815bdc02>] SyS_sendmsg+0x12/0x20 [ 115.621174] [<ffffffff816ffd69>] system_call_fastpath+0x16/0x1b Fix this by correctly providing macvlan lockdep class. Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-16vlan: Fix lockdep warning with stacked vlan devices.Vlad Yasevich
This reverts commit dc8eaaa006350d24030502a4521542e74b5cb39f. vlan: Fix lockdep warning when vlan dev handle notification Instead we use the new new API to find the lock subclass of our vlan device. This way we can support configurations where vlans are interspersed with other devices: bond -> vlan -> macvlan -> vlan Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-16net: Allow for more then a single subclass for netif_addr_lockVlad Yasevich
Currently netif_addr_lock_nested assumes that there can be only a single nesting level between 2 devices. However, if we have multiple devices of the same type stacked, this fails. For example: eth0 <-- vlan0.10 <-- vlan0.10.20 A more complicated configuration may stack more then one type of device in different order. Ex: eth0 <-- vlan0.10 <-- macvlan0 <-- vlan1.10.20 <-- macvlan1 This patch adds an ndo_* function that allows each stackable device to report its nesting level. If the device doesn't provide this function default subclass of 1 is used. Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-16net: Find the nesting level of a given device by type.Vlad Yasevich
Multiple devices in the kernel can be stacked/nested and they need to know their nesting level for the purposes of lockdep. This patch provides a generic function that determines a nesting level of a particular device by its type (ex: vlan, macvlan, etc). We only care about nesting of the same type of devices. For example: eth0 <- vlan0.10 <- macvlan0 <- vlan1.20 The nesting level of vlan1.20 would be 1, since there is another vlan in the stack under it. Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-16net/mlx4_core: Add UPDATE_QP SRIOV wrapper supportMatan Barak
This patch adds UPDATE_QP SRIOV wrapper support. The mechanism is a general one, but currently only source MAC index changes are allowed for VFs. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-15rtnetlink: wait for unregistering devices in rtnl_link_unregister()Cong Wang
From: Cong Wang <cwang@twopensource.com> commit 50624c934db18ab90 (net: Delay default_device_exit_batch until no devices are unregistering) introduced rtnl_lock_unregistering() for default_device_exit_batch(). Same race could happen we when rmmod a driver which calls rtnl_link_unregister() as we call dev->destructor without rtnl lock. For long term, I think we should clean up the mess of netdev_run_todo() and net namespce exit code. Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Cong Wang <cwang@twopensource.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-15of: fix CONFIG_OF=n prototype of of_node_full_name()Stephen Rothwell
Make the CONFIG_OF=n prototpe of of_node_full_name() mateh the CONFIG_OF=y version. Fixes compile warnings like this: sound/soc/soc-core.c: In function 'soc_check_aux_dev': sound/soc/soc-core.c:1667:3: warning: passing argument 1 of 'of_node_full_name' discards 'const' qualifier from pointer target type [enabled by default] codecname = of_node_full_name(aux_dev->codec_of_node); when CONFIG_OF is not defined. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Grant Likely <grant.likely@linaro.org>
2014-05-14net: avoid dependency of net_get_random_once on nop patchingHannes Frederic Sowa
net_get_random_once depends on the static keys infrastructure to patch up the branch to the slow path during boot. This was realized by abusing the static keys api and defining a new initializer to not enable the call site while still indicating that the branch point should get patched up. This was needed to have the fast path considered likely by gcc. The static key initialization during boot up normally walks through all the registered keys and either patches in ideal nops or enables the jump site but omitted that step on x86 if ideal nops where already placed at static_key branch points. Thus net_get_random_once branches not always became active. This patch switches net_get_random_once to the ordinary static_key api and thus places the kernel fast path in the - by gcc considered - unlikely path. Microbenchmarks on Intel and AMD x86-64 showed that the unlikely path actually beats the likely path in terms of cycle cost and that different nop patterns did not make much difference, thus this switch should not be noticeable. Fixes: a48e42920ff38b ("net: introduce new macro net_get_random_once") Reported-by: Tuomas Räsänen <tuomasjjrasanen@tjjr.fi> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-13cgroup: introduce task_css_is_root()Tejun Heo
Determining the css of a task usually requires RCU read lock as that's the only thing which keeps the returned css accessible till its reference is acquired; however, testing whether a task belongs to the root can be performed without dereferencing the returned css by comparing the returned pointer against the root one in init_css_set[] which never changes. Implement task_css_is_root() which can be invoked in any context. This will be used by the scheduled cgroup_freezer change. v2: cgroup no longer supports modular controllers. No need to export init_css_set. Pointed out by Li. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
2014-05-13kernfs, sysfs, cgroup: restrict extra perm check on open to sysfsTejun Heo
The kernfs open method - kernfs_fop_open() - inherited extra permission checks from sysfs. While the vfs layer allows ignoring the read/write permissions checks if the issuer has CAP_DAC_OVERRIDE, sysfs explicitly denied open regardless of the cap if the file doesn't have any of the UGO perms of the requested access or doesn't implement the requested operation. It can be debated whether this was a good idea or not but the behavior is too subtle and dangerous to change at this point. After cgroup got converted to kernfs, this extra perm check also got applied to cgroup breaking libcgroup which opens write-only files with O_RDWR as root. This patch gates the extra open permission check with a new flag KERNFS_ROOT_EXTRA_OPEN_PERM_CHECK and enables it for sysfs. For sysfs, nothing changes. For cgroup, root now can perform any operation regardless of the permissions as it was before kernfs conversion. Note that kernfs still fails unimplemented operations with -EINVAL. While at it, add comments explaining KERNFS_ROOT flags. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Andrey Wagin <avagin@gmail.com> Tested-by: Andrey Wagin <avagin@gmail.com> Cc: Li Zefan <lizefan@huawei.com> References: http://lkml.kernel.org/g/CANaxB-xUm3rJ-Cbp72q-rQJO5mZe1qK6qXsQM=vh0U8upJ44+A@mail.gmail.com Fixes: 2bd59d48ebfb ("cgroup: convert to kernfs") Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-05-09Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Peter Anvin: "A somewhat unpleasantly large collection of small fixes. The big ones are the __visible tree sweep and a fix for 'earlyprintk=efi,keep'. It was using __init functions with predictably suboptimal results. Another key fix is a build fix which would produce output that simply would not decompress correctly in some configuration, due to the existing Makefiles picking up an unfortunate local label and mistaking it for the global symbol _end. Additional fixes include the handling of 64-bit numbers when setting the vdso data page (a latent bug which became manifest when i386 started exporting a vdso with time functions), a fix to the new MSR manipulation accessors which would cause features to not get properly unblocked, a build fix for 32-bit userland, and a few new platform quirks" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, vdso, time: Cast tv_nsec to u64 for proper shifting in update_vsyscall() x86: Fix typo in MSR_IA32_MISC_ENABLE_LIMIT_CPUID macro x86: Fix typo preventing msr_set/clear_bit from having an effect x86/intel: Add quirk to disable HPET for the Baytrail platform x86/hpet: Make boot_hpet_disable extern x86-64, build: Fix stack protector Makefile breakage with 32-bit userland x86/reboot: Add reboot quirk for Certec BPC600 asmlinkage: Add explicit __visible to drivers/*, lib/*, kernel/* asmlinkage, x86: Add explicit __visible to arch/x86/* asmlinkage: Revert "lto: Make asmlinkage __visible" x86, build: Don't get confused by local symbols x86/efi: earlyprintk=efi,keep fix
2014-05-08Merge tag 'mfd-mmc-fixes-3.15-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd Pull mmc/rtsx revert from Lee Jones. * tag 'mfd-mmc-fixes-3.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd: mmc: rtsx: Revert "mmc: rtsx: add support for pre_req and post_req"
2014-05-08mmc: rtsx: Revert "mmc: rtsx: add support for pre_req and post_req"Micky Ching
This reverts commit c42deffd5b53c9e583d83c7964854ede2f12410d. commit <mmc: rtsx: add support for pre_req and post_req> did use mutex_unlock() in tasklet, but mutex_unlock() can't be used in tasklet(atomic context). The driver needs to use mutex to avoid concurrency, so we can't use tasklet here, the patch need to be removed. The spinlock host->lock and pcr->lock may deadlock, one way to solve the deadlock is remove host->lock in sd_isr_done_transfer(), but if using workqueue the we can avoid using the spinlock and also avoid the problem. Signed-off-by: Micky Ching <micky_ching@realsil.com.cn> Acked-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Lee Jones <lee.jones@linaro.org>
2014-05-07net: mdio: of_mdiobus_register(): fall back to mdiobus_register() for !CONFIG_OFDaniel Mack
If CONFIG_OF is not set, make of_mdiobus_register() call mdiobus_register() instead of returning -ENOSYS. This way, we can just call of_mdiobus_register() from all DT-enabled drivers to handle the compat cases. Signed-off-by: Daniel Mack <zonque@gmail.com> Suggested-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Mugunthan V N <mugunthanvnm@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-07Revert "net: core: introduce netif_skb_dev_features"Florian Westphal
This reverts commit d206940319c41df4299db75ed56142177bb2e5f6, there are no more callers. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-07sched/deadline: Fix sched_yield() behaviorJuri Lelli
yield_task_dl() is broken: o it forces current to be throttled setting its runtime to zero; o it sets current's dl_se->dl_new to one, expecting that dl_task_timer() will queue it back with proper parameters at replenish time. Unfortunately, dl_task_timer() has this check at the very beginning: if (!dl_task(p) || dl_se->dl_new) goto unlock; So, it just bails out and the task is never replenished. It actually yielded forever. To fix this, introduce a new flag indicating that the task properly yielded the CPU before its current runtime expired. While this is a little overdoing at the moment, the flag would be useful in the future to discriminate between "good" jobs (of which remaining runtime could be reclaimed, i.e. recycled) and "bad" jobs (for which dl_throttled task has been set) that needed to be stopped. Reported-by: yjay.kim <yjay.kim@lge.com> Signed-off-by: Juri Lelli <juri.lelli@gmail.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20140429103953.e68eba1b2ac3309214e3dc5a@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-05-07genirq: Provide irq_force_affinity fallback for non-SMPArnd Bergmann
Patch 01f8fa4f01d "genirq: Allow forcing cpu affinity of interrupts" added an irq_force_affinity() function, and 30ccf03b4a6 "clocksource: Exynos_mct: Use irq_force_affinity() in cpu bringup" subsequently uses it. However, the driver can be used with CONFIG_SMP disabled, but the function declaration is only available for CONFIG_SMP, leading to this build error: drivers/clocksource/exynos_mct.c:431:3: error: implicit declaration of function 'irq_force_affinity' [-Werror=implicit-function-declaration] irq_force_affinity(mct_irqs[MCT_L0_IRQ + cpu], cpumask_of(cpu)); This patch introduces a dummy helper function for the non-SMP case that always returns success, to get rid of the build error. Since the patches causing the problem are marked for stable backports, this one should be as well. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Krzysztof Kozlowski <k.kozlowski@samsung.com> Acked-by: Kukjin Kim <kgene.kim@samsung.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/5619084.0zmrrIUZLV@wuerfel Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-05-06Merge branch 'akpm' (incoming from Andrew)Linus Torvalds
Merge misc fixes from Andrew Morton: "13 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: agp: info leak in agpioc_info_wrap() fs/affs/super.c: bugfix / double free fanotify: fix -EOVERFLOW with large files on 64-bit slub: use sysfs'es release mechanism for kmem_cache revert "mm: vmscan: do not swap anon pages just because free+file is low" autofs: fix lockref lookup mm: filemap: update find_get_pages_tag() to deal with shadow entries mm/compaction: make isolate_freepages start at pageblock boundary MAINTAINERS: zswap/zbud: change maintainer email address mm/page-writeback.c: fix divide by zero in pos_ratio_polynom hugetlb: ensure hugepage access is denied if hugepages are not supported slub: fix memcg_propagate_slab_attrs drivers/rtc/rtc-pcf8523.c: fix month definition
2014-05-06slub: use sysfs'es release mechanism for kmem_cacheChristoph Lameter
debugobjects warning during netfilter exit: ------------[ cut here ]------------ WARNING: CPU: 6 PID: 4178 at lib/debugobjects.c:260 debug_print_object+0x8d/0xb0() ODEBUG: free active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x20 Modules linked in: CPU: 6 PID: 4178 Comm: kworker/u16:2 Tainted: G W 3.11.0-next-20130906-sasha #3984 Workqueue: netns cleanup_net Call Trace: dump_stack+0x52/0x87 warn_slowpath_common+0x8c/0xc0 warn_slowpath_fmt+0x46/0x50 debug_print_object+0x8d/0xb0 __debug_check_no_obj_freed+0xa5/0x220 debug_check_no_obj_freed+0x15/0x20 kmem_cache_free+0x197/0x340 kmem_cache_destroy+0x86/0xe0 nf_conntrack_cleanup_net_list+0x131/0x170 nf_conntrack_pernet_exit+0x5d/0x70 ops_exit_list+0x5e/0x70 cleanup_net+0xfb/0x1c0 process_one_work+0x338/0x550 worker_thread+0x215/0x350 kthread+0xe7/0xf0 ret_from_fork+0x7c/0xb0 Also during dcookie cleanup: WARNING: CPU: 12 PID: 9725 at lib/debugobjects.c:260 debug_print_object+0x8c/0xb0() ODEBUG: free active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x20 Modules linked in: CPU: 12 PID: 9725 Comm: trinity-c141 Not tainted 3.15.0-rc2-next-20140423-sasha-00018-gc4ff6c4 #408 Call Trace: dump_stack (lib/dump_stack.c:52) warn_slowpath_common (kernel/panic.c:430) warn_slowpath_fmt (kernel/panic.c:445) debug_print_object (lib/debugobjects.c:262) __debug_check_no_obj_freed (lib/debugobjects.c:697) debug_check_no_obj_freed (lib/debugobjects.c:726) kmem_cache_free (mm/slub.c:2689 mm/slub.c:2717) kmem_cache_destroy (mm/slab_common.c:363) dcookie_unregister (fs/dcookies.c:302 fs/dcookies.c:343) event_buffer_release (arch/x86/oprofile/../../../drivers/oprofile/event_buffer.c:153) __fput (fs/file_table.c:217) ____fput (fs/file_table.c:253) task_work_run (kernel/task_work.c:125 (discriminator 1)) do_notify_resume (include/linux/tracehook.h:196 arch/x86/kernel/signal.c:751) int_signal (arch/x86/kernel/entry_64.S:807) Sysfs has a release mechanism. Use that to release the kmem_cache structure if CONFIG_SYSFS is enabled. Only slub is changed - slab currently only supports /proc/slabinfo and not /sys/kernel/slab/*. We talked about adding that and someone was working on it. [akpm@linux-foundation.org: fix CONFIG_SYSFS=n build] [akpm@linux-foundation.org: fix CONFIG_SYSFS=n build even more] Signed-off-by: Christoph Lameter <cl@linux.com> Reported-by: Sasha Levin <sasha.levin@oracle.com> Tested-by: Sasha Levin <sasha.levin@oracle.com> Acked-by: Greg KH <greg@kroah.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Pekka Enberg <penberg@kernel.org> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Bart Van Assche <bvanassche@acm.org> Cc: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-05-06hugetlb: ensure hugepage access is denied if hugepages are not supportedNishanth Aravamudan
Currently, I am seeing the following when I `mount -t hugetlbfs /none /dev/hugetlbfs`, and then simply do a `ls /dev/hugetlbfs`. I think it's related to the fact that hugetlbfs is properly not correctly setting itself up in this state?: Unable to handle kernel paging request for data at address 0x00000031 Faulting instruction address: 0xc000000000245710 Oops: Kernel access of bad area, sig: 11 [#1] SMP NR_CPUS=2048 NUMA pSeries .... In KVM guests on Power, in a guest not backed by hugepages, we see the following: AnonHugePages: 0 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 64 kB HPAGE_SHIFT == 0 in this configuration, which indicates that hugepages are not supported at boot-time, but this is only checked in hugetlb_init(). Extract the check to a helper function, and use it in a few relevant places. This does make hugetlbfs not supported (not registered at all) in this environment. I believe this is fine, as there are no valid hugepages and that won't change at runtime. [akpm@linux-foundation.org: use pr_info(), per Mel] [akpm@linux-foundation.org: fix build when HPAGE_SHIFT is undefined] Signed-off-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Mel Gorman <mgorman@suse.de> Cc: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-05-06Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs fixes from Al Viro: "dcache fixes + kvfree() (uninlined, exported by mm/util.c) + posix_acl bugfix from hch" The dcache fixes are for a subtle LRU list corruption bug reported by Miklos Szeredi, where people inside IBM saw list corruptions with the LTP/host01 test. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: nick kvfree() from apparmor posix_acl: handle NULL ACL in posix_acl_equiv_mode dcache: don't need rcu in shrink_dentry_list() more graceful recovery in umount_collect() don't remove from shrink list in select_collect() dentry_kill(): don't try to remove from shrink list expand the call of dentry_lru_del() in dentry_kill() new helper: dentry_free() fold try_prune_one_dentry() fold d_kill() and d_free() fix races between __d_instantiate() and checks of dentry flags
2014-05-06nick kvfree() from apparmorAl Viro
too many places open-code it Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-05-05asmlinkage: Revert "lto: Make asmlinkage __visible"Andi Kleen
As requested by Linus, revert adding __visible to asmlinkage. Instead we add __visible explicitely to all the symbols that need it. This reverts commit 128ea04a9885af9629059e631ddf0cab4815b589. Link: http://lkml.kernel.org/r/1398984278-29319-2-git-send-email-andi@firstfloor.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) e1000e computes header length incorrectly wrt vlans, fix from Vlad Yasevich. 2) ns_capable() check in sock_diag netlink code, from Andrew Lutomirski. 3) Fix invalid queue pairs handling in virtio_net, from Amos Kong. 4) Checksum offloading busted in sxgbe driver due to incorrect descriptor layout, fix from Byungho An. 5) Fix build failure with SMC_DEBUG set to 2 or larger, from Zi Shen Lim. 6) Fix uninitialized A and X registers in BPF interpreter, from Alexei Starovoitov. 7) Fix arch dependencies of candence driver. 8) Fix netlink capabilities checking tree-wide, from Eric W Biederman. 9) Don't dump IFLA_VF_PORTS if netlink request didn't ask for it in IFLA_EXT_MASK, from David Gibson. 10) IPV6 FIB dump restart doesn't handle table changes that happen meanwhile, causing the code to loop forever or emit dups, fix from Kumar Sandararajan. 11) Memory leak on VF removal in bnx2x, from Yuval Mintz. 12) Bug fixes for new Altera TSE driver from Vince Bridgers. 13) Fix route lookup key in SCTP, from Xugeng Zhang. 14) Use BH blocking spinlocks in SLIP, as per a similar fix to CAN/SLCAN driver. From Oliver Hartkopp. 15) TCP doesn't bump retransmit counters in some code paths, fix from Eric Dumazet. 16) Clamp delayed_ack in tcp_cubic to prevent theoretical divides by zero. Fix from Liu Yu. 17) Fix locking imbalance in error paths of HHF packet scheduler, from John Fastabend. 18) Properly reference the transport module when vsock_core_init() runs, from Andy King. 19) Fix buffer overflow in cdc_ncm driver, from Bjørn Mork. 20) IP_ECN_decapsulate() doesn't see a correct SKB network header in ip_tunnel_rcv(), fix from Ying Cai. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (132 commits) net: macb: Fix race between HW and driver net: macb: Remove 'unlikely' optimization net: macb: Re-enable RX interrupt only when RX is done net: macb: Clear interrupt flags net: macb: Pass same size to DMA_UNMAP as used for DMA_MAP ip_tunnel: Set network header properly for IP_ECN_decapsulate() e1000e: Restrict MDIO Slow Mode workaround to relevant parts e1000e: Fix issue with link flap on 82579 e1000e: Expand workaround for 10Mb HD throughput bug e1000e: Workaround for dropped packets in Gig/100 speeds on 82579 net/mlx4_core: Don't issue PCIe speed/width checks for VFs net/mlx4_core: Load the Eth driver first net/mlx4_core: Fix slave id computation for single port VF net/mlx4_core: Adjust port number in qp_attach wrapper when detaching net: cdc_ncm: fix buffer overflow Altera TSE: ALTERA_TSE should depend on HAS_DMA vsock: Make transport the proto owner net: sched: lock imbalance in hhf qdisc net: mvmdio: Check for a valid interrupt instead of an error net phy: Check for aneg completion before setting state to PHY_RUNNING ...
2014-05-05Merge tag 'tty-3.15-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty Pull tty/serial fixes from Greg KH: "Here are some tty and serial driver fixes for things reported recently" * tag 'tty-3.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: tty: Fix lockless tty buffer race Revert "tty: Fix race condition between __tty_buffer_request_room and flush_to_ldisc" drivers/tty/hvc: don't free hvc_console_setup after init n_tty: Fix n_tty_write crash when echoing in raw mode tty: serial: 8250_core.c Bug fix for Exar chips.
2014-05-03Revert "tty: Fix race condition between __tty_buffer_request_room and ↵Peter Hurley
flush_to_ldisc" This reverts commit 6a20dbd6caa2358716136144bf524331d70b1e03. Although the commit correctly identifies an unsafe race condition between __tty_buffer_request_room() and flush_to_ldisc(), the commit fixes the race with an unnecessary spinlock in a lockless algorithm. The follow-on commit, "tty: Fix lockless tty buffer race" fixes the race locklessly. Signed-off-by: Peter Hurley <peter@hurleysoftware.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-05-03Merge branch 'irq-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fixes from Thomas Gleixner: "This udpate delivers: - A fix for dynamic interrupt allocation on x86 which is required to exclude the GSI interrupts from the dynamic allocatable range. This was detected with the newfangled tablet SoCs which have GPIOs and therefor allocate a range of interrupts. The MSI allocations already excluded the GSI range, so we never noticed before. - The last missing set_irq_affinity() repair, which was delayed due to testing issues - A few bug fixes for the armada SoC interrupt controller - A memory allocation fix for the TI crossbar interrupt controller - A trivial kernel-doc warning fix" * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip: irq-crossbar: Not allocating enough memory irqchip: armanda: Sanitize set_irq_affinity() genirq: x86: Ensure that dynamic irq allocation does not conflict linux/interrupt.h: fix new kernel-doc warnings irqchip: armada-370-xp: Fix releasing of MSIs irqchip: armada-370-xp: implement the ->check_device() msi_chip operation irqchip: armada-370-xp: fix invalid cast of signed value into unsigned variable