summaryrefslogtreecommitdiffstats
path: root/include/linux
AgeCommit message (Collapse)Author
2012-09-27net: struct napi_struct fields reorderingEric Dumazet
Remove two holes on 64bit arches, and put dev_list at the end of napi_struct since its not used in fast path. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-24filter: add XOR instruction for use with X/KDaniel Borkmann
SKF_AD_ALU_XOR_X has been added a while ago, but as an 'ancillary' operation that is invoked through a negative offset in K within BPF load operations. Since BPF_MOD has recently been added, BPF_XOR should also be part of the common ALU operations. Removing SKF_AD_ALU_XOR_X might not be an option since this is exposed to user space. Signed-off-by: Daniel Borkmann <daniel.borkmann@tik.ee.ethz.ch> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-24net: use a per task frag allocatorEric Dumazet
We currently use a per socket order-0 page cache for tcp_sendmsg() operations. This page is used to build fragments for skbs. Its done to increase probability of coalescing small write() into single segments in skbs still in write queue (not yet sent) But it wastes a lot of memory for applications handling many mostly idle sockets, since each socket holds one page in sk->sk_sndmsg_page Its also quite inefficient to build TSO 64KB packets, because we need about 16 pages per skb on arches where PAGE_SIZE = 4096, so we hit page allocator more than wanted. This patch adds a per task frag allocator and uses bigger pages, if available. An automatic fallback is done in case of memory pressure. (up to 32768 bytes per frag, thats order-3 pages on x86) This increases TCP stream performance by 20% on loopback device, but also benefits on other network devices, since 8x less frags are mapped on transmit and unmapped on tx completion. Alexander Duyck mentioned a probable performance win on systems with IOMMU enabled. Its possible some SG enabled hardware cant cope with bigger fragments, but their ndo_start_xmit() should already handle this, splitting a fragment in sub fragments, since some arches have PAGE_SIZE=65536 Successfully tested on various ethernet devices. (ixgbe, igb, bnx2x, tg3, mellanox mlx4) Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Ben Hutchings <bhutchings@solarflare.com> Cc: Vijay Subramanian <subramanian.vijay@gmail.com> Cc: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Vijay Subramanian <subramanian.vijay@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-24Merge branch 'master' of git://1984.lsi.us.es/nf-nextDavid S. Miller
Pablo Neira Ayuso says: ==================== This patchset contains updates for your net-next tree, they are: * Mostly fixes for the recently pushed IPv6 NAT support: - Fix crash while removing nf_nat modules from Patrick McHardy. - Fix unbalanced rcu_read_unlock from Ulrich Weber. - Merge NETMAP and REDIRECT into one single xt_target module, from Jan Engelhardt. - Fix Kconfig for IPv6 NAT, which allows inconsistent configurations, from myself. * Updates for ipset, all of the from Jozsef Kadlecsik: - Add the new "nomatch" option to obtain reverse set matching. - Support for /0 CIDR in hash:net,iface set type. - One non-critical fix for a rare crash due to pass really wrong configuration parameters. - Coding style cleanups. - Sparse fixes. - Add set revision supported via modinfo.i * One extension for the xt_time match, to support matching during the transition between two days with one single rule, from Florian Westphal. * Fix maximum packet length supported by nfnetlink_queue and add NFQA_CAP_LEN attribute, from myself. You can notice that this batch contains a couple of fixes that may go to 3.6-rc but I don't consider them critical to push them: * The ipset fix for the /0 cidr case, which is triggered with one inconsistent command line invocation of ipset. * The nfnetlink_queue maximum packet length supported since it requires the new NFQA_CAP_LEN attribute to provide a full workaround for the described problem. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-24netfilter: nfnetlink_queue: add NFQA_CAP_LEN attributePablo Neira Ayuso
This patch adds the NFQA_CAP_LEN attribute that allows us to know what is the real packet size from user-space (even if we decided to retrieve just a few bytes from the packet instead of all of it). Security software that inspects packets should always check for this new attribute to make sure that it is inspecting the entire packet. This also helps to provide a workaround for the problem described in: http://marc.info/?l=netfilter-devel&m=134519473212536&w=2 Original idea from Florian Westphal. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-09-24netfilter: nf_ct_ftp: add sequence tracking pickup facility for injected entriesPablo Neira Ayuso
This patch allows the FTP helper to pickup the sequence tracking from the first packet seen. This is useful to fix the breakage of the first FTP command after the failover while using conntrackd to synchronize states. The seq_aft_nl_num field in struct nf_ct_ftp_info has been shrinked to 16-bits (enough for what it does), so we can use the remaining 16-bits to store the flags while using the same size for the private FTP helper data. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-09-24netfilter: xt_time: add support to ignore day transitionFlorian Westphal
Currently, if you want to do something like: "match Monday, starting 23:00, for two hours" You need two rules, one for Mon 23:00 to 0:00 and one for Tue 0:00-1:00. The rule: --weekdays Mo --timestart 23:00 --timestop 01:00 looks correct, but it will first match on monday from midnight to 1 a.m. and then again for another hour from 23:00 onwards. This permits userspace to explicitly ignore the day transition and match for a single, continuous time period instead. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-09-23netlink: Rearrange netlink_kernel_cfg to save space on 64-bit.David S. Miller
Suggested by Jan Engelhardt. Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-22netfilter: ipset: Support to match elements marked with "nomatch"Jozsef Kadlecsik
Exceptions can now be matched and we can branch according to the possible cases: a. match in the set if the element is not flagged as "nomatch" b. match in the set if the element is flagged with "nomatch" c. no match i.e. iptables ... -m set --match-set ... -j ... iptables ... -m set --match-set ... --nomatch-entries -j ... ... Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
2012-09-22netfilter: ipset: Coding style fixesJozsef Kadlecsik
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
2012-09-22netfilter: ipset: Include supported revisions in module descriptionJozsef Kadlecsik
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
2012-09-22netfilter: ipset: Rewrite cidr book keeping to handle /0Jozsef Kadlecsik
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
2012-09-22ptp: clarify the clock_name sysfs attributeRichard Cochran
There has been some confusion among PHC driver authors about the intended purpose of the clock_name attribute. This patch expands the documation in order to clarify how the clock_name field should be understood. Signed-off-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-22ptp: link the phc device to its parent deviceRichard Cochran
PTP Hardware Clock devices appear as class devices in sysfs. This patch changes the registration API to use the parent device, clarifying the clock's relationship to the underlying device. Signed-off-by: Richard Cochran <richardcochran@gmail.com> Acked-by: Ben Hutchings <bhutchings@solarflare.com> Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-21netlink: use <linux/export.h> instead of <linux/module.h>Pablo Neira Ayuso
Since (9f00d97 netlink: hide struct module parameter in netlink_kernel_create), linux/netlink.h includes linux/module.h because of the use of THIS_MODULE. Use linux/export.h instead, as suggested by Stephen Rothwell, which is significantly smaller and defines THIS_MODULES. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-20ipv4: Don't add TCP-code in inet_sock_destructChristoph Paasch
Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be> Acked-by: H.K. Jerry Chu <hkchu@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-20IB/ipoib: Add rtnl_link_ops supportOr Gerlitz
Add rtnl_link_ops to IPoIB, with the first usage being child device create/delete through them. Childs devices are now either legacy ones, created/deleted through the ipoib sysfs entries, or RTNL ones. Adding support for RTNL childs involved refactoring of ipoib_vlan_add which is now used by both the sysfs and the link_ops code. Also, added ndo_uninit entry to support calling unregister_netdevice_queue from the rtnl dellink entry. This required removal of calls to ipoib_dev_cleanup from the driver in flows which use unregister_netdevice, since the networking core will invoke ipoib_uninit which does exactly that. Signed-off-by: Erez Shitrit <erezsh@mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-20Merge branch 'for-davem' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/bwh/sfc-next Ben Hutchings says: ==================== 1. Extension to PPS/PTP to allow for PHC devices where pulses are subject to a variable but measurable delay. 2. PPS/PTP/PHC support for Solarflare boards with a timestamping peripheral. 3. MTD support for updating the timestamping peripheral on those boards. 4. Fix for potential over-length requests to firmware. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-19netpoll: call ->ndo_select_queue() in tx pathAmerigo Wang
In netpoll tx path, we miss the chance of calling ->ndo_select_queue(), thus could cause problems when bonding is involved. This patch makes dev_pick_tx() extern (and rename it to netdev_pick_tx()) to let netpoll call it in netpoll_send_skb_on_dev(). Reported-by: Sylvain Munaut <s.munaut@whatever-company.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Cong Wang <amwang@redhat.com> Tested-by: Sylvain Munaut <s.munaut@whatever-company.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-19netdev: make address const in device address managementstephen hemminger
The internal functions for add/deleting addresses don't change their argument. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: net/netfilter/nfnetlink_log.c net/netfilter/xt_LOG.c Rather easy conflict resolution, the 'net' tree had bug fixes to make sure we checked if a socket is a time-wait one or not and elide the logging code if so. Whereas on the 'net-next' side we are calculating the UID and GID from the creds using different interfaces due to the user namespace changes from Eric Biederman. Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-14Merge branch 'i2c-embedded/for-current' of ↵Linus Torvalds
git://git.pengutronix.de/git/wsa/linux Pull i2c embedded fixes from Wolfram Sang: "The last bunch of (typical) i2c-embedded driver fixes for 3.6. Also update the MAINTAINERS file to point to my tree since people keep asking where to find their patches." * 'i2c-embedded/for-current' of git://git.pengutronix.de/git/wsa/linux: i2c: algo: pca: Fix mode selection for PCA9665 MAINTAINERS: fix tree for current i2c-embedded development i2c: mxs: correctly setup speed for non devicetree i2c: pnx: Fix read transactions of >= 2 bytes i2c: pnx: Fix bit definitions
2012-09-14Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "This tree includes various fixes" Ingo really needs to improve on the whole "explain git pull" part. "Various fixes" indeed. * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/hwpb: Invoke __perf_event_disable() if interrupts are already disabled perf/x86: Enable Intel Cedarview Atom suppport perf_event: Switch to internal refcount, fix race with close() oprofile, s390: Fix uninitialized memory access when writing to oprofilefs perf/x86: Fix microcode revision check for SNB-PEBS
2012-09-14Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Use after free and new device IDs in bluetooth from Andre Guedes, Yevgeniy Melnichuk, Gustavo Padovan, and Henrik Rydberg. 2) Fix crashes with short packet lengths and VLAN in pktgen, from Nishank Trivedi. 3) mISDN calls flush_work_sync() with locks held, fix from Karsten Keil. 4) Packet scheduler gred parameters are reported to userspace improperly scaled, and WRED idling is not performed correctly. All from David Ward. 5) Fix TCP socket refcount problem in ipv6, from Julian Anastasov. 6) ibmveth device has RX queue alignment requirements which are not being explicitly met resulting in sporadic failures, fix from Santiago Leon. 7) Netfilter needs to take care when interpreting sockets attached to socket buffers, they could be time-wait minisockets. Fix from Eric Dumazet. 8) sock_edemux() has the same issue as netfilter did in #7 above, fix from Eric Dumazet. 9) Avoid infinite loops in CBQ scheduler with some configurations, from Eric Dumazet. 10) Deal with "Reflection scan: an Off-Path Attack on TCP", from Jozsef Kadlecsik. 11) SCTP overcharges socket for TX packets, fix from Thomas Graf. 12) CODEL packet scheduler should not reset it's state every time it builds a new flow, fix from Eric Dumazet. 13) Fix memory leak in nl80211, from Wei Yongjun. 14) NETROM doesn't check skb_copy_datagram_iovec() return values, from Alan Cox. 15) l2tp ethernet was using sizeof(ETH_HLEN) instead of plain ETH_HLEN, oops. From Eric Dumazet. 16) Fix selection of ath9k chips on which PA linearization and AM2PM predistoration are used, from Felix Fietkau. 17) Flow steering settings in mlx4 driver need to be validated properly, from Hadar Hen Zion. 18) bnx2x doesn't show the correct link duplex setting, from Yaniv Rosner. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (75 commits) pktgen: fix crash with vlan and packet size less than 46 bnx2x: Add missing afex code bnx2x: fix registers dumped bnx2x: correct advertisement of pause capabilities bnx2x: display the correct duplex value bnx2x: prevent timeouts when using PFC bnx2x: fix stats copying logic bnx2x: Avoid sending multiple statistics queries net: qmi_wwan: call subdriver with control intf only net_sched: gred: actually perform idling in WRED mode net_sched: gred: fix qave reporting via netlink net_sched: gred: eliminate redundant DP prio comparisons net_sched: gred: correct comment about qavg calculation in RIO mode mISDN: Fix wrong usage of flush_work_sync while holding locks netfilter: log: Fix log-level processing net-sched: sch_cbq: avoid infinite loop net: qmi_wwan: fix Gobi device probing for un2430 net: fix net/core/sock.c build error ixp4xx_hss: fix build failure due to missing linux/module.h inclusion caif: move the dereference below the NULL test ...
2012-09-14Merge tag 'driver-core-3.6-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core fix from Greg Kroah-Hartman: "Here is one fix for 3.6-rc6 for the kobject.h file. It fixes a reported oops if CONFIG_HOTPLUG is disabled. It's been in the linux-next tree for a while now. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>" * tag 'driver-core-3.6-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: kobject: fix oops with "input0: bad kobj_uevent_env content in show_uevent()"
2012-09-13mISDN: Fix wrong usage of flush_work_sync while holding locksKarsten Keil
It is a bad idea to hold a spinlock and call flush_work_sync. Move the workqueue cleanup outside the spinlock and use cancel_work_sync, on closing the channel this seems to be the more correct function. Remove the never used and constant return value of mISDN_freebchannel. Signed-off-by: Karsten Keil <keil@b1-systems.de> Cc: <stable@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-13Merge tag 'nfs-for-3.6-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
Pull NFS client bugfixes from Trond Myklebust: - Final (hopefully) fix for the range checking code in NFSv4 getacl. This should fix the Oopses being seen when the acl size is close to PAGE_SIZE. - Fix a regression with the legacy binary mount code - Fix a regression in the readdir cookieverf initialisation - Fix an RPC over UDP regression - Ensure that we report all errors in the NFSv4 open code - Ensure that fsync() reports all relevant synchronisation errors. * tag 'nfs-for-3.6-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: NFS: fsync() must exit with an error if page writeback failed SUNRPC: Fix a UDP transport regression NFS: return error from decode_getfh in decode open NFSv4: Fix buffer overflow checking in __nfs4_get_acl_uncached NFSv4: Fix range checking in __nfs4_get_acl_uncached and __nfs4_proc_set_acl NFS: Fix a problem with the legacy binary mount code NFS: Fix the initialisation of the readdir 'cookieverf' array
2012-09-12i2c: pnx: Fix read transactions of >= 2 bytesRoland Stigge
On transactions with n>=2 bytes, the controller actually wrongly clocks in n+1 bytes. This is caused by the (wrong) assumption that RFE in the Status Register is 1 iff there is no byte already ordered (via a dummy TX byte). This lead to the implementation of synchronized byte ordering, e.g.: Dummy-TX - RX - Dummy-TX - RX - ... But since RFE actually stays high after some Dummy-TX, it rather looks like: Dummy-TX - Dummy-TX - RX - Dummy-TX - RX - (RX) The last RX byte is clocked in by the bus controller, but ignored by the kernel when filling the userspace buffer. This patch fixes the issue by asking for RX via Dummy-TX asynchronously. Introducing a separate counter for TX bytes. Signed-off-by: Roland Stigge <stigge@antcom.de> Signed-off-by: Wolfram Sang <w.sang@pengutronix.de>
2012-09-10etherdevice: introduce help function eth_zero_addr()Duan Jiong
a lot of code has either the memset or an inefficient copy from a static array that contains the all-zeros Ethernet address. Introduce help function eth_zero_addr() to fill an address with all zeros, making the code clearer and allowing us to get rid of some constant arrays. Signed-off-by: Duan Jiong <djduanjiong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-10filter: add MOD operationEric Dumazet
Add a new ALU opcode, to compute a modulus. Commit ffe06c17afbbb used an ancillary to implement XOR_X, but here we reserve one of the available ALU opcode to implement both MOD_X and MOD_K Signed-off-by: Eric Dumazet <edumazet@google.com> Suggested-by: George Bakos <gbakos@alpinista.org> Cc: Jay Schulist <jschlst@samba.org> Cc: Jiri Pirko <jpirko@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-10netlink: Rename pid to portid to avoid confusionEric W. Biederman
It is a frequent mistake to confuse the netlink port identifier with a process identifier. Try to reduce this confusion by renaming fields that hold port identifiers portid instead of pid. I have carefully avoided changing the structures exported to userspace to avoid changing the userspace API. I have successfully built an allyesconfig kernel with this change. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-08netlink: hide struct module parameter in netlink_kernel_createPablo Neira Ayuso
This patch defines netlink_kernel_create as a wrapper function of __netlink_kernel_create to hide the struct module *me parameter (which seems to be THIS_MODULE in all existing netlink subsystems). Suggested by David S. Miller. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-08netlink: kill netlink_set_nonrootPablo Neira Ayuso
Replace netlink_set_nonroot by one new field `flags' in struct netlink_kernel_cfg that is passed to netlink_kernel_create. This patch also renames NL_NONROOT_* to NL_CFG_F_NONROOT_* since now the flags field in nl_table is generic (so we can add more flags if needed in the future). Also adjust all callers in the net-next tree to use these flags instead of netlink_set_nonroot. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-07pps/ptp: Allow PHC devices to adjust PPS events for known delayBen Hutchings
Initial version by Stuart Hodgson <smhodgson@solarflare.com> Some PHC device drivers may deliver PPS events with a significant and variable delay, but still be able to measure precisely what that delay is. Add a pps_sub_ts() function for subtracting a delay from the timestamp(s) in a PPS event, and a PTP event type (PTP_CLOCK_PPSUSR) for which the caller provides a complete PPS event. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
2012-09-07scm: Don't use struct ucred in NETLINK_CB and struct scm_cookie.Eric W. Biederman
Passing uids and gids on NETLINK_CB from a process in one user namespace to a process in another user namespace can result in the wrong uid or gid being presented to userspace. Avoid that problem by passing kuids and kgids instead. - define struct scm_creds for use in scm_cookie and netlink_skb_parms that holds uid and gid information in kuid_t and kgid_t. - Modify scm_set_cred to fill out scm_creds by heand instead of using cred_to_ucred to fill out struct ucred. This conversion ensures userspace does not get incorrect uid or gid values to look at. - Modify scm_recv to convert from struct scm_creds to struct ucred before copying credential values to userspace. - Modify __scm_send to populate struct scm_creds on in the scm_cookie, instead of just copying struct ucred from userspace. - Modify netlink_sendmsg to copy scm_creds instead of struct ucred into the NETLINK_CB. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-07net/mlx4_core: Add security check / enforcement for flow steering rules set ↵Hadar Hen Zion
for VMs Since VFs may be mapped to VMs which aren't trusted entities, flow steering rules attached through the wrapper on behalf of VFs must be checked to make sure that their L2 specification relate to MAC address assigned to that VF, and add L2 specification if its missing. Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-07net/mlx4_core: Put Firmware flow steering structures in common header filesHadar Hen Zion
To allow for usage of the flow steering Firmware structures in more locations over the driver, such as the resource tracker, move them from mcg.c to common header files. Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-07SUNRPC: Fix a UDP transport regressionTrond Myklebust
Commit 43cedbf0e8dfb9c5610eb7985d5f21263e313802 (SUNRPC: Ensure that we grab the XPRT_LOCK before calling xprt_alloc_slot) is causing hangs in the case of NFS over UDP mounts. Since neither the UDP or the RDMA transport mechanism use dynamic slot allocation, we can skip grabbing the socket lock for those transports. Add a new rpc_xprt_op to allow switching between the TCP and UDP/RDMA case. Note that the NFSv4.1 back channel assigns the slot directly through rpc_run_bc_task, so we can ignore that case. Reported-by: Dick Streefland <dick.streefland@altium.nl> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@vger.kernel.org [>= 3.1]
2012-09-06kobject: fix oops with "input0: bad kobj_uevent_env content in show_uevent()"Bjørn Mork
Fengguang Wu <fengguang.wu@intel.com> writes: > After the __devinit* removal series, I can still get kernel panic in > show_uevent(). So there are more sources of bug.. > > Debug patch: > > @@ -343,8 +343,11 @@ static ssize_t show_uevent(struct device > goto out; > > /* copy keys to file */ > - for (i = 0; i < env->envp_idx; i++) > + dev_err(dev, "uevent %d env[%d]: %s/.../%s\n", env->buflen, env->envp_idx, top_kobj->name, dev->kobj.name); > + for (i = 0; i < env->envp_idx; i++) { > + printk(KERN_ERR "uevent %d env[%d]: %s\n", (int)count, i, env->envp[i]); > count += sprintf(&buf[count], "%s\n", env->envp[i]); > + } > > Oops message, the env[] is again not properly initilized: > > [ 44.068623] input input0: uevent 61 env[805306368]: input0/.../input0 > [ 44.069552] uevent 0 env[0]: (null) This is a completely different CONFIG_HOTPLUG problem, only demonstrating another reason why CONFIG_HOTPLUG should go away. I had a hard time trying to disable it anyway ;-) The problem this time is lots of code assuming that a call to add_uevent_var() will guarantee that env->buflen > 0. This is not true if CONFIG_HOTPLUG is unset. So things like this end up overwriting env->envp_idx because the array index is -1: if (add_uevent_var(env, "MODALIAS=")) return -ENOMEM; len = input_print_modalias(&env->buf[env->buflen - 1], sizeof(env->buf) - env->buflen, dev, 0); Don't know what the best action is, given that there seem to be a *lot* of this around the kernel. This patch "fixes" the problem for me, but I don't know if it can be considered an appropriate fix. [ It is the correct fix for now, for 3.7 forcing CONFIG_HOTPLUG to always be on is the longterm fix, but it's too late for 3.6 and older kernels to resolve this that way - gregkh ] Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Bjørn Mork <bjorn@mork.no> Tested-by: Fengguang Wu <fengguang.wu@intel.com> Cc: stable <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-09-06Merge tag 'hwmon-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging Pull a hwmon fix from Guenter Roeck: "One patch, fixing DIV_ROUND_CLOSEST to support negative dividends. While the changes are not in the drivers/hwmon directory, the problem primarily affects hwmon drivers, and it makes sense to push the patch through the hwmon tree." * tag 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging: linux/kernel.h: Fix DIV_ROUND_CLOSEST to support negative dividends
2012-09-06NFSv4: Fix buffer overflow checking in __nfs4_get_acl_uncachedTrond Myklebust
Pass the checks made by decode_getacl back to __nfs4_get_acl_uncached so that it knows if the acl has been truncated. The current overflow checking is broken, resulting in Oopses on user-triggered nfs4_getfacl calls, and is opaque to the point where several attempts at fixing it have failed. This patch tries to clean up the code in addition to fixing the Oopses by ensuring that the overflow checks are performed in a single place (decode_getacl). If the overflow check failed, we will still be able to report the acl length, but at least we will no longer attempt to cache the acl or copy the truncated contents to user space. Reported-by: Sachin Prabhu <sprabhu@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Tested-by: Sachin Prabhu <sprabhu@redhat.com>
2012-09-05Merge tag 'mmc-fixes-for-3.6-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc Pull MMC fixes from Chris Ball: - a firmware bug on several Samsung MoviNAND eMMC models causes permanent corruption on the device when secure erase and secure trim requests are made, so we disable those requests on these eMMC devices. - atmel-mci: fix a hang with some SD cards by waiting for not-busy flag. - dw_mmc: low-power mode breaks SDIO interrupts; fix PIO error handling; fix handling of error interrupts. - mxs-mmc: fix deadlocks; fix compile error due to dma.h arch change. - omap: fix broken PIO mode causing memory corruption. - sdhci-esdhc: fix card detection. * tag 'mmc-fixes-for-3.6-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc: mmc: omap: fix broken PIO mode mmc: card: Skip secure erase on MoviNAND; causes unrecoverable corruption. mmc: dw_mmc: Disable low power mode if SDIO interrupts are used mmc: dw_mmc: fix error handling in PIO mode mmc: dw_mmc: correct mishandling error interrupt mmc: dw_mmc: amend using error interrupt status mmc: atmel-mci: not busy flag has also to be used for read operations mmc: sdhci-esdhc: break out early if clock is 0 mmc: mxs-mmc: fix deadlock caused by recursion loop mmc: mxs-mmc: fix deadlock in SDIO IRQ case mmc: bfin_sdh: fix dma_desc_array build error
2012-09-05net: qdisc busylock needs lockdep annotationsEric Dumazet
It seems we need to provide ability for stacked devices to use specific lock_class_key for sch->busylock We could instead default l2tpeth tx_queue_len to 0 (no qdisc), but a user might use a qdisc anyway. (So same fixes are probably needed on non LLTX stacked drivers) Noticed while stressing L2TPV3 setup : ====================================================== [ INFO: possible circular locking dependency detected ] 3.6.0-rc3+ #788 Not tainted ------------------------------------------------------- netperf/4660 is trying to acquire lock: (l2tpsock){+.-...}, at: [<ffffffffa0208db2>] l2tp_xmit_skb+0x172/0xa50 [l2tp_core] but task is already holding lock: (&(&sch->busylock)->rlock){+.-...}, at: [<ffffffff81596595>] dev_queue_xmit+0xd75/0xe00 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&(&sch->busylock)->rlock){+.-...}: [<ffffffff810a5df0>] lock_acquire+0x90/0x200 [<ffffffff817499fc>] _raw_spin_lock_irqsave+0x4c/0x60 [<ffffffff81074872>] __wake_up+0x32/0x70 [<ffffffff8136d39e>] tty_wakeup+0x3e/0x80 [<ffffffff81378fb3>] pty_write+0x73/0x80 [<ffffffff8136cb4c>] tty_put_char+0x3c/0x40 [<ffffffff813722b2>] process_echoes+0x142/0x330 [<ffffffff813742ab>] n_tty_receive_buf+0x8fb/0x1230 [<ffffffff813777b2>] flush_to_ldisc+0x142/0x1c0 [<ffffffff81062818>] process_one_work+0x198/0x760 [<ffffffff81063236>] worker_thread+0x186/0x4b0 [<ffffffff810694d3>] kthread+0x93/0xa0 [<ffffffff81753e24>] kernel_thread_helper+0x4/0x10 -> #0 (l2tpsock){+.-...}: [<ffffffff810a5288>] __lock_acquire+0x1628/0x1b10 [<ffffffff810a5df0>] lock_acquire+0x90/0x200 [<ffffffff817498c1>] _raw_spin_lock+0x41/0x50 [<ffffffffa0208db2>] l2tp_xmit_skb+0x172/0xa50 [l2tp_core] [<ffffffffa021a802>] l2tp_eth_dev_xmit+0x32/0x60 [l2tp_eth] [<ffffffff815952b2>] dev_hard_start_xmit+0x502/0xa70 [<ffffffff815b63ce>] sch_direct_xmit+0xfe/0x290 [<ffffffff81595a05>] dev_queue_xmit+0x1e5/0xe00 [<ffffffff815d9d60>] ip_finish_output+0x3d0/0x890 [<ffffffff815db019>] ip_output+0x59/0xf0 [<ffffffff815da36d>] ip_local_out+0x2d/0xa0 [<ffffffff815da5a3>] ip_queue_xmit+0x1c3/0x680 [<ffffffff815f4192>] tcp_transmit_skb+0x402/0xa60 [<ffffffff815f4a94>] tcp_write_xmit+0x1f4/0xa30 [<ffffffff815f5300>] tcp_push_one+0x30/0x40 [<ffffffff815e6672>] tcp_sendmsg+0xe82/0x1040 [<ffffffff81614495>] inet_sendmsg+0x125/0x230 [<ffffffff81576cdc>] sock_sendmsg+0xdc/0xf0 [<ffffffff81579ece>] sys_sendto+0xfe/0x130 [<ffffffff81752c92>] system_call_fastpath+0x16/0x1b Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&(&sch->busylock)->rlock); lock(l2tpsock); lock(&(&sch->busylock)->rlock); lock(l2tpsock); *** DEADLOCK *** 5 locks held by netperf/4660: #0: (sk_lock-AF_INET){+.+.+.}, at: [<ffffffff815e581c>] tcp_sendmsg+0x2c/0x1040 #1: (rcu_read_lock){.+.+..}, at: [<ffffffff815da3e0>] ip_queue_xmit+0x0/0x680 #2: (rcu_read_lock_bh){.+....}, at: [<ffffffff815d9ac5>] ip_finish_output+0x135/0x890 #3: (rcu_read_lock_bh){.+....}, at: [<ffffffff81595820>] dev_queue_xmit+0x0/0xe00 #4: (&(&sch->busylock)->rlock){+.-...}, at: [<ffffffff81596595>] dev_queue_xmit+0xd75/0xe00 stack backtrace: Pid: 4660, comm: netperf Not tainted 3.6.0-rc3+ #788 Call Trace: [<ffffffff8173dbf8>] print_circular_bug+0x1fb/0x20c [<ffffffff810a5288>] __lock_acquire+0x1628/0x1b10 [<ffffffff810a334b>] ? check_usage+0x9b/0x4d0 [<ffffffff810a3f44>] ? __lock_acquire+0x2e4/0x1b10 [<ffffffff810a5df0>] lock_acquire+0x90/0x200 [<ffffffffa0208db2>] ? l2tp_xmit_skb+0x172/0xa50 [l2tp_core] [<ffffffff817498c1>] _raw_spin_lock+0x41/0x50 [<ffffffffa0208db2>] ? l2tp_xmit_skb+0x172/0xa50 [l2tp_core] [<ffffffffa0208db2>] l2tp_xmit_skb+0x172/0xa50 [l2tp_core] [<ffffffffa021a802>] l2tp_eth_dev_xmit+0x32/0x60 [l2tp_eth] [<ffffffff815952b2>] dev_hard_start_xmit+0x502/0xa70 [<ffffffff81594e0e>] ? dev_hard_start_xmit+0x5e/0xa70 [<ffffffff81595961>] ? dev_queue_xmit+0x141/0xe00 [<ffffffff815b63ce>] sch_direct_xmit+0xfe/0x290 [<ffffffff81595a05>] dev_queue_xmit+0x1e5/0xe00 [<ffffffff81595820>] ? dev_hard_start_xmit+0xa70/0xa70 [<ffffffff815d9d60>] ip_finish_output+0x3d0/0x890 [<ffffffff815d9ac5>] ? ip_finish_output+0x135/0x890 [<ffffffff815db019>] ip_output+0x59/0xf0 [<ffffffff815da36d>] ip_local_out+0x2d/0xa0 [<ffffffff815da5a3>] ip_queue_xmit+0x1c3/0x680 [<ffffffff815da3e0>] ? ip_local_out+0xa0/0xa0 [<ffffffff815f4192>] tcp_transmit_skb+0x402/0xa60 [<ffffffff815fa25e>] ? tcp_md5_do_lookup+0x18e/0x1a0 [<ffffffff815f4a94>] tcp_write_xmit+0x1f4/0xa30 [<ffffffff815f5300>] tcp_push_one+0x30/0x40 [<ffffffff815e6672>] tcp_sendmsg+0xe82/0x1040 [<ffffffff81614495>] inet_sendmsg+0x125/0x230 [<ffffffff81614370>] ? inet_create+0x6b0/0x6b0 [<ffffffff8157e6e2>] ? sock_update_classid+0xc2/0x3b0 [<ffffffff8157e750>] ? sock_update_classid+0x130/0x3b0 [<ffffffff81576cdc>] sock_sendmsg+0xdc/0xf0 [<ffffffff81162579>] ? fget_light+0x3f9/0x4f0 [<ffffffff81579ece>] sys_sendto+0xfe/0x130 [<ffffffff810a69ad>] ? trace_hardirqs_on+0xd/0x10 [<ffffffff8174a0b0>] ? _raw_spin_unlock_irq+0x30/0x50 [<ffffffff810757e3>] ? finish_task_switch+0x83/0xf0 [<ffffffff810757a6>] ? finish_task_switch+0x46/0xf0 [<ffffffff81752cb7>] ? sysret_check+0x1b/0x56 [<ffffffff81752c92>] system_call_fastpath+0x16/0x1b Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-05tcp: add generic netlink support for tcp_metricsJulian Anastasov
Add support for genl "tcp_metrics". No locking is changed, only that now we can unlink and delete entries after grace period. We implement get/del for single entry and dump to support show/flush filtering in user space. Del without address attribute causes flush for all addresses, sadly under genl_mutex. v2: - remove rcu_assign_pointer as suggested by Eric Dumazet, it is not needed because there are no other writes under lock - move the flushing code in tcp_metrics_flush_all v3: - remove synchronize_rcu on flush as suggested by Eric Dumazet Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-04NFS: Fix the initialisation of the readdir 'cookieverf' arrayTrond Myklebust
When the NFS_COOKIEVERF helper macro was converted into a static inline function in commit 99fadcd764 (nfs: convert NFS_*(inode) helpers to static inline), we broke the initialisation of the readdir cookies, since that depended on doing a memset with an argument of 'sizeof(NFS_COOKIEVERF(inode))' which therefore changed from sizeof(be32 cookieverf[2]) to sizeof(be32 *). At this point, NFS_COOKIEVERF seems to be more of an obfuscation than a helper, so the best thing would be to just get rid of it. Also see: https://bugzilla.kernel.org/show_bug.cgi?id=46881 Reported-by: Andi Kleen <andi@firstfloor.org> Reported-by: David Binderman <dcb314@hotmail.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@vger.kernel.org
2012-09-04mmc: card: Skip secure erase on MoviNAND; causes unrecoverable corruption.Ian Chen
For several MoviNAND eMMC parts, there are known issues with secure erase and secure trim. For these specific MoviNAND devices, we skip these operations. Specifically, there is a bug in the eMMC firmware that causes unrecoverable corruption when the MMC is erased with MMC_CAP_ERASE enabled. References: http://forum.xda-developers.com/showthread.php?t=1644364 https://plus.google.com/111398485184813224730/posts/21pTYfTsCkB#111398485184813224730/posts/21pTYfTsCkB Signed-off-by: Ian Chen <ian.cy.chen@samsung.com> Reviewed-by: Namjae Jeon <linkinjeon@gmail.com> Acked-by: Jaehoon Chung <jh80.chung@samsung.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Cc: stable <stable@vger.kernel.org> [3.0+] Signed-off-by: Chris Ball <cjb@laptop.org>
2012-09-04perf/hwpb: Invoke __perf_event_disable() if interrupts are already disabledK.Prasad
While debugging a warning message on PowerPC while using hardware breakpoints, it was discovered that when perf_event_disable is invoked through hw_breakpoint_handler function with interrupts disabled, a subsequent IPI in the code path would trigger a WARN_ON_ONCE message in smp_call_function_single function. This patch calls __perf_event_disable() when interrupts are already disabled, instead of perf_event_disable(). Reported-by: Edjunior Barbosa Machado <emachado@linux.vnet.ibm.com> Signed-off-by: K.Prasad <Prasad.Krishnan@gmail.com> [naveen.n.rao@linux.vnet.ibm.com: v3: Check to make sure we target current task] Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20120802081635.5811.17737.stgit@localhost.localdomain [ Fixed build error on MIPS. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-09-04perf_event: Switch to internal refcount, fix race with close()Al Viro
Don't mess with file refcounts (or keep a reference to file, for that matter) in perf_event. Use explicit refcount of its own instead. Deal with the race between the final reference to event going away and new children getting created for it by use of atomic_long_inc_not_zero() in inherit_event(); just have the latter free what it had allocated and return NULL, that works out just fine (children of siblings of something doomed are created as singletons, same as if the child of leader had been created and immediately killed). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Cc: stable@kernel.org Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20120820135925.GG23464@ZenIV.linux.org.uk Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-09-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextPablo Neira Ayuso
This merges (3f509c6 netfilter: nf_nat_sip: fix incorrect handling of EBUSY for RTCP expectation) to Patrick McHardy's IPv6 NAT changes.
2012-09-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) NLA_PUT* --> nla_put_* conversion got one case wrong in nfnetlink_log, fix from Patrick McHardy. 2) Missed error return check in ipw2100 driver, from Julia Lawall. 3) PMTU updates in ipv4 were setting the expiry time incorrectly, fix from Eric Dumazet. 4) SFC driver erroneously reversed src and dst when reporting filters via ethtool. 5) Memory leak in CAN protocol and wrong setting of IRQF_SHARED in sja1000 can platform driver, from Alexey Khoroshilov and Sven Schmitt. 6) Fix multicast traffic scaling regression in ipv4_dst_destroy, only take the lock when we really need to. From Eric Dumazet. 7) Fix non-root process spoofing in netlink, from Pablo Neira Ayuso. 8) CWND reduction in TCP is done incorrectly during non-SACK recovery, fix from Yuchung Cheng. 9) Revert netpoll change, and fix what was actually a driver specific problem. From Amerigo Wang. This should cure bootup hangs with netconsole some people reported. 10) Fix xen-netfront invoking __skb_fill_page_desc() with a NULL page pointer. From Ian Campbell. 11) SIP NAT fix for expectiontation creation, from Pablo Neira Ayuso. 12) __ip_rt_update_pmtu() needs RCU locking, from Eric Dumazet. 13) Fix usbnet deadlock on resume, can't use GFP_KERNEL in this situation. From Oliver Neukum. 14) The davinci ethernet driver triggers an OOPS on removal because it frees an MDIO object before unregistering it. Fix from Bin Liu. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (41 commits) net: qmi_wwan: add several new Gobi devices fddi: 64 bit bug in smt_add_para() net: ethernet: fix kernel OOPS when remove davinci_mdio module net/xfrm/xfrm_state.c: fix error return code net: ipv6: fix error return code net: qmi_wwan: new device: Foxconn/Novatel E396 usbnet: fix deadlock in resume cs89x0 : packet reception not working netfilter: nf_conntrack: fix racy timer handling with reliable events bnx2x: Correct the ndo_poll_controller call bnx2x: Move netif_napi_add to the open call ipv4: must use rcu protection while calling fib_lookup bnx2x: fix 57840_MF pci id net: ipv4: ipmr_expire_timer causes crash when removing net namespace e1000e: DoS while TSO enabled caused by link partner with small MSS l2tp: avoid to use synchronize_rcu in tunnel free function gianfar: fix default tx vlan offload feature flag netfilter: nf_nat_sip: fix incorrect handling of EBUSY for RTCP expectation xen-netfront: use __pskb_pull_tail to ensure linear area is big enough on RX netfilter: nfnetlink_log: fix error return code in init path ...