summaryrefslogtreecommitdiffstats
path: root/net
AgeCommit message (Collapse)Author
2012-06-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: net/ipv6/route.c This deals with a merge conflict between the net-next addition of the inetpeer network namespace ops, and Thomas Graf's bug fix in 2a0c451ade8e1783c5d453948289e4a978d417c9 which makes sure we don't register /proc/net/ipv6_route before it is actually safe to do so. Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-15ipv6: Prevent access to uninitialized fib_table_hash via /proc/net/ipv6_routeThomas Graf
/proc/net/ipv6_route reflects the contents of fib_table_hash. The proc handler is installed in ip6_route_net_init() whereas fib_table_hash is allocated in fib6_net_init() _after_ the proc handler has been installed. This opens up a short time frame to access fib_table_hash with its pants down. fib6_init() as a whole can't be moved to an earlier position as it also registers the rtnetlink message handlers which should be registered at the end. Therefore split it into fib6_init() which is run early and fib6_init_late() to register the rtnetlink message handlers. Signed-off-by: Thomas Graf <tgraf@suug.ch> Reviewed-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-15net: remove skb_orphan_try()Eric Dumazet
Orphaning skb in dev_hard_start_xmit() makes bonding behavior unfriendly for applications sending big UDP bursts : Once packets pass the bonding device and come to real device, they might hit a full qdisc and be dropped. Without orphaning, the sender is automatically throttled because sk->sk_wmemalloc reaches sk->sk_sndbuf (assuming sk_sndbuf is not too big) We could try to defer the orphaning adding another test in dev_hard_start_xmit(), but all this seems of little gain, now that BQL tends to make packets more likely to be parked in Qdisc queues instead of NIC TX ring, in cases where performance matters. Reverts commits : fc6055a5ba31 net: Introduce skb_orphan_try() 87fd308cfc6b net: skb_tx_hash() fix relative to skb_orphan_try() and removes SKBTX_DRV_NEEDS_SK_REF flag Reported-and-bisected-by: Jean-Michel Hautbois <jhautbois@gmail.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Tested-by: Oliver Hartkopp <socketcan@hartkopp.net> Acked-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-15ipv6: Handle PMTU in ICMP error handlers.David S. Miller
One tricky issue on the ipv6 side vs. ipv4 is that the ICMP callouts to handle the error pass the 32-bit info cookie in network byte order whereas ipv4 passes it around in host byte order. Like the ipv4 side, we have two helper functions. One for when we have a socket context and one for when we do not. ip6ip6 tunnels are not handled here, because they handle PMTU events by essentially relaying another ICMP packet-too-big message back to the original sender. This patch allows us to get rid of rt6_do_pmtu_disc(). It handles all kinds of situations that simply cannot happen when we do the PMTU update directly using a fully resolved route. In fact, the "plen == 128" check in ip6_rt_update_pmtu() can very likely be removed or changed into a BUG_ON() check. We should never have a prefixed ipv6 route when we get there. Another piece of strange history here is that TCP and DCCP, unlike in ipv4, never invoke the update_pmtu() method from their ICMP error handlers. This is incredibly astonishing since this is the context where we have the most accurate context in which to make a PMTU update, namely we have a fully connected socket and associated cached socket route. Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-14ipv4: Handle PMTU in all ICMP error handlers.David S. Miller
With ip_rt_frag_needed() removed, we have to explicitly update PMTU information in every ICMP error handler. Create two helper functions to facilitate this. 1) ipv4_sk_update_pmtu() This updates the PMTU when we have a socket context to work with. 2) ipv4_update_pmtu() Raw version, used when no socket context is available. For this interface, we essentially just pass in explicit arguments for the flow identity information we would have extracted from the socket. And you'll notice that ipv4_sk_update_pmtu() is simply implemented in terms of ipv4_update_pmtu() Note that __ip_route_output_key() is used, rather than something like ip_route_output_flow() or ip_route_output_key(). This is because we absolutely do not want to end up with a route that does IPSEC encapsulation and the like. Instead, we only want the route that would get us to the node described by the outermost IP header. Reported-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-14dcbnl: Use BUG_ON() instead of BUG()Thomas Graf
Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-14dcbnl: Silence harmless gcc warning about uninitialized reply_nlhThomas Graf
Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-13netpoll: fix netpoll_send_udp() bugsEric Dumazet
Bogdan Hamciuc diagnosed and fixed following bug in netpoll_send_udp() : "skb->len += len;" instead of "skb_put(skb, len);" Meaning that _if_ a network driver needs to call skb_realloc_headroom(), only packet headers would be copied, leaving garbage in the payload. However the skb_realloc_headroom() must be avoided as much as possible since it requires memory and netpoll tries hard to work even if memory is exhausted (using a pool of preallocated skbs) It appears netpoll_send_udp() reserved 16 bytes for the ethernet header, which happens to work for typicall drivers but not all. Right thing is to use LL_RESERVED_SPACE(dev) (And also add dev->needed_tailroom of tailroom) This patch combines both fixes. Many thanks to Bogdan for raising this issue. Reported-by: Bogdan Hamciuc <bogdan.hamciuc@freescale.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Tested-by: Bogdan Hamciuc <bogdan.hamciuc@freescale.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Neil Horman <nhorman@tuxdriver.com> Reviewed-by: Neil Horman <nhorman@tuxdriver.com> Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-13dcbnl: Use type safe nlmsg_data()Thomas Graf
Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-13dcbnl: Move dcb app allocation into dcb_app_add()Thomas Graf
Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-13dcbnl: Move dcb app lookup code into dcb_app_lookup()Thomas Graf
Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-13dcbnl: Return consistent error codesThomas Graf
EMSGSIZE - ran out of space while constructing message EOPNOTSUPP - driver/hardware does not support operation ENODEV - network device not found EINVAL - invalid message Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-13dcbnl: Use dcbnl_newmsg() where possibleThomas Graf
Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-13dcbnl: Remove now unused dcbnl_reply()Thomas Graf
Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-13dcbnl: Shorten all command handling functionsThomas Graf
Allocating and sending the skb in dcb_doit() allows for much shorter and cleaner command handling functions. The huge switch statement is replaced with an array based definition of the handling function and reply message type. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-13dcbnl: Prepare framework to shorten handling functionsThomas Graf
There is no need to allocate and send the reply message in each handling function separately. Instead, the reply skb can be allocated and sent in dcb_doit() directly. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-12Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: MAINTAINERS drivers/net/wireless/iwlwifi/pcie/trans.c The iwlwifi conflict was resolved by keeping the code added in 'net' that turns off the buggy chip feature. The MAINTAINERS conflict was merely overlapping changes, one change updated all the wireless web site URLs and the other changed some GIT trees to be Johannes's instead of John's. Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-12ethtool: Make more commands available to unprivileged processesBen Hutchings
'Get' commands should generally not require CAP_NET_ADMIN, with the exception of those that expose internal state. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-12net-next: add dev_loopback_xmit() to avoid duplicate codeMichel Machado
Add dev_loopback_xmit() in order to deduplicate functions ip_dev_loopback_xmit() (in net/ipv4/ip_output.c) and ip6_dev_loopback_xmit() (in net/ipv6/ip6_output.c). I was about to reinvent the wheel when I noticed that ip_dev_loopback_xmit() and ip6_dev_loopback_xmit() do exactly what I need and are not IP-only functions, but they were not available to reuse elsewhere. ip6_dev_loopback_xmit() does not have line "skb_dst_force(skb);", but I understand that this is harmless, and should be in dev_loopback_xmit(). Signed-off-by: Michel Machado <michel@digirati.com.br> CC: "David S. Miller" <davem@davemloft.net> CC: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> CC: James Morris <jmorris@namei.org> CC: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> CC: Patrick McHardy <kaber@trash.net> CC: Eric Dumazet <edumazet@google.com> CC: Jiri Pirko <jpirko@redhat.com> CC: "Michał Mirosław" <mirq-linux@rere.qmqm.pl> CC: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-12ipv4: Add interface option to enable routing of 127.0.0.0/8Thomas Graf
Routing of 127/8 is tradtionally forbidden, we consider packets from that address block martian when routing and do not process corresponding ARP requests. This is a sane default but renders a huge address space practically unuseable. The RFC states that no address within the 127/8 block should ever appear on any network anywhere but it does not forbid the use of such addresses outside of the loopback device in particular. For example to address a pool of virtual guests behind a load balancer. This patch adds a new interface option 'route_localnet' enabling routing of the 127/8 address block and processing of ARP requests on a specific interface. Note that for the feature to work, the default local route covering 127/8 dev lo needs to be removed. Example: $ sysctl -w net.ipv4.conf.eth0.route_localnet=1 $ ip route del 127.0.0.0/8 dev lo table local $ ip addr add 127.1.0.1/16 dev eth0 $ ip route flush cache V2: Fix invalid check to auto flush cache (thanks davem) Signed-off-by: Thomas Graf <tgraf@suug.ch> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-12Merge branch 'master' of git://1984.lsi.us.es/netDavid S. Miller
2012-06-12Merge branch 'master' of ↵John W. Linville
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem
2012-06-11af_packet: use sizeof instead of constant in spkt_devicedanborkmann@iogearbox.net
This small patch removes access to the last element of the spkt_device array through a constant. Instead, it is accessed by sizeof() to respect possible changes in if_packet.h. Signed-off-by: Daniel Borkmann <daniel.borkmann@tik.ee.ethz.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-11netfilter: nf_ct_tcp, udp: fix compilation with sysctl disabledPablo Neira Ayuso
This patch fixes the compilation of the TCP and UDP trackers with sysctl compilation disabled: net/netfilter/nf_conntrack_proto_udp.c: In function ‘udp_init_net_data’: net/netfilter/nf_conntrack_proto_udp.c:279:13: error: ‘struct nf_proto_net’ has no member named ‘user’ net/netfilter/nf_conntrack_proto_tcp.c:1606:9: error: ‘struct nf_proto_net’ has no member named ‘user’ net/netfilter/nf_conntrack_proto_tcp.c:1643:9: error: ‘struct nf_proto_net’ has no member named ‘user’ Reported-by: Fengguang Wu <wfg@linux.intel.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-11Merge branch 'master' of git://1984.lsi.us.es/net-nextDavid S. Miller
2012-06-11Merge tag 'nfc-next-3.6-1' of ↵John W. Linville
git://git.kernel.org/pub/scm/linux/kernel/git/sameo/nfc-3.0
2012-06-11Merge branch 'master' of ↵John W. Linville
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem
2012-06-11inet: Avoid potential NULL peer dereference.David S. Miller
We handle NULL in rt{,6}_set_peer but then our caller will try to pass that NULL pointer into inet_putpeer() which isn't ready for it. Fix this by moving the NULL check one level up, and then remove the now unnecessary NULL check from inetpeer_ptr_set_peer(). Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-11inet: Use FIB table peer roots in routes.David S. Miller
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-11inet: Add inetpeer tree roots to the FIB tables.David S. Miller
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-11inet: Add family scope inetpeer flushes.David S. Miller
This implementation can deal with having many inetpeer roots, which is a necessary prerequisite for per-FIB table rooted peer tables. Each family (AF_INET, AF_INET6) has a sequence number which we bump when we get a family invalidation request. Each peer lookup cheaply checks whether the flush sequence of the root we are using is out of date, and if so flushes it and updates the sequence number. Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-11ipv4: Kill ip_rt_frag_needed().David S. Miller
There is zero point to this function. It's only real substance is to perform an extremely outdated BSD4.2 ICMP check, which we can safely remove. If you really have a MTU limited link being routed by a BSD4.2 derived system, here's a nickel go buy yourself a real router. The other actions of ip_rt_frag_needed(), checking and conditionally updating the peer, are done by the per-protocol handlers of the ICMP event. TCP, UDP, et al. have a handler which will receive this event and transmit it back into the associated route via dst_ops->update_pmtu(). This simplification is important, because it eliminates the one place where we do not have a proper route context in which to make an inetpeer lookup. Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-11inet: Hide route peer accesses behind helpers.David S. Miller
We encode the pointer(s) into an unsigned long with one state bit. The state bit is used so we can store the inetpeer tree root to use when resolving the peer later. Later the peer roots will be per-FIB table, and this change works to facilitate that. Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-09inet: Pass inetpeer root into inet_getpeer*() interfaces.David S. Miller
Otherwise we reference potentially non-existing members when ipv6 is disabled. Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-09af_unix: remove unix_iter_stateEric Dumazet
As pointed out by Michael Tokarev , struct unix_iter_state is no longer needed. Suggested-by: Michael Tokarev <mjt@tls.msk.ru> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-09ipv6: Do not mark ipv6_inetpeer_ops as __net_initdata.David S. Miller
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-09inet: Consolidate inetpeer_invalidate_tree() interfaces.David S. Miller
We only need one interface for this operation, since we always know which inetpeer root we want to flush. Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-09inet: Initialize per-netns inetpeer roots in net/ipv{4,6}/route.cDavid S. Miller
Instead of net/ipv4/inetpeer.c Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-09[PATCH] tcp: Cache inetpeer in timewait socket, and only when necessary.David S. Miller
Since it's guarenteed that we will access the inetpeer if we're trying to do timewait recycling and TCP options were enabled on the connection, just cache the peer in the timewait socket. In the future, inetpeer lookups will be context dependent (per routing realm), and this helps facilitate that as well. Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-09tcp: Get rid of inetpeer special cases.David S. Miller
The get_peer method TCP uses is full of special cases that make no sense accommodating, and it also gets in the way of doing more reasonable things here. First of all, if the socket doesn't have a usable cached route, there is no sense in trying to optimize timewait recycling. Likewise for the case where we have IP options, such as SRR enabled, that make the IP header destination address (and thus the destination address of the route key) differ from that of the connection's destination address. Just return a NULL peer in these cases, and thus we're also able to get rid of the clumsy inetpeer release logic. Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-08inet: Create and use rt{,6}_get_peer_create().David S. Miller
There's a lot of places that open-code rt{,6}_get_peer() only because they want to set 'create' to one. So add an rt{,6}_get_peer_create() for their sake. There were also a few spots open-coding plain rt{,6}_get_peer() and those are transformed here as well. Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-08net/core: fix kernel-doc warningsRandy Dunlap
Fix kernel-doc warnings in net/core: Warning(net/core/skbuff.c:3368): No description found for parameter 'delta_truesize' Warning(net/core/filter.c:628): No description found for parameter 'pfp' Warning(net/core/filter.c:628): Excess function parameter 'sk' description in 'sk_unattached_filter_create' Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-08l2tp: fix a race in l2tp_ip_sendmsg()Eric Dumazet
Commit 081b1b1bb27f (l2tp: fix l2tp_ip_sendmsg() route handling) added a race, in case IP route cache is disabled. In this case, we should not do the dst_release(&rt->dst), since it'll free the dst immediately, instead of waiting a RCU grace period. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: James Chapman <jchapman@katalix.com> Cc: Denys Fedoryshchenko <denys@visp.net.lb> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-08af_unix: speedup /proc/net/unixEric Dumazet
/proc/net/unix has quadratic behavior, and can hold unix_table_lock for a while if high number of unix sockets are alive. (90 ms for 200k sockets...) We already have a hash table, so its quite easy to use it. Problem is unbound sockets are still hashed in a single hash slot (unix_socket_table[UNIX_HASH_TABLE]) This patch also spreads unbound sockets to 256 hash slots, to speedup both /proc/net/unix and unix_diag. Time to read /proc/net/unix with 200k unix sockets : (time dd if=/proc/net/unix of=/dev/null bs=4k) before : 520 secs after : 2 secs Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-08inetpeer: add parameter net for inet_getpeer_v4,v6Gao feng
add struct net as a parameter of inet_getpeer_v[4,6], use net to replace &init_net. and modify some places to provide net for inet_getpeer_v[4,6] Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-08inetpeer: add namespace support for inetpeerGao feng
now inetpeer doesn't support namespace,the information will be leaking across namespace. this patch move the global vars v4_peers and v6_peers to netns_ipv4 and netns_ipv6 as a field peers. add struct pernet_operations inetpeer_ops to initial pernet inetpeer data. and change family_to_base and inet_getpeer to support namespace. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-08mac80211: add back channel change flagStanislaw Gruszka
commit 24398e39c8ee4a9d9123eed322b859ece4d16cac Author: Johannes Berg <johannes.berg@intel.com> Date: Wed Mar 28 10:58:36 2012 +0200 mac80211: set HT channel before association removed IEEE80211_CONF_CHANGE_CHANNEL argument from ieee80211_hw_config, which is required by iwl4965 driver, otherwise that driver does not configure channel properly and is not able to associate. Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-06-08NFC: Fix possible NULL ptr deref when getting the name of a socketSasha Levin
llcp_sock_getname() might get called before the LLCP socket was created. This condition isn't checked, and llcp_sock_getname will simply deref a NULL ptr in that case. This exists starting with d646960 ("NFC: Initial LLCP support"). Signed-off-by: Sasha Levin <levinsasha928@gmail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-06-07snmp: fix OutOctets counter to include forwarded datagramsVincent Bernat
RFC 4293 defines ipIfStatsOutOctets (similar definition for ipSystemStatsOutOctets): The total number of octets in IP datagrams delivered to the lower layers for transmission. Octets from datagrams counted in ipIfStatsOutTransmits MUST be counted here. And ipIfStatsOutTransmits: The total number of IP datagrams that this entity supplied to the lower layers for transmission. This includes datagrams generated locally and those forwarded by this entity. Therefore, IPSTATS_MIB_OUTOCTETS must be incremented when incrementing IPSTATS_MIB_OUTFORWDATAGRAMS. IP_UPD_PO_STATS is not used since ipIfStatsOutRequests must not include forwarded datagrams: The total number of IP datagrams that local IP user-protocols (including ICMP) supplied to IP in requests for transmission. Note that this counter does not include any datagrams counted in ipIfStatsOutForwDatagrams. Signed-off-by: Vincent Bernat <bernat@luffy.cx> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-07Added kernel support in EEE Ethtool commandsYuval Mintz
This patch extends the kernel's ethtool interface by adding support for 2 new EEE commands - get_eee and set_eee. Thanks goes to Giuseppe Cavallaro for his original patch adding this support. Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Reviewed-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>