summaryrefslogtreecommitdiffstats
path: root/include/net/udp.h
AgeCommit message (Collapse)Author
2011-03-01udp: Switch to ip_finish_skbHerbert Xu
This patch converts UDP to use the new ip_finish_skb API. This would then allows us to more easily use ip_make_skb which allows UDP to run without a socket lock. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-24net: change netdev->features to u32Michał Mirosław
Quoting Ben Hutchings: we presumably won't be defining features that can only be enabled on 64-bit architectures. Occurences found by `grep -r` on net/, drivers/net, include/ [ Move features and vlan_features next to each other in struct netdev, as per Eric Dumazet's suggestion -DaveM ] Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-10net: avoid limits overflowEric Dumazet
Robin Holt tried to boot a 16TB machine and found some limits were reached : sysctl_tcp_mem[2], sysctl_udp_mem[2] We can switch infrastructure to use long "instead" of "int", now atomic_long_t primitives are available for free. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Reported-by: Robin Holt <holt@sgi.com> Reviewed-by: Robin Holt <holt@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-21tproxy: added udp6_lib_lookup functionBalazs Scheidler
Just like with IPv4, we need access to the UDP hash table to look up local sockets, but instead of exporting the global udp_table, export a lookup function. Signed-off-by: Balazs Scheidler <bazsi@balabit.hu> Signed-off-by: KOVACS Krisztian <hidden@balabit.hu> Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-09-08udp: add rehash on connect()Eric Dumazet
commit 30fff923 introduced in linux-2.6.33 (udp: bind() optimisation) added a secondary hash on UDP, hashed on (local addr, local port). Problem is that following sequence : fd = socket(...) connect(fd, &remote, ...) not only selects remote end point (address and port), but also sets local address, while UDP stack stored in secondary hash table the socket while its local address was INADDR_ANY (or ipv6 equivalent) Sequence is : - autobind() : choose a random local port, insert socket in hash tables [while local address is INADDR_ANY] - connect() : set remote address and port, change local address to IP given by a route lookup. When an incoming UDP frame comes, if more than 10 sockets are found in primary hash table, we switch to secondary table, and fail to find socket because its local address changed. One solution to this problem is to rehash datagram socket if needed. We add a new rehash(struct socket *) method in "struct proto", and implement this method for UDP v4 & v6, using a common helper. This rehashing only takes care of secondary hash table, since primary hash (based on local port only) is not changed. Reported-by: Krzysztof Piotr Oledzki <ole@ans.pl> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Tested-by: Krzysztof Piotr Oledzki <ole@ans.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-07-12net: cleanupsChangli Gao
remove useless blanks. Signed-off-by: Changli Gao <xiaosuo@gmail.com> ---- include/net/inet_common.h | 55 ++++------- include/net/tcp.h | 222 +++++++++++++++++----------------------------- include/net/udp.h | 38 +++---- 3 files changed, 123 insertions(+), 192 deletions(-) Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-10udp: bind() optimisationEric Dumazet
UDP bind() can be O(N^2) in some pathological cases. Thanks to secondary hash tables, we can make it O(N) Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-08udp: secondary hash on (local port, local address)Eric Dumazet
Extends udp_table to contain a secondary hash table. socket anchor for this second hash is free, because UDP doesnt use skc_bind_node : We define an union to hold both skc_bind_node & a new hlist_nulls_node udp_portaddr_node udp_lib_get_port() inserts sockets into second hash chain (additional cost of one atomic op) udp_lib_unhash() deletes socket from second hash chain (additional cost of one atomic op) Note : No spinlock lockdep annotation is needed, because lock for the secondary hash chain is always get after lock for primary hash chain. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-08udp: add a counter into udp_hslotEric Dumazet
Adds a counter in udp_hslot to keep an accurate count of sockets present in chain. This will permit to upcoming UDP lookup algo to chose the shortest chain when secondary hash is added. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-07udp: dynamically size hash tables at boot timeEric Dumazet
UDP_HTABLE_SIZE was initialy defined to 128, which is a bit small for several setups. 4000 active UDP sockets -> 32 sockets per chain in average. An incoming frame has to lookup all sockets to find best match, so long chains hurt latency. Instead of a fixed size hash table that cant be perfect for every needs, let UDP stack choose its table size at boot time like tcp/ip route, using alloc_large_system_hash() helper Add an optional boot parameter, uhash_entries=x so that an admin can force a size between 256 and 65536 if needed, like thash_entries and rhash_entries. dmesg logs two new lines : [ 0.647039] UDP hash table entries: 512 (order: 0, 4096 bytes) [ 0.647099] UDP Lite hash table entries: 512 (order: 0, 4096 bytes) Maximal size on 64bit arches would be 65536 slots, ie 1 MBytes for non debugging spinlocks. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-30net: Make setsockopt() optlen be unsigned.David S. Miller
This provides safety against negative optlen at the type level instead of depending upon (sometimes non-trivial) checks against this sprinkled all over the the place, in each and every implementation. Based upon work done by Arjan van de Ven and feedback from Linus Torvalds. Signed-off-by: David S. Miller <davem@davemloft.net>
2009-07-12udpv4: Handle large incoming UDP/IPv4 packets and support software UFO.Sridhar Samudrala
- validate and forward GSO UDP/IPv4 packets from untrusted sources. - do software UFO if the outgoing device doesn't support UFO. Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-04-11ipv6: Fix NULL pointer dereference with time-wait socketsVlad Yasevich
Commit b2f5e7cd3dee2ed721bf0675e1a1ddebb849aee6 (ipv6: Fix conflict resolutions during ipv6 binding) introduced a regression where time-wait sockets were not treated correctly. This resulted in the following: BUG: unable to handle kernel NULL pointer dereference at 0000000000000062 IP: [<ffffffff805d7d61>] ipv4_rcv_saddr_equal+0x61/0x70 ... Call Trace: [<ffffffffa033847b>] ipv6_rcv_saddr_equal+0x1bb/0x250 [ipv6] [<ffffffffa03505a8>] inet6_csk_bind_conflict+0x88/0xd0 [ipv6] [<ffffffff805bb18e>] inet_csk_get_port+0x1ee/0x400 [<ffffffffa0319b7f>] inet6_bind+0x1cf/0x3a0 [ipv6] [<ffffffff8056d17c>] ? sockfd_lookup_light+0x3c/0xd0 [<ffffffff8056ed49>] sys_bind+0x89/0x100 [<ffffffff80613ea2>] ? trace_hardirqs_on_thunk+0x3a/0x3c [<ffffffff8020bf9b>] system_call_fastpath+0x16/0x1b Tested-by: Brian Haley <brian.haley@hp.com> Tested-by: Ed Tomlinson <edt@aei.ca> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-24ipv6: Fix conflict resolutions during ipv6 bindingVlad Yasevich
The ipv6 version of bind_conflict code calls ipv6_rcv_saddr_equal() which at times wrongly identified intersections between addresses. It particularly broke down under a few instances and caused erroneous bind conflicts. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-16udp: Use hlist_nulls in UDP RCU codeEric Dumazet
This is a straightforward patch, using hlist_nulls infrastructure. RCUification already done on UDP two weeks ago. Using hlist_nulls permits us to avoid some memory barriers, both at lookup time and delete time. Patch is large because it adds new macros to include/net/sock.h. These macros will be used by TCP & DCCP in next patch. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-29udp: introduce struct udp_table and multiple spinlocksEric Dumazet
UDP sockets are hashed in a 128 slots hash table. This hash table is protected by *one* rwlock. This rwlock is readlocked each time an incoming UDP message is handled. This rwlock is writelocked each time a socket must be inserted in hash table (bind time), or deleted from this table (close time) This is not scalable on SMP machines : 1) Even in read mode, lock() and unlock() are atomic operations and must dirty a contended cache line, shared by all cpus. 2) A writer might be starved if many readers are 'in flight'. This can happen on a machine with some NIC receiving many UDP messages. User process can be delayed a long time at socket creation/dismantle time. This patch prepares RCU migration, by introducing 'struct udp_table and struct udp_hslot', and using one spinlock per chain, to reduce contention on central rwlock. Introducing one spinlock per chain reduces latencies, for port randomization on heavily loaded UDP servers. This also speedup bindings to specific ports. udp_lib_unhash() was uninlined, becoming to big. Some cleanups were done to ease review of following patch (RCUification of UDP Unicast lookups) Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-07netns: make uplitev6 mib per/namespaceDenis V. Lunev
Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-07netns: make udpv6 mib per/namespaceDenis V. Lunev
Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-01udp: Export UDP socket lookup functionKOVACS Krisztian
The iptables tproxy code has to be able to do UDP socket hash lookups, so we have to provide an exported lookup function for this purpose. Signed-off-by: KOVACS Krisztian <hidden@sch.bme.hu> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-18mib: put udplite statistics on struct netPavel Emelyanov
Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-18mib: put udp statistics on struct netPavel Emelyanov
Similar to... ouch, I repeat myself. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-05MIB: add struct net to UDP6_INC_STATS_BHPavel Emelyanov
Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-05MIB: add struct net to UDP6_INC_STATS_USERPavel Emelyanov
As simple as the patch #1 in this set. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-05MIB: add struct net to UDP_INC_STATS_BHPavel Emelyanov
Two special cases here - one is rxrpc - I put init_net there explicitly, since we haven't touched this part yet. The second place is in __udp4_lib_rcv - we already have a struct net there, but I have to move its initialization above to make it ready at the "drop" label. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-05MIB: add struct net to UDP_INC_STATS_USERPavel Emelyanov
Nothing special - all the places already have a struct sock at hands, so use the sock_net() net. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-13udp: reorder udp_iter_state to remove padding on 64bit buildsRichard Kennedy
reorder udp_iter_state to remove padding on 64bit builds shrinks from 24 to 16 bytes, moving to a smaller slab when CONFIG_NET_NS is undefined & seq_net_private = {} Signed-off-by: Richard Kennedy <richard@rsk.demon.co.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-05[IPV6]: inet_sk(sk)->cork.opt leakDenis V. Lunev
IPv6 UDP sockets wth IPv4 mapped address use udp_sendmsg to send the data actually. In this case ip_flush_pending_frames should be called instead of ip6_flush_pending_frames. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-03-31[SOCK][NETNS]: Add a struct net argument to sock_prot_inuse_add and _get.Pavel Emelyanov
This counter is about to become per-proto-and-per-net, so we'll need two arguments to determine which cell in this "table" to work with. All the places, but proc already pass proper net to it - proc will be tuned a bit later. Some indentation with spaces in proc files is done to keep the file coding style consistent. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-28[UDP]: Remove owner from udp_seq_afinfo.Denis V. Lunev
Move it to udp_seq_afinfo->seq_fops as should be. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-28[UDP]: Place file operations directly into udp_seq_afinfo.Denis V. Lunev
No need to have separate never-used variable. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-28[UDP]: Move seq_ops from udp_iter_state to udp_seq_afinfo.Denis V. Lunev
No need to create seq_operations for each instance of 'netstat'. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-28[UDP]: Replace struct net on udp_iter_state with seq_net_private.Denis V. Lunev
Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-22[UDP]: Make full use of proto.h.udp_hash innovation.Pavel Emelyanov
After this we have only udp_lib_get_port to get the port and two stubs for ipv4 and ipv6. No difference in udp and udplite except for initialized h.udp_hash member. I tried to find a graceful way to drop the only difference between udp_v4_get_port and udp_v6_get_port (i.e. the rcv_saddr comparison routine), but adding one more callback on the struct proto didn't appear such :( Maybe later. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-21[NETNS][IPV6] udp6 - make proc per namespaceDaniel Lezcano
The proc init/exit functions take a new network namespace parameter in order to register/unregister /proc/net/udp6 for a namespace. Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-21[NETNS][IPV6] udp - make proc handle the network namespaceDaniel Lezcano
This patch makes the common udp proc functions to take care of which socket they should show taking into account the namespace it belongs. Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[NET]: prot_inuse cleanups and optimizationsEric Dumazet
1) Cleanups (all functions are prefixed by sock_prot_inuse) sock_prot_inc_use(prot) -> sock_prot_inuse_add(prot,-1) sock_prot_dec_use(prot) -> sock_prot_inuse_add(prot,-1) sock_prot_inuse() -> sock_prot_inuse_get() New functions : sock_prot_inuse_init() and sock_prot_inuse_free() to abstract pcounter use. 2) if CONFIG_PROC_FS=n, we can zap 'inuse' member from "struct proto", since nobody wants to read the inuse value. This saves 1372 bytes on i386/SMP and some cpu cycles. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[UDP]: Add memory accounting.Hideo Aoki
Signed-off-by: Takahiro Yasui <tyasui@redhat.com> Signed-off-by: Hideo Aoki <haoki@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[UDP]: Restore missing inDatagrams incrementsHerbert Xu
The previous move of the the UDP inDatagrams counter caused the counting of encapsulated packets, SUNRPC data (as opposed to call) packets and RXRPC packets to go missing. This patch restores all of these. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-06-07[UDP]: Revert 2-pass hashing changes.David S. Miller
This reverts changesets: 6aaf47fa48d3c44280810b1b470261d340e4ed87 b7b5f487ab39bc10ed0694af35651a03d9cb97ff de34ed91c4ffa4727964a832c46e624dd1495cf5 fc038410b4b1643766f8033f4940bcdb1dace633 There are still some correctness issues recently discovered which do not have a known fix that doesn't involve doing a full hash table scan on port bind. So revert for now. Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-10[UDP]: Fix AF-specific references in AF-agnostic code.David S. Miller
__udp_lib_port_inuse() cannot make direct references to inet_sk(sk)->rcv_saddr as that is ipv4 specific state and this code is used by ipv6 too. Use an operations vector to solve this, and this also paves the way for ipv6 support for non-wild saddr hashing in UDP. Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25[NET]: Treat CHECKSUM_PARTIAL as CHECKSUM_UNNECESSARYHerbert Xu
When a transmitted packet is looped back directly, CHECKSUM_PARTIAL maps to the semantics of CHECKSUM_UNNECESSARY. Therefore we should treat it as such in the stack. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25[SK_BUFF]: Introduce skb_transport_header(skb)Arnaldo Carvalho de Melo
For the places where we need a pointer to the transport header, it is still legal to touch skb->h.raw directly if just adding to, subtracting from or setting it to another layer header. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25[UDP]: Clean up UDP-Lite receive checksumHerbert Xu
This patch eliminates some duplicate code for the verification of receive checksums between UDP-Lite and UDP. It does this by introducing __skb_checksum_complete_head which is identical to __skb_checksum_complete_head apart from the fact that it takes a length parameter rather than computing the first skb->len bytes. As a result UDP-Lite will be able to use hardware checksum offload for packets which do not use partial coverage checksums. It also means that UDP-Lite loopback no longer does unnecessary checksum verification. If any NICs start support UDP-Lite this would also start working automatically. This patch removes the assumption that msg_flags has MSG_TRUNC clear upon entry in recvmsg. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-04[PATCH] severing skbuff.h -> poll.hAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2006-12-02[UDP(-Lite)]: consolidate v4 and v6 get|setsockopt codeGerrit Renker
This patch consolidates set/getsockopt code between UDP(-Lite) v4 and 6. The justification is that UDP(-Lite) is a transport-layer protocol and therefore the socket option code (at least in theory) should be AF-independent. Furthermore, there is the following code reduplication: * do_udp{,v6}_getsockopt is 100% identical between v4 and v6 * do_udp{,v6}_setsockopt is identical up to the following differerence --v4 in contrast to v4 additionally allows the experimental encapsulation types UDP_ENCAP_ESPINUDP and UDP_ENCAP_ESPINUDP_NON_IKE --the remainder is identical between v4 and v6 I believe that this difference is of little relevance. The advantages in not duplicating twice almost completely identical code. The patch further simplifies the interface of udp{,v6}_push_pending_frames, since for the second argument (struct udp_sock *up) it always holds that up = udp_sk(sk); where sk is the first function argument. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02[NET]: Fix assorted misannotations (from md5 and udplite merges).Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02[NET]: Annotate __skb_checksum_complete() and friends.Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02[NET]: annotate csum_ipv6_magic() callers in net/*Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02[NET]: Supporting UDP-Lite (RFC 3828) in LinuxGerrit Renker
This is a revision of the previously submitted patch, which alters the way files are organized and compiled in the following manner: * UDP and UDP-Lite now use separate object files * source file dependencies resolved via header files net/ipv{4,6}/udp_impl.h * order of inclusion files in udp.c/udplite.c adapted accordingly [NET/IPv4]: Support for the UDP-Lite protocol (RFC 3828) This patch adds support for UDP-Lite to the IPv4 stack, provided as an extension to the existing UDPv4 code: * generic routines are all located in net/ipv4/udp.c * UDP-Lite specific routines are in net/ipv4/udplite.c * MIB/statistics support in /proc/net/snmp and /proc/net/udplite * shared API with extensions for partial checksum coverage [NET/IPv6]: Extension for UDP-Lite over IPv6 It extends the existing UDPv6 code base with support for UDP-Lite in the same manner as per UDPv4. In particular, * UDPv6 generic and shared code is in net/ipv6/udp.c * UDP-Litev6 specific extensions are in net/ipv6/udplite.c * MIB/statistics support in /proc/net/snmp6 and /proc/net/udplite6 * support for IPV6_ADDRFORM * aligned the coding style of protocol initialisation with af_inet6.c * made the error handling in udpv6_queue_rcv_skb consistent; to return `-1' on error on all error cases * consolidation of shared code [NET]: UDP-Lite Documentation and basic XFRM/Netfilter support The UDP-Lite patch further provides * API documentation for UDP-Lite * basic xfrm support * basic netfilter support for IPv4 and IPv6 (LOG target) Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22[UDP]: saddr_cmp function should take const socket pointersDavid S. Miller
This also kills a warning while building ipv6: net/ipv6/udp.c: In function ‘udp_v6_get_port’: net/ipv6/udp.c:66: warning: passing argument 3 of ‘udp_get_port’ from incompatible pointer type Signed-off-by: David S. Miller <davem@davemloft.net>