summaryrefslogtreecommitdiffstats
path: root/net/dccp
AgeCommit message (Collapse)Author
2011-08-07Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/netDavid S. Miller
2011-08-06net: Compute protocol sequence numbers and fragment IDs using MD5.David S. Miller
Computers have become a lot faster since we compromised on the partial MD4 hash which we use currently for performance reasons. MD5 is a much safer choice, and is inline with both RFC1948 and other ISS generators (OpenBSD, Solaris, etc.) Furthermore, only having 24-bits of the sequence number be truly unpredictable is a very serious limitation. So the periodic regeneration and 8-bit counter have been removed. We compute and use a full 32-bit sequence number. For ipv6, DCCP was found to use a 32-bit truncated initial sequence number (it needs 43-bits) and that is fixed here as well. Reported-by: Dan Kaminsky <dan@doxpara.com> Tested-by: Willy Tarreau <w@1wt.eu> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-01dccp ccid-2: check Ack Ratio when reducing cwndSamuel Jero
This patch causes CCID-2 to check the Ack Ratio after reducing the congestion window. If the Ack Ratio is greater than the congestion window, it is reduced. This prevents timeouts caused by an Ack Ratio larger than the congestion window. In this situation, we choose to set the Ack Ratio to half the congestion window (or one if that's zero) so that if we loose one ack we don't trigger a timeout. Signed-off-by: Samuel Jero <sj323707@ohio.edu> Acked-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
2011-08-01dccp ccid-2: increment cwnd correctlySamuel Jero
This patch fixes an issue where CCID-2 will not increase the congestion window for numerous RTTs after an idle period, application-limited period, or a loss once the algorithm is in Congestion Avoidance. What happens is that, when CCID-2 is in Congestion Avoidance mode, it will increase hc->tx_packets_acked by one for every packet and will increment cwnd every cwnd packets. However, if there is now an idle period in the connection, cwnd will be reduced, possibly below the slow start threshold. This will cause the connection to go into Slow Start. However, in Slow Start CCID-2 performs this test to increment cwnd every second ack: ++hc->tx_packets_acked == 2 Unfortunately, this will be incorrect, if cwnd previous to the idle period was larger than 2 and if tx_packets_acked was close to cwnd. For example: cwnd=50 and tx_packets_acked=45. In this case, the current code, will increment tx_packets_acked until it equals two, which will only be once tx_packets_acked (an unsigned 32-bit integer) overflows. My fix is simply to change that test for tx_packets_acked greater than or equal to two in slow start. Signed-off-by: Samuel Jero <sj323707@ohio.edu> Acked-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
2011-08-01dccp ccid-2: prevent cwnd > Sequence WindowSamuel Jero
Add a check to prevent CCID-2 from increasing the cwnd greater than the Sequence Window. When the congestion window becomes bigger than the Sequence Window, CCID-2 will attempt to keep more data in the network than the DCCP Sequence Window code considers possible. This results in the Sequence Window code issuing a Sync, thereby inducing needless overhead. Further, if this occurs at the sender, CCID-2 will never detect the problem because the Acks it receives will indicate no losses. I have seen this cause a drop of 1/3rd in throughput for a connection. Also add code to adjust the Sequence Window to be about 5 times the number of packets in the network (RFC 4340, 7.5.2) and to adjust the Ack Ratio so that the remote Sequence Window will hold about 5 times the number of packets in the network. This allows the congestion window to increase correctly without being limited by the Sequence Window. Signed-off-by: Samuel Jero <sj323707@ohio.edu> Acked-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
2011-08-01dccp ccid-2: use feature-negotiation to report Ack Ratio changesGerrit Renker
This uses the new feature-negotiation framework to signal Ack Ratio changes, as required by RFC 4341, sec. 6.1.2. That raises some problems with CCID-2, which at the moment can not cope gracefully with Ack Ratios > 1. Since these issues are not directly related to feature negotiation, they are marked by a FIXME. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Samuel Jero <sj323707@ohio.edu> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.uk>
2011-08-01dccp: send Confirm options only onceSamuel Jero
If a connection is in the OPEN state, remove feature negotiation Confirm options from the list of options after sending them once; as such options are NOT supposed to be retransmitted and are ONLY supposed to be sent in response to a Change option (RFC 4340 6.2). Signed-off-by: Samuel Jero <sj323707@ohio.edu> Acked-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
2011-08-01dccp: support for exchanging of NN options in established state 2/2Gerrit Renker
This patch adds the receiver side and the (fast-path) activation part for dynamic changes of non-negotiable (NN) parameters in (PART)OPEN state. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Samuel Jero <sj323707@ohio.edu> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.uk>
2011-08-01dccp: support for the exchange of NN options in established state 1/2Gerrit Renker
In contrast to static feature negotiation at the begin of a connection, this patch introduces support for exchange of dynamically changing options. Such an update/exchange is necessary in at least two cases: * CCID-2's Ack Ratio (RFC 4341, 6.1.2) which changes during the connection; * Sequence Window values that, as per RFC 4340, 7.5.2, should be sent "as the connection progresses". Both are non-negotiable (NN) features, which means that no new capabilities are negotiated, but rather that changes in known parameters are brought up-to-date at either end. Thse characteristics are reflected by the implementation: * only NN options can be exchanged after connection setup; * an ack is scheduled directly after activation to speed up the update; * CCIDs may request changes to an NN feature even if a negotiation for that feature is already underway: this is required by CCID-2, where changes in cwnd necessitate Ack Ratio changes, such that the previous Ack Ratio (which is still being negotiated) would cause irrecoverable RTO timeouts (thanks to work by Samuel Jero). Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Samuel Jero <sj323707@ohio.edu> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.uk>
2011-07-04dccp ccid-2: Perform congestion-window validationGerrit Renker
CCID-2's cwnd increases like TCP during slow-start, which has implications for * the local Sequence Window value (should be > cwnd), * the Ack Ratio value. Hence an exponential growth, if it does not reflect the actual network conditions, can quickly lead to instability. This patch adds congestion-window validation (RFC2861) to CCID-2: * cwnd is constrained if the sender is application limited; * cwnd is reduced after a long idle period, as suggested in the '90 paper by Van Jacobson, in RFC 2581 (sec. 4.1); * cwnd is never reduced below the RFC 3390 initial window. As marked in the comments, the code is actually almost a direct copy of the TCP congestion-window-validation algorithms. By continuing this work, it may in future be possible to use the TCP code (not possible at the moment). The mechanism can be turned off using a module parameter. Sampling of the currently-used window (moving-maximum) is however done constantly; this is used to determine the expected window, which can be exploited to regulate DCCP's Sequence Window value. This patch also sets slow-start-after-idle (RFC 4341, 5.1), i.e. it behaves like TCP when net.ipv4.tcp_slow_start_after_idle = 1. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
2011-07-04dccp ccid-2: Use existing function to test for data packetsGerrit Renker
This replaces a switch statement with a test, using the equivalent function dccp_data_packet(skb). It also doubles the range of the field `rx_num_data_pkts' by changing the type from `int' to `u32', avoiding signed/unsigned comparison with the u16 field `dccps_r_ack_ratio'. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
2011-07-04dccp ccid-2: move rfc 3390 function into header fileGerrit Renker
This moves CCID-2's initial window function into the header file, since several parts throughout the CCID-2 code need to call it (CCID-2 still uses RFC 3390). Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Leandro Melo de Sales <leandro@ic.ufal.br>
2011-07-04dccp: cosmetics of info messageGerrit Renker
Change the CCID (de)activation message to start with the protocol name, as 'CCID' is already in there. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
2011-07-04dccp: combine the functionality of enqeueing and cloningGerrit Renker
Realising the following call pattern, * first dccp_entail() is called to enqueue a new skb and * then skb_clone() is called to transmit a clone of that skb, this patch integrates both into the same function. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
2011-07-04dccp: Clean up slow-path input processingGerrit Renker
This patch rearranges the order of statements of the slow-path input processing (i.e. any other state than OPEN), to resolve the following issues. 1. Dependencies: the order of statements now better matches RFC 4340, 8.5, i.e. step 7 is before step 9 (previously 9 was before 7), and parsing options in step 8 (which may consume resources) now comes after step 7. 2. Sequence number checks are omitted if in state LISTEN/REQUEST, due to the note underneath the table in RFC 4340, 7.5.3. As a result, CCID processing is now indeed confined to OPEN/PARTOPEN states, i.e. congestion control is performed only on the flow of data packets. This avoids pathological cases of doing congestion control on those messages which set up and terminate the connection. 3. Packets are now passed on to Ack Vector / CCID processing only after - step 7 (receive unexpected packets), - step 9 (receive Reset), - step 13 (receive CloseReq), - step 14 (receive Close) and only if the state is PARTOPEN. This simplifies CCID processing: - in LISTEN/CLOSED the CCIDs are non-existent; - in RESPOND/REQUEST the CCIDs have not yet been negotiated; - in CLOSEREQ and active-CLOSING the node has already closed this socket; - in passive-CLOSING the client is waiting for its Reset. In the last case, RFC 4340, 8.3 leaves it open to ignore further incoming data, which is the approach taken here. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
2011-05-18ipv4: Make caller provide flowi4 key to inet_csk_route_req().David S. Miller
This way the caller can get at the fully resolved fl4->{daddr,saddr} etc. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-11Merge branch 'master' of ↵David S. Miller
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-3.6 Conflicts: drivers/net/benet/be_main.c
2011-05-08inet: Pass flowi to ->queue_xmit().David S. Miller
This allows us to acquire the exact route keying information from the protocol, however that might be managed. It handles all of the possibilities, from the simplest case of storing the key in inet->cork.fl to the more complex setup SCTP has where individual transports determine the flow. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-08ipv4: Use inet_csk_route_child_sock() in DCCP and TCP.David S. Miller
Operation order is now transposed, we first create the child socket then we try to hook up the route. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-08dccp: Use cork flow in dccp_v4_connect()David S. Miller
Since this is invoked from inet_stream_connect() the socket is locked and therefore this usage is safe. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-06dccp: handle invalid feature options lengthDan Rosenberg
A length of zero (after subtracting two for the type and len fields) for the DCCPO_{CHANGE,CONFIRM}_{L,R} options will cause an underflow due to the subtraction. The subsequent code may read past the end of the options value buffer when parsing. I'm unsure of what the consequences of this might be, but it's probably not good. Signed-off-by: Dan Rosenberg <drosenberg@vsecurity.com> Cc: stable@kernel.org Acked-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-03dccp: Use flowi4->saddr in dccp_v4_connect()David S. Miller
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-04-28ipv4: Get route daddr from flow key in dccp_v4_connect().David S. Miller
Now that output route lookups update the flow with destination address selection, we can fetch it from fl4->daddr instead of rt->rt_dst Signed-off-by: David S. Miller <davem@davemloft.net>
2011-04-28inet: add RCU protection to inet->optEric Dumazet
We lack proper synchronization to manipulate inet->opt ip_options Problem is ip_make_skb() calls ip_setup_cork() and ip_setup_cork() possibly makes a copy of ipc->opt (struct ip_options), without any protection against another thread manipulating inet->opt. Another thread can change inet->opt pointer and free old one under us. Use RCU to protect inet->opt (changed to inet->inet_opt). Instead of handling atomic refcounts, just copy ip_options when necessary, to avoid cache line dirtying. We cant insert an rcu_head in struct ip_options since its included in skb->cb[], so this patch is large because I had to introduce a new ip_options_rcu structure. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-04-27ipv4: Sanitize and simplify ip_route_{connect,newports}()David S. Miller
These functions are used together as a unit for route resolution during connect(). They address the chicken-and-egg problem that exists when ports need to be allocated during connect() processing, yet such port allocations require addressing information from the routing code. It's currently more heavy handed than it needs to be, and in particular we allocate and initialize a flow object twice. Let the callers provide the on-stack flow object. That way we only need to initialize it once in the ip_route_connect() call. Later, if ip_route_newports() needs to do anything, it re-uses that flow object as-is except for the ports which it updates before the route re-lookup. Also, describe why this set of facilities are needed and how it works in a big comment. Signed-off-by: David S. Miller <davem@davemloft.net> Reviewed-by: Eric Dumazet <eric.dumazet@gmail.com>
2011-04-22inet: constify ip headers and in6_addrEric Dumazet
Add const qualifiers to structs iphdr, ipv6hdr and in6_addr pointers where possible, to make code intention more obvious. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-31Fix common misspellingsLucas De Marchi
Fixes generated by 'codespell' and manually reviewed. Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
2011-03-12net: Put fl6_* macros to struct flowi6 and use them again.David S. Miller
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-12ipv6: Convert to use flowi6 where applicable.David S. Miller
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-12net: Put fl4_* macros to struct flowi4 and use them again.David S. Miller
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-12ipv4: Use flowi4 in public route lookup interfaces.David S. Miller
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-12net: Make flowi ports AF dependent.David S. Miller
Create two sets of port member accessors, one set prefixed by fl4_* and the other prefixed by fl6_* This will let us to create AF optimal flow instances. It will work because every context in which we access the ports, we have to be fully aware of which AF the flowi is anyways. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-12net: Put flowi_* prefix on AF independent members of struct flowiDavid S. Miller
I intend to turn struct flowi into a union of AF specific flowi structs. There will be a common structure that each variant includes first, much like struct sock_common. This is the first step to move in that direction. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-03Merge branch 'master' of ↵David S. Miller
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/bnx2x/bnx2x.h
2011-03-02ipv4: Make output route lookup return rtable directly.David S. Miller
Instead of on the stack. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-01dccp: fix oops on Reset after closeGerrit Renker
This fixes a bug in the order of dccp_rcv_state_process() that still permitted reception even after closing the socket. A Reset after close thus causes a NULL pointer dereference by not preventing operations on an already torn-down socket. dccp_v4_do_rcv() | | state other than OPEN v dccp_rcv_state_process() | | DCCP_PKT_RESET v dccp_rcv_reset() | v dccp_time_wait() WARNING: at net/ipv4/inet_timewait_sock.c:141 __inet_twsk_hashdance+0x48/0x128() Modules linked in: arc4 ecb carl9170 rt2870sta(C) mac80211 r8712u(C) crc_ccitt ah [<c0038850>] (unwind_backtrace+0x0/0xec) from [<c0055364>] (warn_slowpath_common) [<c0055364>] (warn_slowpath_common+0x4c/0x64) from [<c0055398>] (warn_slowpath_n) [<c0055398>] (warn_slowpath_null+0x1c/0x24) from [<c02b72d0>] (__inet_twsk_hashd) [<c02b72d0>] (__inet_twsk_hashdance+0x48/0x128) from [<c031caa0>] (dccp_time_wai) [<c031caa0>] (dccp_time_wait+0x40/0xc8) from [<c031c15c>] (dccp_rcv_state_proces) [<c031c15c>] (dccp_rcv_state_process+0x120/0x538) from [<c032609c>] (dccp_v4_do_) [<c032609c>] (dccp_v4_do_rcv+0x11c/0x14c) from [<c0286594>] (release_sock+0xac/0) [<c0286594>] (release_sock+0xac/0x110) from [<c031fd34>] (dccp_close+0x28c/0x380) [<c031fd34>] (dccp_close+0x28c/0x380) from [<c02d9a78>] (inet_release+0x64/0x70) The fix is by testing the socket state first. Receiving a packet in Closed state now also produces the required "No connection" Reset reply of RFC 4340, 8.3.1. Reported-and-tested-by: Johan Hovold <jhovold@gmail.com> Cc: stable@kernel.org Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-01ipv4: Kill can_sleep arg to ip_route_output_flow()David S. Miller
This boolean state is now available in the flow flags. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-01ipv4: Make final arg to ip_route_output_flow to be boolean "can_sleep"David S. Miller
Since that is what the current vague "flags" argument means. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-01ipv4: Can final ip_route_connect() arg to boolean "can_sleep".David S. Miller
Since that's what the current vague "flags" thing means. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-01ipv6: Consolidate route lookup sequences.David S. Miller
Route lookups follow a general pattern in the ipv6 code wherein we first find the non-IPSEC route, potentially override the flow destination address due to ipv6 options settings, and then finally make an IPSEC search using either xfrm_lookup() or __xfrm_lookup(). __xfrm_lookup() is used when we want to generate a blackhole route if the key manager needs to resolve the IPSEC rules (in this case -EREMOTE is returned and the original 'dst' is left unchanged). Otherwise plain xfrm_lookup() is used and when asynchronous IPSEC resolution is necessary, we simply fail the lookup completely. All of these cases are encapsulated into two routines, ip6_dst_lookup_flow and ip6_sk_dst_lookup_flow. The latter of which handles unconnected UDP datagram sockets. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-25dccp: newdp is declared/assigned but never be usedHagen Paul Pfeifer
Declaration and assignment of newdp is removed. Usage of dccp_sk() exhibit no side effects. Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-24ipv4: Rearrange how ip_route_newports() gets port keys.David S. Miller
ip_route_newports() is the only place in the entire kernel that cares about the port members in the routing cache entry's lookup flow key. Therefore the only reason we store an entire flow inside of the struct rtentry is for this one special case. Rewrite ip_route_newports() such that: 1) The caller passes in the original port values, so we don't need to use the rth->fl.fl_ip_{s,d}port values to remember them. 2) The lookup flow is constructed by hand instead of being copied from the routing cache entry's flow. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-02tcp: Increase the initial congestion window to 10.David S. Miller
Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Nandita Dukkipati <nanditad@google.com>
2011-01-13Merge branch 'for-next' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (43 commits) Documentation/trace/events.txt: Remove obsolete sched_signal_send. writeback: fix global_dirty_limits comment runtime -> real-time ppc: fix comment typo singal -> signal drivers: fix comment typo diable -> disable. m68k: fix comment typo diable -> disable. wireless: comment typo fix diable -> disable. media: comment typo fix diable -> disable. remove doc for obsolete dynamic-printk kernel-parameter remove extraneous 'is' from Documentation/iostats.txt Fix spelling milisec -> ms in snd_ps3 module parameter description Fix spelling mistakes in comments Revert conflicting V4L changes i7core_edac: fix typos in comments mm/rmap.c: fix comment sound, ca0106: Fix assignment to 'channel'. hrtimer: fix a typo in comment init/Kconfig: fix typo anon_inodes: fix wrong function name in comment fix comment typos concerning "consistent" poll: fix a typo in comment ... Fix up trivial conflicts in: - drivers/net/wireless/iwlwifi/iwl-core.c (moved to iwl-legacy.c) - fs/ext4/ext4.h Also fix missed 'diabled' typo in drivers/net/bnx2x/bnx2x.h while at it.
2011-01-07dccp: make upper bound for seq_window consistent on 32/64 bitGerrit Renker
The 'seq_window' sysctl sets the initial value for the DCCP Sequence Window, which may range from 32..2^46-1 (RFC 4340, 7.5.2). The patch sets the upper bound consistently to 2^32-1 on both 32 and 64 bit systems, which should be sufficient - with a RTT of 1sec and 1-byte packets, a seq_window of 2^32-1 corresponds to a link speed of 34 Gbps. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
2011-01-07dccp: fix bug in updating the GSRSamuel Jero
Currently dccp_check_seqno allows any valid packet to update the Greatest Sequence Number Received, even if that packet's sequence number is less than the current GSR. This patch adds a check to make sure that the new packet's sequence number is greater than GSR. Signed-off-by: Samuel Jero <sj323707@ohio.edu> Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
2011-01-07dccp: fix return value for sequence-invalid packetsSamuel Jero
Currently dccp_check_seqno returns 0 (indicating a valid packet) if the acknowledgment number is out of bounds and the sync that RFC 4340 mandates at this point is currently being rate-limited. This function should return -1, indicating an invalid packet. Signed-off-by: Samuel Jero <sj323707@ohio.edu> Acked-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
2010-12-22Merge branch 'master' into for-nextJiri Kosina
Conflicts: MAINTAINERS arch/arm/mach-omap2/pm24xx.c drivers/scsi/bfa/bfa_fcpim.c Needed to update to apply fixes for which the old branch was too outdated.
2010-12-10dccp: remove unused macrosShan Wei
Remove macros which have been unused since the initial implementation (commit 7c657876b63cb1d8a2ec06f8fc6c37bb8412e66c, [DCCP]: Initial implementation from Tue Aug 9 20:14:34 2005 -0700). Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com> Acked-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
2010-12-08Merge branch 'master' of ↵David S. Miller
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/wireless/ath/ath9k/ar9003_eeprom.c net/llc/af_llc.c