summaryrefslogtreecommitdiffstats
path: root/net
AgeCommit message (Collapse)Author
2008-06-23sunrpc: remove sv_kill_signal field from svc_serv structJeff Layton
Since we no longer make any distinction between shutdown signals with nfsd, then it becomes easier to just standardize on a particular signal to use to bring it down (SIGINT, in this case). Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-06-23knfsd: convert knfsd to kthread APIJeff Layton
This patch is rather large, but I couldn't figure out a way to break it up that would remain bisectable. It does several things: - change svc_thread_fn typedef to better match what kthread_create expects - change svc_pool_map_set_cpumask to be more kthread friendly. Make it take a task arg and and get rid of the "oldmask" - have svc_set_num_threads call kthread_create directly - eliminate __svc_create_thread Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-06-23knfsd: Replace lock_kernel with a mutex for nfsd thread startup/shutdown ↵Neil Brown
locking. This removes the BKL from the RPC service creation codepath. The BKL really isn't adequate for this job since some of this info needs protection across sleeps. Also, add some comments to try and clarify how the locking should work and to make it clear that the BKL isn't necessary as long as there is adequate locking between tasks when touching the svc_serv fields. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-05-14Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs * 'for-linus' of ssh://master.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs: 9p: fix error path during early mount 9p: make cryptic unknown error from server less scary 9p: fix flags length in net 9p: Correct fidpool creation failure in p9_client_create 9p: use struct mutex instead of struct semaphore 9p: propagate parse_option changes to client and transports fs/9p/v9fs.c (v9fs_parse_options): Handle kstrdup and match_strdup failure. 9p: Documentation updates add match_strlcpy() us it to make v9fs make uname and remotename parsing more robust
2008-05-14net/irda/irnet/irnet_irda.c needs unaligned.hAndrew Morton
net/irda/irnet/irnet_irda.c: In function 'irnet_discovery_indication': net/irda/irnet/irnet_irda.c:1676: error: implicit declaration of function 'get_unaligned' Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-149p: fix error path during early mountEric Van Hensbergen
There was some cleanup issues during early mount which would trigger a kernel bug for certain types of failure. This patch reorganizes the cleanup to get rid of the bad behavior. This also merges the 9pnet and 9pnet_fd modules for the purpose of configuration and initialization. Keeping the fd transport separate from the core 9pnet code seemed like a good idea at the time, but in practice has caused more harm and confusion than good. Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2008-05-149p: make cryptic unknown error from server less scaryEric Van Hensbergen
Right now when we get an error string from the server that we can't map we report a cryptic error that actually makes it look like we are reporting a problem with the client. This changes the text of the log message to clarify where the error is coming from. Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2008-05-149p: fix flags length in netSteven Rostedt
Some files in the net/9p directory uses "int" for flags. This can cause hard to find bugs on some architectures. This patch converts the flags to use "long" instead. This bug was discovered by doing an allyesconfig make on the -rt kernel where checks are done to ensure all flags are of size sizeof(long). Signed-off-by: Steven Rostedt <srostedt@redhat.com> Acked-by: Eric Van Hensbergen <ericvh@gmail.com>
2008-05-149p: Correct fidpool creation failure in p9_client_createJosef 'Jeff' Sipek
On error, p9_idpool_create returns an ERR_PTR-encoded errno. Signed-off-by: Josef 'Jeff' Sipek <jeffpc@josefsipek.net> Acked-by: Eric Van Hensbergen <ericvh@gmail.com>
2008-05-149p: use struct mutex instead of struct semaphoreJosef 'Jeff' Sipek
Replace semaphores protecting use flags with a mutex. Signed-off-by: Josef 'Jeff' Sipek <jeffpc@josefsipek.net> Acked-by: Eric Van Hensbergen <ericvh@gmail.com>
2008-05-149p: propagate parse_option changes to client and transportsEric Van Hensbergen
Propagate changes that were made to the parse_options code to the other parse options pieces present in the other modules. Looks like the client parse options was probably corrupting the parse string and causing problems for others. Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2008-05-149p: Documentation updatesEric Van Hensbergen
The kernel-doc comments of much of the 9p system have been in disarray since reorganization. This patch fixes those problems, adds additional documentation and a template book which collects the 9p information. Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2008-05-14Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (73 commits) net: Fix typo in net/core/sock.c. ppp: Do not free not yet unregistered net device. netfilter: xt_iprange: module aliases for xt_iprange netfilter: ctnetlink: dump conntrack ID in event messages irda: Fix a misalign access issue. (v2) sctp: Fix use of uninitialized pointer cipso: Relax too much careful cipso hash function. tcp FRTO: work-around inorder receivers tcp FRTO: Fix fallback to conventional recovery New maintainer for Intel ethernet adapters DM9000: Use delayed work to update MII PHY state DM9000: Update and fix driver debugging messages DM9000: Add __devinit and __devexit attributes to probe and remove sky2: fix simple define thinko [netdrvr] sfc: sfc: Add self-test support [netdrvr] sfc: Increment rx_reset when reported as driver event [netdrvr] sfc: Remove unused macro EFX_XAUI_RETRAIN_MAX [netdrvr] sfc: Fix code formatting [netdrvr] sfc: Remove kernel-doc comments for removed members of struct efx_nic [netdrvr] sfc: Remove garbage from comment ...
2008-05-14net: Fix typo in net/core/sock.c.Rami Rosen
In sock_queue_rcv_skb() (net/core/sock.c) it should be: "Cast sk->rcvbuf ..." instead of: "Cast skb->rcvbuf ..." Signed-off-by: Rami Rosen <ramirose@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-05-13netfilter: xt_iprange: module aliases for xt_iprangePhil Oester
Using iptables 1.3.8 with kernel 2.6.25, rules which include '-m iprange' don't automatically pull in xt_iprange module. Below patch adds module aliases to fix that. Patch against latest -git, but seems like a good candidate for -stable also. Signed-off-by: Phil Oester <kernel@linuxace.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-05-13netfilter: ctnetlink: dump conntrack ID in event messagesEric Leblond
Conntrack ID is not put (anymore ?) in event messages. This causes current ulogd2 code to fail because it uses the ID to build a hash in userspace. This hash is used to be able to output the starting time of a connection. Conntrack ID can be used in userspace application to maintain an easy match between kernel connections list and userspace one. It may worth to add it if there is no performance related issue. [ Patrick: it was never included in events, but really should be ] Signed-off-by: Eric Leblond <eric@inl.fr> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-05-13irda: Fix a misalign access issue. (v2)Graf Yang
Replace u16ho with put/get_unaligned functions Signed-off-by: Graf Yang <graf.yang@analog.com> Signed-off-by: Bryan Wu <cooloney@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-05-13sctp: Fix use of uninitialized pointerPatrick McHardy
Introduced by c4492586 (sctp: Add address type check while process paramaters of ASCONF chunk): net/sctp/sm_make_chunk.c: In function 'sctp_process_asconf': net/sctp/sm_make_chunk.c:2828: warning: 'addr_param' may be used uninitialized in this function net/sctp/sm_make_chunk.c:2828: note: 'addr_param' was declared here Signed-off-by: Patrick McHardy <kaber@trash.net> Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-05-13cipso: Relax too much careful cipso hash function.Pavel Emelyanov
The cipso_v4_cache is allocated to contain CIPSO_V4_CACHE_BUCKETS buckets. The CIPSO_V4_CACHE_BUCKETS = 1 << CIPSO_V4_CACHE_BUCKETBITS, where CIPSO_V4_CACHE_BUCKETBITS = 7. The bucket-selection function for this hash is calculated like this: bkt = hash & (CIPSO_V4_CACHE_BUCKETBITS - 1); ^^^ i.e. picking only 4 buckets of possible 128 :) Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Paul Moore <paul.moore@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-05-13tcp FRTO: work-around inorder receiversIlpo Järvinen
If receiver consumes segments successfully only in-order, FRTO fallback to conventional recovery produces RTO loop because FRTO's forward transmissions will always get dropped and need to be resent, yet by default they're not marked as lost (which are the only segments we will retransmit in CA_Loss). Price to pay about this is occassionally unnecessarily retransmitting the forward transmission(s). SACK blocks help a bit to avoid this, so it's mainly a concern for NewReno case though SACK is not fully immune either. This change has a side-effect of fixing SACKFRTO problem where it didn't have snd_nxt of the RTO time available anymore when fallback become necessary (this problem would have only occured when RTO would occur for two or more segments and ECE arrives in step 3; no need to figure out how to fix that unless the TODO item of selective behavior is considered in future). Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Reported-by: Damon L. Chesser <damon@damtek.com> Tested-by: Damon L. Chesser <damon@damtek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-05-13tcp FRTO: Fix fallback to conventional recoveryIlpo Järvinen
It seems that commit 009a2e3e4ec ("[TCP] FRTO: Improve interoperability with other undo_marker users") run into another land-mine which caused fallback to conventional recovery to break: 1. Cumulative ACK arrives after FRTO retransmission 2. tcp_try_to_open sees zero retrans_out, clears retrans_stamp which should be kept like in CA_Loss state it would be 3. undo_marker change allowed tcp_packet_delayed to return true because of the cleared retrans_stamp once FRTO is terminated causing LossUndo to occur, which means all loss markings FRTO made are reverted. This means that the conventional recovery basically recovered one loss per RTT, which is not that efficient. It was quite unobvious that the undo_marker change broken something like this, I had a quite long session to track it down because of the non-intuitiviness of the bug (luckily I had a trivial reproducer at hand and I was also able to learn to use kprobes in the process as well :-)). This together with the NewReno+FRTO fix and FRTO in-order workaround this fixes Damon's problems, this and the first mentioned are enough to fix Bugzilla #10063. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Reported-by: Damon L. Chesser <damon@damtek.com> Tested-by: Damon L. Chesser <damon@damtek.com> Tested-by: Sebastian Hyrwall <zibbe@cisko.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-05-12mac80211: Use skb_header_cloned() on TX path.David S. Miller
When skb_header_cloned() returns false you can change the headers however you like. Signed-off-by: David S. Miller <davem@davemloft.net>
2008-05-12mac80211: assign needed_headroom/tailroom for netdevsJohannes Berg
This assigns the netdev's needed_headroom/tailroom members to take advantage of pre-allocated space for 802.11 headers. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-05-12net: Allow netdevices to specify needed head/tailroomJohannes Berg
This patch adds needed_headroom/needed_tailroom members to struct net_device and updates many places that allocate sbks to use them. Not all of them can be converted though, and I'm sure I missed some (I mostly grepped for LL_RESERVED_SPACE) Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-05-12mac80211: add missing newlines in printk()Pavel Roskin
Signed-off-by: Pavel Roskin <proski@gnu.org> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-05-12mac80211: fix association with some APsHelmut Schaa
Some APs refuse association if the supported rates contained in the association request do not match its own supported rates. This patch introduces a new function which builds the intersection between the AP's supported rates and the client's supported rates to work around such problems. The same approach is already used in ipw2200 for example. Signed-off-by: Helmut Schaa <hschaa@suse.de> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-05-12Fix potential scheduling while atomic in mesh_path_add.Pavel Emelyanov
Calling synchronize_rcu() under write-lock-ed pathtbl_resize_lock may result in this warning (and other side effects). It looks safe just dropping this lock before calling synchronize_rcu. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-05-12Fix not checked kmalloc() result.Pavel Emelyanov
The new_node kmallocation is not checked for success, so add this check. BTW, it also happens under the read_lock. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-05-12Fix GFP_KERNEL allocation under read lock.Pavel Emelyanov
The mesh_path_add() read-locks the pathtbl_resize_lock and calls kmalloc with GFP_KERNEL mask. Fix it and move the endadd2 label lower. It should be _before_ the if() beyond, but it makes no sense for it being there, so I move it right after this if(). Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-05-12mac80211: mesh hwmp: fix kfree(skb)Patrick McHardy
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-05-12mac80211: fix access to null skbLuis Carlos Cobo
Without this patch, if xmit_skb is null but net_ratelimit() returns 0 we would go to the else branch and access the null xmit_skb. Pointed out by Johannes Berg. Signed-off-by: Luis Carlos Cobo <luisca@cozybit.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-05-12mac80211: fix incorrect mesh header lengthLuis Carlos Cobo
This should have been updated at the same time we were transitioning from 3 byte to 4 byte mesh sequence number. Pointed out by Johannes Berg. Signed-off-by: Luis Carlos Cobo <luisca@cozybit.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-05-12mac80211: Don't encrypt beaconsIvo van Doorn
mac80211 should set the IEEE80211_TX_CTL_DO_NOT_ENCRYPT flag in tx_control structure to inform drivers not to encrypt the beacon. Drivers that only check for that flag before accessing the hw_key field, will otherwise cause a NULL pointer dereference since that field is not configured for beacons. Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-05-12mac80211: fix debugfs default key oopsJohannes Berg
Under certain circumstances (in AP mode) the debugfs function that is supposed to add the default key symlink can encounter a NULL default_key pointer. This patch makes it handle that situtation gracefully. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-05-12fix irq flags in mac80211 codeSteven Rostedt
A file in the net/mac80211 directory uses "int" for flags. This can cause hard to find bugs on some architectures. This patch converts the flags to use "long" instead. This bug was discovered by doing an allyesconfig make on the -rt kernel where checks are done to ensure all flags are of size sizeof(long). Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-05-12sctp: Add address type check while process paramaters of ASCONF chunkWei Yongjun
If socket is create by AF_INET type, add IPv6 address to asoc will cause kernel panic while packet is transmitted on that transport. This patch add address type check before process paramaters of ASCONF chunk. If peer is not support this address type, return with error invald parameter. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-05-12sctp: Do not enable peer IPv6 address support on PF_INET socketWei Yongjun
If socket is create by PF_INET type, it can not used IPv6 address to send/recv DATA, So we can not used IPv6 address even if peer tell us it support IPv6 address. This patch fix to only enabled peer IPv6 address support on PF_INET6 socket. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-05-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: sit: Add missing kfree_skb() on pskb_may_pull() failure. tipc: Increase buffer header to support worst-case device
2008-05-08sit: Add missing kfree_skb() on pskb_may_pull() failure.David S. Miller
Noticed by Paul Marks <paul@pmarks.net>. Signed-off-by: David S. Miller <davem@davemloft.net>
2008-05-08tipc: Increase buffer header to support worst-case deviceAllan Stephens
This patch increases the headroom TIPC reserves in each sk_buff to accommodate the largest possible link level device header. Signed-off-by: Allan Stephens <allan.stephens@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-05-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (32 commits) net: Added ASSERT_RTNL() to dev_open() and dev_close(). can: Fix can_send() handling on dev_queue_xmit() failures netns: Fix arbitrary net_device-s corruptions on net_ns stop. netfilter: Kconfig: default DCCP/SCTP conntrack support to the protocol config values netfilter: nf_conntrack_sip: restrict RTP expect flushing on error to last request macvlan: Fix memleak on device removal/crash on module removal net/ipv4: correct RFC 1122 section reference in comment tcp FRTO: SACK variant is errorneously used with NewReno e1000e: don't return half-read eeprom on error ucc_geth: Don't use RX clock as TX clock. cxgb3: Use CAP_SYS_RAWIO for firmware pcnet32: delete non NAPI code from driver. fs_enet: Fix a memory leak in fs_enet_mdio_probe [netdrvr] eexpress: IPv6 fails - multicast problems 3c59x: use netstats in net_device structure 3c980-TX needs EXTRA_PREAMBLE fix warning in drivers/net/appletalk/cops.c e1000e: Add support for BM PHYs on ICH9 uli526x: fix endianness issues in the setup frame uli526x: initialize the hardware prior to requesting interrupts ...
2008-05-08Remove duplicated include in net/sunrpc/svc.cHuang Weiyi
<linux/sched.h> we included twice. Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-08fix irq flags in mac80211 codeSteven Rostedt
A file in the net/mac80211 directory uses "int" for flags. This can cause hard to find bugs on some architectures. This patch converts the flags to use "long" instead. This bug was discovered by doing an allyesconfig make on the -rt kernel where checks are done to ensure all flags are of size sizeof(long). Signed-off-by: Steven Rostedt <srostedt@redhat.com> Cc: "John W. Linville" <linville@tuxdriver.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-08net: Added ASSERT_RTNL() to dev_open() and dev_close().Ben Hutchings
dev_open() and dev_close() must be called holding the RTNL, since they call device functions and netdevice notifiers that are promised the RTNL. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-05-08can: Fix can_send() handling on dev_queue_xmit() failuresOliver Hartkopp
The tx packet counting and the local loopback of CAN frames should only happen in the case that the CAN frame has been enqueued to the netdevice tx queue successfully. Thanks to Andre Naujoks <nautsch@gmail.com> for reporting this issue. Signed-off-by: Oliver Hartkopp <oliver@hartkopp.net> Signed-off-by: Urs Thuermann <urs@isnogud.escape.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-05-08netns: Fix arbitrary net_device-s corruptions on net_ns stop.Pavel Emelyanov
When a net namespace is destroyed, some devices (those, not killed on ns stop explicitly) are moved back to init_net. The problem, is that this net_ns change has one point of failure - the __dev_alloc_name() may be called if a name collision occurs (and this is easy to trigger). This allocator performs a likely-to-fail GFP_ATOMIC allocation to find a suitable number. Other possible conditions that may cause error (for device being ns local or not registered) are always false in this case. So, when this call fails, the device is unregistered. But this is *not* the right thing to do, since after this the device may be released (and kfree-ed) improperly. E. g. bridges require more actions (sysfs update, timer disarming, etc.), some other devices want to remove their private areas from lists, etc. I. e. arbitrary use-after-free cases may occur. The proposed fix is the following: since the only reason for the dev_change_net_namespace to fail is the name generation, we may give it a unique fall-back name w/o %d-s in it - the dev<ifindex> one, since ifindexes are still unique. So make this change, raise the failure-case printk loglevel to EMERG and replace the unregister_netdevice call with BUG(). [ Use snprintf() -DaveM ] Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-05-08netfilter: Kconfig: default DCCP/SCTP conntrack support to the protocol ↵Patrick McHardy
config values When conntrack and DCCP/SCTP protocols are enabled, chances are good that people also want DCCP/SCTP conntrack and NAT support. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-05-08netfilter: nf_conntrack_sip: restrict RTP expect flushing on error to last ↵Patrick McHardy
request Some Inovaphone PBXs exhibit very stange behaviour: when dialing for example "123", the device sends INVITE requests for "1", "12" and "123" back to back. The first requests will elicit error responses from the receiver, causing the SIP helper to flush the RTP expectations even though we might still see a positive response. Note the sequence number of the last INVITE request that contained a media description and only flush the expectations when receiving a negative response for that sequence number. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-05-08net/ipv4: correct RFC 1122 section reference in commentJ.H.M. Dassen (Ray)
RFC 1122 does not have a section 3.1.2.2. The requirement to silently discard datagrams with a bad checksum is in section 3.2.1.2 instead. Addresses http://bugzilla.kernel.org/show_bug.cgi?id=10611 Signed-off-by: J.H.M. Dassen (Ray) <jdassen@debian.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-05-08tcp FRTO: SACK variant is errorneously used with NewRenoIlpo Järvinen
Note: there's actually another bug in FRTO's SACK variant, which is the causing failure in NewReno case because of the error that's fixed here. I'll fix the SACK case separately (it's a separate bug really, though related, but in order to fix that I need to audit tp->snd_nxt usage a bit). There were two places where SACK variant of FRTO is getting incorrectly used even if SACK wasn't negotiated by the TCP flow. This leads to incorrect setting of frto_highmark with NewReno if a previous recovery was interrupted by another RTO. An eventual fallback to conventional recovery then incorrectly considers one or couple of segments as forward transmissions though they weren't, which then are not LOST marked during fallback making them "non-retransmittable" until the next RTO. In a bad case, those segments are really lost and are the only one left in the window. Thus TCP needs another RTO to continue. The next FRTO, however, could again repeat the same events making the progress of the TCP flow extremely slow. In order for these events to occur at all, FRTO must occur again in FRTOs step 3 while the key segments must be lost as well, which is not too likely in practice. It seems to most frequently with some small devices such as network printers that *seem* to accept TCP segments only in-order. In cases were key segments weren't lost, things get automatically resolved because those wrongly marked segments don't need to be retransmitted in order to continue. I found a reproducer after digging up relevant reports (few reports in total, none at netdev or lkml I know of), some cases seemed to indicate middlebox issues which seems now to be a false assumption some people had made. Bugzilla #10063 _might_ be related. Damon L. Chesser <damon@damtek.com> had a reproducable case and was kind enough to tcpdump it for me. With the tcpdump log it was quite trivial to figure out. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>