summaryrefslogtreecommitdiffstats
path: root/net/ipv4
AgeCommit message (Collapse)Author
2011-10-24TCP: remove TCP_DEBUGFlavio Leitner
It was enabled by default and the messages guarded by the define are useful. Signed-off-by: Flavio Leitner <fbl@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-24ipv4: tcp: fix TOS value in ACK messages sent from TIME_WAITEric Dumazet
There is a long standing bug in linux tcp stack, about ACK messages sent on behalf of TIME_WAIT sockets. In the IP header of the ACK message, we choose to reflect TOS field of incoming message, and this might break some setups. Example of things that were broken : - Routing using TOS as a selector - Firewalls - Trafic classification / shaping We now remember in timewait structure the inet tos field and use it in ACK generation, and route lookup. Notes : - We still reflect incoming TOS in RST messages. - We could extend MuraliRaja Muniraju patch to report TOS value in netlink messages for TIME_WAIT sockets. - A patch is needed for IPv6 Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-24tcp: md5: add more const attributesEric Dumazet
Now tcp_md5_hash_header() has a const tcphdr argument, we can add more const attributes to callers. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-24tcp: md5: dont write skb head in tcp_md5_hash_header()Eric Dumazet
tcp_md5_hash_header() writes into skb header a temporary zero value, this might confuse other users of this area. Since tcphdr is small (20 bytes), copy it in a temporary variable and make the change in the copy. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-22net: use INET_ECN_MASK instead of hardcoded 3Maciej Żenczykowski
Signed-off-by: Maciej Żenczykowski <maze@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-21tcp: add const qualifiers where possibleEric Dumazet
Adding const qualifiers to pointers can ease code review, and spot some bugs. It might allow compiler to optimize code further. For example, is it legal to temporary write a null cksum into tcphdr in tcp_md5_hash_header() ? I am afraid a sniffer could catch the temporary null value... Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-20net: allow CAP_NET_RAW to set socket options IP{,V6}_TRANSPARENTMaciej Żenczykowski
Up till now the IP{,V6}_TRANSPARENT socket options (which actually set the same bit in the socket struct) have required CAP_NET_ADMIN privileges to set or clear the option. - we make clearing the bit not require any privileges. - we allow CAP_NET_ADMIN to set the bit (as before this change) - we allow CAP_NET_RAW to set this bit, because raw sockets already pretty much effectively allow you to emulate socket transparency. Signed-off-by: Maciej Żenczykowski <maze@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-20tcp: remove unused tcp_fin() parametersEric Dumazet
tcp_fin() only needs socket pointer, we can remove skb and th params. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-20tcp: use TCP_DEFAULT_INIT_RCVWND in tcp_fixup_rcvbuf()Eric Dumazet
Since commit 356f039822b (TCP: increase default initial receive window.), we allow sender to send 10 (TCP_DEFAULT_INIT_RCVWND) segments. Change tcp_fixup_rcvbuf() to reflect this change, even if no real change is expected, since sysctl_tcp_rmem[1] = 87380 and this value is bigger than tcp_fixup_rcvbuf() computed rcvmem (~23720) Note: Since commit 356f039822b limited default window to maximum of 10*1460 and 2*MSS, we use same heuristic in this patch. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-19ipv4: compat_ioctl is local to af_inet.c, make it staticGerrit Renker
ipv4: compat_ioctl is local to af_inet.c, make it static Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-19tcp: use TCP_INIT_CWND in tcp_fixup_sndbuf()Eric Dumazet
Initial cwnd being 10 (TCP_INIT_CWND) instead of 3, change tcp_fixup_sndbuf() to get more than 16384 bytes (sysctl_tcp_wmem[1]) in initial sk_sndbuf Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-19net: add skb frag size accessorsEric Dumazet
To ease skb->truesize sanitization, its better to be able to localize all references to skb frags size. Define accessors : skb_frag_size() to fetch frag size, and skb_frag_size_{set|add|sub}() to manipulate it. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-18macvlan: handle fragmented multicast framesEric Dumazet
Fragmented multicast frames are delivered to a single macvlan port, because ip defrag logic considers other samples are redundant. Implement a defrag step before trying to send the multicast frame. Reported-by: Ben Greear <greearb@candelatech.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-13net: more accurate skb truesizeEric Dumazet
skb truesize currently accounts for sk_buff struct and part of skb head. kmalloc() roundings are also ignored. Considering that skb_shared_info is larger than sk_buff, its time to take it into account for better memory accounting. This patch introduces SKB_TRUESIZE(X) macro to centralize various assumptions into a single place. At skb alloc phase, we put skb_shared_info struct at the exact end of skb head, to allow a better use of memory (lowering number of reallocations), since kmalloc() gives us power-of-two memory blocks. Unless SLUB/SLUB debug is active, both skb->head and skb_shared_info are aligned to cache lines, as before. Note: This patch might trigger performance regressions because of misconfigured protocol stacks, hitting per socket or global memory limits that were previously not reached. But its a necessary step for a more accurate memory accounting. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Andi Kleen <ak@linux.intel.com> CC: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-12net-netlink: Add a new attribute to expose TOS values via netlinkMurali Raja
This patch exposes the tos value for the TCP sockets when the TOS flag is requested in the ext_flags for the inet_diag request. This would mainly be used to expose TOS values for both for TCP and UDP sockets. Currently it is supported for TCP. When netlink support for UDP would be added the support to expose the TOS values would alse be done. For IPV4 tos value is exposed and for IPV6 tclass value is exposed. Signed-off-by: Murali Raja <muralira@google.com> Acked-by: Stephen Hemminger <shemminger@vyatta.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-11cipso: remove an unneeded NULL check in cipso_v4_doi_add()Dan Carpenter
We dereference doi_def on the line before the NULL check. It has been this way since 2008. I checked all the callers and doi_def is always non-NULL here. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-07Merge branch 'master' of github.com:davem330/netDavid S. Miller
Conflicts: net/batman-adv/soft-interface.c
2011-10-04tcp: properly update lost_cnt_hint during shiftingYan, Zheng
lost_skb_hint is used by tcp_mark_head_lost() to mark the first unhandled skb. lost_cnt_hint is the number of packets or sacked packets before the lost_skb_hint; When shifting a skb that is before the lost_skb_hint, if tcp_is_fack() is ture, the skb has already been counted in the lost_cnt_hint; if tcp_is_fack() is false, tcp_sacktag_one() will increase the lost_cnt_hint. So tcp_shifted_skb() does not need to adjust the lost_cnt_hint by itself. When shifting a skb that is equal to lost_skb_hint, the shifted packets will not be counted by tcp_mark_head_lost(). So tcp_shifted_skb() should adjust the lost_cnt_hint even tcp_is_fack(tp) is true. Signed-off-by: Zheng Yan <zheng.z.yan@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-04tcp: properly handle md5sig_pool referencesYan, Zheng
tcp_v4_clear_md5_list() assumes that multiple tcp md5sig peers only hold one reference to md5sig_pool. but tcp_v4_md5_do_add() increases use count of md5sig_pool for each peer. This patch makes tcp_v4_md5_do_add() only increases use count for the first tcp md5sig peer. Signed-off-by: Zheng Yan <zheng.z.yan@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-03ipv4: NET_IPV4_ROUTE_GC_INTERVAL removalVasily Averin
removing obsoleted sysctl, ip_rt_gc_interval variable no longer used since 2.6.38 Signed-off-by: Vasily Averin <vvs@sw.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-03tcp: report ECN_SEEN in tcp_infoEric Dumazet
Allows ss command (iproute2) to display "ecnseen" if at least one packet with ECT(0) or ECT(1) or ECN was received by this socket. "ecn" means ECN was negotiated at session establishment (TCP level) "ecnseen" means we received at least one packet with ECT fields set (IP level) ss -i ... ESTAB 0 0 192.168.20.110:22 192.168.20.144:38016 ino:5950 sk:f178e400 mem:(r0,w0,f0,t0) ts sack ecn ecnseen bic wscale:7,8 rto:210 rtt:12.5/7.5 cwnd:10 send 9.3Mbps rcv_space:14480 Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-09-27tcp: rename tcp_skb_cb flagsEric Dumazet
Rename struct tcp_skb_cb "flags" to "tcp_flags" to ease code review and maintenance. Its content is a combination of FIN/SYN/RST/PSH/ACK/URG/ECE/CWR flags Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-09-27tcp: unalias tcp_skb_cb flags and ip_dsfieldEric Dumazet
struct tcp_skb_cb contains a "flags" field containing either tcp flags or IP dsfield depending on context (input or output path) Introduce ip_dsfield to make the difference clear and ease maintenance. If later we want to save space, we can union flags/ip_dsfield Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-09-27tcp: ECN blackhole should not force quickack modeEric Dumazet
While playing with a new ADSL box at home, I discovered that ECN blackhole can trigger suboptimal quickack mode on linux : We send one ACK for each incoming data frame, without any delay and eventual piggyback. This is because TCP_ECN_check_ce() considers that if no ECT is seen on a segment, this is because this segment was a retransmit. Refine this heuristic and apply it only if we seen ECT in a previous segment, to detect ECN blackhole at IP level. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Jamal Hadi Salim <jhs@mojatatu.com> CC: Jerry Chu <hkchu@google.com> CC: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> CC: Jim Gettys <jg@freedesktop.org> CC: Dave Taht <dave.taht@gmail.com> Acked-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-09-22Merge branch 'master' of github.com:davem330/netDavid S. Miller
Conflicts: MAINTAINERS drivers/net/Kconfig drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c drivers/net/ethernet/broadcom/tg3.c drivers/net/wireless/iwlwifi/iwl-pci.c drivers/net/wireless/iwlwifi/iwl-trans-tx-pcie.c drivers/net/wireless/rt2x00/rt2800usb.c drivers/net/wireless/wl12xx/main.c
2011-09-18tcp: fix validation of D-SACKZheng Yan
D-SACK is allowed to reside below snd_una. But the corresponding check in tcp_is_sackblock_valid() is the exact opposite. It looks like a typo. Signed-off-by: Zheng Yan <zheng.z.yan@intel.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-09-17tcp: md5: remove one indirection level in tcp_md5sig_poolEric Dumazet
tcp_md5sig_pool is currently an 'array' (a percpu object) of pointers to struct tcp_md5sig_pool. Only the pointers are NUMA aware, but objects themselves are all allocated on a single node. Remove this extra indirection to get proper percpu memory (NUMA aware) and make code simpler. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-09-16ipv4: Fix fib_info->fib_metrics leakYan, Zheng
Commit 4670994d(net,rcu: convert call_rcu(fc_rport_free_rcu) to kfree_rcu()) introduced a memory leak. This patch reverts it. Signed-off-by: Zheng Yan <zheng.z.yan@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-09-16Merge branch 'master' of ../netdev/David S. Miller
2011-09-15tcp: Change possible SYN flooding messagesEric Dumazet
"Possible SYN flooding on port xxxx " messages can fill logs on servers. Change logic to log the message only once per listener, and add two new SNMP counters to track : TCPReqQFullDoCookies : number of times a SYNCOOKIE was replied to client TCPReqQFullDrop : number of times a SYN request was dropped because syncookies were not enabled. Based on a prior patch from Tom Herbert, and suggestions from David. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-30net: ipv4: relax AF_INET check in bind()Eric Dumazet
commit d0733d2e29b65 (Check for mistakenly passed in non-IPv4 address) added regression on legacy apps that use bind() with AF_UNSPEC family. Relax the check, but make sure the bind() is done on INADDR_ANY addresses, as AF_UNSPEC has probably no sane meaning for other addresses. Bugzilla reference : https://bugzilla.kernel.org/show_bug.cgi?id=42012 Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Reported-and-bisected-by: Rene Meier <r_meier@freenet.de> CC: Marcus Meissner <meissner@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-30Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-2.6
2011-08-30netfilter: nf_queue: reject NF_STOLEN verdicts from userspaceFlorian Westphal
A userspace listener may send (bogus) NF_STOLEN verdict, which causes skb leak. This problem was previously fixed via 64507fdbc29c3a622180378210ecea8659b14e40 (netfilter: nf_queue: fix NF_STOLEN skb leak) but this had to be reverted because NF_STOLEN can also be returned by a netfilter hook when iterating the rules in nf_reinject. Reject userspace NF_STOLEN verdict, as suggested by Michal Miroslaw. This is complementary to commit fad54440438a7c231a6ae347738423cbabc936d9 (netfilter: avoid double free in nf_reinject). Cc: Julian Anastasov <ja@ssi.bg> Cc: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-08-24Proportional Rate Reduction for TCP.Nandita Dukkipati
This patch implements Proportional Rate Reduction (PRR) for TCP. PRR is an algorithm that determines TCP's sending rate in fast recovery. PRR avoids excessive window reductions and aims for the actual congestion window size at the end of recovery to be as close as possible to the window determined by the congestion control algorithm. PRR also improves accuracy of the amount of data sent during loss recovery. The patch implements the recommended flavor of PRR called PRR-SSRB (Proportional rate reduction with slow start reduction bound) and replaces the existing rate halving algorithm. PRR improves upon the existing Linux fast recovery under a number of conditions including: 1) burst losses where the losses implicitly reduce the amount of outstanding data (pipe) below the ssthresh value selected by the congestion control algorithm and, 2) losses near the end of short flows where application runs out of data to send. As an example, with the existing rate halving implementation a single loss event can cause a connection carrying short Web transactions to go into the slow start mode after the recovery. This is because during recovery Linux pulls the congestion window down to packets_in_flight+1 on every ACK. A short Web response often runs out of new data to send and its pipe reduces to zero by the end of recovery when all its packets are drained from the network. Subsequent HTTP responses using the same connection will have to slow start to raise cwnd to ssthresh. PRR on the other hand aims for the cwnd to be as close as possible to ssthresh by the end of recovery. A description of PRR and a discussion of its performance can be found at the following links: - IETF Draft: http://tools.ietf.org/html/draft-mathis-tcpm-proportional-rate-reduction-01 - IETF Slides: http://www.ietf.org/proceedings/80/slides/tcpm-6.pdf http://tools.ietf.org/agenda/81/slides/tcpm-2.pdf - Paper to appear in Internet Measurements Conference (IMC) 2011: Improving TCP Loss Recovery Nandita Dukkipati, Matt Mathis, Yuchung Cheng Signed-off-by: Nandita Dukkipati <nanditad@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-24net: ipv4: convert to SKB frag APIsIan Campbell
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: "Pekka Savola (ipv6)" <pekkas@netcore.fi> Cc: James Morris <jmorris@namei.org> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: Patrick McHardy <kaber@trash.net> Cc: netdev@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-24mcast: Fix source address selection for multicast listener reportYan, Zheng
Should check use count of include mode filter instead of total number of include mode filters. Signed-off-by: Zheng Yan <zheng.z.yan@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-20Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/netDavid S. Miller
2011-08-17net: remove ndo_set_multicast_list callbackJiri Pirko
Remove no longer used operation. Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-17rps: Add flag to skb to indicate rxhash is based on L4 tupleTom Herbert
The l4_rxhash flag was added to the skb structure to indicate that the rxhash value was computed over the 4 tuple for the packet which includes the port information in the encapsulated transport packet. This is used by the stack to preserve the rxhash value in __skb_rx_tunnel. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-12net: cleanup some rcu_dereference_rawEric Dumazet
RCU api had been completed and rcu_access_pointer() or rcu_dereference_protected() are better than generic rcu_dereference_raw() Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-11ipv4: some rt_iif -> rt_route_iif conversionsJulian Anastasov
As rt_iif represents input device even for packets coming from loopback with output route, it is not an unique key specific to input routes. Now rt_route_iif has such role, it was fl.iif in 2.6.38, so better to change the checks at some places to save CPU cycles and to restore 2.6.38 semantics. compare_keys: - input routes: only rt_route_iif matters, rt_iif is same - output routes: only rt_oif matters, rt_iif is not used for matching in __ip_route_output_key - now we are back to 2.6.38 state ip_route_input_common: - matching rt_route_iif implies input route - compared to 2.6.38 we eliminated one rth->fl.oif check because it was not needed even for 2.6.38 compare_hash_inputs: Only the change here is not an optimization, it has effect only for output routes. I assume I'm restoring the original intention to ignore oif, it was using fl.iif - now we are back to 2.6.38 state Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-10tcp: initialize variable ecn_ok in syncookies pathMike Waychison
Using a gcc 4.4.3, warnings are emitted for a possibly uninitialized use of ecn_ok. This can happen if cookie_check_timestamp() returns due to not having seen a timestamp. Defaulting to ecn off seems like a reasonable thing to do in this case, so initialized ecn_ok to false. Signed-off-by: Mike Waychison <mikew@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-07Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/netDavid S. Miller
2011-08-07ipv4: use dst with ref during bcast/mcast loopbackJulian Anastasov
Make sure skb dst has reference when moving to another context. Currently, I don't see protocols that can hit it when sending broadcasts/multicasts to loopback using noref dsts, so it is just a precaution. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-07ipv4: route non-local sources for raw socketJulian Anastasov
The raw sockets can provide source address for routing but their privileges are not considered. We can provide non-local source address, make sure the FLOWI_FLAG_ANYSRC flag is set if socket has privileges for this, i.e. based on hdrincl (IP_HDRINCL) and transparent flags. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-07netfilter: TCP and raw fix for ip_route_me_harderJulian Anastasov
TCP in some cases uses different global (raw) socket to send RST and ACK. The transparent flag is not set there. Currently, it is a problem for rerouting after the previous change. Fix it by simplifying the checks in ip_route_me_harder and use FLOWI_FLAG_ANYSRC even for sockets. It looks safe because the initial routing allowed this source address to be used and now we just have to make sure the packet is rerouted. As a side effect this also allows rerouting for normal raw sockets that use spoofed source addresses which was not possible even before we eliminated the ip_route_input call. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-07ipv4: Fix ip_getsockopt for IP_PKTOPTIONSDaniel Baluta
IP_PKTOPTIONS is broken for 32-bit applications running in COMPAT mode on 64-bit kernels. This happens because msghdr's msg_flags field is always set to zero. When running in COMPAT mode this should be set to MSG_CMSG_COMPAT instead. Signed-off-by: Tiberiu Szocs-Mihai <tszocs@ixiacom.com> Signed-off-by: Daniel Baluta <dbaluta@ixiacom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-07ipv4: fix the reusing of routing cache entriesJulian Anastasov
compare_keys and ip_route_input_common rely on rt_oif for distinguishing of input and output routes with same keys values. But sometimes the input route has also same hash chain (keyed by iif != 0) with the output routes (keyed by orig_oif=0). Problem visible if running with small number of rhash_entries. Fix them to use rt_route_iif instead. By this way input route can not be returned to users that request output route. The patch fixes the ip_rt_bug errors that were reported in ip_local_out context, mostly for 255.255.255.255 destinations. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-06net: Compute protocol sequence numbers and fragment IDs using MD5.David S. Miller
Computers have become a lot faster since we compromised on the partial MD4 hash which we use currently for performance reasons. MD5 is a much safer choice, and is inline with both RFC1948 and other ISS generators (OpenBSD, Solaris, etc.) Furthermore, only having 24-bits of the sequence number be truly unpredictable is a very serious limitation. So the periodic regeneration and 8-bit counter have been removed. We compute and use a full 32-bit sequence number. For ipv6, DCCP was found to use a 32-bit truncated initial sequence number (it needs 43-bits) and that is fixed here as well. Reported-by: Dan Kaminsky <dan@doxpara.com> Tested-by: Willy Tarreau <w@1wt.eu> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-03net: fix NULL dereferences in check_peer_redir()Eric Dumazet
Gergely Kalman reported crashes in check_peer_redir(). It appears commit f39925dbde778 (ipv4: Cache learned redirect information in inetpeer.) added a race, leading to possible NULL ptr dereference. Since we can now change dst neighbour, we should make sure a reader can safely use a neighbour. Add RCU protection to dst neighbour, and make sure check_peer_redir() can be called safely by different cpus in parallel. As neighbours are already freed after one RCU grace period, this patch should not add typical RCU penalty (cache cold effects) Many thanks to Gergely for providing a pretty report pointing to the bug. Reported-by: Gergely Kalman <synapse@hippy.csoma.elte.hu> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>