summaryrefslogtreecommitdiffstats
path: root/net/ipv4
AgeCommit message (Collapse)Author
2011-05-22ipv4: Include linux/prefetch.h in fib_trie.cDavid S. Miller
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-20Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1446 commits) macvlan: fix panic if lowerdev in a bond tg3: Add braces around 5906 workaround. tg3: Fix NETIF_F_LOOPBACK error macvlan: remove one synchronize_rcu() call networking: NET_CLS_ROUTE4 depends on INET irda: Fix error propagation in ircomm_lmp_connect_response() irda: Kill set but unused variable 'bytes' in irlan_check_command_param() irda: Kill set but unused variable 'clen' in ircomm_connect_indication() rxrpc: Fix set but unused variable 'usage' in rxrpc_get_transport() be2net: Kill set but unused variable 'req' in lancer_fw_download() irda: Kill set but unused vars 'saddr' and 'daddr' in irlan_provider_connect_indication() atl1c: atl1c_resume() is only used when CONFIG_PM_SLEEP is defined. rxrpc: Fix set but unused variable 'usage' in rxrpc_get_peer(). rxrpc: Kill set but unused variable 'local' in rxrpc_UDP_error_handler() rxrpc: Kill set but unused variable 'sp' in rxrpc_process_connection() rxrpc: Kill set but unused variable 'sp' in rxrpc_rotate_tx_window() pkt_sched: Kill set but unused variable 'protocol' in tc_classify() isdn: capi: Use pr_debug() instead of ifdefs. tg3: Update version to 3.119 tg3: Apply rx_discards fix to 5719/5720 ... Fix up trivial conflicts in arch/x86/Kconfig and net/mac80211/agg-tx.c as per Davem.
2011-05-19Merge branch 'core-rcu-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (78 commits) Revert "rcu: Decrease memory-barrier usage based on semi-formal proof" net,rcu: convert call_rcu(prl_entry_destroy_rcu) to kfree batman,rcu: convert call_rcu(softif_neigh_free_rcu) to kfree_rcu batman,rcu: convert call_rcu(neigh_node_free_rcu) to kfree() batman,rcu: convert call_rcu(gw_node_free_rcu) to kfree_rcu net,rcu: convert call_rcu(kfree_tid_tx) to kfree_rcu() net,rcu: convert call_rcu(xt_osf_finger_free_rcu) to kfree_rcu() net/mac80211,rcu: convert call_rcu(work_free_rcu) to kfree_rcu() net,rcu: convert call_rcu(wq_free_rcu) to kfree_rcu() net,rcu: convert call_rcu(phonet_device_rcu_free) to kfree_rcu() perf,rcu: convert call_rcu(swevent_hlist_release_rcu) to kfree_rcu() perf,rcu: convert call_rcu(free_ctx) to kfree_rcu() net,rcu: convert call_rcu(__nf_ct_ext_free_rcu) to kfree_rcu() net,rcu: convert call_rcu(net_generic_release) to kfree_rcu() net,rcu: convert call_rcu(netlbl_unlhsh_free_addr6) to kfree_rcu() net,rcu: convert call_rcu(netlbl_unlhsh_free_addr4) to kfree_rcu() security,rcu: convert call_rcu(sel_netif_free) to kfree_rcu() net,rcu: convert call_rcu(xps_dev_maps_release) to kfree_rcu() net,rcu: convert call_rcu(xps_map_release) to kfree_rcu() net,rcu: convert call_rcu(rps_map_release) to kfree_rcu() ...
2011-05-19ipconfig wait for carrierMicha Nelissen
v3 -> v4: fix return boolean false instead of 0 for ic_is_init_dev Currently the ip auto configuration has a hardcoded delay of 1 second. When (ethernet) link takes longer to come up (e.g. more than 3 seconds), nfs root may not be found. Remove the hardcoded delay, and wait for carrier on at least one network device. Signed-off-by: Micha Nelissen <micha@neli.hopto.org> Cc: David Miller <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-19net: ping: fix the coding styleChangli Gao
The characters in a line should be no more than 80. Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-19net: ping: make local functions staticChangli Gao
As these functions are only used in this file. Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-18ipv4: Pass explicit destination address to rt_bind_peer().David S. Miller
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-18ipv4: Pass explicit destination address to rt_get_peer().David S. Miller
This will next trickle down to rt_bind_peer(). Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-18ipv4: Make caller provide flowi4 key to inet_csk_route_req().David S. Miller
This way the caller can get at the fully resolved fl4->{daddr,saddr} etc. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-18ipv4: Kill RT_CACHE_DEBUGDavid S. Miller
It's way past it's usefulness. And this gets rid of a bunch of stray ->rt_{dst,src} references. Even the comment documenting the macro was inaccurate (stated default was 1 when it's 0). If reintroduced, it should be done properly, with dynamic debug facilities. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-17ipv4: Don't use enums as bitmasks in ip_fragment.cDavid S. Miller
Noticed by Joe Perches. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-17net: ping: fix build failureVasiliy Kulikov
If CONFIG_PROC_SYSCTL=n the building process fails: ping.c:(.text+0x52af3): undefined reference to `inet_get_ping_group_range_net' Moved inet_get_ping_group_range_net() to ping.c. Reported-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Vasiliy Kulikov <segoon@openwall.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-16ipv4: more compliant RFC 3168 supportEric Dumazet
Commit 6623e3b24a5e (ipv4: IP defragmentation must be ECN aware) was an attempt to not lose "Congestion Experienced" (CE) indications when performing datagram defragmentation. Stefanos Harhalakis raised the point that RFC 3168 requirements were not completely met by this commit. In particular, we MUST detect invalid combinations and eventually drop illegal frames. Reported-by: Stefanos Harhalakis <v13@v13.gr> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-16ipv4: Trivial rt->rt_src conversions in net/ipv4/route.cDavid S. Miller
At these points we have a fully filled in value via the IP header the form of ip_hdr(skb)->saddr Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-16net: ping: dont call udp_ioctl()Eric Dumazet
udp_ioctl() really handles UDP and UDPLite protocols. 1) It can increment UDP_MIB_INERRORS in case first_packet_length() finds a frame with bad checksum. 2) It has a dependency on sizeof(struct udphdr), not applicable to ICMP/PING If ping sockets need to handle SIOCINQ/SIOCOUTQ ioctl, this should be done differently. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Vasiliy Kulikov <segoon@openwall.com> Acked-by: Vasiliy Kulikov <segoon@openwall.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-15net: ping: small changesEric Dumazet
ping_table is not __read_mostly, since it contains one rwlock, and is static to ping.c ping_port_rover & ping_v4_lookup are static Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Vasiliy Kulikov <segoon@openwall.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-13ipv4: Remove rt->rt_dst reference from ip_forward_options().David S. Miller
At this point iph->daddr equals what rt->rt_dst would hold. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-13ipv4: Remove route key identity dependencies in ip_rt_get_source().David S. Miller
Pass in the sk_buff so that we can fetch the necessary keys from the packet header when working with input routes. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-13ipv4: Always call ip_options_build() after rest of IP header is filled in.David S. Miller
This will allow ip_options_build() to reliably look at the values of iph->{daddr,saddr} Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-13ipv4: Kill spurious write to iph->daddr in ip_forward_options().David S. Miller
This code block executes when opt->srr_is_hit is set. It will be set only by ip_options_rcv_srr(). ip_options_rcv_srr() walks until it hits a matching nexthop in the SRR option addresses, and when it matches one 1) looks up the route for that nexthop and 2) on route lookup success it writes that nexthop value into iph->daddr. ip_forward_options() runs later, and again walks the SRR option addresses looking for the option matching the destination of the route stored in skb_rtable(). This route will be the same exact one looked up for the nexthop by ip_options_rcv_srr(). Therefore "rt->rt_dst == iph->daddr" must be true. All it really needs to do is record the route's source address in the matching SRR option adddress. It need not write iph->daddr again, since that has already been done by ip_options_rcv_srr() as detailed above. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-13net: ipv4: add IPPROTO_ICMP socket kindVasiliy Kulikov
This patch adds IPPROTO_ICMP socket kind. It makes it possible to send ICMP_ECHO messages and receive the corresponding ICMP_ECHOREPLY messages without any special privileges. In other words, the patch makes it possible to implement setuid-less and CAP_NET_RAW-less /bin/ping. In order not to increase the kernel's attack surface, the new functionality is disabled by default, but is enabled at bootup by supporting Linux distributions, optionally with restriction to a group or a group range (see below). Similar functionality is implemented in Mac OS X: http://www.manpagez.com/man/4/icmp/ A new ping socket is created with socket(PF_INET, SOCK_DGRAM, PROT_ICMP) Message identifiers (octets 4-5 of ICMP header) are interpreted as local ports. Addresses are stored in struct sockaddr_in. No port numbers are reserved for privileged processes, port 0 is reserved for API ("let the kernel pick a free number"). There is no notion of remote ports, remote port numbers provided by the user (e.g. in connect()) are ignored. Data sent and received include ICMP headers. This is deliberate to: 1) Avoid the need to transport headers values like sequence numbers by other means. 2) Make it easier to port existing programs using raw sockets. ICMP headers given to send() are checked and sanitized. The type must be ICMP_ECHO and the code must be zero (future extensions might relax this, see below). The id is set to the number (local port) of the socket, the checksum is always recomputed. ICMP reply packets received from the network are demultiplexed according to their id's, and are returned by recv() without any modifications. IP header information and ICMP errors of those packets may be obtained via ancillary data (IP_RECVTTL, IP_RETOPTS, and IP_RECVERR). ICMP source quenches and redirects are reported as fake errors via the error queue (IP_RECVERR); the next hop address for redirects is saved to ee_info (in network order). socket(2) is restricted to the group range specified in "/proc/sys/net/ipv4/ping_group_range". It is "1 0" by default, meaning that nobody (not even root) may create ping sockets. Setting it to "100 100" would grant permissions to the single group (to either make /sbin/ping g+s and owned by this group or to grant permissions to the "netadmins" group), "0 4294967295" would enable it for the world, "100 4294967295" would enable it for the users, but not daemons. The existing code might be (in the unlikely case anyone needs it) extended rather easily to handle other similar pairs of ICMP messages (Timestamp/Reply, Information Request/Reply, Address Mask Request/Reply etc.). Userspace ping util & patch for it: http://openwall.info/wiki/people/segoon/ping For Openwall GNU/*/Linux it was the last step on the road to the setuid-less distro. A revision of this patch (for RHEL5/OpenVZ kernels) is in use in Owl-current, such as in the 2011/03/12 LiveCD ISOs: http://mirrors.kernel.org/openwall/Owl/current/iso/ Initially this functionality was written by Pavel Kankovsky for Linux 2.4.32, but unfortunately it was never made public. All ping options (-b, -p, -Q, -R, -s, -t, -T, -M, -I), are tested with the patch. PATCH v3: - switched to flowi4. - minor changes to be consistent with raw sockets code. PATCH v2: - changed ping_debug() to pr_debug(). - removed CONFIG_IP_PING. - removed ping_seq_fops.owner field (unused for procfs). - switched to proc_net_fops_create(). - switched to %pK in seq_printf(). PATCH v1: - fixed checksumming bug. - CAP_NET_RAW may not create icmp sockets anymore. RFC v2: - minor cleanups. - introduced sysctl'able group range to restrict socket(2). Signed-off-by: Vasiliy Kulikov <segoon@openwall.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-12ipv4: Fix 'iph' use before set.David S. Miller
I swear none of my compilers warned about this, yet it is so obvious. > net/ipv4/ip_forward.c: In function 'ip_forward': > net/ipv4/ip_forward.c:87: warning: 'iph' may be used uninitialized in this function Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-12ipv4: Elide use of rt->rt_dst in ip_forward()David S. Miller
No matter what kind of header mangling occurs due to IP options processing, rt->rt_dst will always equal iph->daddr in the packet. So we can safely use iph->daddr instead of rt->rt_dst here. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-12ipv4: Simplify iph->daddr overwrite in ip_options_rcv_srr().David S. Miller
We already copy the 4-byte nexthop from the options block into local variable "nexthop" for the route lookup. Re-use that variable instead of memcpy()'ing again when assigning to iph->daddr after the route lookup succeeds. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-12ipv4: Kill spurious opt->srr check in ip_options_rcv_srr().David S. Miller
All call sites conditionalize the call to ip_options_rcv_srr() with a check of opt->srr, so no need to check it again there. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-11Merge branch 'master' of ↵David S. Miller
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-3.6 Conflicts: drivers/net/benet/be_main.c
2011-05-10xfrm: Assign the inner mode output function to the dst entrySteffen Klassert
As it is, we assign the outer modes output function to the dst entry when we create the xfrm bundle. This leads to two problems on interfamily scenarios. We might insert ipv4 packets into ip6_fragment when called from xfrm6_output. The system crashes if we try to fragment an ipv4 packet with ip6_fragment. This issue was introduced with git commit ad0081e4 (ipv6: Fragment locally generated tunnel-mode IPSec6 packets as needed). The second issue is, that we might insert ipv4 packets in netfilter6 and vice versa on interfamily scenarios. With this patch we assign the inner mode output function to the dst entry when we create the xfrm bundle. So xfrm4_output/xfrm6_output from the inner mode is used and the right fragmentation and netfilter functions are called. We switch then to outer mode with the output_finish functions. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-10net: fix two lockdep splatsEric Dumazet
Commit e67f88dd12f6 (net: dont hold rtnl mutex during netlink dump callbacks) switched rtnl protection to RCU, but we forgot to adjust two rcu_dereference() lockdep annotations : inet_get_link_af_size() or inet_fill_link_af() might be called with rcu_read_lock or rtnl held, so use rcu_dereference_rtnl() instead of rtnl_dereference() Reported-by: Valdis Kletnieks <Valdis.Kletnieks@vt.edu> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-10ipv4: xfrm: Eliminate ->rt_src reference in policy code.David S. Miller
Rearrange xfrm4_dst_lookup() so that it works by calling a helper function __xfrm_dst_lookup() that takes an explicit flow key storage area as an argument. Use this new helper in xfrm4_get_saddr() so we can fetch the selected source address from the flow instead of from rt->rt_src Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-10ipv4: udp: Eliminate remaining uses of rt->rt_srcDavid S. Miller
We already track and pass around the correct flow key, so simply use it in udp_send_skb(). Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-10ipv4: icmp: Eliminate remaining uses of rt->rt_srcDavid S. Miller
On input packets, rt->rt_src always equals ip_hdr(skb)->saddr Anything that mangles or otherwise changes the IP header must relookup the route found at skb_rtable(). Therefore this invariant must always hold true. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-10ipv4: Pass explicit daddr arg to ip_send_reply().David S. Miller
This eliminates an access to rt->rt_src. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-08ipv4: Pass flow key down into ip_append_*().David S. Miller
This way rt->rt_dst accesses are unnecessary. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-08ipv4: Pass flow keys down into datagram packet building engine.David S. Miller
This way ip_output.c no longer needs rt->rt_{src,dst}. We already have these keys sitting, ready and waiting, on the stack or in a socket structure. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-08udp: Use flow key information instead of rt->rt_{src,dst}David S. Miller
We have two cases. Either the socket is in TCP_ESTABLISHED state and connect() filled in the inet socket cork flow, or we looked up the route here and used an on-stack flow. Track which one it was, and use it to obtain src/dst addrs. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-08tcp_cubic: limit delayed_ack ratio to prevent divide errorstephen hemminger
TCP Cubic keeps a metric that estimates the amount of delayed acknowledgements to use in adjusting the window. If an abnormally large number of packets are acknowledged at once, then the update could wrap and reach zero. This kind of ACK could only happen when there was a large window and huge number of ACK's were lost. This patch limits the value of delayed ack ratio. The choice of 32 is just a conservative value since normally it should be range of 1 to 4 packets. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-08tcp: Use cork flow info instead of rt->rt_dst in tcp_v4_get_peer()David S. Miller
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-08ipv4: Don't use rt->rt_{src,dst} in ip_queue_xmit().David S. Miller
Now we can pick it out of the provided flow key. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-08inet: Pass flowi to ->queue_xmit().David S. Miller
This allows us to acquire the exact route keying information from the protocol, however that might be managed. It handles all of the possibilities, from the simplest case of storing the key in inet->cork.fl to the more complex setup SCTP has where individual transports determine the flow. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-08ipv4: Use inet_csk_route_child_sock() in DCCP and TCP.David S. Miller
Operation order is now transposed, we first create the child socket then we try to hook up the route. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-08ipv4: Create inet_csk_route_child_sock().David S. Miller
This is just like inet_csk_route_req() except that it operates after we've created the new child socket. In this way we can use the new socket's cork flow for proper route key storage. This will be used by DCCP and TCP child socket creation handling. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-08ipv4: Use cork flow in ip_queue_xmit()David S. Miller
All invokers of ip_queue_xmit() must make certain that the socket is locked. All of SCTP, TCP, DCCP, and L2TP now make sure this is the case. Therefore we can use the cork flow during output route lookup in ip_queue_xmit() when the socket route check fails. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-08ipv4: Use cork flow in inet_sk_{reselect_saddr,rebuild_header}()David S. Miller
These two functions must be invoked only when the socket is locked (because socket identity modifications are made non-atomically). Therefore we can use the cork flow for output route lookups. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-08ipv4: Lock socket and use cork flow in ip4_datagram_connect().David S. Miller
This is to make sure that an l2tp socket's inet cork flow is fully filled in, when it's encapsulated in UDP. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-08tcp: Use cork flow in tcp_v4_connect()David S. Miller
Since this is invoked from inet_stream_connect() the socket is locked and therefore this usage is safe. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-07net,rcu: convert call_rcu(ip_mc_socklist_reclaim) to kfree_rcu()Lai Jiangshan
The rcu callback ip_mc_socklist_reclaim() just calls a kfree(), so we use kfree_rcu() instead of the call_rcu(ip_mc_socklist_reclaim). Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2011-05-07net,rcu: convert call_rcu(ip_sf_socklist_reclaim) to kfree_rcu()Lai Jiangshan
The rcu callback ip_sf_socklist_reclaim() just calls a kfree(), so we use kfree_rcu() instead of the call_rcu(ip_sf_socklist_reclaim). Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2011-05-07net,rcu: convert call_rcu(ip_mc_list_reclaim) to kfree_rcu()Lai Jiangshan
The rcu callback ip_mc_list_reclaim() just calls a kfree(), so we use kfree_rcu() instead of the call_rcu(ip_mc_list_reclaim). Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2011-05-07net,rcu: convert call_rcu(__leaf_info_free_rcu) to kfree_rcu()Lai Jiangshan
The rcu callback __leaf_info_free_rcu() just calls a kfree(), so we use kfree_rcu() instead of the call_rcu(__leaf_info_free_rcu). Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2011-05-07net,rcu: convert call_rcu(fc_rport_free_rcu) to kfree_rcu()Lai Jiangshan
The rcu callback fc_rport_free_rcu() just calls a kfree(), so we use kfree_rcu() instead of the call_rcu(fc_rport_free_rcu). Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>