summaryrefslogtreecommitdiffstats
path: root/net
AgeCommit message (Collapse)Author
2012-08-14gre: Support GRE over IPv6xeb@mail.ru
GRE over IPv6 implementation. Signed-off-by: Dmitry Kozlov <xeb@mail.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-14net: remove netdev_bonding_change()Amerigo Wang
I don't see any benifits to use netdev_bonding_change() than using call_netdevice_notifiers() directly. Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-14net: move and rename netif_notify_peers()Amerigo Wang
I believe net/core/dev.c is a better place for netif_notify_peers(), because other net event notify functions also stay in this file. And rename it to netdev_notify_peers(). Cc: David S. Miller <davem@davemloft.net> Cc: Ian Campbell <Ian.Campbell@citrix.com> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-14Merge branch 'master' of ↵John W. Linville
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem
2012-08-14netfilter: ctnetlink: fix missing locking while changing conntrack from nfqueuePablo Neira Ayuso
Since 9cb017665 netfilter: add glue code to integrate nfnetlink_queue and ctnetlink, we can modify the conntrack entry via nfnl_queue. However, the change of the conntrack entry via nfnetlink_queue requires appropriate locking to avoid concurrent updates. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-08-14netfilter: PTR_RET can be usedWu Fengguang
This quiets the coccinelle warnings: net/bridge/netfilter/ebtable_filter.c:107:1-3: WARNING: PTR_RET can be used net/bridge/netfilter/ebtable_nat.c:107:1-3: WARNING: PTR_RET can be used net/ipv6/netfilter/ip6table_filter.c:65:1-3: WARNING: PTR_RET can be used net/ipv6/netfilter/ip6table_mangle.c:100:1-3: WARNING: PTR_RET can be used net/ipv6/netfilter/ip6table_raw.c:44:1-3: WARNING: PTR_RET can be used net/ipv6/netfilter/ip6table_security.c:62:1-3: WARNING: PTR_RET can be used net/ipv4/netfilter/iptable_filter.c:72:1-3: WARNING: PTR_RET can be used net/ipv4/netfilter/iptable_mangle.c:107:1-3: WARNING: PTR_RET can be used net/ipv4/netfilter/iptable_raw.c:51:1-3: WARNING: PTR_RET can be used net/ipv4/netfilter/iptable_security.c:70:1-3: WARNING: PTR_RET can be used Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-08-13TTY: ircomm_tty, add tty installJiri Slaby
This has two outcomes: * we give the TTY layer a tty_port * we do not find the info structure every time open is called on that tty Signed-off-by: Jiri Slaby <jslaby@suse.cz> Cc: Samuel Ortiz <samuel@sortiz.org> Acked-by: Alan Cox <alan@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-08-13TTY: use tty_port_register_deviceJiri Slaby
Currently we have no way to assign tty->port while performing tty installation. There are two ways to provide the link tty_struct => tty_port. Either by calling tty_port_install from tty->ops->install or tty_port_register_device called instead of tty_register_device when the device is being set up after connected. In this patch we modify most of the drivers to do the latter. When the drivers use tty_register_device and we have tty_port already, we switch to tty_port_register_device. So we have the tty_struct => tty_port link for free for those. Signed-off-by: Jiri Slaby <jslaby@suse.cz> Acked-by: Alan Cox <alan@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-08-13workqueue: use mod_delayed_work() instead of cancel + queueTejun Heo
Convert delayed_work users doing cancel_delayed_work() followed by queue_delayed_work() to mod_delayed_work(). Most conversions are straight-forward. Ones worth mentioning are, * drivers/edac/edac_mc.c: edac_mc_workq_setup() converted to always use mod_delayed_work() and cancel loop in edac_mc_reset_delay_period() is dropped. * drivers/platform/x86/thinkpad_acpi.c: No need to remember whether watchdog is active or not. @fan_watchdog_active and related code dropped. * drivers/power/charger-manager.c: Seemingly a lot of delayed_work_pending() abuse going on here. [delayed_]work_pending() are unsynchronized and racy when used like this. I converted one instance in fullbatt_handler(). Please conver the rest so that it invokes workqueue APIs for the intended target state rather than trying to game work item pending state transitions. e.g. if timer should be modified - call mod_delayed_work(), canceled - call cancel_delayed_work[_sync](). * drivers/thermal/thermal_sys.c: thermal_zone_device_set_polling() simplified. Note that round_jiffies() calls in this function are meaningless. round_jiffies() work on absolute jiffies not delta delay used by delayed_work. v2: Tomi pointed out that __cancel_delayed_work() users can't be safely converted to mod_delayed_work(). They could be calling it from irq context and if that happens while delayed_work_timer_fn() is running, it could deadlock. __cancel_delayed_work() users are dropped. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br> Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Acked-by: Anton Vorontsov <cbouatmailru@gmail.com> Acked-by: David Howells <dhowells@redhat.com> Cc: Tomi Valkeinen <tomi.valkeinen@ti.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Doug Thompson <dougthompson@xmission.com> Cc: David Airlie <airlied@linux.ie> Cc: Roland Dreier <roland@kernel.org> Cc: "John W. Linville" <linville@tuxdriver.com> Cc: Zhang Rui <rui.zhang@intel.com> Cc: Len Brown <len.brown@intel.com> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Johannes Berg <johannes@sipsolutions.net>
2012-08-13mac80211: fix unnecessary beacon update after peering status changeMarco Porsch
ieee80211_bss_info_change_notify is called everytime a peer link is established or closed, because the accepting_plinks flag in the meshconf IE *might* have changed. With this patch the corresponding functions return the BSS_CHANGED_BEACON flag when a beacon update is necessary. Also it makes mesh_accept_plinks_update the common place to update the accepting_plinks flag. mesh_accept_plinks_update is called upon plink change and also periodically from ieee80211_mesh_housekeeping. Thus, it also picks up changes of local->num_sta. Signed-off-by: Marco Porsch <marco.porsch@etit.tu-chemnitz.de> Acked-by: Thomas Pedersen <thomas@cozybit.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-08-12af_packet: remove BUG statement in tpacket_destruct_skbdanborkmann@iogearbox.net
Here's a quote of the comment about the BUG macro from asm-generic/bug.h: Don't use BUG() or BUG_ON() unless there's really no way out; one example might be detecting data structure corruption in the middle of an operation that can't be backed out of. If the (sub)system can somehow continue operating, perhaps with reduced functionality, it's probably not BUG-worthy. If you're tempted to BUG(), think again: is completely giving up really the *only* solution? There are usually better options, where users don't need to reboot ASAP and can mostly shut down cleanly. In our case, the status flag of a ring buffer slot is managed from both sides, the kernel space and the user space. This means that even though the kernel side might work as expected, the user space screws up and changes this flag right between the send(2) is triggered when the flag is changed to TP_STATUS_SENDING and a given skb is destructed after some time. Then, this will hit the BUG macro. As David suggested, the best solution is to simply remove this statement since it cannot be used for kernel side internal consistency checks. I've tested it and the system still behaves /stable/ in this case, so in accordance with the above comment, we should rather remove it. Signed-off-by: Daniel Borkmann <daniel.borkmann@tik.ee.ethz.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-10Merge branch 'for-davem' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless John W. Linville says: ==================== Here is a handful of fixes intended for 3.6. Daniel Drake offers a cfg80211 fix to consume pending events before taking a wireless device down. This prevents a resource leak. Stanislaw Gruszka gives us a fix for a NULL pointer dereference in rt61pci. Johannes Berg provides an iwlwifi patch to disable "greenfield" mode. Use of that mode was causing a rate scaling problem in for iwlwifi. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-10ipv4: fix ip_send_skb()Eric Dumazet
ip_send_skb() can send orphaned skb, so we must pass the net pointer to avoid possible NULL dereference in error path. Bug added by commit 3a7c384ffd57 (ipv4: tcp: unicast_sock should not land outside of TCP stack) Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-10tty: localise the lockAlan Cox
The termios and other changes mean the other protections needed on the driver tty arrays should be adequate. Turn it all back on. This contains pieces folded in from the fixes made to the original patches | From: Geert Uytterhoeven <geert@linux-m68k.org> (fix m68k) | From: Paul Gortmaker <paul.gortmaker@windriver.com> (fix cris) | From: Jiri Kosina <jkosina@suze.cz> (lockdep) | From: Eric Dumazet <eric.dumazet@gmail.com> (lockdep) Signed-off-by: Alan Cox <alan@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-08-10Merge branch 'master' of ↵John W. Linville
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next
2012-08-10Merge branch 'master' of ↵John W. Linville
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth
2012-08-10Merge branch 'master' of ↵John W. Linville
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem
2012-08-10netfilter: nf_nat_sip: fix via header translation with multiple parametersPatrick McHardy
Via-headers are parsed beginning at the first character after the Via-address. When the address is translated first and its length decreases, the offset to start parsing at is incorrect and header parameters might be missed. Update the offset after translating the Via-address to fix this. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-08-10netfilter: nf_ct_sip: fix IPv6 address parsingPatrick McHardy
Within SIP messages IPv6 addresses are enclosed in square brackets in most cases, with the exception of the "received=" header parameter. Currently the helper fails to parse enclosed addresses. This patch: - changes the SIP address parsing function to enforce square brackets when required, and accept them when not required but present, as recommended by RFC 5118. - adds a new SDP address parsing function that never accepts square brackets since SDP doesn't use them. With these changes, the SIP helper correctly parses all test messages from RFC 5118 (Session Initiation Protocol (SIP) Torture Test Messages for Internet Protocol Version 6 (IPv6)). Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-08-10netfilter: nf_ct_sip: fix helper namePatrick McHardy
Commit 3a8fc53a (netfilter: nf_ct_helper: allocate 16 bytes for the helper and policy names) introduced a bug in the SIP helper, the helper name is sprinted to the sip_names array instead of instead of into the helper structure. This breaks the helper match and the /proc/net/nf_conntrack_expect output. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-08-09net: tcp: ipv6_mapped needs sk_rx_dst_set methodEric Dumazet
commit 5d299f3d3c8a2fb (net: ipv6: fix TCP early demux) added a regression for ipv6_mapped case. [ 67.422369] SELinux: initialized (dev autofs, type autofs), uses genfs_contexts [ 67.449678] SELinux: initialized (dev autofs, type autofs), uses genfs_contexts [ 92.631060] BUG: unable to handle kernel NULL pointer dereference at (null) [ 92.631435] IP: [< (null)>] (null) [ 92.631645] PGD 0 [ 92.631846] Oops: 0010 [#1] SMP [ 92.632095] Modules linked in: autofs4 sunrpc ipv6 dm_mirror dm_region_hash dm_log dm_multipath dm_mod video sbs sbshc battery ac lp parport sg snd_hda_intel snd_hda_codec snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device pcspkr snd_pcm_oss snd_mixer_oss snd_pcm snd_timer serio_raw button floppy snd i2c_i801 i2c_core soundcore snd_page_alloc shpchp ide_cd_mod cdrom microcode ehci_hcd ohci_hcd uhci_hcd [ 92.634294] CPU 0 [ 92.634294] Pid: 4469, comm: sendmail Not tainted 3.6.0-rc1 #3 [ 92.634294] RIP: 0010:[<0000000000000000>] [< (null)>] (null) [ 92.634294] RSP: 0018:ffff880245fc7cb0 EFLAGS: 00010282 [ 92.634294] RAX: ffffffffa01985f0 RBX: ffff88024827ad00 RCX: 0000000000000000 [ 92.634294] RDX: 0000000000000218 RSI: ffff880254735380 RDI: ffff88024827ad00 [ 92.634294] RBP: ffff880245fc7cc8 R08: 0000000000000001 R09: 0000000000000000 [ 92.634294] R10: 0000000000000000 R11: ffff880245fc7bf8 R12: ffff880254735380 [ 92.634294] R13: ffff880254735380 R14: 0000000000000000 R15: 7fffffffffff0218 [ 92.634294] FS: 00007f4516ccd6f0(0000) GS:ffff880256600000(0000) knlGS:0000000000000000 [ 92.634294] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 92.634294] CR2: 0000000000000000 CR3: 0000000245ed1000 CR4: 00000000000007f0 [ 92.634294] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 92.634294] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 92.634294] Process sendmail (pid: 4469, threadinfo ffff880245fc6000, task ffff880254b8cac0) [ 92.634294] Stack: [ 92.634294] ffffffff813837a7 ffff88024827ad00 ffff880254b6b0e8 ffff880245fc7d68 [ 92.634294] ffffffff81385083 00000000001d2680 ffff8802547353a8 ffff880245fc7d18 [ 92.634294] ffffffff8105903a ffff88024827ad60 0000000000000002 00000000000000ff [ 92.634294] Call Trace: [ 92.634294] [<ffffffff813837a7>] ? tcp_finish_connect+0x2c/0xfa [ 92.634294] [<ffffffff81385083>] tcp_rcv_state_process+0x2b6/0x9c6 [ 92.634294] [<ffffffff8105903a>] ? sched_clock_cpu+0xc3/0xd1 [ 92.634294] [<ffffffff81059073>] ? local_clock+0x2b/0x3c [ 92.634294] [<ffffffff8138caf3>] tcp_v4_do_rcv+0x63a/0x670 [ 92.634294] [<ffffffff8133278e>] release_sock+0x128/0x1bd [ 92.634294] [<ffffffff8139f060>] __inet_stream_connect+0x1b1/0x352 [ 92.634294] [<ffffffff813325f5>] ? lock_sock_nested+0x74/0x7f [ 92.634294] [<ffffffff8104b333>] ? wake_up_bit+0x25/0x25 [ 92.634294] [<ffffffff813325f5>] ? lock_sock_nested+0x74/0x7f [ 92.634294] [<ffffffff8139f223>] ? inet_stream_connect+0x22/0x4b [ 92.634294] [<ffffffff8139f234>] inet_stream_connect+0x33/0x4b [ 92.634294] [<ffffffff8132e8cf>] sys_connect+0x78/0x9e [ 92.634294] [<ffffffff813fd407>] ? sysret_check+0x1b/0x56 [ 92.634294] [<ffffffff81088503>] ? __audit_syscall_entry+0x195/0x1c8 [ 92.634294] [<ffffffff811cc26e>] ? trace_hardirqs_on_thunk+0x3a/0x3f [ 92.634294] [<ffffffff813fd3e2>] system_call_fastpath+0x16/0x1b [ 92.634294] Code: Bad RIP value. [ 92.634294] RIP [< (null)>] (null) [ 92.634294] RSP <ffff880245fc7cb0> [ 92.634294] CR2: 0000000000000000 [ 92.648982] ---[ end trace 24e2bed94314c8d9 ]--- [ 92.649146] Kernel panic - not syncing: Fatal exception in interrupt Fix this using inet_sk_rx_dst_set(), and export this function in case IPv6 is modular. Reported-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-09ipv4: tcp: unicast_sock should not land outside of TCP stackEric Dumazet
commit be9f4a44e7d41cee (ipv4: tcp: remove per net tcp_sock) added a selinux regression, reported and bisected by John Stultz selinux_ip_postroute_compat() expect to find a valid sk->sk_security pointer, but this field is NULL for unicast_sock It turns out that unicast_sock are really temporary stuff to be able to reuse part of IP stack (ip_append_data()/ip_push_pending_frames()) Fact is that frames sent by ip_send_unicast_reply() should be orphaned to not fool LSM. Note IPv6 never had this problem, as tcp_v6_send_response() doesnt use a fake socket at all. I'll probably implement tcp_v4_send_response() to remove these unicast_sock in linux-3.7 Reported-by: John Stultz <johnstul@us.ibm.com> Bisected-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Paul Moore <paul@paul-moore.com> Cc: Eric Paris <eparis@parisplace.org> Cc: "Serge E. Hallyn" <serge@hallyn.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-10ipvs: add pmtu_disc option to disable IP DF for TUN packetsJulian Anastasov
Disabling PMTU discovery can increase the output packet rate but some users have enough resources and prefer to fragment than to drop traffic. By default, we copy the DF bit but if pmtu_disc is disabled we do not send FRAG_NEEDED messages anymore. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2012-08-10ipvs: implement passive PMTUD for IPIP packetsJulian Anastasov
IPVS is missing the logic to update PMTU in routing for its IPIP packets. We monitor the dst_mtu and can return FRAG_NEEDED messages but if the tunneled packets get ICMP error we can not rely on other traffic to save the lowest MTU. The following patch adds ICMP handling for IPIP packets in incoming direction, from some remote host to our local IP used as saddr in the outer header. By this way we can forward any related ICMP traffic if it is for IPVS TUN connection. For the special case of PMTUD we update the routing and if client requested DF we can forward the error. To properly update the routing we have to bind the cached route (dest->dst_cache) to the selected saddr because ipv4_update_pmtu uses saddr for dst lookup. Add IP_VS_RT_MODE_CONNECT flag to force such binding with second route. Update ip_vs_tunnel_xmit to provide IP_VS_RT_MODE_CONNECT and change the code to copy DF. For now we prefer not to force PMTU discovery (outer DF=1) because we don't have configuration option to enable or disable PMTUD. As we do not keep any packets to resend, we prefer not to play games with packets without DF bit because the sender is not informed when they are rejected. Also, change ops->update_pmtu to be called only for local clients because there is no point to update MTU for input routes, in our case skb->dst->dev is lo. It seems the code is copied from ipip.c where the skb dst points to tunnel device. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2012-08-10ipvs: fixed sparse warningClaudiu Ghioc
Removed the following sparse warnings, wether CONFIG_SYSCTL is defined or not: * warning: symbol 'ip_vs_control_net_init_sysctl' was not declared. Should it be static? * warning: symbol 'ip_vs_control_net_cleanup_sysctl' was not declared. Should it be static? Signed-off-by: Claudiu Ghioc <claudiu.ghioc@gmail.com> Signed-off-by: Simon Horman <horms@verge.net.au>
2012-08-10ipvs: generalize app registration in netnsJulian Anastasov
Get rid of the ftp_app pointer and allow applications to be registered without adding fields in the netns_ipvs structure. v2: fix coding style as suggested by Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2012-08-10ipvs: ip_vs_ftp depends on nf_conntrack_ftp helperJulian Anastasov
The FTP application indirectly depends on the nf_conntrack_ftp helper for proper NAT support. If the module is not loaded, IPVS can resize the packets for the command connection, eg. PASV response but the SEQ adjustment logic in ipv4_confirm is not called without helper. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2012-08-09net: Loopback ifindex is constant nowPavel Emelyanov
As pointed out, there are places, that access net->loopback_dev->ifindex and after ifindex generation is made per-net this value becomes constant equals 1. So go ahead and introduce the LOOPBACK_IFINDEX constant and use it where appropriate. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-09net: Make ifindex generation per-net namespacePavel Emelyanov
Strictly speaking this is only _really_ required for checkpoint-restore to make loopback device always have the same index. This change appears to be safe wrt "ifindex should be unique per-system" concept, as all the ifindex usage is either already made per net namespace of is explicitly limited with init_net only. There are two cool side effects of this. The first one -- ifindices of devices in container are always small, regardless of how many containers we've started (and re-started) so far. The second one is -- we can speed up the loopback ifidex access as shown in the next patch. v2: Place ifindex right after dev_base_seq : avoid two holes and use the same cache line, dirtied in list_netdevice()/unlist_netdevice() Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-09net: Allow to create links with given ifindexPavel Emelyanov
Currently the RTM_NEWLINK results in -EOPNOTSUPP if the ifinfomsg->ifi_index is not zero. I propose to allow requesting ifindices on link creation. This is required by the checkpoint-restore to correctly restore a net namespace (i.e. -- a container). Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-09time: jiffies_delta_to_clock_t() helper to the rescueEric Dumazet
Various /proc/net files sometimes report crazy timer values, expressed in clock_t units. This happens when an expired timer delta (expires - jiffies) is passed to jiffies_to_clock_t(). This function has an overflow in : return div_u64((u64)x * TICK_NSEC, NSEC_PER_SEC / USER_HZ); commit cbbc719fccdb8cb (time: Change jiffies_to_clock_t() argument type to unsigned long) only got around the problem. As we cant output negative values in /proc/net/tcp without breaking various tools, I suggest adding a jiffies_delta_to_clock_t() wrapper that caps the negative delta to a 0 value. Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Maciej Żenczykowski <maze@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: hank <pyu@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-09tcp: must free metrics at net dismantleEric Dumazet
We currently leak all tcp metrics at struct net dismantle time. tcp_net_metrics_exit() frees the hash table, we must first iterate it to free all metrics. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-08net/core: Fix potential memory leak in dev_set_alias()Alexey Khoroshilov
Do not leak memory by updating pointer with potentially NULL realloc return value. Found by Linux Driver Verification project (linuxtesting.org). Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-08batman-adv: Fix mem leak in the batadv_tt_local_event() functionJesper Juhl
Memory is allocated for 'tt_change_node' with kmalloc(). 'tt_change_node' may go out of scope really being used for anything (except have a few members initialized) if we hit the 'del:' label. This patch makes sure we free the memory in that case. Signed-off-by: Jesper Juhl <jj@chaosbits.net> Acked-by: Antonio Quartulli <ordex@autistici.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-08sched: add missing group change to qfq_change_classPaolo Valente
[Resending again, as the text was corrupted by the email client] To speed up operations, QFQ internally divides classes into groups. Which group a class belongs to depends on the ratio between the maximum packet length and the weight of the class. Unfortunately the function qfq_change_class lacks the steps for changing the group of a class when the ratio max_pkt_len/weight of the class changes. For example, when the last of the following three commands is executed, the group of class 1:1 is not correctly changed: tc disc add dev XXX root handle 1: qfq tc class add dev XXX parent 1: qfq classid 1:1 weight 1 tc class change dev XXX parent 1: classid 1:1 qfq weight 4 Not changing the group of a class does not affect the long-term bandwidth guaranteed to the class, as the latter is independent of the maximum packet length, and correctly changes (only) if the weight of the class changes. In contrast, if the group of the class is not updated, the class is still guaranteed the short-term bandwidth and packet delay related to its old group, instead of the guarantees that it should receive according to its new weight and/or maximum packet length. This may also break service guarantees for other classes. This patch adds the missing operations. Signed-off-by: Paolo Valente <paolo.valente@unimore.it> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-08net: force dst_default_metrics to const sectionEric Dumazet
While investigating on network performance problems, I found this little gem : $ nm -v vmlinux | grep -1 dst_default_metrics ffffffff82736540 b busy.46605 ffffffff82736560 B dst_default_metrics ffffffff82736598 b dst_busy_list Apparently, declaring a const array without initializer put it in (writeable) bss section, in middle of possibly often dirtied cache lines. Since we really want dst_default_metrics be const to avoid any possible false sharing and catch any buggy writes, I force a null initializer. ffffffff818a4c20 R dst_default_metrics Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-08net: fib: fix incorrect call_rcu_bh()Eric Dumazet
After IP route cache removal, I believe rcu_bh() has very little use and we should remove this RCU variant, since it adds some cycles in fast path. Anyway, the call_rcu_bh() use in fib_true is obviously wrong, since some users only assert rcu_read_lock(). Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-08af_packet: Quiet sparse noise about using plain integer as NULL pointerYing Xue
Quiets the sparse warning: warning: Using plain integer as NULL pointer Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Missed rcu_assign_pointer() in mac80211 scanning, from Johannes Berg. 2) Allow devices to limit the number of segments that an individual TCP TSO packet can use at a time, to deal with device and/or driver specific limitations. From Ben Hutchings. 3) Fix unexpected hard IPSEC expiration after setting the date. From Fan Du. 4) Memory leak fix in bxn2x driver, from Jesper Juhl. 5) Fix two memory leaks in libertas driver, from Daniel Drake. 6) Fix deref of out-of-range array index in packet scheduler generic actions layer. From Hiroaki SHIMODA. 7) Fix TX flow control errors in mlx4 driver, from Yevgeny Petrilin. 8) Fix CRIS eth_v10.c driver build, from Randy Dunlap. 9) Fix wrong SKB freeing in LLC protocol layer, from Sorin Dumitru. 10) The IP output path checks neigh lookup errors incorrectly, it needs to use IS_ERR(). From Vasiliy Kulikov. 11) An estimator leak leads to deref of freed memory in timer handler, fix from Hiroaki SHIMODA. 12) TCP early demux in ipv6 needs to use DST cookies in order to validate the RX route properly. Fix from Eric Dumazet. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (43 commits) net: ipv6: fix TCP early demux net: Use PTR_RET rather than if(IS_ERR(.. [1] net_sched: act: Delete estimator in error path. ip: fix error handling in ip_finish_output2() llc: free the right skb ixp4xx_eth: fix ptp_ixp46x build failure drivers/atm/iphase.c: fix error return code tcp_output: fix sparse warning for tcp_wfree drivers/net/phy/mdio-mux-gpio.c: drop devm_kfree of devm_kzalloc'd data batman-adv: select an internet gateway if none was chosen mISDN: Bugfix for layer2 fixed TEI mode igb: don't break user visible strings over multiple lines in igb_ethtool.c igb: correct hardware type (i210/i211) check in igb_loopback_test() igb: Fix for failure to init on some 82576 devices. cris: fix eth_v10.c build error cdc-ncm: tag Ericsson WWAN devices (eg F5521gw) with FLAG_WWAN isdnloop: fix and simplify isdnloop_init() hyperv: Move wait completion msg code into rndis_filter_halt_device() net/mlx4_core: Remove port type restrictions net/mlx4_en: Fixing TX queue stop/wake flow ...
2012-08-07fib: use __fls() on non null argumentEric Dumazet
__fls(x) is a bit faster than fls(x), granted we know x is non null. As Ben Hutchings pointed out, fls(x) = __fls(x) + 1 Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-06openvswitch: Relax set header validation.Jesse Gross
When installing a flow with an action to set a particular field we need to validate that the packets that are part of the flow actually contain that header. With IP we use zeroed addresses and with TCP/UDP the check is for zeroed ports. This check is overly broad and can catch packets like DHCP requests that have a zero source address in a legitimate header. This changes the check to look for a zeroed protocol number for IP or for both ports be zero for TCP/UDP before considering the header to not exist. Reported-by: Ethan Jackson <ethan@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com>
2012-08-06tcp: ecn: dont delay ACKS after CEEric Dumazet
While playing with CoDel and ECN marking, I discovered a non optimal behavior of receiver of CE (Congestion Encountered) segments. In pathological cases, sender has reduced its cwnd to low values, and receiver delays its ACK (by 40 ms). While RFC 3168 6.1.3 (The TCP Receiver) doesn't explicitly recommend to send immediate ACKS, we believe its better to not delay ACKS, because a CE segment should give same signal than a dropped segment, and its quite important to reduce RTT to give ECE/CWR signals as fast as possible. Note we already call tcp_enter_quickack_mode() from TCP_ECN_check_ce() if we receive a retransmit, for the same reason. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-06net: tcp: GRO should be ECN friendlyEric Dumazet
While doing TCP ECN tests, I discovered GRO was reordering packets if it receives one packet with CE set, while previous packets in same NAPI run have ECT(0) for the same flow : 09:25:25.857620 IP (tos 0x2,ECT(0), ttl 64, id 27893, offset 0, flags [DF], proto TCP (6), length 4396) 172.30.42.19.54550 > 172.30.42.13.44139: Flags [.], seq 233801:238145, ack 1, win 115, options [nop,nop,TS val 3397779 ecr 1990627], length 4344 09:25:25.857626 IP (tos 0x3,CE, ttl 64, id 27892, offset 0, flags [DF], proto TCP (6), length 1500) 172.30.42.19.54550 > 172.30.42.13.44139: Flags [.], seq 232353:233801, ack 1, win 115, options [nop,nop,TS val 3397779 ecr 1990627], length 1448 09:25:25.857638 IP (tos 0x0, ttl 64, id 34581, offset 0, flags [DF], proto TCP (6), length 64) 172.30.42.13.44139 > 172.30.42.19.54550: Flags [.], cksum 0xac8f (incorrect -> 0xca69), ack 232353, win 1271, options [nop,nop,TS val 1990627 ecr 3397779,nop,nop,sack 1 {233801:238145}], length 0 We have two problems here : 1) GRO reorders packets If NIC gave packet1, then packet2, which happen to be from "different flows" GRO feeds stack with packet2, then packet1. I have yet to understand how to solve this problem. 2) GRO is not ECN friendly Delivering packets out of order makes TCP stack not as fast as it could be. In this patch I suggest we make the tos test not part of the 'same_flow' determination, but part of the 'should flush' logic Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-06net: ipv6: fix TCP early demuxEric Dumazet
IPv6 needs a cookie in dst_check() call. We need to add rx_dst_cookie and provide a family independent sk_rx_dst_set(sk, skb) method to properly support IPv6 TCP early demux. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-06net_sched: act: Delete estimator in error path.Hiroaki SHIMODA
Some action modules free struct tcf_common in their error path while estimator is still active. This results in est_timer() dereference freed memory. Add gen_kill_estimator() in ipt, pedit and simple action. Signed-off-by: Hiroaki SHIMODA <shimoda.hiroaki@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-06ip: fix error handling in ip_finish_output2()Vasiliy Kulikov
__neigh_create() returns either a pointer to struct neighbour or PTR_ERR(). But the caller expects it to return either a pointer or NULL. Replace the NULL check with IS_ERR() check. The bug was introduced in a263b3093641fb1ec377582c90986a7fd0625184 ("ipv4: Make neigh lookups directly in output packet path."). Signed-off-by: Vasily Kulikov <segoon@openwall.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-06llc: free the right skbSorin Dumitru
We are freeing skb instead of nskb, resulting in a double free on skb and a leak from nskb. Signed-off-by: Sorin Dumitru <sdumitru@ixiacom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-06tcp_output: fix sparse warning for tcp_wfreeSilviu-Mihai Popescu
Fix sparse warning: * symbol 'tcp_wfree' was not declared. Should it be static? Signed-off-by: Silviu-Mihai Popescu <silviupopescu1990@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-06batman-adv: select an internet gateway if none was chosenMarek Lindner
This is a regression introduced by: 2265c141086474bbae55a5bb3afa1ebb78ccaa7c ("batman-adv: gateway election code refactoring") Reported-by: Nicolás Echániz <nicoechaniz@codigosur.org> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de> Acked-by: Antonio Quartulli <ordex@autistici.org> Signed-off-by: Antonio Quartulli <ordex@autistici.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-06cfg80211: process pending events when unregistering net deviceDaniel Drake
libertas currently calls cfg80211_disconnected() when it is being brought down. This causes an event to be allocated, but since the wdev is already removed from the rdev by the time that the event processing work executes, the event is never processed or freed. http://article.gmane.org/gmane.linux.kernel.wireless.general/95666 Fix this leak, and other possible situations, by processing the event queue when a device is being unregistered. Thanks to Johannes Berg for the suggestion. Signed-off-by: Daniel Drake <dsd@laptop.org> Cc: stable@vger.kernel.org Reviewed-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>