summaryrefslogtreecommitdiffstats
path: root/net
AgeCommit message (Collapse)Author
2011-03-04ipv4: Optimize flow initialization in output route lookup.David S. Miller
We burn a lot of useless cycles, cpu store buffer traffic, and memory operations memset()'ing the on-stack flow used to perform output route lookups in __ip_route_output_key(). Only the first half of the flow object members even matter for output route lookups in this context, specifically: FIB rules matching cares about: dst, src, tos, iif, oif, mark FIB trie lookup cares about: dst FIB semantic match cares about: tos, scope, oif Therefore only initialize these specific members and elide the memset entirely. On Niagara2 this kills about ~300 cycles from the output route lookup path. Likely, we can take things further, since all callers of output route lookups essentially throw away the on-stack flow they use. So they don't care if we use it as a scratch-pad to compute the final flow key. Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
2011-03-04inetpeer: seqlock optimizationEric Dumazet
David noticed : ------------------ Eric, I was profiling the non-routing-cache case and something that stuck out is the case of calling inet_getpeer() with create==0. If an entry is not found, we have to redo the lookup under a spinlock to make certain that a concurrent writer rebalancing the tree does not "hide" an existing entry from us. This makes the case of a create==0 lookup for a not-present entry really expensive. It is on the order of 600 cpu cycles on my Niagara2. I added a hack to not do the relookup under the lock when create==0 and it now costs less than 300 cycles. This is now a pretty common operation with the way we handle COW'd metrics, so I think it's definitely worth optimizing. ----------------- One solution is to use a seqlock instead of a spinlock to protect struct inet_peer_base. After a failed avl tree lookup, we can easily detect if a writer did some changes during our lookup. Taking the lock and redo the lookup is only necessary in this case. Note: Add one private rcu_deref_locked() macro to place in one spot the access to spinlock included in seqlock. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-04Merge branch 'for-davem' of ↵David S. Miller
ssh://master.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6
2011-03-04libceph: fix msgr standby handlingSage Weil
The standby logic used to be pretty dependent on the work requeueing behavior that changed when we switched to WQ_NON_REENTRANT. It was also very fragile. Restructure things so that: - We clear WRITE_PENDING when we set STANDBY. This ensures we will requeue work when we wake up later. - con_work backs off if STANDBY is set. There is nothing to do if we are in standby. - clear_standby() helper is called by both con_send() and con_keepalive(), the two actions that can wake us up again. Move the connect_seq++ logic here. Signed-off-by: Sage Weil <sage@newdream.net>
2011-03-04libceph: fix msgr keepalive flagSage Weil
There was some broken keepalive code using a dead variable. Shift to using the proper bit flag. Signed-off-by: Sage Weil <sage@newdream.net>
2011-03-04libceph: fix msgr backoffSage Weil
With commit f363e45f we replaced a bunch of hacky workqueue mutual exclusion logic with the WQ_NON_REENTRANT flag. One pieces of fallout is that the exponential backoff breaks in certain cases: * con_work attempts to connect. * we get an immediate failure, and the socket state change handler queues immediate work. * con_work calls con_fault, we decide to back off, but can't queue delayed work. In this case, we add a BACKOFF bit to make con_work reschedule delayed work next time it runs (which should be immediately). Signed-off-by: Sage Weil <sage@newdream.net>
2011-03-04Merge branch 'master' of ↵John W. Linville
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 into for-davem
2011-03-04mac80211: Remove redundant preamble and RTS flag setup in minstrel_htHelmut Schaa
mac80211 does the same afterwards anyway. Hence, just drop this redundant code. Signed-off-by: Helmut Schaa <helmut.schaa@googlemail.com> Acked-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-03-04Merge branch 'master' of ↵John W. Linville
git://git.kernel.org/pub/scm/linux/kernel/git/padovan/bluetooth-next-2.6
2011-03-03Merge branch 'master' of ↵David S. Miller
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/bnx2x/bnx2x.h
2011-03-03Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6: DNS: Fix a NULL pointer deref when trying to read an error key [CVE-2011-1076]
2011-03-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (42 commits) MAINTAINERS: Add Andy Gospodarek as co-maintainer. r8169: disable ASPM RxRPC: Fix v1 keys AF_RXRPC: Handle receiving ACKALL packets cnic: Fix lost interrupt on bnx2x cnic: Prevent status block race conditions with hardware net: dcbnl: check correct ops in dcbnl_ieee_set() e1000e: disable broken PHY wakeup for ICH10 LOMs, use MAC wakeup instead igb: fix sparse warning e1000: fix sparse warning netfilter: nf_log: avoid oops in (un)bind with invalid nfproto values dccp: fix oops on Reset after close ipvs: fix dst_lock locking on dest update davinci_emac: Add Carrier Link OK check in Davinci RX Handler bnx2x: update driver version to 1.62.00-6 bnx2x: properly calculate lro_mss bnx2x: perform statistics "action" before state transition. bnx2x: properly configure coefficients for MinBW algorithm (NPAR mode). bnx2x: Fix ethtool -t link test for MF (non-pmf) devices. bnx2x: Fix nvram test for single port devices. ...
2011-03-04DNS: Fix a NULL pointer deref when trying to read an error key [CVE-2011-1076]David Howells
When a DNS resolver key is instantiated with an error indication, attempts to read that key will result in an oops because user_read() is expecting there to be a payload - and there isn't one [CVE-2011-1076]. Give the DNS resolver key its own read handler that returns the error cached in key->type_data.x[0] as an error rather than crashing. Also make the kenter() at the beginning of dns_resolver_instantiate() limit the amount of data it prints, since the data is not necessarily NUL-terminated. The buggy code was added in: commit 4a2d789267e00b5a1175ecd2ddefcc78b83fbf09 Author: Wang Lei <wang840925@gmail.com> Date: Wed Aug 11 09:37:58 2010 +0100 Subject: DNS: If the DNS server returns an error, allow that to be cached [ver #2] This can trivially be reproduced by any user with the following program compiled with -lkeyutils: #include <stdlib.h> #include <keyutils.h> #include <err.h> static char payload[] = "#dnserror=6"; int main() { key_serial_t key; key = add_key("dns_resolver", "a", payload, sizeof(payload), KEY_SPEC_SESSION_KEYRING); if (key == -1) err(1, "add_key"); if (keyctl_read(key, NULL, 0) == -1) err(1, "read_key"); return 0; } What should happen is that keyctl_read() reports error 6 (ENXIO) to the user: dns-break: read_key: No such device or address but instead the kernel oopses. This cannot be reproduced with the 'keyutils add' or 'keyutils padd' commands as both of those cut the data down below the NUL termination that must be included in the data. Without this dns_resolver_instantiate() will return -EINVAL and the key will not be instantiated such that it can be read. The oops looks like: BUG: unable to handle kernel NULL pointer dereference at 0000000000000010 IP: [<ffffffff811b99f7>] user_read+0x4f/0x8f PGD 3bdf8067 PUD 385b9067 PMD 0 Oops: 0000 [#1] SMP last sysfs file: /sys/devices/pci0000:00/0000:00:19.0/irq CPU 0 Modules linked in: Pid: 2150, comm: dns-break Not tainted 2.6.38-rc7-cachefs+ #468 /DG965RY RIP: 0010:[<ffffffff811b99f7>] [<ffffffff811b99f7>] user_read+0x4f/0x8f RSP: 0018:ffff88003bf47f08 EFLAGS: 00010246 RAX: 0000000000000001 RBX: ffff88003b5ea378 RCX: ffffffff81972368 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88003b5ea378 RBP: ffff88003bf47f28 R08: ffff88003be56620 R09: 0000000000000000 R10: 0000000000000395 R11: 0000000000000002 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: ffffffffffffffa1 FS: 00007feab5751700(0000) GS:ffff88003e000000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000010 CR3: 000000003de40000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process dns-break (pid: 2150, threadinfo ffff88003bf46000, task ffff88003be56090) Stack: ffff88003b5ea378 ffff88003b5ea3a0 0000000000000000 0000000000000000 ffff88003bf47f68 ffffffff811b708e ffff88003c442bc8 0000000000000000 00000000004005a0 00007fffba368060 0000000000000000 0000000000000000 Call Trace: [<ffffffff811b708e>] keyctl_read_key+0xac/0xcf [<ffffffff811b7c07>] sys_keyctl+0x75/0xb6 [<ffffffff81001f7b>] system_call_fastpath+0x16/0x1b Code: 75 1f 48 83 7b 28 00 75 18 c6 05 58 2b fb 00 01 be bb 00 00 00 48 c7 c7 76 1c 75 81 e8 13 c2 e9 ff 4c 8b b3 e0 00 00 00 4d 85 ed <41> 0f b7 5e 10 74 2d 4d 85 e4 74 28 e8 98 79 ee ff 49 39 dd 48 RIP [<ffffffff811b99f7>] user_read+0x4f/0x8f RSP <ffff88003bf47f08> CR2: 0000000000000010 Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Jeff Layton <jlayton@redhat.com> cc: Wang Lei <wang840925@gmail.com> Signed-off-by: James Morris <jmorris@namei.org>
2011-03-03libceph: retry after authorization failureSage Weil
If we mark the connection CLOSED we will give up trying to reconnect to this server instance. That is appropriate for things like a protocol version mismatch that won't change until the server is restarted, at which point we'll get a new addr and reconnect. An authorization failure like this is probably due to the server not properly rotating it's secret keys, however, and should be treated as transient so that the normal backoff and retry behavior kicks in. Signed-off-by: Sage Weil <sage@newdream.net>
2011-03-03libceph: fix handling of short returns from get_user_pagesSage Weil
get_user_pages() can return fewer pages than we ask for. We were returning a bogus pointer/error code in that case. Instead, loop until we get all the pages we want or get an error we can return to the caller. Signed-off-by: Sage Weil <sage@newdream.net>
2011-03-03netlink: kill eff_cap from struct netlink_skb_parmsPatrick McHardy
Netlink message processing in the kernel is synchronous these days, capabilities can be checked directly in security_netlink_recv() from the current process. Signed-off-by: Patrick McHardy <kaber@trash.net> Reviewed-by: James Morris <jmorris@namei.org> [chrisw: update to include pohmelfs and uvesafb] Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-03ipv6: Use ERR_CAST in addrconf_dst_alloc.David S. Miller
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-03ipv4: Fix __ip_dev_find() to use ifa_local instead of ifa_address.David S. Miller
Reported-by: Stephen Hemminger <shemminger@vyatta.com> Reported-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-03net_sched: reduce fifo qdisc sizeEric Dumazet
Because of various alignements [SLUB / qdisc], we use 512 bytes of memory for one {p|b}fifo qdisc, instead of 256 bytes on 64bit arches and 192 bytes on 32bit ones. Move the "u32 limit" inside "struct Qdisc" (no impact on other qdiscs) Change qdisc_alloc(), first trying a regular allocation before an oversized one. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-03netlink: kill loginuid/sessionid/sid members from struct netlink_skb_parmsPatrick McHardy
Netlink message processing in the kernel is synchronous these days, the session information can be collected when needed. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-03ipv4: Fix crash in dst_release when udp_sendmsg route lookup fails.David S. Miller
As reported by Eric: [11483.697233] IP: [<c12b0638>] dst_release+0x18/0x60 ... [11483.697741] Call Trace: [11483.697764] [<c12fc9d2>] udp_sendmsg+0x282/0x6e0 [11483.697790] [<c12a1c01>] ? memcpy_toiovec+0x51/0x70 [11483.697818] [<c12dbd90>] ? ip_generic_getfrag+0x0/0xb0 The pointer passed to dst_release() is -EINVAL, that's because we leave an error pointer in the local variable "rt" by accident. NULL it out to fix the bug. Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-02AF_RXRPC: Handle receiving ACKALL packetsDavid Howells
The OpenAFS server is now sending ACKALL packets, so we need to handle them. Otherwise we report a protocol error and abort. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-02dcbnl: add support for retrieving peer configuration - ceeShmulik Ravid
This patch adds the support for retrieving the remote or peer DCBX configuration via dcbnl for embedded DCBX stacks supporting the CEE DCBX standard. Signed-off-by: Shmulik Ravid <shmulikr@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-02dcbnl: add support for retrieving peer configuration - ieeeShmulik Ravid
These 2 patches add the support for retrieving the remote or peer DCBX configuration via dcbnl for embedded DCBX stacks. The peer configuration is part of the DCBX MIB and is useful for debugging and diagnostics of the overall DCB configuration. The first patch add this support for IEEE 802.1Qaz standard the second patch add the same support for the older CEE standard. Diff for v2 - the peer-app-info is CEE specific. Signed-off-by: Shmulik Ravid <shmulikr@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-02net: dcbnl: check correct ops in dcbnl_ieee_set()John Fastabend
The incorrect ops routine was being tested for in DCB_ATTR_IEEE_PFC attributes. This patch corrects it. Currently, every driver implementing ieee_setets also implements ieee_setpfc so this bug is not actualized yet. Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-02ipv4: ip_route_output_key() is better as an inline.David S. Miller
This avoid a stack frame at zero cost. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-02ipv4: Make output route lookup return rtable directly.David S. Miller
Instead of on the stack. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-02xfrm: Return dst directly from xfrm_lookup()David S. Miller
Instead of on the stack. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-02Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-next-2.6
2011-03-02Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-2.6
2011-03-02netfilter: nf_log: avoid oops in (un)bind with invalid nfproto valuesJan Engelhardt
Like many other places, we have to check that the array index is within allowed limits, or otherwise, a kernel oops and other nastiness can ensue when we access memory beyond the end of the array. [ 5954.115381] BUG: unable to handle kernel paging request at 0000004000000000 [ 5954.120014] IP: __find_logger+0x6f/0xa0 [ 5954.123979] nf_log_bind_pf+0x2b/0x70 [ 5954.123979] nfulnl_recv_config+0xc0/0x4a0 [nfnetlink_log] [ 5954.123979] nfnetlink_rcv_msg+0x12c/0x1b0 [nfnetlink] ... The problem goes back to v2.6.30-rc1~1372~1342~31 where nf_log_bind was decoupled from nf_log_register. Reported-by: Miguel Di Ciurcio Filho <miguel.filho@gmail.com>, via irc.freenode.net/#netfilter Signed-off-by: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-03-01dccp: fix oops on Reset after closeGerrit Renker
This fixes a bug in the order of dccp_rcv_state_process() that still permitted reception even after closing the socket. A Reset after close thus causes a NULL pointer dereference by not preventing operations on an already torn-down socket. dccp_v4_do_rcv() | | state other than OPEN v dccp_rcv_state_process() | | DCCP_PKT_RESET v dccp_rcv_reset() | v dccp_time_wait() WARNING: at net/ipv4/inet_timewait_sock.c:141 __inet_twsk_hashdance+0x48/0x128() Modules linked in: arc4 ecb carl9170 rt2870sta(C) mac80211 r8712u(C) crc_ccitt ah [<c0038850>] (unwind_backtrace+0x0/0xec) from [<c0055364>] (warn_slowpath_common) [<c0055364>] (warn_slowpath_common+0x4c/0x64) from [<c0055398>] (warn_slowpath_n) [<c0055398>] (warn_slowpath_null+0x1c/0x24) from [<c02b72d0>] (__inet_twsk_hashd) [<c02b72d0>] (__inet_twsk_hashdance+0x48/0x128) from [<c031caa0>] (dccp_time_wai) [<c031caa0>] (dccp_time_wait+0x40/0xc8) from [<c031c15c>] (dccp_rcv_state_proces) [<c031c15c>] (dccp_rcv_state_process+0x120/0x538) from [<c032609c>] (dccp_v4_do_) [<c032609c>] (dccp_v4_do_rcv+0x11c/0x14c) from [<c0286594>] (release_sock+0xac/0) [<c0286594>] (release_sock+0xac/0x110) from [<c031fd34>] (dccp_close+0x28c/0x380) [<c031fd34>] (dccp_close+0x28c/0x380) from [<c02d9a78>] (inet_release+0x64/0x70) The fix is by testing the socket state first. Receiving a packet in Closed state now also produces the required "No connection" Reset reply of RFC 4340, 8.3.1. Reported-and-tested-by: Johan Hovold <jhovold@gmail.com> Cc: stable@kernel.org Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-01inet: Replace left-over references to inet->corkHerbert Xu
The patch to replace inet->cork with cork left out two spots in __ip_append_data that can result in bogus packet construction. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-01pfkey: fix warningStephen Hemminger
If CONFIG_NET_KEY_MIGRATE is not defined the arguments of pfkey_migrate stub do not match causing warning. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-01ipv6: Make icmp route lookup code a bit clearer.David S. Miller
The route lookup code in icmpv6_send() is slightly tricky as a result of having to handle all of the requirements of RFC 4301 host relookups. Pull the route resolution into a seperate function, so that the error handling and route reference counting is hopefully easier to see and contained wholly within this new routine. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-01Bluetooth: Fix some small code style issues in mgmt.cSzymon Janc
Signed-off-by: Szymon Janc <szymon.janc@tieto.com> Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2011-03-01Bluetooth: Use variable name instead of type in sizeof()Szymon Janc
As written in the CodingStyle doc. Signed-off-by: Szymon Janc <szymon.janc@tieto.com> Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2011-03-01Bluetooth: Remove unused code from get_connectionsSzymon Janc
Command pointer was a leftover after moving controller index to mgmt_hdr. Signed-off-by: Szymon Janc <szymon.janc@tieto.com> Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2011-03-01Bluetooth: Log all parameters in cmd_status for easier debuggingSzymon Janc
Signed-off-by: Szymon Janc <szymon.janc@tieto.com> Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2011-03-01Bluetooth: Fix possible NULL pointer dereference in cmd_completeSzymon Janc
It is now possible to create command complete event without specific reply data by passing NULL as reply with len 0. Check pointer before calling memcpy to avoid undefined behaviour. Signed-off-by: Szymon Janc <szymon.janc@tieto.com> Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2011-03-01ipv4: Make icmp route lookup code a bit clearer.David S. Miller
The route lookup code in icmp_send() is slightly tricky as a result of having to handle all of the requirements of RFC 4301 host relookups. Pull the route resolution into a seperate function, so that the error handling and route reference counting is hopefully easier to see and contained wholly within this new routine. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-01xfrm: Handle blackhole route creation via afinfo.David S. Miller
That way we don't have to potentially do this in every xfrm_lookup() caller. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-02ipvs: fix dst_lock locking on dest updateJulian Anastasov
Fix dst_lock usage in __ip_vs_update_dest. We need _bh locking because destination is updated in user context. Can cause lockups on frequent destination updates. Problem reported by Simon Kirby. Bug was introduced in 2.6.37 from the "ipvs: changes for local real server" change. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Hans Schillstrom <hans@schillstrom.com> Signed-off-by: Simon Horman <horms@verge.net.au>
2011-03-01ipv6: Normalize arguments to ip6_dst_blackhole().David S. Miller
Return a dst pointer which is potentitally error encoded. Don't pass original dst pointer by reference, pass a struct net instead of a socket, and elide the flow argument since it is unnecessary. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-01xfrm: Kill XFRM_LOOKUP_WAIT flag.David S. Miller
This can be determined from the flow flags instead. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-01ipv6: Change final dst lookup arg name to "can_sleep"David S. Miller
Since it indicates whether we are invoked from a sleepable context or not. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-01ipv4: Kill can_sleep arg to ip_route_output_flow()David S. Miller
This boolean state is now available in the flow flags. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-01net: Add FLOWI_FLAG_CAN_SLEEP.David S. Miller
And set is in contexts where the route resolution can sleep. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-01ipv4: Make final arg to ip_route_output_flow to be boolean "can_sleep"David S. Miller
Since that is what the current vague "flags" argument means. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-01ipv4: Can final ip_route_connect() arg to boolean "can_sleep".David S. Miller
Since that's what the current vague "flags" thing means. Signed-off-by: David S. Miller <davem@davemloft.net>