summaryrefslogtreecommitdiffstats
path: root/net
AgeCommit message (Collapse)Author
2010-02-10sunrpc: parse and return errors reported by gssdJeff Layton
The kernel currently ignores any error code sent by gssd and always considers it to be -EACCES. In order to better handle the situation of an expired KRB5 TGT, the kernel needs to be able to parse and deal with the errors that gssd sends. Aside from -EACCES the only error we care about is -EKEYEXPIRED, which we're using to indicate that the upper layers should retry the call a little later. To maintain backward compatibility with older gssd's, any error other than -EKEYEXPIRED is interpreted as -EACCES. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-02-10Merge branch 'master' of /repos/git/net-next-2.6Patrick McHardy
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-02-09Merge branch 'master' of ↵David S. Miller
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
2010-02-09mac80211: Retry null data frame for power save.Vivek Natarajan
Even if the null data frame is not acked by the AP, mac80211 goes into power save. This might lead to loss of frames from the AP. Prevent this by restarting dynamic_ps_timer when ack is not received for null data frames. Cc: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Vivek Natarajan <vnatarajan@atheros.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-02-09tree-wide: Assorted spelling fixesDaniel Mack
In particular, several occurances of funny versions of 'success', 'unknown', 'therefore', 'acknowledge', 'argument', 'achieve', 'address', 'beginning', 'desirable', 'separate' and 'necessary' are fixed. Signed-off-by: Daniel Mack <daniel@caiaq.de> Cc: Joe Perches <joe@perches.com> Cc: Junio C Hamano <gitster@pobox.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2010-02-08Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/holtmann/bluetooth-2.6
2010-02-08net/sched: Fix module name in KconfigJan Luebbe
The action modules have been prefixed with 'act_', but the Kconfig description was not changed. Signed-off-by: Jan Luebbe <jluebbe@debian.org> Acked-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-089p: fix p9_client_destroy unconditional calling v9fs_put_transEric Van Hensbergen
restructure client create code to handle error cases better and only cleanup initialized portions of the stack. Signed-off-by: Venkateswararao Jujjuri <jvrao@us.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-02-089p: Fix the kernel crash on a failed mountAneesh Kumar K.V
The patch fix the crash repoted below [ 15.149907] BUG: unable to handle kernel NULL pointer dereference at 00000001 [ 15.150806] IP: [<c140b886>] p9_virtio_close+0x18/0x24 ..... .... [ 15.150806] Call Trace: [ 15.150806] [<c1408e78>] ? p9_client_destroy+0x3f/0x163 [ 15.150806] [<c1409342>] ? p9_client_create+0x25f/0x270 [ 15.150806] [<c1063b72>] ? trace_hardirqs_on+0xb/0xd [ 15.150806] [<c11ed4e8>] ? match_token+0x64/0x164 [ 15.150806] [<c1175e8d>] ? v9fs_session_init+0x2f1/0x3c8 [ 15.150806] [<c109cfc9>] ? kmem_cache_alloc+0x98/0xb8 [ 15.150806] [<c1063b72>] ? trace_hardirqs_on+0xb/0xd [ 15.150806] [<c1173dd1>] ? v9fs_get_sb+0x47/0x1e8 [ 15.150806] [<c1173dea>] ? v9fs_get_sb+0x60/0x1e8 [ 15.150806] [<c10a2e77>] ? vfs_kern_mount+0x81/0x11a [ 15.150806] [<c10a2f55>] ? do_kern_mount+0x33/0xbe [ 15.150806] [<c10b40b9>] ? do_mount+0x654/0x6b3 [ 15.150806] [<c1038949>] ? do_page_fault+0x0/0x284 [ 15.150806] [<c10b28ec>] ? copy_mount_options+0x73/0xd2 [ 15.150806] [<c10b4179>] ? sys_mount+0x61/0x94 [ 15.150806] [<c14284e9>] ? syscall_call+0x7/0xb .... [ 15.203562] ---[ end trace 1dd159357709eb4b ]--- [ Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-02-08dst: call cond_resched() in dst_gc_task()Eric Dumazet
Kernel bugzilla #15239 On some workloads, it is quite possible to get a huge dst list to process in dst_gc_task(), and trigger soft lockup detection. Fix is to call cond_resched(), as we run in process context. Reported-by: Pawel Staszewski <pstaszewski@itcare.pl> Tested-by: Pawel Staszewski <pstaszewski@itcare.pl> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-089p: fix option parsingEric Van Hensbergen
Options pointer is being moved before calling kfree() which seems to cause problems. This uses a separate pointer to track and free original allocation. Signed-off-by: Venkateswararao Jujjuri <jvrao@us.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>w
2010-02-08mac80211: Reset dynamic ps timer in Rx path.Vivek Natarajan
The current mac80211 implementation enables power save if there is no Tx traffic for a specific timeout. Hence, PS is triggered even if there is a continuous Rx only traffic(like UDP) going on. This makes the drivers to wait on the tim bit in the next beacon to awake which leads to redundant sleep-wake cycles. Fix this by restarting the dynamic ps timer on receiving every data packet. Signed-off-by: Vivek Natarajan <vnatarajan@atheros.com> CC: stable@kernel.org Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-02-08mac80211: make rate_control_alloc staticAndres Salomon
rate_control_alloc is not used by anything outside of ieee80211_init_rate_ctrl_alg. Both are in rate.c; there's no reason to make rate_control_alloc visible outside of it. Signed-off-by: Andres Salomon <dilinger@collabora.co.uk> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-02-08mac80211: remove get_tx_stats() driver opKalle Valo
get_tx_stats() driver operation is not currently used anywhere in mac80211 and there are no plans to use it in the not-so-near future. So it can go without anyone missing it. Signed-off-by: Kalle Valo <kalle.valo@iki.fi> Acked-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-02-08mac80211: fix deauth raceJohannes Berg
When userspace requests a deauth while the authentication work is pending in the auth (not probe) state, we do not properly abort the work and then things get confused. Fix that and also improve the checks here to include the correct virtual interface, just in case two virtual interfaces would ever try to connect to the same BSS. Also fix a bug -- need to use list_del_rcu instead of just list_del to free a work item. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-02-08mac80211: fix bss_conf.dtim_periodJohannes Berg
In AP mode, the only mode where the parameter is supposed to be valid, we never assign it! Fix that to allow drivers to avoid parsing the TIM IE for the value. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-02-08mac80211: Added a new debugfs file for reading channel_typeBenoit Papillault
This file helps debugging HT channels since it displays if we are on ht20 or ht40+/ht40- Signed-off-by: Benoit Papillault <benoit.papillault@free.fr> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-02-08mac80211: tear down all agg queues when restart/reconfig hwWey-Yi Guy
When there is a need to restart/reconfig hw, tear down all the aggregation queues and let the mac80211 and driver get in-sync to have the opportunity to re-establish the aggregation queues again. Need to wait until driver re-establish all the station information before tear down the aggregation queues, driver(at least iwlwifi driver) will reject the stop aggregation queue request if station is not ready. But also need to make sure the aggregation queues are tear down before waking up the queues, so mac80211 will not sending frames with aggregation bit set. Signed-off-by: Wey-Yi Guy <wey-yi.w.guy@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-02-08mac80211: allow station add/remove to sleepJohannes Berg
Many drivers would like to sleep during station addition and removal, and currently have a high complexity there from not being able to. This introduces two new callbacks sta_add() and sta_remove() that drivers can implement instead of using sta_notify() and that can sleep, and the new sta_add() callback is also allowed to fail. The reason we didn't do this previously is that the IBSS code wants to insert stations from the RX path, which is a tasklet, so cannot sleep. This patch will keep the station allocation in that path, but moves adding the station to the driver out of line. Since the addition can now fail, we can have IBSS peer structs the driver rejected -- in that case we still talk to the station but never tell the driver about it in the control.sta pointer. If there will ever be a driver that has a low limit on the number of stations and that cannot talk to any stations that are not known to it, we need to do come up with a new strategy of handling larger IBSSs, maybe quicker expiry or rejecting peers. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-02-08mac80211: don't probe if we have probe responseJohannes Berg
We can now easily determine whether we already have probe response information for the BSS we are asked to connect to, in which case there's little point in probing the BSS again. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-02-08wireless: update radiotap parserJohannes Berg
Upstream radiotap has adopted the namespace proposal David Young made and I then took care of, for which I had adapted the radiotap parser as a library outside the kernel. This brings the in-kernel parser up to speed. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-02-08Merge branch 'master' of ↵John W. Linville
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 Conflicts: net/mac80211/scan.c
2010-02-08mac80211: fix deferred hardware scan requestsJohannes Berg
Reinette found the reason for the warnings that happened occasionally when a hw-offloaded scan finished; her description of the problem: mac80211 will defer the handling of scan requests if it is busy with management work at the time. The scan requests are deferred and run after the work has completed. When this occurs there are currently two problems. * The scan request for hardware scan is not fully populated with the band and channels to scan not initialized. * When the scan is queued the state is not correctly updated to reflect that a scan is in progress. The problem here is that when the driver completes the scan and calls ieee80211_scan_completed() a warning will be triggered since mac80211 was not aware that a scan was in progress. The reason is that the queued scan work will start the hw scan right away when the hw_scan_req struct has already been allocated. However, in the first pass it will not have been filled, which happens at the same time as setting the bits. To fix this, simply move the allocation after the pending work test as well, so that the first iteration of the scan work will call __ieee80211_start_scan() even in the hardware scan case. Bug-identified-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-02-08mac80211: Fix probe request filtering in IBSS modeBenoit Papillault
We only reply to probe request if either the requested SSID is the broadcast SSID or if the requested SSID matches our own SSID. This latter case was not properly handled since we were replying to different SSID with the same length as our own SSID. Signed-off-by: Benoit Papillault <benoit.papillault@free.fr> Cc: stable@kernel.org Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-02-08net/9p: fix statsize inside twstatEric Van Hensbergen
stat structures contain a size prefix. In our twstat messages we were including the size of the size prefix in the prefix, which is not what the protocol wants, and Inferno servers would complain. Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-02-08net/9p: fail when user specifies a transport which we can't findEric Van Hensbergen
If the user specifies a transport and we can't find it, we failed back to the default trainsport silently. This patch will make the code complain more loudly and return an error code. Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-02-08net/9p: fix virtio transport to correctly update status on connectEric Van Hensbergen
The 9p virtio transport was not updating its connection status correctly preventing it from being able to mount the server. Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-02-08netfilter: nf_conntrack: fix hash resizing with namespacesPatrick McHardy
As noticed by Jon Masters <jonathan@jonmasters.org>, the conntrack hash size is global and not per namespace, but modifiable at runtime through /sys/module/nf_conntrack/hashsize. Changing the hash size will only resize the hash in the current namespace however, so other namespaces will use an invalid hash size. This can cause crashes when enlarging the hashsize, or false negative lookups when shrinking it. Move the hash size into the per-namespace data and only use the global hash size to initialize the per-namespace value when instanciating a new namespace. Additionally restrict hash resizing to init_net for now as other namespaces are not handled currently. Cc: stable@kernel.org Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-08netfilter: xtables: compat out of scope fixAlexey Dobriyan
As per C99 6.2.4(2) when temporary table data goes out of scope, the behaviour is undefined: if (compat) { struct foo tmp; ... private = &tmp; } [dereference private] Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: stable@kernel.org Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-02-08netfilter: nf_conntrack: restrict runtime expect hashsize modificationsAlexey Dobriyan
Expectation hashtable size was simply glued to a variable with no code to rehash expectations, so it was a bug to allow writing to it. Make "expect_hashsize" readonly. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: stable@kernel.org Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-02-08netfilter: nf_conntrack: per netns nf_conntrack_cachepEric Dumazet
nf_conntrack_cachep is currently shared by all netns instances, but because of SLAB_DESTROY_BY_RCU special semantics, this is wrong. If we use a shared slab cache, one object can instantly flight between one hash table (netns ONE) to another one (netns TWO), and concurrent reader (doing a lookup in netns ONE, 'finding' an object of netns TWO) can be fooled without notice, because no RCU grace period has to be observed between object freeing and its reuse. We dont have this problem with UDP/TCP slab caches because TCP/UDP hashtables are global to the machine (and each object has a pointer to its netns). If we use per netns conntrack hash tables, we also *must* use per netns conntrack slab caches, to guarantee an object can not escape from one namespace to another one. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> [Patrick: added unique slab name allocation] Cc: stable@kernel.org Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-02-08netfilter: nf_conntrack: fix memory corruption with multiple namespacesPatrick McHardy
As discovered by Jon Masters <jonathan@jonmasters.org>, the "untracked" conntrack, which is located in the data section, might be accidentally freed when a new namespace is instantiated while the untracked conntrack is attached to a skb because the reference count it re-initialized. The best fix would be to use a seperate untracked conntrack per namespace since it includes a namespace pointer. Unfortunately this is not possible without larger changes since the namespace is not easily available everywhere we need it. For now move the untracked conntrack initialization to the init_net setup function to make sure the reference count is not re-initialized and handle cleanup in the init_net cleanup function to make sure namespaces can exit properly while the untracked conntrack is in use in other namespaces. Cc: stable@kernel.org Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-08netfilter: fix build failure with CONNTRACK=y NAT=nFlorian Westphal
net/ipv4/netfilter/nf_defrag_ipv4.c: In function 'ipv4_conntrack_defrag': net/ipv4/netfilter/nf_defrag_ipv4.c:62: error: implicit declaration of function 'nf_ct_is_template' Signed-off-by: Florian Westphal <fwestphal@astaro.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-02-05packet: Kill CONFIG_PACKET_MMAP.David S. Miller
Early on this was an experimental facility that few people other than Alexey Kuznetsov played with. Now it's a pretty fundamental thing and as people add more features to AF_PACKET sockets this config options creates ifdef spaghetti. So kill it off. Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-05Bluetooth: Keep a copy of each HID device's report descriptorMichael Poole
The report descriptor is read by user space (via the Service Discovery Protocol), so it is only available during the ioctl to connect. However, the HID probe function that needs the descriptor might not be called until a specific module is loaded. Keep a copy of the descriptor so it is available for later use. Signed-off-by: Michael Poole <mdpoole@troilus.org> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2010-02-04bridge: Remove unused age_listHerbert Xu
This patch removes the unused age_list member from the net_bridge structure. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-04packet: Add GSO/csum offload support.Sridhar Samudrala
This patch adds GSO/checksum offload to af_packet sockets using virtio_net_hdr. Based on Rusty's patch to add this support to tun. It allows GSO/checksum offload to be enabled when using raw socket backend with virtio_net. Adds PACKET_VNET_HDR socket option to prepend virtio_net_hdr in the receive path and process/skip virtio_net_hdr in the send path. This option is only allowed with SOCK_RAW sockets attached to ethernet type devices. v2 updates ---------- Michael's Comments - Perform length check in packet_snd() when GSO is off even when vnet_hdr is present. - Check for SKB_GSO_FCOE type and return -EINVAL - don't allow tx/rx ring when vnet_hdr is enabled. Herbert's Comments - Removed ethernet specific code. - protocol value is assumed to be passed in by the caller. Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-04ipv4: obsolete config in kernel source (IP_ROUTE_PERVASIVE)Christoph Egger
CONFIG_IP_ROUTE_PERVASIVE is missing a corresponding config IP_ROUTE_PERVASIVE somewhere in KConfig (and missing it for ages already) so it looks like some aging artefact no longer needed. Therefor this patch kills of the only remaining reference to that config Item removing the already unrechable code snipet. Signed-off-by: Christoph Egger <siccegge@stud.informatik.uni-erlangen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-04pktgen: Fix freezing problemRafael J. Wysocki
Add missing try_to_freeze() to one of the pktgen_thread_worker() code paths so that it doesn't block suspend/hibernation. Fixes http://bugzilla.kernel.org/show_bug.cgi?id=15006 Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Reported-and-tested-by: Ciprian Dorin Craciun <ciprian.craciun@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-04Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6
2010-02-03net: maintain namespace isolation between vlan and real deviceArnd Bergmann
In the vlan and macvlan drivers, the start_xmit function forwards data to the dev_queue_xmit function for another device, which may potentially belong to a different namespace. To make sure that classification stays within a single namespace, this resets the potentially critical fields. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-03net/rds: remove uses of NIPQUAD, use %pI4Joe Perches
Signed-off-by: Joe Perches <joe@perches.com> Cc: Andy Grover <andy.grover@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-03irda: add missing BKL in irnet_ppp ioctlThadeu Lima de Souza Cascardo
One ioctl has been forgotten when the BKL was push down into irnet_ppp ioctl function. Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@holoscopio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-03irda: unbalanced lock_kernel in irnet_pppThadeu Lima de Souza Cascardo
Add the missing unlock_kernel in one ioctl operation. Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@holoscopio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-03Merge branch 'master' of ↵David S. Miller
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
2010-02-03Bluetooth: Enter active mode before establishing a SCO link.Nick Pelly
When in sniff mode with a long interval time (1.28s) it can take 4+ seconds to establish a SCO link. Fix by requesting active mode before requesting SCO connection. This improves SCO setup time to ~500ms. Bluetooth headsets that use a long interval time, and exhibit the long SCO connection time include Motorola H790, HX1 and H17. They have a CSR 2.1 chipset. Verified this behavior and fix with host Bluetooth chipsets: BCM4329 and TI1271. 2009-10-13 14:17:46.183722 > HCI Event: Mode Change (0x14) plen 6 status 0x00 handle 1 mode 0x02 interval 2048 Mode: Sniff 2009-10-13 14:17:53.436285 < HCI Command: Setup Synchronous Connection (0x01|0x0028) plen 17 handle 1 voice setting 0x0060 2009-10-13 14:17:53.445593 > HCI Event: Command Status (0x0f) plen 4 Setup Synchronous Connection (0x01|0x0028) status 0x00 ncmd 1 2009-10-13 14:17:57.788855 > HCI Event: Synchronous Connect Complete 0x2c) plen 17 status 0x00 handle 257 bdaddr 00:1A:0E:F1:A4:7F type eSCO Air mode: CVSD Signed-off-by: Nick Pelly <npelly@google.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2010-02-03dccp: fix auto-loading of dccp(_probe)Gerrit Renker
This fixes commit (38ff3e6bb987ec583268da8eb22628293095d43b) ("dccp_probe: Fix module load dependencies between dccp and dccp_probe", from 15 Jan). It fixes the construction of the first argument of try_then_request_module(), where only valid return codes from the first argument should be returned. What we do now is assign the result of register_jprobe() to ret, without the side effect of the comparison. Acked-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-03dccp: fix bug in cache allocationGerrit Renker
This fixes a bug introduced in commit de4ef86cfce60d2250111f34f8a084e769f23b16 ("dccp: fix dccp rmmod when kernel configured to use slub", 17 Jan): the vsnprintf used sizeof(slab_name_fmt), which became truncated to 4 bytes, since slab_name_fmt is now a 4-byte pointer and no longer a 32-character array. This lead to error messages such as FATAL: Error inserting dccp: No buffer space available >> kernel: [ 1456.341501] kmem_cache_create: duplicate cache cci generated due to the truncation after the 3rd character. Fixed for the moment by introducing a symbolic constant. Tested to fix the bug. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-03netlink: fix for too early rmmodAlexey Dobriyan
Netlink code does module autoload if protocol userspace is asking for is not ready. However, module can dissapear right after it was autoloaded. Example: modprobe/rmmod stress-testing and xfrm_user.ko providing NETLINK_XFRM. netlink_create() in such situation _will_ create userspace socket and _will_not_ pin module. Now if module was removed and we're going to call ->netlink_rcv into nothing: BUG: unable to handle kernel paging request at ffffffffa02f842a ^^^^^^^^^^^^^^^^ modules are loaded near these addresses here IP: [<ffffffffa02f842a>] 0xffffffffa02f842a PGD 161f067 PUD 1623063 PMD baa12067 PTE 0 Oops: 0010 [#1] PREEMPT SMP DEBUG_PAGEALLOC last sysfs file: /sys/devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda/uevent CPU 1 Pid: 11515, comm: ip Not tainted 2.6.33-rc5-netns-00594-gaaa5728-dirty #6 P5E/P5E RIP: 0010:[<ffffffffa02f842a>] [<ffffffffa02f842a>] 0xffffffffa02f842a RSP: 0018:ffff8800baa3db48 EFLAGS: 00010292 RAX: ffff8800baa3dfd8 RBX: ffff8800be353640 RCX: 0000000000000000 RDX: ffffffff81959380 RSI: ffff8800bab7f130 RDI: 0000000000000001 RBP: ffff8800baa3db58 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000011 R13: ffff8800be353640 R14: ffff8800bcdec240 R15: ffff8800bd488010 FS: 00007f93749656f0(0000) GS:ffff880002300000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: ffffffffa02f842a CR3: 00000000ba82b000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process ip (pid: 11515, threadinfo ffff8800baa3c000, task ffff8800bab7eb30) Stack: ffffffff813637c0 ffff8800bd488000 ffff8800baa3dba8 ffffffff8136397d <0> 0000000000000000 ffffffff81344adc 7fffffffffffffff 0000000000000000 <0> ffff8800baa3ded8 ffff8800be353640 ffff8800bcdec240 0000000000000000 Call Trace: [<ffffffff813637c0>] ? netlink_unicast+0x100/0x2d0 [<ffffffff8136397d>] netlink_unicast+0x2bd/0x2d0 netlink_unicast_kernel: nlk->netlink_rcv(skb); [<ffffffff81344adc>] ? memcpy_fromiovec+0x6c/0x90 [<ffffffff81364263>] netlink_sendmsg+0x1d3/0x2d0 [<ffffffff8133975b>] sock_sendmsg+0xbb/0xf0 [<ffffffff8106cdeb>] ? __lock_acquire+0x27b/0xa60 [<ffffffff810a18c3>] ? might_fault+0x73/0xd0 [<ffffffff810a18c3>] ? might_fault+0x73/0xd0 [<ffffffff8106db22>] ? __lock_release+0x82/0x170 [<ffffffff810a190e>] ? might_fault+0xbe/0xd0 [<ffffffff810a18c3>] ? might_fault+0x73/0xd0 [<ffffffff81344c77>] ? verify_iovec+0x47/0xd0 [<ffffffff8133a509>] sys_sendmsg+0x1a9/0x360 [<ffffffff813c2be5>] ? _raw_spin_unlock_irqrestore+0x65/0x70 [<ffffffff8106aced>] ? trace_hardirqs_on+0xd/0x10 [<ffffffff813c2bc2>] ? _raw_spin_unlock_irqrestore+0x42/0x70 [<ffffffff81197004>] ? __up_read+0x84/0xb0 [<ffffffff8106ac95>] ? trace_hardirqs_on_caller+0x145/0x190 [<ffffffff813c207f>] ? trace_hardirqs_on_thunk+0x3a/0x3f [<ffffffff8100262b>] system_call_fastpath+0x16/0x1b Code: Bad RIP value. RIP [<ffffffffa02f842a>] 0xffffffffa02f842a RSP <ffff8800baa3db48> CR2: ffffffffa02f842a If module was quickly removed after autoloading, return -E. Return -EPROTONOSUPPORT if module was quickly removed after autoloading. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-03af_key: fix netns ops ordering on module load/unloadAlexey Dobriyan
1. After sock_register() returns, it's possible to create sockets, even if module still not initialized fully (blame generic module code for that!) 2. Consequently, pfkey_create() can be called with pfkey_net_id still not initialized which will BUG_ON in net_generic(): kernel BUG at include/net/netns/generic.h:43! 3. During netns shutdown, netns ops should be unregistered after key manager unregistered because key manager calls can be triggered from xfrm_user module: general protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC pfkey_broadcast+0x111/0x210 [af_key] pfkey_send_notify+0x16a/0x300 [af_key] km_state_notify+0x41/0x70 xfrm_flush_sa+0x75/0x90 [xfrm_user] 4. Unregister netns ops after socket ops just in case and for symmetry. Reported by Luca Tettamanti. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Tested-by: Luca Tettamanti <kronos.it@gmail.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>