asmadeus/linux.git - The linux kernel

Age	Commit message (Collapse)	Author
2005-08-29	[NETFILTER]: reduce netfilter sk_buff enlargement	Harald Welte
	As discussed at netconf'05, we're trying to save every bit in sk_buff. The patch below makes sk_buff 8 bytes smaller. I did some basic testing on my notebook and it seems to work. The only real in-tree user of nfcache was IPVS, who only needs a single bit. Unfortunately I couldn't find some other free bit in sk_buff to stuff that bit into, so I introduced a separate field for them. Maybe the IPVS guys can resolve that to further save space. Initially I wanted to shrink pkt_type to three bits (PACKET_HOST and alike are only 6 values defined), but unfortunately the bluetooth code overloads pkt_type :( The conntrack-event-api (out-of-tree) uses nfcache, but Rusty just came up with a way how to do it without any skb fields, so it's safe to remove it. - remove all never-implemented 'nfcache' code - don't have ipvs code abuse 'nfcache' field. currently get's their own compile-conditional skb->ipvs_property field. IPVS maintainers can decide to move this bit elswhere, but nfcache needs to die. - remove skb->nfcache field to save 4 bytes - move skb->nfctinfo into three unused bits to save further 4 bytes Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29	[NETFILTER]: convert nfmark and conntrack mark to 32bit	Harald Welte
	As discussed at netconf'05, we convert nfmark and conntrack-mark to be 32bits even on 64bit architectures. Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-23	[FIB_TRIE]: Don't ignore negative results from fib_semantic_match	Patrick McHardy
	When a semantic match occurs either success, not found or an error (for matching unreachable routes/blackholes) is returned. fib_trie ignores the errors and looks for a different matching route. Treat results other than "no match" as success and end lookup. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-23	[ROSE]: Fix typo in rose_route_frame() locking fix.	David S. Miller
	Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-23	[ROSE]: Fix missing unlocks in rose_route_frame()	David S. Miller
	Noticed by Coverity checker. Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-23	[TCP]: Document non-trivial locking path in tcp_v{4,6}_get_port().	David S. Miller
	This trips up a lot of folks reading this code. Put an unlikely() around the port-exhaustion test for good measure. Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-23	[TCP]: Unconditionally clear TCP_NAGLE_PUSH in skb_entail().	David S. Miller
	Intention of this bit is to force pushing of the existing send queue when TCP_CORK or TCP_NODELAY state changes via setsockopt(). But it's easy to create a situation where the bit never clears. For example, if the send queue starts empty: 1) set TCP_NODELAY 2) clear TCP_NODELAY 3) set TCP_CORK 4) do small write() The current code will leave TCP_NAGLE_PUSH set after that sequence. Unconditionally clearing the bit when new data is added via skb_entail() solves the problem. Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-23	[PKT_SCHED]: Fix missing qdisc_destroy() in qdisc_create_dflt()	Thomas Graf
	qdisc_create_dflt() is missing to destroy the newly allocated default qdisc if the initialization fails resulting in leaks of all kinds. The only caller in mainline which may trigger this bug is sch_tbf.c in tbf_create_dflt_qdisc(). Note: qdisc_create_dflt() doesn't fulfill the official locking requirements of qdisc_destroy() but since the qdisc could never be seen by the outside world this doesn't matter and it can stay as-is until the locking of pkt_sched is cleaned up. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-23	[SCTP]: Add SENTINEL to SCTP MIB stats	Vlad Yasevich
	Add SNMP_MIB_SENTINEL to the definition of the sctp_snmp_list so that the output routine in proc correctly terminates. This was causing some problems running on ia64 systems. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-23	[AX25]: UID fixes	Ralf Baechle
	o Brown paperbag bug - ax25_findbyuid() was always returning a NULL pointer as the result. Breaks ROSE completly and AX.25 if UID policy set to deny. o While the list structure of AX.25's UID to callsign mapping table was properly protected by a spinlock, it's elements were not refcounted resulting in a race between removal and usage of an element. Signed-off-by: Ralf Baechle DL5RB <ralf@linux-mips.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-23	[NET]: Fix socket bitop damage	Ralf Baechle
	The socket flag cleanups that went into 2.6.12-rc1 are basically oring the flags of an old socket into the socket just being created. Unfortunately that one was just initialized by sock_init_data(), so already has SOCK_ZAPPED set. As the result zapped sockets are created and all incoming connection will fail due to this bug which again was carefully replicated to at least AX.25, NET/ROM or ROSE. In order to keep the abstraction alive I've introduced sock_copy_flags() to copy the socket flags from one sockets to another and used that instead of the bitwise copy thing. Anyway, the idea here has probably been to copy all flags, so sock_copy_flags() should be the right thing. With this the ham radio protocols are usable again, so I hope this will make it into 2.6.13. Signed-off-by: Ralf Baechle DL5RB <ralf@linux-mips.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-23	[NETFILTER]: Fix HW checksum handling in ip_queue/ip6_queue	Patrick McHardy
	The checksum needs to be filled in on output, after mangling a packet ip_summed needs to be reset. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-23	[IPV4]: Fix negative timer loop with lots of ipv4 peers.	Dave Johnson
	From: Dave Johnson <djohnson+linux-kernel@sw.starentnetworks.com> Found this bug while doing some scaling testing that created 500K inet peers. peer_check_expire() in net/ipv4/inetpeer.c isn't using inet_peer_gc_mintime correctly and will end up creating an expire timer with less than the minimum duration, and even zero/negative if enough active peers are present. If >65K peers, the timer will be less than inet_peer_gc_mintime, and with >70K peers, the timer duration will reach zero and go negative. The timer handler will continue to schedule another zero/negative timer in a loop until peers can be aged. This can continue for at least a few minutes or even longer if the peers remain active due to arriving packets while the loop is occurring. Bug is present in both 2.4 and 2.6. Same patch will apply to both just fine. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-23	[RPC]: Kill bogus kmap in krb5	Herbert Xu
	While I was going through the crypto users recently, I noticed this bogus kmap in sunrpc. It's totally unnecessary since the crypto layer will do its own kmap before touching the data. Besides, the kmap is throwing the return value away. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-23	[TCP]: Do TSO deferral even if tail SKB can go out now.	Dmitry Yusupov
	If the tail SKB fits into the window, it is still benefitical to defer until the goal percentage of the window is available. This give the application time to feed more data into the send queue and thus results in larger TSO frames going out. Patch from Dmitry Yusupov <dima@neterion.com>. Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-20	[NETFILTER]: Fix HW checksum handling in TCPMSS target	Patrick McHardy
	Most importantly, remove bogus BUG() in receive path. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-20	[NETFILTER]: Fix HW checksum handling in ECN target	Patrick McHardy
	Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-20	[NETFILTER]: Fix ECN target TCP marking	Patrick McHardy
	An incorrect check made it bail out before doing anything. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-18	[IPCOMP]: Fix false smp_processor_id warning	Herbert Xu
	This patch fixes a false-positive from debug_smp_processor_id(). The processor ID is only used to look up crypto_tfm objects. Any processor ID is acceptable here as long as it is one that is iterated on by for_each_cpu(). Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-18	[IPV4]: Fix DST leak in icmp_push_reply()	Patrick McHardy
	Based upon a bug report and initial patch by Ollie Wild. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-18	[TOKENRING]: Use interrupt-safe locking with rif_lock.	Jay Vosburgh
	Change operations on rif_lock from spin_{un}lock_bh to spin_{un}lock_irq{save,restore} equivalents. Some of the rif_lock critical sections are called from interrupt context via tr_type_trans->tr_add_rif_info. The TR NIC drivers call tr_type_trans from their packet receive handlers. Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-17	[DECNET]: Fix RCU race condition in dn_neigh_construct().	Paul E. McKenney
	Signed-off-by: Paul E. McKenney <paulmck@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-17	[IPV6]: Fix SKB leak in ip6_input_finish()	Patrick McHardy
	Changing it to how ip_input handles should fix it. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-17	[TCP]: Fix bug #5070: kernel BUG at net/ipv4/tcp_output.c:864	Herbert Xu
	1) We send out a normal sized packet with TSO on to start off. 2) ICMP is received indicating a smaller MTU. 3) We send the current sk_send_head which needs to be fragmented since it was created before the ICMP event. The first fragment is then sent out. At this point the remaining fragment is allocated by tcp_fragment. However, its size is padded to fit the L1 cache-line size therefore creating tail-room up to 124 bytes long. This fragment will also be sitting at sk_send_head. 4) tcp_sendmsg is called again and it stores data in the tail-room of of the fragment. 5) tcp_push_one is called by tcp_sendmsg which then calls tso_fragment since the packet as a whole exceeds the MTU. At this point we have a packet that has data in the head area being fed to tso_fragment which bombs out. My take on this is that we shouldn't ever call tcp_fragment on a TSO socket for a packet that is yet to be transmitted since this creates a packet on sk_send_head that cannot be extended. So here is a patch to change it so that tso_fragment is always used in this case. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-17	[IPV6]: Fix raw socket hardware checksum failures	Patrick McHardy
	When packets hit raw sockets the csum update isn't done yet, do it manually. Packets can also reach rawv6_rcv on the output path through ip6_call_ra_chain, in this case skb->ip_summed is CHECKSUM_NONE and this codepath isn't executed. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-16	[PATCH] NFS: Ensure ACL xdr code doesn't overflow.	Trond Myklebust
	Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-08-11	[NETPOLL]: remove unused variable	Matt Mackall
	Remove unused variable Signed-off-by: Matt Mackall <mpm@selenic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-11	[NETPOLL]: fix initialization/NAPI race	Matt Mackall
	This fixes a race during initialization with the NAPI softirq processing by using an RCU approach. This race was discovered when refill_skbs() was added to the setup code. Signed-off-by: Matt Mackall <mpm@selenic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-11	[NETPOLL]: pre-fill skb pool	Ingo Molnar
	we could do one thing (see the patch below): i think it would be useful to fill up the netlogging skb queue straight at initialization time. Especially if netpoll is used for dumping alone, the system might not be in a situation to fill up the queue at the point of crash, so better be a bit more prepared and keep the pipeline filled. [ I've modified this to be called earlier - mpm ] Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Matt Mackall <mpm@selenic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-11	[NETPOLL]: add retry timeout	Matt Mackall
	Add limited retry logic to netpoll_send_skb Each time we attempt to send, decrement our per-device retry counter. On every successful send, we reset the counter. We delay 50us between attempts with up to 20000 retries for a total of 1 second. After we've exhausted our retries, subsequent failed attempts will try only once until reset by success. Signed-off-by: Matt Mackall <mpm@selenic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-11	[NETPOLL]: netpoll_send_skb simplify	Matt Mackall
	Minor netpoll_send_skb restructuring Restructure to avoid confusing goto and move some bits out of the retry loop. Signed-off-by: Matt Mackall <mpm@selenic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-11	[NETPOLL]: deadlock bugfix	Jeff Moyer
	This fixes an obvious deadlock in the netpoll code. netpoll_rx takes the npinfo->rx_lock. netpoll_rx is also the only caller of arp_reply (through __netpoll_rx). As such, it is not necessary to take this lock. Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Matt Mackall <mpm@selenic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-11	[NETPOLL]: rx_flags bugfix	Jeff Moyer
	Initialize npinfo->rx_flags. The way it stands now, this will have random garbage, and so will incur a locking penalty even when an rx_hook isn't registered and we are not active in the netpoll polling code. Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Matt Mackall <mpm@selenic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-10	[TCP]: Adjust {p,f}ackets_out correctly in tcp_retransmit_skb()	Herbert Xu
	Well I've only found one potential cause for the assertion failure in tcp_mark_head_lost. First of all, this can only occur if cnt > 1 since tp->packets_out is never zero here. If it did hit zero we'd have much bigger problems. So cnt is equal to fackets_out - reordering. Normally fackets_out is less than packets_out. The only reason I've found that might cause fackets_out to exceed packets_out is if tcp_fragment is called from tcp_retransmit_skb with a TSO skb and the current MSS is greater than the MSS stored in the TSO skb. This might occur as the result of an expiring dst entry. In that case, packets_out may decrease (line 1380-1381 in tcp_output.c). However, fackets_out is unchanged which means that it may in fact exceed packets_out. Previously tcp_retrans_try_collapse was the only place where packets_out can go down and it takes care of this by decrementing fackets_out. So we should make sure that fackets_out is reduced by an appropriate amount here as well. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-10	[DECNET]: Use sk_stream_error function rather than DECnet's own	Steven Whitehouse
	Signed-off-by: Steven Whitehouse <steve@chygwyn.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-09	[NET]: Fix memory leak in sys_{send,recv}msg() w/compat	Andrew Morton
	From: Dave Johnson <djohnson+linux-kernel@sw.starentnetworks.com> sendmsg()/recvmsg() syscalls from o32/n32 apps to a 64bit kernel will cause a kernel memory leak if iov_len > UIO_FASTIOV for each syscall! This is because both sys_sendmsg() and verify_compat_iovec() kmalloc a new iovec structure. Only the one from sys_sendmsg() is free'ed. I wrote a simple test program to confirm this after identifying the problem: http://davej.org/programs/testsendmsg.c Note that the below fix will break solaris_sendmsg()/solaris_recvmsg() as it also calls verify_compat_iovec() but expects it to malloc internally. [ I fixed that. -DaveM ] Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-09	[SUNRPC]: Fix nsec --> usec conversion.	David S. Miller
	We need to divide, not multiply. While we're here, use NSEC_PER_USEC instead of a magic constant. Based upon a report from Josip Loncaric and a patch by Andrew Morton. Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-08	Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6	Linus Torvalds

2005-08-08	[IPV4]: Debug cleanup	Heikki Orsila
	Here's a small patch to cleanup NETDEBUG() use in net/ipv4/ for Linux kernel 2.6.13-rc5. Also weird use of indentation is changed in some places. Signed-off-by: Heikki Orsila <heikki.orsila@iki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-08	[PATCH] don't try to do any NAT on untracked connections	Harald Welte
	With the introduction of 'rustynat' in 2.6.11, the old tricks of preventing NAT of 'untracked' connections (e.g. NOTRACK target in 'raw' table) are no longer sufficient. The ip_conntrack_untracked.status \|= IPS_NAT_DONE_MASK effectively prevents iteration of the 'nat' table, but doesn't prevent nat_packet() to be executed. Since nr_manips is gone in 'rustynat', nat_packet() now implicitly thinks that it has to do NAT on the packet. This patch fixes that problem by explicitly checking for ip_conntrack_untracked in ip_nat_fn(). Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-08-06	[IPSEC]: Restrict socket policy loading to CAP_NET_ADMIN.	Herbert Xu
	The interface needs much redesigning if we wish to allow normal users to do this in some way. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-06	[Bluetooth] Add direction and timestamp to stack internal events	Marcel Holtmann
	This patch changes the direction to incoming and adds the timestamp to all stack internal events. Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2005-08-06	[Bluetooth] Remove unused functions and cleanup symbol exports	Marcel Holtmann
	This patch removes the unused bt_dump() function and it also removes its BT_DMP macro. It also unexports the hci_dev_get(), hci_send_cmd() and hci_si_event() functions. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2005-08-06	[Bluetooth] Revert session reference counting fix	Marcel Holtmann
	The fix for the reference counting problem of the signal DLC introduced a race condition which leads to an oops. The reason for it is not fully understood by now and so revert this fix, because the reference counting problem is not crashing the RFCOMM layer and its appearance it rare. Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2005-08-05	[IPV4]: Fix memory leak during fib_info hash expansion.	David S. Miller
	When we grow the tables, we forget to free the olds ones up. Noticed by Yan Zheng. Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-04	[PATCH] tcp: fix TSO cwnd caching bug	Herbert Xu
	tcp_write_xmit caches the cwnd value indirectly in cwnd_quota. When tcp_transmit_skb reduces the cwnd because of tcp_enter_cwr, the cached value becomes invalid. This patch ensures that the cwnd value is always reread after each tcp_transmit_skb call. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-08-04	[PATCH] tcp: fix TSO sizing bugs	David S. Miller
	MSS changes can be lost since we preemptively initialize the tso_segs count for an SKB before we %100 commit to sending it out. So, by the time we send it out, the tso_size information can be stale due to PMTU events. This mucks up all of the logic in our send engine, and can even result in the BUG() triggering in tcp_tso_should_defer(). Another problem we have is that we're storing the tp->mss_cache, not the SACK block normalized MSS, as the tso_size. That's wrong too. Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-30	[NET] Fix too aggressive backoff in dst garbage collection	Denis Lunev
	The bug is evident when it is seen once. dst gc timer was backed off, when gc queue is not empty. But this means that timer quickly backs off, if at least one destination remains in use. Normally, the bug is invisible, because adding new dst entry to queue cancels the backoff. But it shots deadly with destination cache overflow when new destinations are not released for long time f.e. after an interface goes down. The fix is to cancel backoff when something was released. Signed-off-by: Denis Lunev <den@sw.ru> Signed-off-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-30	[NET]: fix oops after tunnel module unload	Alexey Kuznetsov
	Tunnel modules used to obtain module refcount each time when some tunnel was created, which meaned that tunnel could be unloaded only after all the tunnels are deleted. Since killing old MOD_*_USE_COUNT macros this protection has gone. It is possible to return it back as module_get/put, but it looks more natural and practically useful to force destruction of all the child tunnels on module unload. Signed-off-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-30	[NETFILTER] Inherit masq_index to slave connections	Harald Welte
	masq_index is used for cleanup in case the interface address changes (such as a dialup ppp link with dynamic addreses). Without this patch, slave connections are not evicted in such a case, since they don't inherit masq_index. Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>