summaryrefslogtreecommitdiffstats
path: root/include/linux
AgeCommit message (Collapse)Author
2006-03-20SUNRPC: Run rpci->queue_timeout on the rpciod workqueue instead of genericTrond Myklebust
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20lockd: Don't expose the process pid to the NLM serverTrond Myklebust
Instead we use the nlm_lockowner->pid. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20NFS: Fix a busy inodes issue...Trond Myklebust
The nfs_open_context may live longer than the file descriptor that spawned it, so it needs to carry a reference to the vfsmount. If not, then generic_shutdown_super() may end up being called before reads and writes have been flushed out. Make a couple of functions static while we're at it... Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-19[TG3]: 40-bit DMA workaround part 2Michael Chan
The 40-bit DMA workaround recently implemented for 5714, 5715, and 5780 needs to be expanded because there may be other tg3 devices behind the EPB Express to PCIX bridge in the 5780 class device. For example, some 4-port card or mother board designs have 5704 behind the 5714. All devices behind the EPB require the 40-bit DMA workaround. Thanks to Chris Elmquist again for reporting the problem and testing the patch. Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-11[PATCH] remove __put_task_struct_cb export againChristoph Hellwig
The patch '[PATCH] RCU signal handling' [1] added an export for __put_task_struct_cb, a put_task_struct helper newly introduced in that patch. But the put_task_struct couldn't be used modular previously as __put_task_struct wasn't exported. There are not callers of it in modular code, and it shouldn't be exported because we don't want drivers to hold references to task_structs. This patch removes the export and folds __put_task_struct into __put_task_struct_cb as there's no other caller. [1] http://www2.kernel.org/git/gitweb.cgi?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=e56d090310d7625ecb43a1eeebd479f04affb48b Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Paul E. McKenney <paulmck@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-11[PATCH] ext3: ext3_symlink should use GFP_NOFS allocations insideKirill Korotaev
This patch fixes illegal __GFP_FS allocation inside ext3 transaction in ext3_symlink(). Such allocation may re-enter ext3 code from try_to_free_pages. But JBD/ext3 code keeps a pointer to current journal handle in task_struct and, hence, is not reentrable. This bug led to "Assertion failure in journal_dirty_metadata()" messages. http://bugzilla.openvz.org/show_bug.cgi?id=115 Signed-off-by: Andrey Savochkin <saw@saw.sw.com.sg> Signed-off-by: Kirill Korotaev <dev@openvz.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-09[PATCH] slab: Node rotor for freeing alien caches and remote per cpu pages.Christoph Lameter
The cache reaper currently tries to free all alien caches and all remote per cpu pages in each pass of cache_reap. For a machines with large number of nodes (such as Altix) this may lead to sporadic delays of around ~10ms. Interrupts are disabled while reclaiming creating unacceptable delays. This patch changes that behavior by adding a per cpu reap_node variable. Instead of attempting to free all caches, we free only one alien cache and the per cpu pages from one remote node. That reduces the time spend in cache_reap. However, doing so will lengthen the time it takes to completely drain all remote per cpu pagesets and all alien caches. The time needed will grow with the number of nodes in the system. All caches are drained when they overflow their respective capacity. So the drawback here is only that a bit of memory may be wasted for awhile longer. Details: 1. Rename drain_remote_pages to drain_node_pages to allow the specification of the node to drain of pcp pages. 2. Add additional functions init_reap_node, next_reap_node for NUMA that manage a per cpu reap_node counter. 3. Add a reap_alien function that reaps only from the current reap_node. For us this seems to be a critical issue. Holdoffs of an average of ~7ms cause some HPC benchmarks to slow down significantly. F.e. NAS parallel slows down dramatically. NAS parallel has a 12-16 seconds runtime w/o rotor compared to 5.8 secs with the rotor patches. It gets down to 5.05 secs with the additional interrupt holdoff reductions. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-09[PATCH] mtd: 64 bit fixesAtsushi Nemoto
Fix some bugs in mtd/jffs2 on 64bit platform. The MEMGETBADBLOCK/MEMSETBADBLOCK ioctl are not listed in compat_ioctl.h. And some variables in jffs2 are declared as uint32_t but used to hold size_t values. Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp> Cc: Thomas Gleixner <tglx@linutronix.de> Acked-by: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-08[PATCH] fix file countingDipankar Sarma
I have benchmarked this on an x86_64 NUMA system and see no significant performance difference on kernbench. Tested on both x86_64 and powerpc. The way we do file struct accounting is not very suitable for batched freeing. For scalability reasons, file accounting was constructor/destructor based. This meant that nr_files was decremented only when the object was removed from the slab cache. This is susceptible to slab fragmentation. With RCU based file structure, consequent batched freeing and a test program like Serge's, we just speed this up and end up with a very fragmented slab - llm22:~ # cat /proc/sys/fs/file-nr 587730 0 758844 At the same time, I see only a 2000+ objects in filp cache. The following patch I fixes this problem. This patch changes the file counting by removing the filp_count_lock. Instead we use a separate percpu counter, nr_files, for now and all accesses to it are through get_nr_files() api. In the sysctl handler for nr_files, we populate files_stat.nr_files before returning to user. Counting files as an when they are created and destroyed (as opposed to inside slab) allows us to correctly count open files with RCU. Signed-off-by: Dipankar Sarma <dipankar@in.ibm.com> Cc: "Paul E. McKenney" <paulmck@us.ibm.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-08[PATCH] rcu batch tuningDipankar Sarma
This patch adds new tunables for RCU queue and finished batches. There are two types of controls - number of completed RCU updates invoked in a batch (blimit) and monitoring for high rate of incoming RCUs on a cpu (qhimark, qlowmark). By default, the per-cpu batch limit is set to a small value. If the input RCU rate exceeds the high watermark, we do two things - force quiescent state on all cpus and set the batch limit of the CPU to INTMAX. Setting batch limit to INTMAX forces all finished RCUs to be processed in one shot. If we have more than INTMAX RCUs queued up, then we have bigger problems anyway. Once the incoming queued RCUs fall below the low watermark, the batch limit is set to the default. Signed-off-by: Dipankar Sarma <dipankar@in.ibm.com> Cc: "Paul E. McKenney" <paulmck@us.ibm.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-08[PATCH] percpu_counter_sum()Andrew Morton
Implement percpu_counter_sum(). This is a more accurate but slower version of percpu_counter_read_positive(). We need this for Alex's speedup-ext3_statfs patch and for the nr_file accounting fix. Otherwise these things would be too inaccurate on large CPU counts. Cc: Ravikiran G Thirumalai <kiran@scalex86.org> Cc: Alex Tomas <alex@clusterfs.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-08Mark the pipe file operations staticLinus Torvalds
They aren't used (nor even really usable) outside of pipe.c anyway Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-06[PATCH] Increase max kmalloc size for very large systemsJack Steiner
Systems with extemely large numbers of nodes or cpus need to kmalloc structures larger than is currently supported. This patch increases the maximum supported size for very large systems. This patch should have no effect on current systems. (akpm: why not just use alloc_pages() for sysfs_cpus?) Signed-off-by: Jack Steiner <steiner@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-06[PATCH] memory-hotplug compile fixKAMEZAWA Hiroyuki
include/linux/memory_hotplug.h:53: warning: 'struct page' declared inside parameter list (akpm: I tossed in a couple more possibly-needed-sometime struct decls too) Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-06[PATCH] fix next_timer_interrupt() for hrtimerTony Lindgren
Also from Thomas Gleixner <tglx@linutronix.de> Function next_timer_interrupt() got broken with a recent patch 6ba1b91213e81aa92b5cf7539f7d2a94ff54947c as sys_nanosleep() was moved to hrtimer. This broke things as next_timer_interrupt() did not check hrtimer tree for next event. Function next_timer_interrupt() is needed with dyntick (CONFIG_NO_IDLE_HZ, VST) implementations, as the system can be in idle when next hrtimer event was supposed to happen. At least ARM and S390 currently use next_timer_interrupt(). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Russell King <rmk@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-06[PATCH] i4l: add new PCI IDs for HFC-S PCIKarsten Keil
Add new PCI IDs for HFC-S PCI based ISDN TA 'Primux II S0' and 'Primux II S0' from Gerdes AG Signed-off-by: Martin Bachem <info@colognechip.com> Signed-off-by: Karsten Keil <kkeil@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-02[PATCH] reiserfs: fix unaligned bitmap usageJeff Mahoney
The bitmaps associated with generation numbers for directory entries are declared as an array of ints. On some platforms, this causes alignment exceptions. The following patch uses the standard bitmap declaration macros to declare the bitmaps, fixing the problem. Originally from Takashi Iwai. Signed-off-by: Takashi Iwai <tiwai@suse.de> Acked-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-28Merge branch 'release' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6
2006-02-28[PATCH] Add mm->task_size and fix powerpc vdsoBenjamin Herrenschmidt
This patch adds mm->task_size to keep track of the task size of a given mm and uses that to fix the powerpc vdso so that it uses the mm task size to decide what pages to fault in instead of the current thread flags (which broke when ptracing). (akpm: I expect that mm_struct.task_size will become the way in which we finally sort out the confusion between 32-bit processes and 32-bit mm's. It may need tweaks, but at this stage this patch is powerpc-only.) Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-28[IA64] sysctl option to silence unaligned trap warningsJes Sorensen
Allow sysadmin to disable all warnings about userland apps making unaligned accesses by using: # echo 1 > /proc/sys/kernel/ignore-unaligned-usertrap Rather than having to use prctl on a process by process basis. Default behaivour leaves the warnings enabled. Signed-off-by: Jes Sorensen <jes@sgi.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
2006-02-27[NETFILTER]: Restore {ipt,ip6t,ebt}_LOG compatibilityPatrick McHardy
The nfnetlink_log infrastructure changes broke compatiblity of the LOG targets. They currently use whatever log backend was registered first, which means that if ipt_ULOG was loaded first, no messages will be printed to the ring buffer anymore. Restore compatiblity by using the old log functions by default and only use the nf_log backend if the user explicitly said so. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-02-24Merge master.kernel.org:/home/rmk/linux-2.6-serialLinus Torvalds
2006-02-24[PATCH] flags parameter for linkatUlrich Drepper
I'm currently at the POSIX meeting and one thing covered was the incompatibility of Linux's link() with the POSIX definition. The name. Linux does not follow symlinks, POSIX requires it does. Even if somebody thinks this is a good default behavior we cannot change this because it would break the ABI. But the fact remains that some application might want this behavior. We have one chance to help implementing this without breaking the behavior. For this we could use the new linkat interface which would need a new flags parameter. If the new parameter is AT_SYMLINK_FOLLOW the new behavior could be invoked. I do not want to introduce such a patch now. But we could add the parameter now, just don't use it. The patch below would do this. Can we get this late patch applied before the release more or less fixes the syscall API? Signed-off-by: Ulrich Drepper <drepper@redhat.com> Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-23[SERIAL] Trivial comment fix: include/linux/serial_reg.hMichal Janusz Miroslaw
Trivial comment fix for include/linux/serial_reg.h Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2006-02-22Merge master.kernel.org:/home/rmk/linux-2.6-mmcLinus Torvalds
2006-02-22Revert mount/umount uevent removalGreg Kroah-Hartman
This change reverts the 033b96fd30db52a710d97b06f87d16fc59fee0f1 commit from Kay Sievers that removed the mount/umount uevents from the kernel. Some older versions of HAL still depend on these events to detect when a new device has been mounted. These events are not correctly emitted, and are broken by design, and so, should not be relied upon by any future program. Instead, the /proc/mounts file should be polled to properly detect this kind of event. A feature-removal-schedule.txt entry has been added, noting when this interface will be removed from the kernel. Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-02-22[MMC] Fix mmc_cmd_type() maskRussell King
It's MMC_CMD_MASK not MMC_CMD_TYPE. Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2006-02-21Merge branch 'upstream-fixes' of ↵Linus Torvalds
master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/libata-dev
2006-02-20[PATCH] Fix compile for CONFIG_SYSVIPC=n or CONFIG_SYSCTL=nStephen Rothwell
The compat syscalls are added to sys_ni.c since they are not defined if the above CONFIG options are off. Also, nfs would not build with CONFIG_SYSCTL off. Noticed by Arthur Othieno. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Cc: "David S. Miller" <davem@davemloft.net> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-20[PATCH] Fix undefined symbols for nommu architectureLuke Yang
Signed-off-by: Luke Yang <luke.adi@gmail.com> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-20[PATCH] suspend-to-ram: allow video options to be set at runtimePavel Machek
Currently, acpi video options can only be set on kernel command line. That's little inflexible; I'd like userland s2ram application that just works, and modifying kernel command line according to whitelist is not fun. It is better to just allow s2ram application to set video options just before suspend (according to the whitelist). This implements sysctl to allow setting suspend video options without reboot. (akpm: Documentation updates for this new sysctl are pending..) Signed-off-by: Pavel Machek <pavel@suse.cz> Cc: "Brown, Len" <len.brown@intel.com> Cc: "Antonino A. Daplas" <adaplas@pol.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-20[PATCH] Terminate process that fails on a constrained allocationChristoph Lameter
Some allocations are restricted to a limited set of nodes (due to memory policies or cpuset constraints). If the page allocator is not able to find enough memory then that does not mean that overall system memory is low. In particular going postal and more or less randomly shooting at processes is not likely going to help the situation but may just lead to suicide (the whole system coming down). It is better to signal to the process that no memory exists given the constraints that the process (or the configuration of the process) has placed on the allocation behavior. The process may be killed but then the sysadmin or developer can investigate the situation. The solution is similar to what we do when running out of hugepages. This patch adds a check before we kill processes. At that point performance considerations do not matter much so we just scan the zonelist and reconstruct a list of nodes. If the list of nodes does not contain all online nodes then this is a constrained allocation and we should kill the current process. Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Andi Kleen <ak@muc.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-20[PATCH] libata: fix qc->n_elem == 0 case handling in ata_qc_next_sgTejun Heo
This patch makes ata_for_each_sg() start with pad_sgent when qc->n_elem is zero. Previously, ata_for_each_sg() unconditionally started with qc->__sg, handling the first sg to fill_sg() routines even when the entry was invalid. And while at it, unwind ?: in ata_qc_next_sg() into if statement. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
2006-02-17[PATCH] Provide an interface for getting the current tick lengthPaul Mackerras
This provides an interface for arch code to find out how many nanoseconds are going to be added on to xtime by the next call to do_timer. The value returned is a fixed-point number in 52.12 format in nanoseconds. The reason for this format is that it gives the full precision that the timekeeping code is using internally. The motivation for this is to fix a problem that has arisen on 32-bit powerpc in that the value returned by do_gettimeofday drifts apart from xtime if NTP is being used. PowerPC is now using a lockless do_gettimeofday based on reading the timebase register and performing some simple arithmetic. (This method of getting the time is also exported to userspace via the VDSO.) However, the factor and offset it uses were calculated based on the nominal tick length and weren't being adjusted when NTP varied the tick length. Note that 64-bit powerpc has had the lockless do_gettimeofday for a long time now. It also had an extremely hairy routine that got called from the 32-bit compat routine for adjtimex, which adjusted the factor and offset according to what it thought the timekeeping code was going to do. Not only was this only called if a 32-bit task did adjtimex (i.e. not if a 64-bit task did adjtimex), it was also duplicating computations from kernel/timer.c and it wasn't clear that it was (still) correct. The simple solution is to ask the timekeeping code how long the current jiffy will be on each timer interrupt, after calling do_timer. If this jiffy will be a different length from the last one, we then need to compute new values for the factor and offset used in the lockless do_gettimeofday. In this way we can keep xtime and do_gettimeofday in sync, even when NTP is varying the tick length. Note that when adjtimex varies the tick length, it almost always introduces the variation from the next tick on. The only case I could see where adjtimex would vary the length of the current tick is when an old-style adjtime adjustment is being cancelled. (It's not clear to me why the adjustment has to be cancelled immediately rather than from the next tick on.) Thus I don't see any real need for a hook in adjtimex; the rare case of an old-style adjustment being cancelled can be fixed up at the next tick. Signed-off-by: Paul Mackerras <paulus@samba.org> Acked-by: john stultz <johnstul@us.ibm.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-17[PATCH] x86_64: Add boot option to disable randomized mappings and cleanupAndi Kleen
AMD SimNow!'s JIT doesn't like them at all in the guest. For distribution installation it's easiest if it's a boot time option. Also I moved the variable to a more appropiate place and make it independent from sysctl And marked __read_mostly which it is. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-15Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds
2006-02-15[PATCH] hrtimer: fix multiple macro argument expansionRoman Zippel
For two macros the arguments were expanded twice, change them to inline functions to avoid it. Signed-off-by: Roman Zippel <zippel@linux-m68k.org> Acked-by: Ingo Molnar <mingo@elte.hu> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-15[NETFILTER]: Don't invoke okfn in CONFIG_NETFILTER=n variant of nf_hook()Patrick McHardy
nf_hook() is supposed to call the netfilter hook and return control of the packet back to the caller in case it may pass, the okfn is only used for queueing. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-02-15[XFRM]: Fix SNAT-related crash in xfrm4_output_finishPatrick McHardy
When a packet matching an IPsec policy is SNATed so it doesn't match any policy anymore it looses its xfrm bundle, which makes xfrm4_output_finish crash because of a NULL pointer dereference. This patch directs these packets to the original output path instead. Since the packets have already passed the POST_ROUTING hook, but need to start at the beginning of the original output path which includes another POST_ROUTING invocation, a flag is added to the IPCB to indicate that the packet was rerouted and doesn't need to pass the POST_ROUTING hook again. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-02-15[PATCH] fix zap_thread's ptrace related problemsOleg Nesterov
1. The tracee can go from ptrace_stop() to do_signal_stop() after __ptrace_unlink(p). 2. It is unsafe to __ptrace_unlink(p) while p->parent may wait for tasklist_lock in ptrace_detach(). Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Roland McGrath <roland@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Christoph Hellwig <hch@lst.de> Cc: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-15[NETFILTER]: Fix xfrm lookup after SNATPatrick McHardy
To find out if a packet needs to be handled by IPsec after SNAT, packets are currently rerouted in POST_ROUTING and a new xfrm lookup is done. This breaks SNAT of non-unicast packets to non-local addresses because the packet is routed as incoming packet and no neighbour entry is bound to the dst_entry. In general, it seems to be a bad idea to replace the dst_entry after the packet was already sent to the output routine because its state might not match what's expected. This patch changes the xfrm lookup in POST_ROUTING to re-use the original dst_entry without routing the packet again. This means no policy routing can be used for transport mode transforms (which keep the original route) when packets are SNATed to match the policy, but it looks like the best we can do for now. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-02-14[PATCH] sched: revert "filter affine wakeups"Chen, Kenneth W
Revert commit d7102e95b7b9c00277562c29aad421d2d521c5f6: [PATCH] sched: filter affine wakeups Apparently caused more than 10% performance regression for aim7 benchmark. The setup in use is 16-cpu HP rx8620, 64Gb of memory and 12 MSA1000s with 144 disks. Each disk is 72Gb with a single ext3 filesystem (courtesy of HP, who supplied benchmark results). The problem is, for aim7, the wake-up pattern is random, but it still needs load balancing action in the wake-up path to achieve best performance. With the above commit, lack of load balancing hurts that workload. However, for workloads like database transaction processing, the requirement is exactly opposite. In the wake up path, best performance is achieved with absolutely zero load balancing. We simply wake up the process on the CPU that it was previously run. Worst performance is obtained when we do load balancing at wake up. There isn't an easy way to auto detect the workload characteristics. Ingo's earlier patch that detects idle CPU and decide whether to load balance or not doesn't perform with aim7 either since all CPUs are busy (it causes even bigger perf. regression). Revert commit d7102e95b7b9c00277562c29aad421d2d521c5f6, which causes more than 10% performance regression with aim7. Signed-off-by: Ken Chen <kenneth.w.chen@intel.com> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-14[PATCH] NLM: Fix the NLM_GRANTED callback checksTrond Myklebust
If 2 threads attached to the same process are blocking on different locks on different files (maybe even on different servers) but have the same lock arguments (i.e. same offset+length - actually quite common, since most processes try to lock the entire file) then the first GRANTED call that wakes one up will also wake the other. Currently when the NLM_GRANTED callback comes in, lockd walks the list of blocked locks in search of a match to the lock that the NLM server has granted. Although it checks the lock pid, start and end, it fails to check the filehandle and the server address. By checking the filehandle and server IP address, we ensure that this only happens if the locks truly are referencing the same file. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-14[PATCH] jbd: revert checkpoint list changesMark Fasheh
This patch reverts commit f93ea411b73594f7d144855fd34278bcf34a9afc: [PATCH] jbd: split checkpoint lists This broke journal_flush() for OCFS2, which is its method of being sure that metadata is sent to disk for another node. And two related commits 8d3c7fce2d20ecc3264c8d8c91ae3beacdeaed1b and 43c3e6f5abdf6acac9b90c86bf03f995bf7d3d92 with the subjects: [PATCH] jbd: log_do_checkpoint fix [PATCH] jbd: remove_transaction fix These seem to be incremental bugfixes on the original patch and as such are no longer needed. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com> Cc: Jan Kara <jack@ucw.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-11[PATCH] nvidiafb: Add support for Geforce4 MX 4000Antonino A. Daplas
Add support for Geforce4 MX 4000 (0x185) Signed-off-by: Antonino Daplas <adaplas@pol.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-11[PATCH] select: fix returned timevalAndrew Morton
With David Woodhouse <dwmw2@infradead.org> select() presently has a habit of increasing the value of the user's `timeout' argument on return. We were writing back a timeout larger than the original. We _deliberately_ round up, since we know we must wait at _least_ as long as the caller asks us to. The patch adds a couple of helper functions for magnitude comparison of timespecs and of timevals, and uses them to prevent the various poll and select functions from returning a timeout which is larger than the one which was passed in. The patch also fixes a bug in compat_sys_pselect7(): it was adding the new timeout value to the old one and was returning that. It should just return the new timeout value. (We have various handy timespec/timeval-to-from-nsec conversion functions in time.h. But this code open-codes it all). Cc: "David S. Miller" <davem@davemloft.net> Cc: Andi Kleen <ak@muc.de> Cc: Ulrich Drepper <drepper@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: george anzinger <george@mvista.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-11[PATCH] fstatat64 supportUlrich Drepper
The *at patches introduced fstatat and, due to inusfficient research, I used the newfstat functions generally as the guideline. The result is that on 32-bit platforms we don't have all the information needed to implement fstatat64. This patch modifies the code to pass up 64-bit information if __ARCH_WANT_STAT64 is defined. I renamed the syscall entry point to make this clear. Other archs will continue to use the existing code. On x86-64 the compat code is implemented using a new sys32_ function. this is what is done for the other stat syscalls as well. This patch might break some other archs (those which define __ARCH_WANT_STAT64 and which already wired up the syscall). Yet others might need changes to accomodate the compatibility mode. I really don't want to do that work because all this stat handling is a mess (more so in glibc, but the kernel is also affected). It should be done by the arch maintainers. I'll provide some stand-alone test shortly. Those who are eager could compile glibc and run 'make check' (no installation needed). The patch below has been tested on x86 and x86-64. Signed-off-by: Ulrich Drepper <drepper@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Andi Kleen <ak@muc.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-10[PATCH] tty buffering stall fixPaul Fulghum
Prevent stalled processing of received data when a driver allocates tty buffer space but does not immediately follow the allocation with more data and a call to schedule receive tty processing. (example: hvc_console) This bug was introduced by the first locking patch for the new tty buffering. Signed-off-by: Paul Fulghum <paulkf@microgate.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-10[PATCH] x86: don't initialise cpu_possible_map to all onesAndrew Morton
Initialising cpu_possible_map to all-ones with CONFIG_HOTPLUG_CPU means that a) All for_each_cpu() loops will iterate across all NR_CPUS CPUs, rather than over possible ones. That can be quite expensive. b) Soon we'll be allocating per-cpu areas only for possible CPUs. So with CPU_MASK_ALL, we'll be wasting memory. I also switched voyager over to not use CPU_MASK_ALL in the non-CPU-hotplug case. Should be OK.. I note that parisc is also using CPU_MASK_ALL. Suggest that it stop doing that. Cc: James Bottomley <James.Bottomley@steeleye.com> Cc: Kyle McMartin <kyle@mcmartin.ca> Cc: Paul Jackson <pj@sgi.com> Cc: Ashok Raj <ashok.raj@intel.com> Cc: Zwane Mwaikambo <zwane@linuxpower.ca> Cc: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-10[PATCH] kexec: fix in free initrd when overlapped with crashkernel regionHaren Myneni
It is possible that the reserved crashkernel region can be overlapped with initrd since the bootloader sets the initrd location. When the initrd region is freed, the second kernel memory will not be contiguous. The Kexec_load can cause an oops since there is no contiguous memory to write the second kernel or this memory could be used in the first kernel itself and may not be part of the dump. For example, on powerpc, the initrd is located at 36MB and the crashkernel starts at 32MB. The kexec_load caused panic since writing into non-allocated memory (after 36MB). We could see the similar issue even on other archs. One possibility is to move the initrd outside of crashkernel region. But, the initrd region will be freed anyway before the system is up. This patch fixes this issue and frees only regions that are not part of crashkernel memory in case overlaps. Signed-off-by: Haren Myneni <haren@us.ibm.com> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Vivek Goyal <vgoyal@in.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>