summaryrefslogtreecommitdiffstats
path: root/kernel
AgeCommit message (Collapse)Author
2007-12-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-schedLinus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched: futex: correctly return -EFAULT not -EINVAL lockdep: in_range() fix lockdep: fix debug_show_all_locks() sched: style cleanups futex: fix for futex_wait signal stack corruption
2007-12-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6: [SPARC64]: Update defconfig. [SPARC]: Add missing of_node_put [SPARC64]: check for possible NULL pointer dereference [SPARC]: Add missing "space" [SPARC64]: Add missing "space" [SPARC64]: Add missing pci_dev_put [SYSCTL_CHECK]: Fix typo in KERN_SPARC_SCONS_PWROFF entry string. [SPARC64]: Missing mdesc_release() in ldc_init().
2007-12-05Avoid potential NULL dereference in unregister_sysctl_tablePavel Emelyanov
register_sysctl_table() can return NULL sometimes, e.g. when kmalloc() returns NULL or when sysctl check fails. I've also noticed, that many (most?) code in the kernel doesn't check for the return value from register_sysctl_table() and later simply calls the unregister_sysctl_table() with potentially NULL argument. This is unlikely on a common kernel configuration, but in case we're dealing with modules and/or fault-injection support, there's a slight possibility of an OOPS. Changing all the users to check for return code from the registering does not look like a good solution - there are too many code doing this and failure in sysctl tables registration is not a good reason to abort module loading (in most of the cases). So I think, that we can just have this check in unregister_sysctl_table just to avoid accidental OOPS-es (actually, the unregister_sysctl_table() did exactly this, before the start_unregistering() appeared). Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-12-05fix clone(CLONE_NEWPID)Eric W. Biederman
Currently we are complicating the code in copy_process, the clone ABI, and if we fix the bugs sys_setsid itself, with an unnecessary open coded version of sys_setsid. So just simplify everything and don't special case the session and pgrp of the initial process in a pid namespace. Having this special case actually presents to user space the classic linux startup conditions with session == pgrp == 0 for /sbin/init. We already handle sending signals to processes in a child pid namespace. We need to handle sending signals to processes in a parent pid namespace for cases like SIGCHILD and SIGIO. This makes nothing extra visible inside a pid namespace. So this extra special case appears to have no redeeming merits. Further removing this special case increases the flexibility of how we can use pid namespaces, by not requiring the initial process in a pid namespace to be a daemon. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-12-05futex: correctly return -EFAULT not -EINVALThomas Gleixner
return -EFAULT not -EINVAL. Found by review. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-12-05lockdep: in_range() fixOleg Nesterov
Torsten Kaiser wrote: | static inline int in_range(const void *start, const void *addr, const void *end) | { | return addr >= start && addr <= end; | } | This will return true, if addr is in the range of start (including) | to end (including). | | But debug_check_no_locks_freed() seems does: | const void *mem_to = mem_from + mem_len | -> mem_to is the last byte of the freed range, that fits in_range | lock_from = (void *)hlock->instance; | -> first byte of the lock | lock_to = (void *)(hlock->instance + 1); | -> first byte of the next lock, not last byte of the lock that is being checked! | | The test is: | if (!in_range(mem_from, lock_from, mem_to) && | !in_range(mem_from, lock_to, mem_to)) | continue; | So it tests, if the first byte of the lock is in the range that is freed ->OK | And if the first byte of the *next* lock is in the range that is freed | -> Not OK. We can also simplify in_range checks, we need only 2 comparisons, not 4. If the lock is not in memory range, it should be either at the left of range or at the right. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
2007-12-05lockdep: fix debug_show_all_locks()Ingo Molnar
fix the oops that can be seen in: http://bugzilla.kernel.org/attachment.cgi?id=13828&action=view it is not safe to print the locks of running tasks. (even with this fix we have a small race - but this is a debug function after all.) Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
2007-12-05sched: style cleanupsIngo Molnar
style cleanup of various changes that were done recently. no code changed: text data bss dec hex filename 23680 2542 28 26250 668a sched.o.before 23680 2542 28 26250 668a sched.o.after Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-12-05futex: fix for futex_wait signal stack corruptionSteven Rostedt
David Holmes found a bug in the -rt tree with respect to pthread_cond_timedwait. After trying his test program on the latest git from mainline, I found the bug was there too. The bug he was seeing that his test program showed, was that if one were to do a "Ctrl-Z" on a process that was in the pthread_cond_timedwait, and then did a "bg" on that process, it would return with a "-ETIMEDOUT" but early. That is, the timer would go off early. Looking into this, I found the source of the problem. And it is a rather nasty bug at that. Here's the relevant code from kernel/futex.c: (not in order in the file) [...] smlinkage long sys_futex(u32 __user *uaddr, int op, u32 val, struct timespec __user *utime, u32 __user *uaddr2, u32 val3) { struct timespec ts; ktime_t t, *tp = NULL; u32 val2 = 0; int cmd = op & FUTEX_CMD_MASK; if (utime && (cmd == FUTEX_WAIT || cmd == FUTEX_LOCK_PI)) { if (copy_from_user(&ts, utime, sizeof(ts)) != 0) return -EFAULT; if (!timespec_valid(&ts)) return -EINVAL; t = timespec_to_ktime(ts); if (cmd == FUTEX_WAIT) t = ktime_add(ktime_get(), t); tp = &t; } [...] return do_futex(uaddr, op, val, tp, uaddr2, val2, val3); } [...] long do_futex(u32 __user *uaddr, int op, u32 val, ktime_t *timeout, u32 __user *uaddr2, u32 val2, u32 val3) { int ret; int cmd = op & FUTEX_CMD_MASK; struct rw_semaphore *fshared = NULL; if (!(op & FUTEX_PRIVATE_FLAG)) fshared = &current->mm->mmap_sem; switch (cmd) { case FUTEX_WAIT: ret = futex_wait(uaddr, fshared, val, timeout); [...] static int futex_wait(u32 __user *uaddr, struct rw_semaphore *fshared, u32 val, ktime_t *abs_time) { [...] struct restart_block *restart; restart = &current_thread_info()->restart_block; restart->fn = futex_wait_restart; restart->arg0 = (unsigned long)uaddr; restart->arg1 = (unsigned long)val; restart->arg2 = (unsigned long)abs_time; restart->arg3 = 0; if (fshared) restart->arg3 |= ARG3_SHARED; return -ERESTART_RESTARTBLOCK; [...] static long futex_wait_restart(struct restart_block *restart) { u32 __user *uaddr = (u32 __user *)restart->arg0; u32 val = (u32)restart->arg1; ktime_t *abs_time = (ktime_t *)restart->arg2; struct rw_semaphore *fshared = NULL; restart->fn = do_no_restart_syscall; if (restart->arg3 & ARG3_SHARED) fshared = &current->mm->mmap_sem; return (long)futex_wait(uaddr, fshared, val, abs_time); } So when the futex_wait is interrupt by a signal we break out of the hrtimer code and set up or return from signal. This code does not return back to userspace, so we set up a RESTARTBLOCK. The bug here is that we save the "abs_time" which is a pointer to the stack variable "ktime_t t" from sys_futex. This returns and unwinds the stack before we get to call our signal. On return from the signal we go to futex_wait_restart, where we update all the parameters for futex_wait and call it. But here we have a problem where abs_time is no longer valid. I verified this with print statements, and sure enough, what abs_time was set to ends up being garbage when we get to futex_wait_restart. The solution I did to solve this (with input from Linus Torvalds) was to add unions to the restart_block to allow system calls to use the restart with specific parameters. This way the futex code now saves the time in a 64bit value in the restart block instead of storing it on the stack. Note: I'm a bit nervious to add "linux/types.h" and use u32 and u64 in thread_info.h, when there's a #ifdef __KERNEL__ just below that. Not sure what that is there for. If this turns out to be a problem, I've tested this with using "unsigned int" for u32 and "unsigned long long" for u64 and it worked just the same. I'm using u32 and u64 just to be consistent with what the futex code uses. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-12-05[SYSCTL_CHECK]: Fix typo in KERN_SPARC_SCONS_PWROFF entry string.David S. Miller
Based upon a report by Mikael Pettersson. Signed-off-by: David S. Miller <davem@davemloft.net>
2007-12-04sched: default to more agressive yield for SCHED_BATCH tasksIngo Molnar
do more agressive yield for SCHED_BATCH tuned tasks: they are all about throughput anyway. This allows a gentler migration path for any apps that relied on stronger yield. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-12-04sched: fix crash in sys_sched_rr_get_interval()Ingo Molnar
Luiz Fernando N. Capitulino reported that sched_rr_get_interval() crashes for SCHED_OTHER tasks that are on an idle runqueue. The fix is to return a 0 timeslice for tasks that are on an idle runqueue. (and which are not running, obviously) this also shrinks the code a bit: text data bss dec hex filename 47903 3934 336 52173 cbcd sched.o.before 47885 3934 336 52155 cbbb sched.o.after Reported-by: Luiz Fernando N. Capitulino <lcapitulino@mandriva.com.br> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-12-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-schedLinus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched: sched: cpu accounting controller (V2)
2007-12-03uml: add !UML dependenciesAl Viro
The previous commit ("uml: keep UML Kconfig in sync with x86") is not enough, unfortunately. If we go that way, we need to add dependencies on !UML for several options. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jeff Dike <jdike@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-12-02sched: cpu accounting controller (V2)Srivatsa Vaddagiri
Commit cfb5285660aad4931b2ebbfa902ea48a37dfffa1 removed a useful feature for us, which provided a cpu accounting resource controller. This feature would be useful if someone wants to group tasks only for accounting purpose and doesnt really want to exercise any control over their cpu consumption. The patch below reintroduces the feature. It is based on Paul Menage's original patch (Commit 62d0df64065e7c135d0002f069444fbdfc64768f), with these differences: - Removed load average information. I felt it needs more thought (esp to deal with SMP and virtualized platforms) and can be added for 2.6.25 after more discussions. - Convert group cpu usage to be nanosecond accurate (as rest of the cfs stats are) and invoke cpuacct_charge() from the respective scheduler classes - Make accounting scalable on SMP systems by splitting the usage counter to be per-cpu - Move the code from kernel/cpu_acct.c to kernel/sched.c (since the code is not big enough to warrant a new file and also this rightly needs to live inside the scheduler. Also things like accessing rq->lock while reading cpu usage becomes easier if the code lived in kernel/sched.c) The patch also modifies the cpu controller not to provide the same accounting information. Tested-by: Balbir Singh <balbir@linux.vnet.ibm.com> Tested the patches on top of 2.6.24-rc3. The patches work fine. Ran some simple tests like cpuspin (spin on the cpu), ran several tasks in the same group and timed them. Compared their time stamps with cpuacct.usage. Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-11-29wait_task_stopped(): pass correct exit_code to wait_noreap_copyout()Scott James Remnant
In wait_task_stopped() exit_code already contains the right value for the si_status member of siginfo, and this is simply set in the non WNOWAIT case. If you call waitid() with a stopped or traced process, you'll get the signal in siginfo.si_status as expected -- however if you call waitid(WNOWAIT) at the same time, you'll get the signal << 8 | 0x7f Pass it unchanged to wait_noreap_copyout(); we would only need to shift it and add 0x7f if we were returning it in the user status field and that isn't used for any function that permits WNOWAIT. Signed-off-by: Scott James Remnant <scott@ubuntu.com> Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Roland McGrath <roland@redhat.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-11-29FRV: fix the extern declaration of kallsyms_num_symsDavid Howells
Fix the extern declaration of kallsyms_num_syms to indicate that the symbol does not reside in the small-data storage space, and so may not be accessed relative to the small data base register. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-11-29Isolate the UTS namespace's domainname and hostname backPavel Emelyanov
Commit 7d69a1f4a72b18876c99c697692b78339d491568 ("remove CONFIG_UTS_NS and CONFIG_IPC_NS") by Cedric Le Goater accidentally removed the code that prevented the uts->hostname and uts->domainname values from being overwritten from another namespace. In other words, setting hostname/domainname via sysfs (echo xxx > /proc/sys/kernel/(host|domain)name) cased the new value to be set in init UTS namespace only. Return the isolation back. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Cedric Le Goater <clg@fr.ibm.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-11-29wait_task_stopped(): don't use task_pid_nr_ns() locklessOleg Nesterov
wait_task_stopped(WNOWAIT) does task_pid_nr_ns() without tasklist/rcu lock, we can read an already freed memory. Use the cached pid_t value. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Looks-good-to: Roland McGrath <roland@redhat.com> Acked-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-11-28sched: clean up kernel/sched_stat.hIngo Molnar
clean up kernel/sched_stat.h. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-11-28sched: clean up overlong line in kernel/sched_debug.cIngo Molnar
clean up overlong line in kernel/sched_debug.c. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-11-28sched: clean up, move __sched_text_start/end to sched.hIngo Molnar
move __sched_text_start/end to sched.h. No code changed: text data bss dec hex filename 26582 2310 28 28920 70f8 sched.o.before 26582 2310 28 28920 70f8 sched.o.after Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-11-28sched: clean up sd_alloc_ctl_cpu_table() definitionIngo Molnar
clean up sd_alloc_ctl_cpu_table() definition. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-11-28softlockup: fix false positives on CONFIG_NOHZThomas Gleixner
David Miller reported soft lockup false-positives that trigger on NOHZ due to CPUs idling for more than 10 seconds. The solution is touch the softlockup watchdog when we return from idle. (by definition we are not 'locked up' when we were idle) http://bugzilla.kernel.org/show_bug.cgi?id=9409 Reported-by: David Miller <davem@davemloft.net> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-11-26Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/net-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/net-2.6: (41 commits) [XFRM]: Fix leak of expired xfrm_states [ATM]: [he] initialize lock and tasklet earlier [IPV4]: Remove bogus ifdef mess in arp_process [SKBUFF]: Free old skb properly in skb_morph [IPV4]: Fix memory leak in inet_hashtables.h when NUMA is on [IPSEC]: Temporarily remove locks around copying of non-atomic fields [TCP] MTUprobe: Cleanup send queue check (no need to loop) [TCP]: MTUprobe: receiver window & data available checks fixed [MAINTAINERS]: tlan list is subscribers-only [SUNRPC]: Remove SPIN_LOCK_UNLOCKED [SUNRPC]: Make xprtsock.c:xs_setup_{udp,tcp}() static [PFKEY]: Sending an SADB_GET responds with an SADB_GET [IRDA]: Compilation for CONFIG_INET=n case [IPVS]: Fix compiler warning about unused register_ip_vs_protocol [ARP]: Fix arp reply when sender ip 0 [IPV6] TCPMD5: Fix deleting key operation. [IPV6] TCPMD5: Check return value of tcp_alloc_md5sig_pool(). [IPV4] TCPMD5: Use memmove() instead of memcpy() because we have overlaps. [IPV4] TCPMD5: Omit redundant NULL check for kfree() argument. ieee80211: Stop net_ratelimit/IEEE80211_DEBUG_DROP log pollution ...
2007-11-26Merge git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-schedLinus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched: sched: bump version of kernel/sched_debug.c sched: fix minimum granularity tunings sched: fix RLIMIT_CPU comment sched: fix kernel/acct.c comment sched: fix prev_stime calculation sched: don't forget to unlock uids_mutex on error paths
2007-11-26Merge git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86: x86: fix APIC related bootup crash on Athlon XP CPUs time: add ADJ_OFFSET_SS_READ x86: export the symbol empty_zero_page on the 32-bit x86 architecture x86: fix kprobes_64.c inlining borkage pci: use pci=bfsort for HP DL385 G2, DL585 G2 x86: correctly set UTS_MACHINE for "make ARCH=x86" lockdep: annotate do_debug() trap handler x86: turn off iommu merge by default x86: fix ACPI compile for LOCAL_APIC=n x86: printk kernel version in WARN_ON and other dump_stack users ACPI: Set max_cstate to 1 for early Opterons. x86: fix NMI watchdog & 'stopped time' problem
2007-11-26Merge git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linusLinus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus: virtio: fix net driver loop case where we fail to restart module: fix and elaborate comments virtio: fix module/device unloading lguest: Fix uninitialized members in example launcher
2007-11-26sched: bump version of kernel/sched_debug.cIngo Molnar
bump version of kernel/sched_debug.c and remove CFS version information from it. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-11-26sched: fix minimum granularity tuningsZou Nan hai
increase the default minimum granularity some more - this gives us more performance in aim7 benchmarks. also correct some comments: we scale with ilog(ncpus) + 1. Signed-off-by: Zou Nan hai <nanhai.zou@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-11-26sched: fix kernel/acct.c commentIngo Molnar
fix kernel/acct.c comment. noticed by Lin Tan. Comment suggested by Olaf Kirch. also see: http://bugzilla.kernel.org/show_bug.cgi?id=8220 Reported-by: tammy000@gmail.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-11-26sched: don't forget to unlock uids_mutex on error pathsPavel Emelyanov
The commit commit 5cb350baf580017da38199625b7365b1763d7180 Author: Dhaval Giani <dhaval@linux.vnet.ibm.com> Date: Mon Oct 15 17:00:14 2007 +0200 sched: group scheduling, sysfs tunables introduced the uids_mutex and the helpers to lock/unlock it. Unfortunately, the error paths of alloc_uid() were not patched to unlock it. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Dhaval Giani <dhaval@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-11-26time: add ADJ_OFFSET_SS_READJohn Stultz
Michael Kerrisk reported that a long standing bug in the adjtimex() system call causes glibc's adjtime(3) function to deliver the wrong results if 'delta' is NULL. add the ADJ_OFFSET_SS_READ API detail, which will be used by glibc to fix this API compatibility bug. Also see: http://bugzilla.kernel.org/show_bug.cgi?id=6761 [ mingo@elte.hu: added patch description and made it backwards compatible ] NOTE: the new flag is defined 0xa001 so that it returns -EINVAL on older kernels - this way glibc can use it safely. Suggested by Ulrich Drepper. Acked-by: Ulrich Drepper <drepper@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2007-11-20[S390] appldata: remove unused binary sysctls.Heiko Carstens
Remove binary sysctls that never worked due to missing strategy functions. Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Gerald Schaefer <geraldsc@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2007-11-20[S390] cmm: remove unused binary sysctls.Heiko Carstens
Remove binary sysctls that never worked due to missing strategy functions. Cc: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2007-11-19[IPVS]: Move remaining sysctl handlers over to CTL_UNNUMBEREDSimon Horman
Switch the remaining IPVS sysctl entries over to to use CTL_UNNUMBERED, I stronly doubt that anyone is using the sys_sysctl interface to these variables. Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-19[IPVS]: Fix sysctl warnings about missing strategy in schedulersSimon Horman
sysctl table check failed: /net/ipv4/vs/lblc_expiration .3.5.21.19 Missing strategy [...] sysctl table check failed: /net/ipv4/vs/lblcr_expiration .3.5.21.20 Missing strategy Switch these entried over to use CTL_UNNUMBERED as clearly the sys_syscal portion wasn't working. This is along the same lines as Christian Borntraeger's patch that fixes up entries with no stratergy in net/ipv4/ipvs/ip_vs_ctl.c Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-19[IPVS]: Fix sysctl warnings about missing strategyChristian Borntraeger
Running the latest git code I get the following messages during boot: sysctl table check failed: /net/ipv4/vs/drop_entry .3.5.21.4 Missing strategy [...] sysctl table check failed: /net/ipv4/vs/drop_packet .3.5.21.5 Missing strategy [...] sysctl table check failed: /net/ipv4/vs/secure_tcp .3.5.21.6 Missing strategy [...] sysctl table check failed: /net/ipv4/vs/sync_threshold .3.5.21.24 Missing strategy I removed the binary sysctl handler for those messages and also removed the definitions in ip_vs.h. The alternative would be to implement a proper strategy handler, but syscall sysctl is deprecated. There are other sysctl definitions that are commented out or work with the default sysctl_data strategy. I did not touch these. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Simon Horman <horms@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-19module: fix and elaborate commentsMatti Linnanvuori
Fix and elaborate comments. Signed-off-by: Matti Linnanvuori <mattilinnanvuori@yahoo.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2007-11-17ntp: fix typo that makes sync_cmos_clock erraticDavid P. Reed
Fix a typo in ntp.c that has caused updating of the persistent (RTC) clock when synced to NTP to behave erratically. When debugging a freeze that arises on my AMD64 machines when I run the ntpd service, I added a number of printk's to monitor the sync_cmos_clock procedure. I discovered that it was not syncing to cmos RTC every 11 minutes as documented, but instead would keep trying every second for hours at a time. The reason turned out to be a typo in sync_cmos_clock, where it attempts to ensure that update_persistent_clock is called very close to 500 msec. after a 1 second boundary (required by the PC RTC's spec). That typo referred to "xtime" in one spot, rather than "now", which is derived from "xtime" but not equal to it. This makes the test erratic, creating a "coin-flip" that decides when update_persistent_clock is called - when it is called, which is rarely, it may be at any time during the one second period, rather than close to 500 msec, so the value written is needlessly incorrect, too. Signed-off-by: David P. Reed Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2007-11-17x86: ignore the sys_getcpu() tcache parameterIngo Molnar
dont use the vgetcpu tcache - it's causing problems with tasks migrating, they'll see the old cache up to a jiffy after the migration, further increasing the costs of the migration. In the worst case they see a complete bogus information from the tcache, when a sys_getcpu() call "invalidated" the cache info by incrementing the jiffies _and_ the cpuid info in the cache and the following vdso_getcpu() call happens after vdso_jiffies have been incremented. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Ulrich Drepper <drepper@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2007-11-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-schedLinus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched: sched: reorder SCHED_FEAT_ bits sched: make sched_nr_latency static sched: remove activate_idle_task() sched: fix __set_task_cpu() SMP race sched: fix SCHED_FIFO tasks & FAIR_GROUP_SCHED sched: fix accounting of interrupts during guest execution on s390
2007-11-15sched: reorder SCHED_FEAT_ bitsIngo Molnar
reorder SCHED_FEAT_ bits so that the used ones come first. Makes tuning instructions easier. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-11-15sched: make sched_nr_latency staticAdrian Bunk
sched_nr_latency can now become static. Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-11-15sched: remove activate_idle_task()Dmitry Adamushko
cpu_down() code is ok wrt sched_idle_next() placing the 'idle' task not at the beginning of the queue. So get rid of activate_idle_task() and make use of activate_task() instead. It is the same as activate_task(), except for the update_rq_clock(rq) call that is redundant. Code size goes down: text data bss dec hex filename 47853 3934 336 52123 cb9b sched.o.before 47828 3934 336 52098 cb82 sched.o.after Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-11-15sched: fix __set_task_cpu() SMP raceDmitry Adamushko
Grant Wilson has reported rare SCHED_FAIR_USER crashes on his quad-core system, which crashes can only be explained via runqueue corruption. there is a narrow SMP race in __set_task_cpu(): after ->cpu is set up to a new value, task_rq_lock(p, ...) can be successfuly executed on another CPU. We must ensure that updates of per-task data have been completed by this moment. this bug has been hiding in the Linux scheduler for an eternity (we never had any explicit barrier for task->cpu in set_task_cpu() - so the bug was introduced in 2.5.1), but only became visible via set_task_cfs_rq() being accidentally put after the task->cpu update. It also probably needs a sufficiently out-of-order CPU to trigger. Reported-by: Grant Wilson <grant.wilson@zen.co.uk> Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-11-15sched: fix SCHED_FIFO tasks & FAIR_GROUP_SCHEDOleg Nesterov
Suppose that the SCHED_FIFO task does switch_uid(new_user); Now, p->se.cfs_rq and p->se.parent both point into the old user_struct->tg because sched_move_task() doesn't call set_task_cfs_rq() for !fair_sched_class case. Suppose that old user_struct/task_group is freed/reused, and the task does sched_setscheduler(SCHED_NORMAL); __setscheduler() sets fair_sched_class, but doesn't update ->se.cfs_rq/parent which point to the freed memory. This means that check_preempt_wakeup() doing while (!is_same_group(se, pse)) { se = parent_entity(se); pse = parent_entity(pse); } may OOPS in a similar way if rq->curr or p did something like above. Perhaps we need something like the patch below, note that __setscheduler() can't do set_task_cfs_rq(). Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-11-15sched: fix accounting of interrupts during guest execution on s390Christian Borntraeger
Currently the scheduler checks for PF_VCPU to decide if this timeslice has to be accounted as guest time. On s390 host interrupts are not disabled during guest execution. This causes theses interrupts to be accounted as guest time if CONFIG_VIRT_CPU_ACCOUNTING is set. Solution is to check if an interrupt triggered account_system_time. As the tick is timer interrupt based, we have to subtract hardirq_offset. I tested the patch on s390 with CONFIG_VIRT_CPU_ACCOUNTING and on x86_64. Seems to work. CC: Avi Kivity <avi@qumranet.com> CC: Laurent Vivier <Laurent.Vivier@bull.net> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-11-15wait_task_stopped: Check p->exit_state instead of TASK_TRACEDRoland McGrath
The original meaning of the old test (p->state > TASK_STOPPED) was "not dead", since it was before TASK_TRACED existed and before the state/exit_state split. It was a wrong correction in commit 14bf01bb0599c89fc7f426d20353b76e12555308 to make this test for TASK_TRACED instead. It should have been changed when TASK_TRACED was introducted and again when exit_state was introduced. Signed-off-by: Roland McGrath <roland@redhat.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Alexey Dobriyan <adobriyan@sw.ru> Cc: Kees Cook <kees@ubuntu.com> Acked-by: Scott James Remnant <scott@ubuntu.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-11-14Merge branch 'master' of ↵Linus Torvalds
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 * 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: [NET]: rt_check_expire() can take a long time, add a cond_resched() [ISDN] sc: Really, really fix warning [ISDN] sc: Fix sndpkt to have the correct number of arguments [TCP] FRTO: Clear frto_highmark only after process_frto that uses it [NET]: Remove notifier block from chain when register_netdevice_notifier fails [FS_ENET]: Fix module build. [TCP]: Make sure write_queue_from does not begin with NULL ptr [TCP]: Fix size calculation in sk_stream_alloc_pskb [S2IO]: Fixed memory leak when MSI-X vector allocation fails [BONDING]: Fix resource use after free [SYSCTL]: Fix warning for token-ring from sysctl checker [NET] random : secure_tcp_sequence_number should not assume CONFIG_KTIME_SCALAR [IWLWIFI]: Not correctly dealing with hotunplug. [TCP] FRTO: Plug potential LOST-bit leak [TCP] FRTO: Limit snd_cwnd if TCP was application limited [E1000]: Fix schedule while atomic when called from mii-tool. [NETX]: Fix build failure added by 2.6.24 statistics cleanup. [EP93xx_ETH]: Build fix after 2.6.24 NAPI changes. [PKT_SCHED]: Check subqueue status before calling hard_start_xmit