summaryrefslogtreecommitdiffstats
path: root/kernel
AgeCommit message (Collapse)Author
2008-07-22module: don't use stop_machine for waiting rmmodRusty Russell
rmmod has a little-used "-w" option, meaning that instead of failing if the module is in use, it should block until the module becomes unused. In this case, we don't need to use stop_machine: Max Krasnyansky indicated that would be useful for SystemTap which loads/unloads new modules frequently. Cc: Max Krasnyansky <maxk@qualcomm.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-07-22Merge branch 'linus' into x86/x2apicIngo Molnar
2008-07-22genirq: remove last NO_IDLE_HZ leftoversThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-07-21sysdev: Pass the attribute to the low level sysdev show/store functionAndi Kleen
This allow to dynamically generate attributes and share show/store functions between attributes. Right now most attributes are generated by special macros and lots of duplicated code. With the attribute passed it's instead possible to attach some data to the attribute and then use that in shared low level functions to do different things. I need this for the dynamically generated bank attributes in the x86 machine check code, but it'll allow some further cleanups. I converted all users in tree to the new show/store prototype. It's a single huge patch to avoid unbisectable sections. Runtime tested: x86-32, x86-64 Compiled only: ia64, powerpc Not compile tested/only grep converted: sh, arm, avr32 Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-07-20dma-coherent: add documentation to new interfacesDmitry Baryshkov
Signed-off-by: Dmitry Baryshkov <dbaryshkov@gmail.com> Cc: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-20sched: hrtick_enabled() should use cpu_active()Ingo Molnar
Peter pointed out that hrtick_enabled() should use cpu_active(). Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-20Merge branch 'sched/urgent' into sched/develIngo Molnar
2008-07-20sched, x86: clean up hrtick implementationPeter Zijlstra
random uvesafb failures were reported against Gentoo: http://bugs.gentoo.org/show_bug.cgi?id=222799 and Mihai Moldovan bisected it back to: > 8f4d37ec073c17e2d4aa8851df5837d798606d6f is first bad commit > commit 8f4d37ec073c17e2d4aa8851df5837d798606d6f > Author: Peter Zijlstra <a.p.zijlstra@chello.nl> > Date: Fri Jan 25 21:08:29 2008 +0100 > > sched: high-res preemption tick Linus suspected it to be hrtick + vm86 interaction and observed: > Btw, Peter, Ingo: I think that commit is doing bad things. They aren't > _incorrect_ per se, but they are definitely bad. > > Why? > > Using random _TIF_WORK_MASK flags is really impolite for doing > "scheduling" work. There's a reason that arch/x86/kernel/entry_32.S > special-cases the _TIF_NEED_RESCHED flag: we don't want to exit out of > vm86 mode unnecessarily. > > See the "work_notifysig_v86" label, and how it does that > "save_v86_state()" thing etc etc. Right, I never liked having to fiddle with those TIF flags. Initially I needed it because the hrtimer base lock could not nest in the rq lock. That however is fixed these days. Currently the only reason left to fiddle with the TIF flags is remote wakeups. We cannot program a remote cpu's hrtimer. I've been thinking about using the new and improved IPI function call stuff to implement hrtimer_start_on(). However that does require that smp_call_function_single(.wait=0) works from interrupt context - /me looks at the latest series from Jens - Yes that does seem to be supported, good. Here's a stab at cleaning this stuff up ... Mihai reported test success as well. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Tested-by: Mihai Moldovan <ionic@ionic.de> Cc: Michal Januszewski <spock@gentoo.org> Cc: Antonino Daplas <adaplas@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-18Merge branch 'linus' into x86/x2apicIngo Molnar
2008-07-18cpumask: Optimize cpumask_of_cpu in kernel/time/tick-common.cMike Travis
* Optimize various places where a pointer to the cpumask_of_cpu value will result in reducing stack pressure. Signed-off-by: Mike Travis <travis@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-18cpumask: Replace cpumask_of_cpu with cpumask_of_cpu_ptrMike Travis
* This patch replaces the dangerous lvalue version of cpumask_of_cpu with new cpumask_of_cpu_ptr macros. These are patterned after the node_to_cpumask_ptr macros. In general terms, if there is a cpumask_of_cpu_map[] then a pointer to the cpumask_of_cpu_map[cpu] entry is used. The cpumask_of_cpu_map is provided when there is a large NR_CPUS count, reducing greatly the amount of code generated and stack space used for cpumask_of_cpu(). The pointer to the cpumask_t value is needed for calling set_cpus_allowed_ptr() to reduce the amount of stack space needed to pass the cpumask_t value. If there isn't a cpumask_of_cpu_map[], then a temporary variable is declared and filled in with value from cpumask_of_cpu(cpu) as well as a pointer variable pointing to this temporary variable. Afterwards, the pointer is used to reference the cpumask value. The compiler will optimize out the extra dereference through the pointer as well as the stack space used for the pointer, resulting in identical code. A good example of the orthogonal usages is in net/sunrpc/svc.c: case SVC_POOL_PERCPU: { unsigned int cpu = m->pool_to[pidx]; cpumask_of_cpu_ptr(cpumask, cpu); *oldmask = current->cpus_allowed; set_cpus_allowed_ptr(current, cpumask); return 1; } case SVC_POOL_PERNODE: { unsigned int node = m->pool_to[pidx]; node_to_cpumask_ptr(nodecpumask, node); *oldmask = current->cpus_allowed; set_cpus_allowed_ptr(current, nodecpumask); return 1; } Signed-off-by: Mike Travis <travis@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-18Merge branch 'linus' into cpus4096Ingo Molnar
Conflicts: drivers/acpi/processor_throttling.c Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-18Generic dma-coherent: fix DMA_MEMORY_EXCLUSIVEDmitry Baryshkov
Don't rewrite successfull allocation return values in case the memory was marked with DMA_MEMORY_EXCLUSIVE. Signed-off-by: Dmitry Baryshkov <dbaryshkov@gmail.com> Cc: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-18Merge branch 'linus' into core/generic-dma-coherentIngo Molnar
Conflicts: kernel/Makefile Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-18Merge branch 'linus' into timers/nohzIngo Molnar
2008-07-18genirq: enable polling for disabled screaming irqsEric W. Biederman
When we disable a screaming irq we never see it again. If the irq line is shared or if the driver half works this is a real pain. So periodically poll the handlers for screaming interrupts. I use a timer instead of the classic irq poll technique of working off the timer interrupt because when we use the local apic timers note_interrupt is never called (bug?). Further on a system with dynamic ticks the timer interrupt might not even fire unless there is a timer telling it it needs to. I forced this case on my test system with an e1000 nic and my ssh session remained responsive despite the interrupt handler only being called every 10th of a second. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-18ftrace: only trace preempt off with preempt tracerSteven Rostedt
When PREEMPT_TRACER and IRQSOFF_TRACER are both configured and irqsoff tracer is running, the preempt_off sections might also be traced. Thanks to Andrew Morton for pointing out my mistake of spin_lock disabling interrupts while he was reviewing ftrace.txt. Seems that my example I used actually hit this bug. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-18kthread: reduce stack pressure in create_kthread and kthreaddMike Travis
* Replace: set_cpus_allowed(..., CPU_MASK_ALL) with: set_cpus_allowed_ptr(..., CPU_MASK_ALL_PTR) to remove excessive stack requirements when NR_CPUS=4096. Signed-off-by: Mike Travis <travis@sgi.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-18softlockup: fix invalid proc_handler for softlockup_panicHiroshi Shimamoto
The type of softlockup_panic is int, but the proc_handler is proc_doulongvec_minmax(). This handler is for unsigned long. This handler should be proc_dointvec_minmax(). Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-18nohz: prevent tick stop outside of the idle loopThomas Gleixner
Jack Ren and Eric Miao tracked down the following long standing problem in the NOHZ code: scheduler switch to idle task enable interrupts Window starts here ----> interrupt happens (does not set NEED_RESCHED) irq_exit() stops the tick ----> interrupt happens (does set NEED_RESCHED) return from schedule() cpu_idle(): preempt_disable(); Window ends here The interrupts can happen at any point inside the race window. The first interrupt stops the tick, the second one causes the scheduler to rerun and switch away from idle again and we end up with the tick disabled. The fact that it needs two interrupts where the first one does not set NEED_RESCHED and the second one does made the bug obscure and extremly hard to reproduce and analyse. Kudos to Jack and Eric. Solution: Limit the NOHZ functionality to the idle loop to make sure that we can not run into such a situation ever again. cpu_idle() { preempt_disable(); while(1) { tick_nohz_stop_sched_tick(1); <- tell NOHZ code that we are in the idle loop while (!need_resched()) halt(); tick_nohz_restart_sched_tick(); <- disables NOHZ mode preempt_enable_no_resched(); schedule(); preempt_disable(); } } In hindsight we should have done this forever, but ... /me grabs a large brown paperbag. Debugged-by: Jack Ren <jack.ren@marvell.com>, Debugged-by: eric miao <eric.y.miao@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-07-18rcu classic: new algorithm for callbacks-processing(v2)Lai Jiangshan
This is v2, it's a little deference from v1 that I had send to lkml. use ACCESS_ONCE use rcu_batch_after/rcu_batch_before for batch # comparison. rcutorture test result: (hotplugs: do cpu-online/offline once per second) No CONFIG_NO_HZ: OK, 12hours No CONFIG_NO_HZ, hotplugs: OK, 12hours CONFIG_NO_HZ=y: OK, 24hours CONFIG_NO_HZ=y, hotplugs: Failed. (Failed also without my patch applied, exactly the same bug occurred, http://lkml.org/lkml/2008/7/3/24) v1's email thread: http://lkml.org/lkml/2008/6/2/539 v1's description: The code/algorithm of the implement of current callbacks-processing is very efficient and technical. But when I studied it and I found a disadvantage: In multi-CPU systems, when a new RCU callback is being queued(call_rcu[_bh]), this callback will be invoked after the grace period for the batch with batch number = rcp->cur+2 has completed very very likely in current implement. Actually, this callback can be invoked after the grace period for the batch with batch number = rcp->cur+1 has completed. The delay of invocation means that latency of synchronize_rcu() is extended. But more important thing is that the callbacks usually free memory, and these works are delayed too! it's necessary for reclaimer to free memory as soon as possible when left memory is few. A very simple way can solve this problem: a field(struct rcu_head::batch) is added to record the batch number for the RCU callback. And when a new RCU callback is being queued, we determine the batch number for this callback(head->batch = rcp->cur+1) and we move this callback to rdp->donelist if we find that head->batch <= rcp->completed when we process callbacks. This simple way reduces the wait time for invocation a lot. (about 2.5Grace Period -> 1.5Grace Period in average in multi-CPU systems) This is my algorithm. But I do not add any field for struct rcu_head in my implement. We just need to memorize the last 2 batches and their batch number, because these 2 batches include all entries that for whom the grace period hasn't completed. So we use a special linked-list rather than add a field. Please see the comment of struct rcu_data. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Dipankar Sarma <dipankar@in.ibm.com> Cc: Gautham Shenoy <ego@in.ibm.com> Cc: Dhaval Giani <dhaval@linux.vnet.ibm.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-18rcu classic: simplify the next pending batchLai Jiangshan
use a batch number(rcp->pending) instead of a flag(rcp->next_pending) rcu_start_batch() need to change this flag, so mb()s is needed for memory-access safe. but(after this patch applied) rcu_start_batch() do not change this batch number(rcp->pending), rcp->pending is managed by __rcu_process_callbacks only, and troublesome mb()s are eliminated. And codes look simpler and clearer. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Dipankar Sarma <dipankar@in.ibm.com> Cc: Gautham Shenoy <ego@in.ibm.com> Cc: Dhaval Giani <dhaval@linux.vnet.ibm.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-18sched: fix warning in inc_rt_tasks() to not declare variable 'rq' if it's ↵David Howells
not needed Fix inc_rt_tasks() to not declare variable 'rq' if it's not needed. It is declared if CONFIG_SMP or CONFIG_RT_GROUP_SCHED, but only used if CONFIG_SMP. This is a consequence of patch 1f11eb6a8bc92536d9e93ead48fa3ffbd1478571 plus patch 1100ac91b6af02d8639d518fad5b434b1bf44ed6. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-18ftrace: fix 4d3702b6 (post-v2.6.26): WARNING: at kernel/lockdep.c:2731 ↵Steven Rostedt
check_flags (ftrace) On Wed, 16 Jul 2008, Vegard Nossum wrote: > When booting 4d3702b6, I got this huge thing: > > Testing tracer wakeup: <4>------------[ cut here ]------------ > WARNING: at kernel/lockdep.c:2731 check_flags+0x123/0x160() > Modules linked in: > Pid: 1, comm: swapper Not tainted 2.6.26-crashing-02127-g4d3702b6 #30 > [<c015c349>] warn_on_slowpath+0x59/0xb0 > [<c01276c6>] ? ftrace_call+0x5/0x8 > [<c012d800>] ? native_read_tsc+0x0/0x20 > [<c0158de2>] ? sub_preempt_count+0x12/0xf0 > [<c01814eb>] ? trace_hardirqs_off+0xb/0x10 > [<c0182fbc>] ? __lock_acquire+0x2cc/0x1120 > [<c01814eb>] ? trace_hardirqs_off+0xb/0x10 > [<c01276af>] ? mcount_call+0x5/0xa > [<c017ff53>] check_flags+0x123/0x160 > [<c0183e61>] lock_acquire+0x51/0xd0 > [<c01276c6>] ? ftrace_call+0x5/0x8 > [<c0613d4f>] _spin_lock_irqsave+0x5f/0xa0 > [<c01a8d45>] ? ftrace_record_ip+0xf5/0x220 > [<c02d5413>] ? debug_locks_off+0x3/0x50 > [<c01a8d45>] ftrace_record_ip+0xf5/0x220 > [<c01276af>] mcount_call+0x5/0xa > [<c02d5418>] ? debug_locks_off+0x8/0x50 > [<c017ff27>] check_flags+0xf7/0x160 > [<c0183e61>] lock_acquire+0x51/0xd0 > [<c01276c6>] ? ftrace_call+0x5/0x8 > [<c0613d4f>] _spin_lock_irqsave+0x5f/0xa0 > [<c01affcd>] ? wakeup_tracer_call+0x6d/0xf0 > [<c01625e2>] ? _local_bh_enable+0x62/0xb0 > [<c0158ddd>] ? sub_preempt_count+0xd/0xf0 > [<c01affcd>] wakeup_tracer_call+0x6d/0xf0 > [<c0162724>] ? __do_softirq+0xf4/0x110 > [<c01afff1>] ? wakeup_tracer_call+0x91/0xf0 > [<c01276c6>] ftrace_call+0x5/0x8 > [<c0162724>] ? __do_softirq+0xf4/0x110 > [<c0158de2>] ? sub_preempt_count+0x12/0xf0 > [<c01625e2>] _local_bh_enable+0x62/0xb0 > [<c0162724>] __do_softirq+0xf4/0x110 > [<c01627ed>] do_softirq+0xad/0xb0 > [<c0162a15>] irq_exit+0xa5/0xb0 > [<c013a506>] smp_apic_timer_interrupt+0x66/0xa0 > [<c02d3fac>] ? trace_hardirqs_off_thunk+0xc/0x10 > [<c0127449>] apic_timer_interrupt+0x2d/0x34 > [<c018007b>] ? find_usage_backwards+0xb/0xf0 > [<c0613a09>] ? _spin_unlock_irqrestore+0x69/0x80 > [<c014ef32>] tg_shares_up+0x132/0x1d0 > [<c014d2a2>] walk_tg_tree+0x62/0xa0 > [<c014ee00>] ? tg_shares_up+0x0/0x1d0 > [<c014a860>] ? tg_nop+0x0/0x10 > [<c015499d>] update_shares+0x5d/0x80 > [<c0154a2f>] try_to_wake_up+0x6f/0x280 > [<c01a8b90>] ? __ftrace_modify_code+0x0/0xc0 > [<c01a8b90>] ? __ftrace_modify_code+0x0/0xc0 > [<c0154c94>] wake_up_process+0x14/0x20 > [<c01725f6>] kthread_create+0x66/0xb0 > [<c0195400>] ? do_stop+0x0/0x200 > [<c0195320>] ? __stop_machine_run+0x30/0xb0 > [<c0195340>] __stop_machine_run+0x50/0xb0 > [<c0195400>] ? do_stop+0x0/0x200 > [<c01a8b90>] ? __ftrace_modify_code+0x0/0xc0 > [<c061242d>] ? mutex_unlock+0xd/0x10 > [<c01953cc>] stop_machine_run+0x2c/0x60 > [<c01a94d3>] unregister_ftrace_function+0x103/0x180 > [<c01b0517>] stop_wakeup_tracer+0x17/0x60 > [<c01b056f>] wakeup_tracer_ctrl_update+0xf/0x30 > [<c01ab8d5>] trace_selftest_startup_wakeup+0xb5/0x130 > [<c01ab950>] ? trace_wakeup_test_thread+0x0/0x70 > [<c01aadf5>] register_tracer+0x135/0x1b0 > [<c0877d02>] init_wakeup_tracer+0xd/0xf > [<c085d437>] kernel_init+0x1a9/0x2ce > [<c061397b>] ? _spin_unlock_irq+0x3b/0x60 > [<c02d3f9c>] ? trace_hardirqs_on_thunk+0xc/0x10 > [<c0877cf5>] ? init_wakeup_tracer+0x0/0xf > [<c0182646>] ? trace_hardirqs_on_caller+0x126/0x180 > [<c02d3f9c>] ? trace_hardirqs_on_thunk+0xc/0x10 > [<c01269c8>] ? restore_nocheck_notrace+0x0/0xe > [<c085d28e>] ? kernel_init+0x0/0x2ce > [<c085d28e>] ? kernel_init+0x0/0x2ce > [<c01275fb>] kernel_thread_helper+0x7/0x10 > ======================= > ---[ end trace a7919e7f17c0a725 ]--- > irq event stamp: 579530 > hardirqs last enabled at (579528): [<c01826ab>] trace_hardirqs_on+0xb/0x10 > hardirqs last disabled at (579529): [<c01814eb>] trace_hardirqs_off+0xb/0x10 > softirqs last enabled at (579530): [<c0162724>] __do_softirq+0xf4/0x110 > softirqs last disabled at (579517): [<c01627ed>] do_softirq+0xad/0xb0 > irq event stamp: 579530 > hardirqs last enabled at (579528): [<c01826ab>] trace_hardirqs_on+0xb/0x10 > hardirqs last disabled at (579529): [<c01814eb>] trace_hardirqs_off+0xb/0x10 > softirqs last enabled at (579530): [<c0162724>] __do_softirq+0xf4/0x110 > softirqs last disabled at (579517): [<c01627ed>] do_softirq+0xad/0xb0 > PASSED > > Incidentally, the kernel also hung while I was typing in this report. Things get weird between lockdep and ftrace because ftrace can be called within lockdep internal code (via the mcount pointer) and lockdep can be called with ftrace (via spin_locks). Signed-off-by: Steven Rostedt <srostedt@redhat.com> Tested-by: Vegard Nossum <vegard.nossum@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-18cpu hotplug: Make cpu_active_map synchronization dependency clearMax Krasnyansky
This goes on top of the cpu_active_map (take 2) patch. Currently we depend on the stop_machine to provide nescessesary synchronization for the cpu_active_map updates. As Dmitry Adamushko pointed this is fragile and is not much clearer than the previous scheme. In other words we do not want to depend on the internal stop machine operation here. So make the synchronization rules clear by doing synchronize_sched() after clearing out cpu active bit. Tested on quad-Core2 with: while true; do for i in 1 2 3; do echo 0 > /sys/devices/system/cpu/cpu$i/online done for i in 1 2 3; do echo 1 > /sys/devices/system/cpu/cpu$i/online done done and stress -c 200 No lockdep, preempt or other complaints. Signed-off-by: Max Krasnyansky <maxk@qualcomm.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-18cpu hotplug, sched: Introduce cpu_active_map and redo sched domain managment ↵Max Krasnyansky
(take 2) This is based on Linus' idea of creating cpu_active_map that prevents scheduler load balancer from migrating tasks to the cpu that is going down. It allows us to simplify domain management code and avoid unecessary domain rebuilds during cpu hotplug event handling. Please ignore the cpusets part for now. It needs some more work in order to avoid crazy lock nesting. Although I did simplfy and unify domain reinitialization logic. We now simply call partition_sched_domains() in all the cases. This means that we're using exact same code paths as in cpusets case and hence the test below cover cpusets too. Cpuset changes to make rebuild_sched_domains() callable from various contexts are in the separate patch (right next after this one). This not only boots but also easily handles while true; do make clean; make -j 8; done and while true; do on-off-cpu 1; done at the same time. (on-off-cpu 1 simple does echo 0/1 > /sys/.../cpu1/online thing). Suprisingly the box (dual-core Core2) is quite usable. In fact I'm typing this on right now in gnome-terminal and things are moving just fine. Also this is running with most of the debug features enabled (lockdep, mutex, etc) no BUG_ONs or lockdep complaints so far. I believe I addressed all of the Dmitry's comments for original Linus' version. I changed both fair and rt balancer to mask out non-active cpus. And replaced cpu_is_offline() with !cpu_active() in the main scheduler code where it made sense (to me). Signed-off-by: Max Krasnyanskiy <maxk@qualcomm.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Gregory Haskins <ghaskins@novell.com> Cc: dmitry.adamushko@gmail.com Cc: pj@sgi.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-18sched: rework of "prioritize non-migratable tasks over migratable ones"Dmitry Adamushko
(1) handle in a generic way all cases when a newly woken-up task is not migratable (not just a corner case when "rt_se->nr_cpus_allowed == 1") (2) if current is to be preempted, then make sure "p" will be picked up by pick_next_task_rt(). i.e. move task's group at the head of its list as well. currently, it's not a case for the group-scheduling case as described here: http://www.ussg.iu.edu/hypermail/linux/kernel/0807.0/0134.html Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Gregory Haskins <ghaskins@novell.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-18sched: reduce stack size in isolated_cpu_setup()Mike Travis
* Remove 16k stack requirements in isolated_cpu_setup when NR_CPUS=4096. Signed-off-by: Mike Travis <travis@sgi.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-18Revert parts of "ftrace: do not trace scheduler functions"Ingo Molnar
the removal of -mno-spe in the !ftrace case was not intended. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-17Merge branch 'linus' into xen-64bitIngo Molnar
2008-07-17Merge branch 'tracing-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: ftrace: do not trace library functions ftrace: do not trace scheduler functions ftrace: fix lockup with MAXSMP ftrace: fix merge buglet
2008-07-17x86, xen, power: fix up config dependencies on PMJeremy Fitzhardinge
Xen save/restore needs bits of code enabled by PM_SLEEP, and PM_SLEEP depends on PM. So make XEN_SAVE_RESTORE depend on PM and PM_SLEEP depend on XEN_SAVE_RESTORE. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Acked-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-17ftrace: do not trace scheduler functionsIngo Molnar
do not trace scheduler functions - it's still a bit fragile and can lock up with: http://redhat.com/~mingo/misc/config-Thu_Jul_17_13_34_52_CEST_2008 Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-16fix dangling zombie when new parent ignores childrenRoland McGrath
This fixes an arcane bug that we think was a regression introduced by commit b2b2cbc4b2a2f389442549399a993a8306420baf. When a parent ignores SIGCHLD (or uses SA_NOCLDWAIT), its children would self-reap but they don't if it's using ptrace on them. When the parent thread later exits and ceases to ptrace a child but leaves other live threads in the parent's thread group, any zombie children are left dangling. The fix makes them self-reap then, as they would have done earlier if ptrace had not been in use. Signed-off-by: Roland McGrath <roland@redhat.com>
2008-07-16do_wait: return security_task_wait() error code in place of -ECHILDRoland McGrath
This reverts the effect of commit f2cc3eb133baa2e9dc8efd40f417106b2ee520f3 "do_wait: fix security checks". That change reverted the effect of commit 73243284463a761e04d69d22c7516b2be7de096c. The rationale for the original commit still stands. The inconsistent treatment of children hidden by ptrace was an unintended omission in the original change and in no way invalidates its purpose. This makes do_wait return the error returned by security_task_wait() (usually -EACCES) in place of -ECHILD when there are some children the caller would be able to wait for if not for the permission failure. A permission error will give the user a clue to look for security policy problems, rather than for mysterious wait bugs. Signed-off-by: Roland McGrath <roland@redhat.com>
2008-07-16ptrace children revampRoland McGrath
ptrace no longer fiddles with the children/sibling links, and the old ptrace_children list is gone. Now ptrace, whether of one's own children or another's via PTRACE_ATTACH, just uses the new ptraced list instead. There should be no user-visible difference that matters. The only change is the order in which do_wait() sees multiple stopped children and stopped ptrace attachees. Since wait_task_stopped() was changed earlier so it no longer reorders the children list, we already know this won't cause any new problems. Signed-off-by: Roland McGrath <roland@redhat.com>
2008-07-16do_wait reorganizationRoland McGrath
This breaks out the guts of do_wait into three subfunctions. The control flow is less nonobvious without so much goto. do_wait_thread and ptrace_do_wait contain the main work of the outer loop. wait_consider_task contains the main work of the inner loop. Signed-off-by: Roland McGrath <roland@redhat.com>
2008-07-16Merge branch 'linux-next' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6 * 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: (72 commits) Revert "x86/PCI: ACPI based PCI gap calculation" PCI: remove unnecessary volatile in PCIe hotplug struct controller x86/PCI: ACPI based PCI gap calculation PCI: include linux/pm_wakeup.h for device_set_wakeup_capable PCI PM: Fix pci_prepare_to_sleep x86/PCI: Fix PCI config space for domains > 0 Fix acpi_pm_device_sleep_wake() by providing a stub for CONFIG_PM_SLEEP=n PCI: Simplify PCI device PM code PCI PM: Introduce pci_prepare_to_sleep and pci_back_from_sleep PCI ACPI: Rework PCI handling of wake-up ACPI: Introduce new device wakeup flag 'prepared' ACPI: Introduce acpi_device_sleep_wake function PCI: rework pci_set_power_state function to call platform first PCI: Introduce platform_pci_power_manageable function ACPI: Introduce acpi_bus_power_manageable function PCI: make pci_name use dev_name PCI: handle pci_name() being const PCI: add stub for pci_set_consistent_dma_mask() PCI: remove unused arch pcibios_update_resource() functions PCI: fix pci_setup_device()'s sprinting into a const buffer ... Fixed up conflicts in various files (arch/x86/kernel/setup_64.c, arch/x86/pci/irq.c, arch/x86/pci/pci.h, drivers/acpi/sleep/main.c, drivers/pci/pci.c, drivers/pci/pci.h, include/acpi/acpi_bus.h) from x86 and ACPI updates manually.
2008-07-16snapshot: Use pm_mutex for mutual exclusionRafael J. Wysocki
We can avoid taking the BKL in snapshot_ioctl() if pm_mutex is used to prevent the ioctls from being executed concurrently. In addition, although it is only possible to open /dev/snapshot once, the task which has done that may spawn a child that will inherit the open descriptor, so in theory they can call snapshot_write(), snapshot_read() and snapshot_release() concurrently. pm_mutex can also be used for mutual exclusion in such cases. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Pavel Machek <pavel@suse.cz> Signed-off-by: Len Brown <len.brown@intel.com>
2008-07-16snapshot: Push BKL down into ioctl handlersAlan Cox
Push BKL down into ioctl handlers - snapshot device. Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>
2008-07-16Freezer: Introduce PF_FREEZER_NOSIGRafael J. Wysocki
The freezer currently attempts to distinguish kernel threads from user space tasks by checking if their mm pointer is unset and it does not send fake signals to kernel threads. However, there are kernel threads, mostly related to networking, that behave like user space tasks and may want to be sent a fake signal to be frozen. Introduce the new process flag PF_FREEZER_NOSIG that will be set by default for all kernel threads and make the freezer only send fake signals to the tasks having PF_FREEZER_NOSIG unset. Provide the set_freezable_with_signal() function to be called by the kernel threads that want to be sent a fake signal for freezing. This patch should not change the freezer's observable behavior. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Pavel Machek <pavel@suse.cz> Signed-off-by: Len Brown <len.brown@intel.com>
2008-07-16force offline the processor during hot-removalZhang Rui
The ACPI device node for the cpu has already been unregistered when acpi_processor_handle_eject is called. Thus we should offline the cpu and continue, rather than a failure here. http://bugzilla.kernel.org/show_bug.cgi?id=9772 Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>
2008-07-16Revert "suspend, xen: enable PM_SLEEP for CONFIG_XEN"Ingo Molnar
This reverts commit 6fbbec428c8e7bb617da2e8a589af2e97bcf3bc4. Rafael doesnt like it - it breaks various assumptions. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-16suspend, xen: enable PM_SLEEP for CONFIG_XENJeremy Fitzhardinge
Xen save/restore requires PM_SLEEP to be set without requiring SUSPEND or HIBERNATION. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-16Merge branch 'linus' into cpus4096Ingo Molnar
Conflicts: arch/x86/xen/smp.c kernel/sched_rt.c net/iucv/iucv.c Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-15Merge branch 'linus' into core/softlockupIngo Molnar
Conflicts: kernel/softlockup.c Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-15generic ipi function calls: wait on alloc failure fallbackJeremy Fitzhardinge
When a GFP_ATOMIC allocation fails, it falls back to allocating the data on the stack and converting it to a waiting call. Make sure we actually wait in this case. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-15Merge branch 'generic-ipi-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'generic-ipi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (22 commits) generic-ipi: more merge fallout generic-ipi: merge fix x86, visws: use mach-default/entry_arch.h x86, visws: fix generic-ipi build generic-ipi: fixlet generic-ipi: fix s390 build bug generic-ipi: fix linux-next tree build failure fix: "smp_call_function: get rid of the unused nonatomic/retry argument" fix: "smp_call_function: get rid of the unused nonatomic/retry argument" fix "smp_call_function: get rid of the unused nonatomic/retry argument" on_each_cpu(): kill unused 'retry' parameter smp_call_function: get rid of the unused nonatomic/retry argument sh: convert to generic helpers for IPI function calls parisc: convert to generic helpers for IPI function calls mips: convert to generic helpers for IPI function calls m32r: convert to generic helpers for IPI function calls arm: convert to generic helpers for IPI function calls alpha: convert to generic helpers for IPI function calls ia64: convert to generic helpers for IPI function calls powerpc: convert to generic helpers for IPI function calls ... Fix trivial conflicts due to rcu updates in kernel/rcupdate.c manually
2008-07-15Merge branch 'generic-ipi' into generic-ipi-for-linusIngo Molnar
Conflicts: arch/powerpc/Kconfig arch/s390/kernel/time.c arch/x86/kernel/apic_32.c arch/x86/kernel/cpu/perfctr-watchdog.c arch/x86/kernel/i8259_64.c arch/x86/kernel/ldt.c arch/x86/kernel/nmi_64.c arch/x86/kernel/smpboot.c arch/x86/xen/smp.c include/asm-x86/hw_irq_32.h include/asm-x86/hw_irq_64.h include/asm-x86/mach-default/irq_vectors.h include/asm-x86/mach-voyager/irq_vectors.h include/asm-x86/smp.h kernel/Makefile Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-15Merge branch 'core/rcu' into core/rcu-for-linusIngo Molnar