summaryrefslogtreecommitdiffstats
path: root/kernel
AgeCommit message (Collapse)Author
2009-09-11Merge branch 'sched-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (64 commits) sched: Fix sched::sched_stat_wait tracepoint field sched: Disable NEW_FAIR_SLEEPERS for now sched: Keep kthreads at default priority sched: Re-tune the scheduler latency defaults to decrease worst-case latencies sched: Turn off child_runs_first sched: Ensure that a child can't gain time over it's parent after fork() sched: enable SD_WAKE_IDLE sched: Deal with low-load in wake_affine() sched: Remove short cut from select_task_rq_fair() sched: Turn on SD_BALANCE_NEWIDLE sched: Clean up topology.h sched: Fix dynamic power-balancing crash sched: Remove reciprocal for cpu_power sched: Try to deal with low capacity, fix update_sd_power_savings_stats() sched: Try to deal with low capacity sched: Scale down cpu_power due to RT tasks sched: Implement dynamic cpu_power sched: Add smt_gain sched: Update the cpu_power sum during load-balance sched: Add SD_PREFER_SIBLING ...
2009-09-11Merge branch 'perfcounters-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perfcounters-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (60 commits) perf tools: Avoid unnecessary work in directory lookups perf stat: Clean up statistics calculations a bit more perf stat: More advanced variance computation perf stat: Use stddev_mean in stead of stddev perf stat: Remove the limit on repeat perf stat: Change noise calculation to use stddev x86, perf_counter, bts: Do not allow kernel BTS tracing for now x86, perf_counter, bts: Correct pointer-to-u64 casts x86, perf_counter, bts: Fail if BTS is not available perf_counter: Fix output-sharing error path perf trace: Fix read_string() perf trace: Print out in nanoseconds perf tools: Seek to the end of the header area perf trace: Fix parsing of perf.data perf trace: Sample timestamps as well perf_counter: Introduce new (non-)paranoia level to allow raw tracepoint access perf trace: Sample the CPU too perf tools: Work around strict aliasing related warnings perf tools: Clean up warnings list in the Makefile perf tools: Complete support for dynamic strings ...
2009-09-11Merge branch 'irq-threaded-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'irq-threaded-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: genirq: Do not mask oneshot edge type interrupts genirq: Support nested threaded irq handling genirq: Add buslock support genirq: Add oneshot support
2009-09-11Merge branch 'irq-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: pci/intr_remapping: Allocate irq_iommu on node irq: Add irq_node() primitive irq: Make sure irq_desc for legacy irq get correct node setting genirq: Add prototype for handle_nested_irq() irq: Remove superfluous NULL pointer check in check_irq_resend() irq: Clean up by removing irqfixup MODULE_PARM_DESC() genirq: Fix comment describing suspend_device_irqs() genirq: Remove obsolete defines and typedefs
2009-09-11Merge branch 'core-rcu-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (28 commits) rcu: Move end of special early-boot RCU operation earlier rcu: Changes from reviews: avoid casts, fix/add warnings, improve comments rcu: Create rcutree plugins to handle hotplug CPU for multi-level trees rcu: Remove lockdep annotations from RCU's _notrace() API members rcu: Add #ifdef to suppress __rcu_offline_cpu() warning in !HOTPLUG_CPU builds rcu: Add CPU-offline processing for single-node configurations rcu: Add "notrace" to RCU function headers used by ftrace rcu: Remove CONFIG_PREEMPT_RCU rcu: Merge preemptable-RCU functionality into hierarchical RCU rcu: Simplify rcu_pending()/rcu_check_callbacks() API rcu: Use debugfs_remove_recursive() simplify code. rcu: Merge per-RCU-flavor initialization into pre-existing macro rcu: Fix online/offline indication for rcudata.csv trace file rcu: Consolidate sparse and lockdep declarations in include/linux/rcupdate.h rcu: Renamings to increase RCU clarity rcu: Move private definitions from include/linux/rcutree.h to kernel/rcutree.h rcu: Expunge lingering references to CONFIG_CLASSIC_RCU, optimize on !SMP rcu: Delay rcu_barrier() wait until beginning of next CPU-hotunplug operation. rcu: Fix typo in rcu_irq_exit() comment header rcu: Make rcupreempt_trace.c look at offline CPUs ...
2009-09-11Merge branch 'core-printk-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-printk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: printk: Fix "printk: Enable the use of more than one CON_BOOT (early console)" printk: Restore previous console_loglevel when re-enabling logging printk: Ensure that "console enabled" messages are printed on the console printk: Enable the use of more than one CON_BOOT (early console)
2009-09-11Merge branch 'core-locking-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (32 commits) locking, m68k/asm-offsets: Rename signal defines locking: Inline spinlock code for all locking variants on s390 locking: Simplify spinlock inlining locking: Allow arch-inlined spinlocks locking: Move spinlock function bodies to header file locking, m68k: Calculate thread_info offset with asm offset locking, m68k/asm-offsets: Rename pt_regs offset defines locking, sparc: Rename __spin_try_lock() and friends locking, powerpc: Rename __spin_try_lock() and friends lockdep: Remove recursion stattistics lockdep: Simplify lock_stat seqfile code lockdep: Simplify lockdep_chains seqfile code lockdep: Simplify lockdep seqfile code lockdep: Fix missing entries in /proc/lock_chains lockdep: Fix missing entry in /proc/lock_stat lockdep: Fix memory usage info of BFS lockdep: Reintroduce generation count to make BFS faster lockdep: Deal with many similar locks lockdep: Introduce lockdep_assert_held() lockdep: Fix style nits ...
2009-09-11Merge branch 'core-futexes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-futexes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: futex: Detect mismatched requeue targets futex: Correct futex_wait_requeue_pi() commentary
2009-09-11Merge branch 'core-debug-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: debug lockups: Improve lockup detection, fix generic arch fallback debug lockups: Improve lockup detection
2009-09-11Merge branch 'core-cleanups-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: workqueues: Improve schedule_work() documentation
2009-09-11Merge branch 'writeback' of git://git.kernel.dk/linux-2.6-blockLinus Torvalds
* 'writeback' of git://git.kernel.dk/linux-2.6-block: writeback: check for registered bdi in flusher add and inode dirty writeback: add name to backing_dev_info writeback: add some debug inode list counters to bdi stats writeback: get rid of pdflush completely writeback: switch to per-bdi threads for flushing data writeback: move dirty inodes from super_block to backing_dev_info writeback: get rid of generic_sync_sb_inodes() export
2009-09-11writeback: add name to backing_dev_infoJens Axboe
This enables us to track who does what and print info. Its main use is catching dirty inodes on the default_backing_dev_info, so we can fix that up. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-11Merge branch 'next' into for-linusJames Morris
2009-09-10sched: Fix sched::sched_stat_wait tracepoint fieldIngo Molnar
This weird perf trace output: cc1-9943 [001] 2802.059479616: sched_stat_wait: task: as:9944 wait: 2801938766276 [ns] Is caused by setting one component field of the delta to zero a bit too early. Move it to later. ( Note, this does not affect the NEW_FAIR_SLEEPERS interactivity bug, it's just a reporting bug in essence. ) Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Nikos Chantziaras <realnc@arcor.de> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Mike Galbraith <efault@gmx.de> LKML-Reference: <4AA93D34.8040500@arcor.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-10sched: Disable NEW_FAIR_SLEEPERS for nowIngo Molnar
Nikos Chantziaras and Jens Axboe reported that turning off NEW_FAIR_SLEEPERS improves desktop interactivity visibly. Nikos described his experiences the following way: " With this setting, I can do "nice -n 19 make -j20" and still have a very smooth desktop and watch a movie at the same time. Various other annoyances (like the "logout/shutdown/restart" dialog of KDE not appearing at all until the background fade-out effect has finished) are also gone. So this seems to be the single most important setting that vastly improves desktop behavior, at least here. " Jens described it the following way, referring to a 10-seconds xmodmap scheduling delay he was trying to debug: " Then I tried switching NO_NEW_FAIR_SLEEPERS on, and then I get: Performance counter stats for 'xmodmap .xmodmap-carl': 9.009137 task-clock-msecs # 0.447 CPUs 18 context-switches # 0.002 M/sec 1 CPU-migrations # 0.000 M/sec 315 page-faults # 0.035 M/sec 0.020167093 seconds time elapsed Woot! " So disable it for now. In perf trace output i can see weird delta timestamps: cc1-9943 [001] 2802.059479616: sched_stat_wait: task: as:9944 wait: 2801938766276 [ns] That nsec field is not supposed to be that large. More digging is needed - but lets turn it off while the real bug is found. Reported-by: Nikos Chantziaras <realnc@arcor.de> Tested-by: Nikos Chantziaras <realnc@arcor.de> Reported-by: Jens Axboe <jens.axboe@oracle.com> Tested-by: Jens Axboe <jens.axboe@oracle.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> LKML-Reference: <4AA93D34.8040500@arcor.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-09sched: Keep kthreads at default priorityMike Galbraith
Removes kthread/workqueue priority boost, they increase worst-case desktop latencies. Signed-off-by: Mike Galbraith <efault@gmx.de> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1252486344.28645.18.camel@marge.simson.net> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-09sched: Re-tune the scheduler latency defaults to decrease worst-case latenciesMike Galbraith
Reduce the latency target from 20 msecs to 5 msecs. Why? Larger latencies increase spread, which is good for scaling, but bad for worst case latency. We still have the ilog(nr_cpus) rule to scale up on bigger server boxes. Signed-off-by: Mike Galbraith <efault@gmx.de> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1252486344.28645.18.camel@marge.simson.net> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-09sched: Turn off child_runs_firstMike Galbraith
Set child_runs_first default to off. It hurts 'optimal' make -j<NR_CPUS> workloads as make jobs get preempted by child tasks, reducing parallelism. Note, this patch might make existing races in user applications more prominent than before - so breakages might be bisected to this commit. Child-runs-first is broken on SMP to begin with, and we already had it off briefly in v2.6.23 so most of the offenders ought to be fixed. Would be nice not to revert this commit but fix those apps finally ... Signed-off-by: Mike Galbraith <efault@gmx.de> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1252486344.28645.18.camel@marge.simson.net> [ made the sysctl independent of CONFIG_SCHED_DEBUG, in case people want to work around broken apps. ] Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-08sched: Ensure that a child can't gain time over it's parent after fork()Mike Galbraith
A fork/exec load is usually "pass the baton", so the child should never be placed behind the parent. With START_DEBIT we make room for the new task, but with child_runs_first, that room comes out of the _parent's_ hide. There's nothing to say that the parent wasn't ahead of min_vruntime at fork() time, which means that the "baton carrier", who is essentially the parent in drag, can gain time and increase scheduling latencies for waiters. With NEW_FAIR_SLEEPERS + START_DEBIT + child_runs_first enabled, we essentially pass the sleeper fairness off to the child, which is fine, but if we don't base placement on the parent's updated vruntime, we can end up compounding latency woes if the child itself then does fork/exec. The debit incurred at fork doesn't hurt the parent who is then going to sleep and maybe exit, but the child who acquires the error harms all comers. This improves latencies of make -j<n> kernel build workloads. Reported-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Mike Galbraith <efault@gmx.de> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-07sched: Deal with low-load in wake_affine()Peter Zijlstra
wake_affine() would always fail under low-load situations where both prev and this were idle, because adding a single task will always be a significant imbalance, even if there's nothing around that could balance it. Deal with this by allowing imbalance when there's nothing you can do about it. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-07sched: Remove short cut from select_task_rq_fair()Peter Zijlstra
select_task_rq_fair() incorrectly skips the wake_affine() logic, remove this. When prev_cpu == this_cpu, the code jumps straight to the wake_idle() logic, this doesn't give the wake_affine() logic the chance to pin the task to this cpu. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-07Security/SELinux: includecheck fix kernel/sysctl.cJaswinder Singh Rajput
fix the following 'make includecheck' warning: kernel/sysctl.c: linux/security.h is included more than once. Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com> Signed-off-by: James Morris <jmorris@namei.org>
2009-09-05Merge branch 'perfcounters-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: perf_counter/powerpc: Fix cache event codes for POWER7 perf_counter: Fix /0 bug in swcounters perf_counters: Increase paranoia level
2009-09-04sched: Fix dynamic power-balancing crashIngo Molnar
This crash: [ 1774.088275] divide error: 0000 [#1] SMP [ 1774.100355] CPU 13 [ 1774.102498] Modules linked in: [ 1774.105631] Pid: 30881, comm: hackbench Not tainted 2.6.31-rc8-tip-01308-g484d664-dirty #1629 X8DTN [ 1774.114807] RIP: 0010:[<ffffffff81041c38>] [<ffffffff81041c38>] sched_balance_self+0x19b/0x2d4 Triggers because update_group_power() modifies the sd tree and does temporary calculations there - not considering that other CPUs could observe intermediate values, such as the zero initial value. Calculate it in a temporary variable instead. (we need no memory barrier as these are all statistical values anyway) Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <20090904092742.GA11014@elte.hu> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-04sched: Remove reciprocal for cpu_powerPeter Zijlstra
Its a source of fail, also, now that cpu_power is dynamical, its a waste of time. before: <idle>-0 [000] 132.877936: find_busiest_group: avg_load: 0 group_load: 8241 power: 1 after: bash-1689 [001] 137.862151: find_busiest_group: avg_load: 10636288 group_load: 10387 power: 1 [ v2: build fix from From: Andreas Herrmann ] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Tested-by: Andreas Herrmann <andreas.herrmann3@amd.com> Acked-by: Andreas Herrmann <andreas.herrmann3@amd.com> Acked-by: Gautham R Shenoy <ego@in.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> LKML-Reference: <20090901083826.425896304@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-04sched: Try to deal with low capacity, fix update_sd_power_savings_stats()Gautham R Shenoy
sgs.group_capacity can now be 0, if for some reason group->__cpu_power happens to be less than SCHED_LOAD_SCALE/2. In that case, we need the following fix to make it work for update_sd_power_savings_stats(). That's because both sum_nr_running and group_capacity are unsigned longs. Cc: Gautham R Shenoy <ego@in.ibm.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Andreas Herrmann <andreas.herrmann3@amd.com> Cc: Balbir Singh <balbir@in.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-04sched: Try to deal with low capacityPeter Zijlstra
When the capacity drops low, we want to migrate load away. Allow the load-balancer to remove all tasks when we hit rock bottom. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Tested-by: Andreas Herrmann <andreas.herrmann3@amd.com> Acked-by: Andreas Herrmann <andreas.herrmann3@amd.com> Acked-by: Gautham R Shenoy <ego@in.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> LKML-Reference: <20090901083826.342231003@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-04sched: Scale down cpu_power due to RT tasksPeter Zijlstra
Keep an average on the amount of time spend on RT tasks and use that fraction to scale down the cpu_power for regular tasks. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Tested-by: Andreas Herrmann <andreas.herrmann3@amd.com> Acked-by: Andreas Herrmann <andreas.herrmann3@amd.com> Acked-by: Gautham R Shenoy <ego@in.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> LKML-Reference: <20090901083826.287778431@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-04sched: Implement dynamic cpu_powerPeter Zijlstra
Recompute the cpu_power for each cpu during load-balance. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Tested-by: Andreas Herrmann <andreas.herrmann3@amd.com> Acked-by: Andreas Herrmann <andreas.herrmann3@amd.com> Acked-by: Gautham R Shenoy <ego@in.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> LKML-Reference: <20090901083826.162033479@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-04sched: Add smt_gainPeter Zijlstra
The idea is that multi-threading a core yields more work capacity than a single thread, provide a way to express a static gain for threads. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Tested-by: Andreas Herrmann <andreas.herrmann3@amd.com> Acked-by: Andreas Herrmann <andreas.herrmann3@amd.com> Acked-by: Gautham R Shenoy <ego@in.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> LKML-Reference: <20090901083826.073345955@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-04sched: Update the cpu_power sum during load-balancePeter Zijlstra
In order to prepare for a more dynamic cpu_power, update the group sum while walking the sched domains during load-balance. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Tested-by: Andreas Herrmann <andreas.herrmann3@amd.com> Acked-by: Andreas Herrmann <andreas.herrmann3@amd.com> Acked-by: Gautham R Shenoy <ego@in.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> LKML-Reference: <20090901083825.985050292@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-04sched: Add SD_PREFER_SIBLINGPeter Zijlstra
Do the placement thing using SD flags. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Tested-by: Andreas Herrmann <andreas.herrmann3@amd.com> Acked-by: Andreas Herrmann <andreas.herrmann3@amd.com> Acked-by: Gautham R Shenoy <ego@in.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> LKML-Reference: <20090901083825.897028974@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-04sched: Restore __cpu_power to a straight sum of powerPeter Zijlstra
cpu_power is supposed to be a representation of the process capacity of the cpu, not a value to randomly tweak in order to affect placement. Remove the placement hacks. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Tested-by: Andreas Herrmann <andreas.herrmann3@amd.com> Acked-by: Andreas Herrmann <andreas.herrmann3@amd.com> Acked-by: Gautham R Shenoy <ego@in.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> LKML-Reference: <20090901083825.810860576@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-04Merge branches 'sched/domains' and 'sched/clock' into sched/coreIngo Molnar
Merge reason: both topics are ready now, and we want to merge dependent changes. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-04Merge branch 'linus' into core/rcuIngo Molnar
Merge reason: Avoid fuzz in init/main.c and update from rc6 to rc8. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-03perf_counter: Fix output-sharing error pathIngo Molnar
We forget to release the fd in the PERF_FLAG_FD_OUTPUT error path. Reorganize the error flow here to be a clean fall-through logic. Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-02perf_counter: Introduce new (non-)paranoia level to allow raw tracepoint accessIngo Molnar
I want to sample inherited tracepoint workloads as a normal user and the CAP_SYS_ADMIN check prevents me from doing that right now. Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-02Merge branch 'perfcounters/urgent' into perfcounters/coreIngo Molnar
Merge reason: We are going to modify a place modified by perfcounters/urgent. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-02KEYS: Add a keyctl to install a process's session keyring on its parent [try #6]David Howells
Add a keyctl to install a process's session keyring onto its parent. This replaces the parent's session keyring. Because the COW credential code does not permit one process to change another process's credentials directly, the change is deferred until userspace next starts executing again. Normally this will be after a wait*() syscall. To support this, three new security hooks have been provided: cred_alloc_blank() to allocate unset security creds, cred_transfer() to fill in the blank security creds and key_session_to_parent() - which asks the LSM if the process may replace its parent's session keyring. The replacement may only happen if the process has the same ownership details as its parent, and the process has LINK permission on the session keyring, and the session keyring is owned by the process, and the LSM permits it. Note that this requires alteration to each architecture's notify_resume path. This has been done for all arches barring blackfin, m68k* and xtensa, all of which need assembly alteration to support TIF_NOTIFY_RESUME. This allows the replacement to be performed at the point the parent process resumes userspace execution. This allows the userspace AFS pioctl emulation to fully emulate newpag() and the VIOCSETTOK and VIOCSETTOK2 pioctls, all of which require the ability to alter the parent process's PAG membership. However, since kAFS doesn't use PAGs per se, but rather dumps the keys into the session keyring, the session keyring of the parent must be replaced if, for example, VIOCSETTOK is passed the newpag flag. This can be tested with the following program: #include <stdio.h> #include <stdlib.h> #include <keyutils.h> #define KEYCTL_SESSION_TO_PARENT 18 #define OSERROR(X, S) do { if ((long)(X) == -1) { perror(S); exit(1); } } while(0) int main(int argc, char **argv) { key_serial_t keyring, key; long ret; keyring = keyctl_join_session_keyring(argv[1]); OSERROR(keyring, "keyctl_join_session_keyring"); key = add_key("user", "a", "b", 1, keyring); OSERROR(key, "add_key"); ret = keyctl(KEYCTL_SESSION_TO_PARENT); OSERROR(ret, "KEYCTL_SESSION_TO_PARENT"); return 0; } Compiled and linked with -lkeyutils, you should see something like: [dhowells@andromeda ~]$ keyctl show Session Keyring -3 --alswrv 4043 4043 keyring: _ses 355907932 --alswrv 4043 -1 \_ keyring: _uid.4043 [dhowells@andromeda ~]$ /tmp/newpag [dhowells@andromeda ~]$ keyctl show Session Keyring -3 --alswrv 4043 4043 keyring: _ses 1055658746 --alswrv 4043 4043 \_ user: a [dhowells@andromeda ~]$ /tmp/newpag hello [dhowells@andromeda ~]$ keyctl show Session Keyring -3 --alswrv 4043 4043 keyring: hello 340417692 --alswrv 4043 4043 \_ user: a Where the test program creates a new session keyring, sticks a user key named 'a' into it and then installs it on its parent. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>
2009-09-02CRED: Add some configurable debugging [try #6]David Howells
Add a config option (CONFIG_DEBUG_CREDENTIALS) to turn on some debug checking for credential management. The additional code keeps track of the number of pointers from task_structs to any given cred struct, and checks to see that this number never exceeds the usage count of the cred struct (which includes all references, not just those from task_structs). Furthermore, if SELinux is enabled, the code also checks that the security pointer in the cred struct is never seen to be invalid. This attempts to catch the bug whereby inode_has_perm() faults in an nfsd kernel thread on seeing cred->security be a NULL pointer (it appears that the credential struct has been previously released): http://www.kerneloops.org/oops.php?number=252883 Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>
2009-09-02sched: Add wait, sleep and iowait accounting tracepointsPeter Zijlstra
Add 3 schedstat tracepoints to help account for wait-time, sleep-time and iowait-time. They can also be used as a perf-counter source to profile tasks on these clocks. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Arjan van de Ven <arjan@linux.intel.com> LKML-Reference: <new-submission> [ build fix for the !CONFIG_SCHEDSTATS case ] Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-02sched: Provide iowait countersArjan van de Ven
For counting how long an application has been waiting for (disk) IO, there currently is only the HZ sample driven information available, while for all other counters in this class, a high resolution version is available via CONFIG_SCHEDSTATS. In order to make an improved bootchart tool possible, we also need a higher resolution version of the iowait time. This patch below adds this scheduler statistic to the kernel. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <4A64B813.1080506@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-02Merge commit 'v2.6.31-rc8' into sched/coreIngo Molnar
Merge reason: bump from rc5 to rc8, but also pick up TP_perf_assign() API, a patch will be queued that depends on it. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-31locking: Allow arch-inlined spinlocksHeiko Carstens
This allows an architecture to specify per lock variant if the locking code should be kept out-of-line or inlined. If an architecure wants out-of-line locking code no change is needed. To force inlining of e.g. spin_lock() the line: #define __always_inline__spin_lock needs to be added to arch/<...>/include/asm/spinlock.h If CONFIG_DEBUG_SPINLOCK or CONFIG_GENERIC_LOCKBREAK are defined the per architecture defines are (partly) ignored and still out-of-line spinlock code will be generated. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Horst Hartmann <horsth@linux.vnet.ibm.com> Cc: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: David Miller <davem@davemloft.net> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Roman Zippel <zippel@linux-m68k.org> Cc: <linux-arch@vger.kernel.org> LKML-Reference: <20090831124418.375299024@de.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-31locking: Move spinlock function bodies to header fileHeiko Carstens
Move spinlock function bodies to header file by creating a static inline version of each variant. Use the inline version on the out-of-line code. This shouldn't make any difference besides that the spinlock code can now be used to generate inlined spinlock code. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Horst Hartmann <horsth@linux.vnet.ibm.com> Cc: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: David Miller <davem@davemloft.net> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Roman Zippel <zippel@linux-m68k.org> Cc: <linux-arch@vger.kernel.org> LKML-Reference: <20090831124417.859022429@de.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-31Merge commit 'v2.6.31-rc8' into core/lockingIngo Molnar
Merge reason: we were on -rc4, move to -rc8 before applying a new batch of locking infrastructure changes. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-29irq: Make sure irq_desc for legacy irq get correct node settingYinghai Lu
when there is no ram on node 0. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jesse Barnes <jbarnes@virtuousgeek.org> LKML-Reference: <4A95C32D.5040605@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-29sched: Rename init_cfs_rq => init_tg_cfs_rqAnirban Sinha
... so that it does not share a common name with a function within the same scope. Signed-off-by: Anirban Sinha <asinha@zeugmasystems.com> LKML-Reference: <DDFD17CC94A9BD49A82147DDF7D545C501EA98A6@exchange.ZeugmaSystems.local> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-29rcu: Changes from reviews: avoid casts, fix/add warnings, improve commentsPaul E. McKenney
Changes suggested by review comments from Josh Triplett and Mathieu Desnoyers. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Josh Triplett <josh@joshtriplett.org> Acked-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: laijs@cn.fujitsu.com Cc: dipankar@in.ibm.com Cc: akpm@linux-foundation.org Cc: dvhltc@us.ibm.com Cc: niv@us.ibm.com Cc: peterz@infradead.org Cc: rostedt@goodmis.org LKML-Reference: <20090827220012.GA30525@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-29rcu: Create rcutree plugins to handle hotplug CPU for multi-level treesPaul E. McKenney
When offlining CPUs from a multi-level tree, there is the possibility of offlining the last CPU from a given node when there are preempted RCU read-side critical sections that started life on one of the CPUs on that node. In this case, the corresponding tasks will be enqueued via the task_struct's rcu_node_entry list_head onto one of the rcu_node's blocked_tasks[] lists. These tasks need to be moved somewhere else so that they will prevent the current grace period from ending. That somewhere is the root rcu_node. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: laijs@cn.fujitsu.com Cc: dipankar@in.ibm.com Cc: akpm@linux-foundation.org Cc: mathieu.desnoyers@polymtl.ca Cc: josht@linux.vnet.ibm.com Cc: dvhltc@us.ibm.com Cc: niv@us.ibm.com Cc: peterz@infradead.org Cc: rostedt@goodmis.org LKML-Reference: <20090827215816.GA30472@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>