summaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)Author
2011-05-29dm kcopyd: alloc pages from the main page allocatorMikulas Patocka
This patch changes dm-kcopyd so that it allocates pages from the main page allocator with __GFP_NOWARN | __GFP_NORETRY flags (so that it can fail in case of memory pressure). If the allocation fails, dm-kcopyd allocates pages from its own reserve. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2011-05-29dm kcopyd: add gfp parm to alloc_plMikulas Patocka
Introduce a parameter for gfp flags to alloc_pl() for use in following patches. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2011-05-29dm kcopyd: remove superfluous page allocation spinlockMikulas Patocka
Remove the spinlock protecting the pages allocation. The spinlock is only taken on initialization or from single-threaded workqueue. Therefore, the spinlock is useless. The spinlock is taken in kcopyd_get_pages and kcopyd_put_pages. kcopyd_get_pages is only called from run_pages_job, which is only called from process_jobs called from do_work. kcopyd_put_pages is called from client_alloc_pages (which is initialization function) or from run_complete_job. run_complete_job is only called from process_jobs called from do_work. Another spinlock, kc->job_lock is taken each time someone pushes or pops some work for the worker thread. Once we take kc->job_lock, we guarantee that any written memory is visible to the other CPUs. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2011-05-29dm kcopyd: preallocate sub jobs to avoid deadlockMikulas Patocka
There's a possible theoretical deadlock in dm-kcopyd because multiple allocations from the same mempool are required to finish a request. Avoid this by preallocating sub jobs. There is a mempool of 512 entries. Each request requires up to 9 entries from the mempool. If we have at least 57 concurrent requests running, the mempool may overflow and mempool allocations may start blocking until another entry is freed to the mempool. Because the same thread is used to free entries to the mempool and allocate entries from the mempool, this may result in a deadlock. This patch changes it so that one mempool entry contains all 9 "struct kcopyd_job" required to fulfill the whole request. The allocation is done only once in dm_kcopyd_copy and no further mempool allocations are done during request processing. If dm_kcopyd_copy is not run in the completion thread, this implementation is deadlock-free. MIN_JOBS needs reducing accordingly and we've chosen to reduce it further to 8. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2011-05-29dm kcopyd: avoid pointless job splittingMikulas Patocka
Don't split SUB_JOB_SIZE jobs If the job size equals SUB_JOB_SIZE, there is no point in splitting it. Splitting it just unnecessarily wastes time, because the split job size is SUB_JOB_SIZE too. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2011-05-29dm mpath: do not fail paths after integrity errorsMartin K. Petersen
Integrity errors need to be passed to the owner of the integrity metadata for processing. Consequently EILSEQ should be passed up the stack. Cc: stable@kernel.org Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Acked-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2011-05-29dm table: reject devices without request fnsMilan Broz
This patch adds a check that a block device has a request function defined before it is used. Otherwise, misconfiguration can cause an oops. Because we are allowing devices with zero size e.g. an offline multipath device as in commit 2cd54d9bedb79a97f014e86c0da393416b264eb3 ("dm: allow offline devices") there needs to be an additional check to ensure devices are initialised. Some block devices, like a loop device without a backing file, exist but have no request function. Reproducer is trivial: dm-mirror on unbound loop device (no backing file on loop devices) dmsetup create x --table "0 8 mirror core 2 8 sync 2 /dev/loop0 0 /dev/loop1 0" and mirror resync will immediatelly cause OOps. BUG: unable to handle kernel NULL pointer dereference at (null) ? generic_make_request+0x2bd/0x590 ? kmem_cache_alloc+0xad/0x190 submit_bio+0x53/0xe0 ? bio_add_page+0x3b/0x50 dispatch_io+0x1ca/0x210 [dm_mod] ? read_callback+0x0/0xd0 [dm_mirror] dm_io+0xbb/0x290 [dm_mod] do_mirror+0x1e0/0x748 [dm_mirror] Signed-off-by: Milan Broz <mbroz@redhat.com> Reported-by: Zdenek Kabelac <zkabelac@redhat.com> Acked-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@kernel.org Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2011-05-29dm table: allow targets to support discards internallyMike Snitzer
Permit a target to support discards regardless of whether or not all its underlying devices do. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2011-05-28Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vapier/blackfin * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/vapier/blackfin: Blackfin: debug-mmrs: include RSI_PID[4567] MMRs Blackfin: bf51x: fix up RSI_PID# MMR defines Blackfin: bf52x/bf54x: fix up usb MMR defines Blackfin: debug-mmrs: fix typos with gptimers/mdma/ppi Blackfin: gptimers: add structure for hardware register layout Blackfin: wire up new sendmmsg syscall Blackfin: mach/bfin_serial_5xx.h: punt now-unused header Blackfin: bfin_serial.h: turn default port wrappers into stubs
2011-05-28scsi: fix scsi_proc new kernel-doc warningRandy Dunlap
Fix kernel-doc warnings in scsi_proc.c: Warning(drivers/scsi/scsi_proc.c:390): No description found for parameter 'dev' Warning(drivers/scsi/scsi_proc.c:390): No description found for parameter 'data' Warning(drivers/scsi/scsi_proc.c:390): Excess function parameter 's' description in 'always_match' Warning(drivers/scsi/scsi_proc.c:390): Excess function parameter 'p' description in 'always_match' Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-28mm: fix page_lock_anon_vma leaving mutex lockedHugh Dickins
On one machine I've been getting hangs, a page fault's anon_vma_prepare() waiting in anon_vma_lock(), other processes waiting for that page's lock. This is a replay of last year's f18194275c39 "mm: fix hang on anon_vma->root->lock". The new page_lock_anon_vma() places too much faith in its refcount: when it has acquired the mutex_trylock(), it's possible that a racing task in anon_vma_alloc() has just reallocated the struct anon_vma, set refcount to 1, and is about to reset its anon_vma->root. Fix this by saving anon_vma->root, and relying on the usual page_mapped() check instead of a refcount check: if page is still mapped, the anon_vma is still ours; if page is not still mapped, we're no longer interested. Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-28mm: fix kernel BUG at mm/rmap.c:1017!Hugh Dickins
I've hit the "address >= vma->vm_end" check in do_page_add_anon_rmap() just once. The stack showed khugepaged allocation trying to compact pages: the call to page_add_anon_rmap() coming from remove_migration_pte(). That path holds anon_vma lock, but does not hold mmap_sem: it can therefore race with a split_vma(), and in commit 5f70b962ccc2 "mmap: avoid unnecessary anon_vma lock" we just took away the anon_vma lock protection when adjusting vma->vm_end. I don't think that particular BUG_ON ever caught anything interesting, so better replace it by a comment, than reinstate the anon_vma locking. Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-28tmpfs: fix race between truncate and writepageHugh Dickins
While running fsx on tmpfs with a memhog then swapoff, swapoff was hanging (interruptibly), repeatedly failing to locate the owner of a 0xff entry in the swap_map. Although shmem_writepage() does abandon when it sees incoming page index is beyond eof, there was still a window in which shmem_truncate_range() could come in between writepage's dropping lock and updating swap_map, find the half-completed swap_map entry, and in trying to free it, leave it in a state that swap_shmem_alloc() could not correct. Arguably a bug in __swap_duplicate()'s and swap_entry_free()'s handling of the different cases, but easiest to fix by moving swap_shmem_alloc() under cover of the lock. More interesting than the bug: it's been there since 2.6.33, why could I not see it with earlier kernels? The mmotm of two weeks ago seems to have some magic for generating races, this is just one of three I found. With yesterday's git I first saw this in mainline, bisected in search of that magic, but the easy reproducibility evaporated. Oh well, fix the bug. Signed-off-by: Hugh Dickins <hughd@google.com> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-28Blackfin: debug-mmrs: include RSI_PID[4567] MMRsMike Frysinger
The documentation is a little iffy as to whether these are actual MMRs, but reading them on the hardware works, and the previous version of this logic (the SDH) had PID[4567]. So add it for RSI too. Signed-off-by: Mike Frysinger <vapier@gentoo.org>
2011-05-28Blackfin: bf51x: fix up RSI_PID# MMR definesMike Frysinger
Looks like the copying of MMR defines from the SDH block missed updating the addresses of the RSI_PID# registers. So tweak them to reflect the actual hardware. Signed-off-by: Mike Frysinger <vapier@gentoo.org>
2011-05-28Blackfin: bf52x/bf54x: fix up usb MMR definesMike Frysinger
The bf52x/bf54x have the incorrect addresses for USB_EP_NI7_RXINTERVAL and USB_EP_NI7_TXCOUNT, so adjust those. Further, the bf54x header puts the USB defines in the wrong place, so shuffle them back to the right grouping. Signed-off-by: Mike Frysinger <vapier@gentoo.org>
2011-05-28Blackfin: debug-mmrs: fix typos with gptimers/mdma/ppiMike Frysinger
This code was mostly developed against a BF54x, so some BF537-specific issues were missed. The PPI block starts at PPI_CONTROL, not PPI_STATUS (which is the reverse of the EPPI block). The MDMA block starts at MDMA_NEXT_DESC_PTR, not MDMA_CONFIG. Seems the sim does not catch misreads here so that'll need to get fixed. The gptimer block is mostly 32bit regs, not 16bit. Use the gptimer struct to figure that out rather than hardcoding it locally. Signed-off-by: Mike Frysinger <vapier@gentoo.org>
2011-05-28Blackfin: gptimers: add structure for hardware register layoutMike Frysinger
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
2011-05-28Blackfin: wire up new sendmmsg syscallMike Frysinger
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
2011-05-28Blackfin: mach/bfin_serial_5xx.h: punt now-unused headerMike Frysinger
Now that the serial code has been unified in bfin_serial.h, and the Blackfin UART driver pushed its resources to the boards files, we don't need these headers anymore. Signed-off-by: Mike Frysinger <vapier@gentoo.org>
2011-05-28Blackfin: bfin_serial.h: turn default port wrappers into stubsMike Frysinger
Any consumer that needs to access the MMRs has to provide these helpers, so make the default into useless stubs. Signed-off-by: Mike Frysinger <vapier@gentoo.org>
2011-05-28Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (36 commits) Cache xattr security drop check for write v2 fs: block_page_mkwrite should wait for writeback to finish mm: Wait for writeback when grabbing pages to begin a write configfs: remove unnecessary dentry_unhash on rmdir, dir rename fat: remove unnecessary dentry_unhash on rmdir, dir rename hpfs: remove unnecessary dentry_unhash on rmdir, dir rename minix: remove unnecessary dentry_unhash on rmdir, dir rename fuse: remove unnecessary dentry_unhash on rmdir, dir rename coda: remove unnecessary dentry_unhash on rmdir, dir rename afs: remove unnecessary dentry_unhash on rmdir, dir rename affs: remove unnecessary dentry_unhash on rmdir, dir rename 9p: remove unnecessary dentry_unhash on rmdir, dir rename ncpfs: fix rename over directory with dangling references ncpfs: document dentry_unhash usage ecryptfs: remove unnecessary dentry_unhash on rmdir, dir rename hostfs: remove unnecessary dentry_unhash on rmdir, dir rename hfsplus: remove unnecessary dentry_unhash on rmdir, dir rename hfs: remove unnecessary dentry_unhash on rmdir, dir rename omfs: remove unnecessary dentry_unhash on rmdir, dir rneame udf: remove unnecessary dentry_unhash from rmdir, dir rename ...
2011-05-28Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, asm: Clean up desc.h a bit x86, amd: Do not enable ARAT feature on AMD processors below family 0x12 x86: Move do_page_fault()'s error path under unlikely() x86, efi: Retain boot service code until after switching to virtual mode x86: Remove unnecessary check in detect_ht() x86: Reorder mm_context_t to remove x86_64 alignment padding and thus shrink mm_struct x86, UV: Clean up uv_tlb.c x86, UV: Add support for SGI UV2 hub chip x86, cpufeature: Update CPU feature RDRND to RDRAND
2011-05-28Merge branch 'sched-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: cpuset: Fix cpuset_cpus_allowed_fallback(), don't update tsk->rt.nr_cpus_allowed sched: Fix ->min_vruntime calculation in dequeue_entity() sched: Fix ttwu() for __ARCH_WANT_INTERRUPTS_ON_CTXSW sched: More sched_domain iterations fixes
2011-05-28Merge branch 'core-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: rcu: Start RCU kthreads in TASK_INTERRUPTIBLE state rcu: Remove waitqueue usage for cpu, node, and boost kthreads rcu: Avoid acquiring rcu_node locks in timer functions atomic: Add atomic_or() Documentation: Add statistics about nested locks rcu: Decrease memory-barrier usage based on semi-formal proof rcu: Make rcu_enter_nohz() pay attention to nesting rcu: Don't do reschedule unless in irq rcu: Remove old memory barriers from rcu_process_callbacks() rcu: Add memory barriers rcu: Fix unpaired rcu_irq_enter() from locking selftests
2011-05-28Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (25 commits) perf: Fix SIGIO handling perf top: Don't stop if no kernel symtab is found perf top: Handle kptr_restrict perf top: Remove unused macro perf events: initialize fd array to -1 instead of 0 perf tools: Make sure kptr_restrict warnings fit 80 col terms perf tools: Fix build on older systems perf symbols: Handle /proc/sys/kernel/kptr_restrict perf: Remove duplicate headers ftrace: Add internal recursive checks tracing: Update btrfs's tracepoints to use u64 interface tracing: Add __print_symbolic_u64 to avoid warnings on 32bit machine ftrace: Set ops->flag to enabled even on static function tracing tracing: Have event with function tracer check error return ftrace: Have ftrace_startup() return failure code jump_label: Check entries limit in __jump_label_update ftrace/recordmcount: Avoid STT_FUNC symbols as base on ARM scripts/tags.sh: Add magic for trace-events for etags too scripts/tags.sh: Fix ctags for DEFINE_EVENT() x86/ftrace: Fix compiler warning in ftrace.c ...
2011-05-28Merge branch 'for-usb-next' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/sarah/xhci * 'for-usb-next' of git://git.kernel.org/pub/scm/linux/kernel/git/sarah/xhci: Intel xhci: Limit number of active endpoints to 64. Intel xhci: Ignore spurious successful event. Intel xhci: Support EHCI/xHCI port switching. Intel xhci: Add PCI id for Panther Point xHCI host. xhci: STFU: Be quieter during URB submission and completion. xhci: STFU: Don't print event ring dequeue pointer. xhci: STFU: Remove function tracing. xhci: Don't submit commands when the host is dead. xhci: Clear stopped_td when Stop Endpoint command completes.
2011-05-28Merge branch 'next' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx: (33 commits) x86: poll waiting for I/OAT DMA channel status maintainers: add dma engine tree details dmaengine: add TODO items for future work on dma drivers dmaengine: Add API documentation for slave dma usage dmaengine/dw_dmac: Update maintainer-ship dmaengine: move link order dmaengine/dw_dmac: implement pause and resume in dwc_control dmaengine/dw_dmac: Replace spin_lock* with irqsave variants and enable submission from callback dmaengine/dw_dmac: Divide one sg to many desc, if sg len is greater than DWC_MAX_COUNT dmaengine/dw_dmac: set residue as total len in dwc_tx_status if status is !DMA_SUCCESS dmaengine/dw_dmac: don't call callback routine in case dmaengine_terminate_all() is called dmaengine: at_hdmac: pause: no need to wait for FIFO empty pch_dma: modify pci device table definition pch_dma: Support new device ML7223 IOH pch_dma: Support I2S for ML7213 IOH pch_dma: Fix DMA setting issue pch_dma: modify for checkpatch pch_dma: fix dma direction issue for ML7213 IOH video-in dmaengine: at_hdmac: use descriptor chaining help function dmaengine: at_hdmac: implement pause and resume in atc_control ... Fix up trivial conflict in drivers/dma/dw_dmac.c
2011-05-28Merge branch 'for-next' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-2.6 * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-2.6: mfd: Fix build breakage caused by tps65910 gpio directory move mfd: Use mfd cell platform_data for db8500-prcmu cells platform bits
2011-05-28Merge branch 'spi/next' of git://git.secretlab.ca/git/linux-2.6Linus Torvalds
* 'spi/next' of git://git.secretlab.ca/git/linux-2.6: spi/spi_bfin_sport: new driver for a SPI bus via the Blackfin SPORT peripheral spi/tle620x: add missing device_remove_file()
2011-05-28Merge branch 'gpio/next' of git://git.secretlab.ca/git/linux-2.6Linus Torvalds
* 'gpio/next' of git://git.secretlab.ca/git/linux-2.6: gpio/pch_gpio: Support new device ML7223 gpio: make gpio_{request,free}_array gpio array parameter const GPIO: OMAP: move to drivers/gpio GPIO: OMAP: move register offset defines into <plat/gpio.h> gpio: Convert gpio_is_valid to return bool gpio: Move the s5pc100 GPIO to drivers/gpio gpio: Move the s5pv210 GPIO to drivers/gpio gpio: Move the exynos4 GPIO to drivers/gpio gpio: Move to Samsung common GPIO library to drivers/gpio gpio/nomadik: add function to read GPIO pull down status gpio/nomadik: show all pins in debug gpio: move Nomadik GPIO driver to drivers/gpio gpio: move U300 GPIO driver to drivers/gpio langwell_gpio: add runtime pm support gpio/pca953x: Add support for pca9574 and pca9575 devices gpio/cs5535: Show explicit dependency between gpio_cs5535 and mfd_cs5535
2011-05-28Merge branch 'setns'Linus Torvalds
* setns: ns: Wire up the setns system call Done as a merge to make it easier to fix up conflicts in arm due to addition of sendmmsg system call
2011-05-28ns: Wire up the setns system callEric W. Biederman
32bit and 64bit on x86 are tested and working. The rest I have looked at closely and I can't find any problems. setns is an easy system call to wire up. It just takes two ints so I don't expect any weird architecture porting problems. While doing this I have noticed that we have some architectures that are very slow to get new system calls. cris seems to be the slowest where the last system calls wired up were preadv and pwritev. avr32 is weird in that recvmmsg was wired up but never declared in unistd.h. frv is behind with perf_event_open being the last syscall wired up. On h8300 the last system call wired up was epoll_wait. On m32r the last system call wired up was fallocate. mn10300 has recvmmsg as the last system call wired up. The rest seem to at least have syncfs wired up which was new in the 2.6.39. v2: Most of the architecture support added by Daniel Lezcano <dlezcano@fr.ibm.com> v3: ported to v2.6.36-rc4 by: Eric W. Biederman <ebiederm@xmission.com> v4: Moved wiring up of the system call to another patch v5: ported to v2.6.39-rc6 v6: rebased onto parisc-next and net-next to avoid syscall conflicts. v7: ported to Linus's latest post 2.6.39 tree. >  arch/blackfin/include/asm/unistd.h     |    3 ++- >  arch/blackfin/mach-common/entry.S      |    1 + Acked-by: Mike Frysinger <vapier@gentoo.org> Oh - ia64 wiring looks good. Acked-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-28Cache xattr security drop check for write v2Andi Kleen
Some recent benchmarking on btrfs showed that a major scaling bottleneck on large systems on btrfs is currently the xattr lookup on every write. Why xattr lookup on every write I hear you ask? write wants to drop suid and security related xattrs that could set o capabilities for executables. To do that it currently looks up security.capability on EVERY write (even for non executables) to decide whether to drop it or not. In btrfs this causes an additional tree walk, hitting some per file system locks and quite bad scalability. In a simple read workload on a 8S system I saw over 90% CPU time in spinlocks related to that. Chris Mason tells me this is also a problem in ext4, where it hits the global mbcache lock. This patch adds a simple per inode to avoid this problem. We only do the lookup once per file and then if there is no xattr cache the decision. All xattr changes clear the flag. I also used the same flag to avoid the suid check, although that one is pretty cheap. A file system can also set this flag when it creates the inode, if it has a cheap way to do so. This is done for some common file systems in followon patches. With this patch a major part of the lock contention disappears for btrfs. Some testing on smaller systems didn't show significant performance changes, but at least it helps the larger systems and is generally more efficient. v2: Rename is_sgid. add file system helper. Cc: chris.mason@oracle.com Cc: josef@redhat.com Cc: viro@zeniv.linux.org.uk Cc: agruen@linbit.com Cc: Serge E. Hallyn <serue@us.ibm.com> Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-05-28rcu: Start RCU kthreads in TASK_INTERRUPTIBLE statePaul E. McKenney
Upon creation, kthreads are in TASK_UNINTERRUPTIBLE state, which can result in softlockup warnings. Because some of RCU's kthreads can legitimately be idle indefinitely, start them in TASK_INTERRUPTIBLE state in order to avoid those warnings. Suggested-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-28rcu: Remove waitqueue usage for cpu, node, and boost kthreadsPeter Zijlstra
It is not necessary to use waitqueues for the RCU kthreads because we always know exactly which thread is to be awakened. In addition, wake_up() only issues an actual wakeup when there is a thread waiting on the queue, which was why there was an extra explicit wake_up_process() to get the RCU kthreads started. Eliminating the waitqueues (and wake_up()) in favor of wake_up_process() eliminates the need for the initial wake_up_process() and also shrinks the data structure size a bit. The wakeup logic is placed in a new rcu_wait() macro. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-28rcu: Avoid acquiring rcu_node locks in timer functionsPaul E. McKenney
This commit switches manipulations of the rcu_node ->wakemask field to atomic operations, which allows rcu_cpu_kthread_timer() to avoid acquiring the rcu_node lock. This should avoid the following lockdep splat reported by Valdis Kletnieks: [ 12.872150] usb 1-4: new high speed USB device number 3 using ehci_hcd [ 12.986667] usb 1-4: New USB device found, idVendor=413c, idProduct=2513 [ 12.986679] usb 1-4: New USB device strings: Mfr=0, Product=0, SerialNumber=0 [ 12.987691] hub 1-4:1.0: USB hub found [ 12.987877] hub 1-4:1.0: 3 ports detected [ 12.996372] input: PS/2 Generic Mouse as /devices/platform/i8042/serio1/input/input10 [ 13.071471] udevadm used greatest stack depth: 3984 bytes left [ 13.172129] [ 13.172130] ======================================================= [ 13.172425] [ INFO: possible circular locking dependency detected ] [ 13.172650] 2.6.39-rc6-mmotm0506 #1 [ 13.172773] ------------------------------------------------------- [ 13.172997] blkid/267 is trying to acquire lock: [ 13.173009] (&p->pi_lock){-.-.-.}, at: [<ffffffff81032d8f>] try_to_wake_up+0x29/0x1aa [ 13.173009] [ 13.173009] but task is already holding lock: [ 13.173009] (rcu_node_level_0){..-...}, at: [<ffffffff810901cc>] rcu_cpu_kthread_timer+0x27/0x58 [ 13.173009] [ 13.173009] which lock already depends on the new lock. [ 13.173009] [ 13.173009] [ 13.173009] the existing dependency chain (in reverse order) is: [ 13.173009] [ 13.173009] -> #2 (rcu_node_level_0){..-...}: [ 13.173009] [<ffffffff810679b9>] check_prevs_add+0x8b/0x104 [ 13.173009] [<ffffffff81067da1>] validate_chain+0x36f/0x3ab [ 13.173009] [<ffffffff8106846b>] __lock_acquire+0x369/0x3e2 [ 13.173009] [<ffffffff81068a0f>] lock_acquire+0xfc/0x14c [ 13.173009] [<ffffffff815697f1>] _raw_spin_lock+0x36/0x45 [ 13.173009] [<ffffffff81090794>] rcu_read_unlock_special+0x8c/0x1d5 [ 13.173009] [<ffffffff8109092c>] __rcu_read_unlock+0x4f/0xd7 [ 13.173009] [<ffffffff81027bd3>] rcu_read_unlock+0x21/0x23 [ 13.173009] [<ffffffff8102cc34>] cpuacct_charge+0x6c/0x75 [ 13.173009] [<ffffffff81030cc6>] update_curr+0x101/0x12e [ 13.173009] [<ffffffff810311d0>] check_preempt_wakeup+0xf7/0x23b [ 13.173009] [<ffffffff8102acb3>] check_preempt_curr+0x2b/0x68 [ 13.173009] [<ffffffff81031d40>] ttwu_do_wakeup+0x76/0x128 [ 13.173009] [<ffffffff81031e49>] ttwu_do_activate.constprop.63+0x57/0x5c [ 13.173009] [<ffffffff81031e96>] scheduler_ipi+0x48/0x5d [ 13.173009] [<ffffffff810177d5>] smp_reschedule_interrupt+0x16/0x18 [ 13.173009] [<ffffffff815710f3>] reschedule_interrupt+0x13/0x20 [ 13.173009] [<ffffffff810b66d1>] rcu_read_unlock+0x21/0x23 [ 13.173009] [<ffffffff810b739c>] find_get_page+0xa9/0xb9 [ 13.173009] [<ffffffff810b8b48>] filemap_fault+0x6a/0x34d [ 13.173009] [<ffffffff810d1a25>] __do_fault+0x54/0x3e6 [ 13.173009] [<ffffffff810d447a>] handle_pte_fault+0x12c/0x1ed [ 13.173009] [<ffffffff810d48f7>] handle_mm_fault+0x1cd/0x1e0 [ 13.173009] [<ffffffff8156cfee>] do_page_fault+0x42d/0x5de [ 13.173009] [<ffffffff8156a75f>] page_fault+0x1f/0x30 [ 13.173009] [ 13.173009] -> #1 (&rq->lock){-.-.-.}: [ 13.173009] [<ffffffff810679b9>] check_prevs_add+0x8b/0x104 [ 13.173009] [<ffffffff81067da1>] validate_chain+0x36f/0x3ab [ 13.173009] [<ffffffff8106846b>] __lock_acquire+0x369/0x3e2 [ 13.173009] [<ffffffff81068a0f>] lock_acquire+0xfc/0x14c [ 13.173009] [<ffffffff815697f1>] _raw_spin_lock+0x36/0x45 [ 13.173009] [<ffffffff81027e19>] __task_rq_lock+0x8b/0xd3 [ 13.173009] [<ffffffff81032f7f>] wake_up_new_task+0x41/0x108 [ 13.173009] [<ffffffff810376c3>] do_fork+0x265/0x33f [ 13.173009] [<ffffffff81007d02>] kernel_thread+0x6b/0x6d [ 13.173009] [<ffffffff8153a9dd>] rest_init+0x21/0xd2 [ 13.173009] [<ffffffff81b1db4f>] start_kernel+0x3bb/0x3c6 [ 13.173009] [<ffffffff81b1d29f>] x86_64_start_reservations+0xaf/0xb3 [ 13.173009] [<ffffffff81b1d393>] x86_64_start_kernel+0xf0/0xf7 [ 13.173009] [ 13.173009] -> #0 (&p->pi_lock){-.-.-.}: [ 13.173009] [<ffffffff81067788>] check_prev_add+0x68/0x20e [ 13.173009] [<ffffffff810679b9>] check_prevs_add+0x8b/0x104 [ 13.173009] [<ffffffff81067da1>] validate_chain+0x36f/0x3ab [ 13.173009] [<ffffffff8106846b>] __lock_acquire+0x369/0x3e2 [ 13.173009] [<ffffffff81068a0f>] lock_acquire+0xfc/0x14c [ 13.173009] [<ffffffff815698ea>] _raw_spin_lock_irqsave+0x44/0x57 [ 13.173009] [<ffffffff81032d8f>] try_to_wake_up+0x29/0x1aa [ 13.173009] [<ffffffff81032f3c>] wake_up_process+0x10/0x12 [ 13.173009] [<ffffffff810901e9>] rcu_cpu_kthread_timer+0x44/0x58 [ 13.173009] [<ffffffff81045286>] call_timer_fn+0xac/0x1e9 [ 13.173009] [<ffffffff8104556d>] run_timer_softirq+0x1aa/0x1f2 [ 13.173009] [<ffffffff8103e487>] __do_softirq+0x109/0x26a [ 13.173009] [<ffffffff8157144c>] call_softirq+0x1c/0x30 [ 13.173009] [<ffffffff81003207>] do_softirq+0x44/0xf1 [ 13.173009] [<ffffffff8103e8b9>] irq_exit+0x58/0xc8 [ 13.173009] [<ffffffff81017f5a>] smp_apic_timer_interrupt+0x79/0x87 [ 13.173009] [<ffffffff81570fd3>] apic_timer_interrupt+0x13/0x20 [ 13.173009] [<ffffffff810bd51a>] get_page_from_freelist+0x2aa/0x310 [ 13.173009] [<ffffffff810bdf03>] __alloc_pages_nodemask+0x178/0x243 [ 13.173009] [<ffffffff8101fe2f>] pte_alloc_one+0x1e/0x3a [ 13.173009] [<ffffffff810d27fe>] __pte_alloc+0x22/0x14b [ 13.173009] [<ffffffff810d48a8>] handle_mm_fault+0x17e/0x1e0 [ 13.173009] [<ffffffff8156cfee>] do_page_fault+0x42d/0x5de [ 13.173009] [<ffffffff8156a75f>] page_fault+0x1f/0x30 [ 13.173009] [ 13.173009] other info that might help us debug this: [ 13.173009] [ 13.173009] Chain exists of: [ 13.173009] &p->pi_lock --> &rq->lock --> rcu_node_level_0 [ 13.173009] [ 13.173009] Possible unsafe locking scenario: [ 13.173009] [ 13.173009] CPU0 CPU1 [ 13.173009] ---- ---- [ 13.173009] lock(rcu_node_level_0); [ 13.173009] lock(&rq->lock); [ 13.173009] lock(rcu_node_level_0); [ 13.173009] lock(&p->pi_lock); [ 13.173009] [ 13.173009] *** DEADLOCK *** [ 13.173009] [ 13.173009] 3 locks held by blkid/267: [ 13.173009] #0: (&mm->mmap_sem){++++++}, at: [<ffffffff8156cdb4>] do_page_fault+0x1f3/0x5de [ 13.173009] #1: (&yield_timer){+.-...}, at: [<ffffffff810451da>] call_timer_fn+0x0/0x1e9 [ 13.173009] #2: (rcu_node_level_0){..-...}, at: [<ffffffff810901cc>] rcu_cpu_kthread_timer+0x27/0x58 [ 13.173009] [ 13.173009] stack backtrace: [ 13.173009] Pid: 267, comm: blkid Not tainted 2.6.39-rc6-mmotm0506 #1 [ 13.173009] Call Trace: [ 13.173009] <IRQ> [<ffffffff8154a529>] print_circular_bug+0xc8/0xd9 [ 13.173009] [<ffffffff81067788>] check_prev_add+0x68/0x20e [ 13.173009] [<ffffffff8100c861>] ? save_stack_trace+0x28/0x46 [ 13.173009] [<ffffffff810679b9>] check_prevs_add+0x8b/0x104 [ 13.173009] [<ffffffff81067da1>] validate_chain+0x36f/0x3ab [ 13.173009] [<ffffffff8106846b>] __lock_acquire+0x369/0x3e2 [ 13.173009] [<ffffffff81032d8f>] ? try_to_wake_up+0x29/0x1aa [ 13.173009] [<ffffffff81068a0f>] lock_acquire+0xfc/0x14c [ 13.173009] [<ffffffff81032d8f>] ? try_to_wake_up+0x29/0x1aa [ 13.173009] [<ffffffff810901a5>] ? rcu_check_quiescent_state+0x82/0x82 [ 13.173009] [<ffffffff815698ea>] _raw_spin_lock_irqsave+0x44/0x57 [ 13.173009] [<ffffffff81032d8f>] ? try_to_wake_up+0x29/0x1aa [ 13.173009] [<ffffffff81032d8f>] try_to_wake_up+0x29/0x1aa [ 13.173009] [<ffffffff810901a5>] ? rcu_check_quiescent_state+0x82/0x82 [ 13.173009] [<ffffffff81032f3c>] wake_up_process+0x10/0x12 [ 13.173009] [<ffffffff810901e9>] rcu_cpu_kthread_timer+0x44/0x58 [ 13.173009] [<ffffffff810901a5>] ? rcu_check_quiescent_state+0x82/0x82 [ 13.173009] [<ffffffff81045286>] call_timer_fn+0xac/0x1e9 [ 13.173009] [<ffffffff810451da>] ? del_timer+0x75/0x75 [ 13.173009] [<ffffffff810901a5>] ? rcu_check_quiescent_state+0x82/0x82 [ 13.173009] [<ffffffff8104556d>] run_timer_softirq+0x1aa/0x1f2 [ 13.173009] [<ffffffff8103e487>] __do_softirq+0x109/0x26a [ 13.173009] [<ffffffff8106365f>] ? tick_dev_program_event+0x37/0xf6 [ 13.173009] [<ffffffff810a0e4a>] ? time_hardirqs_off+0x1b/0x2f [ 13.173009] [<ffffffff8157144c>] call_softirq+0x1c/0x30 [ 13.173009] [<ffffffff81003207>] do_softirq+0x44/0xf1 [ 13.173009] [<ffffffff8103e8b9>] irq_exit+0x58/0xc8 [ 13.173009] [<ffffffff81017f5a>] smp_apic_timer_interrupt+0x79/0x87 [ 13.173009] [<ffffffff81570fd3>] apic_timer_interrupt+0x13/0x20 [ 13.173009] <EOI> [<ffffffff810bd384>] ? get_page_from_freelist+0x114/0x310 [ 13.173009] [<ffffffff810bd51a>] ? get_page_from_freelist+0x2aa/0x310 [ 13.173009] [<ffffffff812220e7>] ? clear_page_c+0x7/0x10 [ 13.173009] [<ffffffff810bd1ef>] ? prep_new_page+0x14c/0x1cd [ 13.173009] [<ffffffff810bd51a>] get_page_from_freelist+0x2aa/0x310 [ 13.173009] [<ffffffff810bdf03>] __alloc_pages_nodemask+0x178/0x243 [ 13.173009] [<ffffffff810d46b9>] ? __pmd_alloc+0x87/0x99 [ 13.173009] [<ffffffff8101fe2f>] pte_alloc_one+0x1e/0x3a [ 13.173009] [<ffffffff810d46b9>] ? __pmd_alloc+0x87/0x99 [ 13.173009] [<ffffffff810d27fe>] __pte_alloc+0x22/0x14b [ 13.173009] [<ffffffff810d48a8>] handle_mm_fault+0x17e/0x1e0 [ 13.173009] [<ffffffff8156cfee>] do_page_fault+0x42d/0x5de [ 13.173009] [<ffffffff810d915f>] ? sys_brk+0x32/0x10c [ 13.173009] [<ffffffff810a0e4a>] ? time_hardirqs_off+0x1b/0x2f [ 13.173009] [<ffffffff81065c4f>] ? trace_hardirqs_off_caller+0x3f/0x9c [ 13.173009] [<ffffffff812235dd>] ? trace_hardirqs_off_thunk+0x3a/0x3c [ 13.173009] [<ffffffff8156a75f>] page_fault+0x1f/0x30 [ 14.010075] usb 5-1: new full speed USB device number 2 using uhci_hcd Reported-by: Valdis Kletnieks <Valdis.Kletnieks@vt.edu> Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-28atomic: Add atomic_or()Paul E. McKenney
An atomic_or() function is needed by TREE_RCU to avoid deadlock, so add a generic version. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-28Merge branch 'rcu/urgent' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-2.6-rcu into core/urgent
2011-05-28perf: Fix SIGIO handlingPeter Zijlstra
Vince noticed that unless we mmap() a buffer, SIGIO gets lost. So explicitly push the wakeup (including signals) when requested. Reported-by: Vince Weaver <vweaver1@eecs.utk.edu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: <stable@kernel.org> Link: http://lkml.kernel.org/n/tip-2euus3f3x3dyvdk52cjxw8zu@git.kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-28Documentation: Add statistics about nested locksJuri Lelli
Explain what the trailing "/1" on some lock class names of lock_stat output means. Reviewed-by: Yong Zhang <yong.zhang0@gmail.com> Signed-off-by: Juri Lelli <juri.lelli@gmail.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/4DD4F6C1.5090701@gmail.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-28cpuset: Fix cpuset_cpus_allowed_fallback(), don't update tsk->rt.nr_cpus_allowedKOSAKI Motohiro
The rule is, we have to update tsk->rt.nr_cpus_allowed if we change tsk->cpus_allowed. Otherwise RT scheduler may confuse. Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/4DD4B3FA.5060901@jp.fujitsu.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-28sched: Fix ->min_vruntime calculation in dequeue_entity()Peter Zijlstra
Dima Zavin <dima@android.com> reported: "After pulling the thread off the run-queue during a cgroup change, the cfs_rq.min_vruntime gets recalculated. The dequeued thread's vruntime then gets normalized to this new value. This can then lead to the thread getting an unfair boost in the new group if the vruntime of the next task in the old run-queue was way further ahead." Reported-by: Dima Zavin <dima@android.com> Signed-off-by: John Stultz <john.stultz@linaro.org> Recalls-having-tested-once-upon-a-time-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1305674470-23727-1-git-send-email-john.stultz@linaro.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-28sched: Fix ttwu() for __ARCH_WANT_INTERRUPTS_ON_CTXSWPeter Zijlstra
Marc reported that e4a52bcb9 (sched: Remove rq->lock from the first half of ttwu()) broke his ARM-SMP machine. Now ARM is one of the few __ARCH_WANT_INTERRUPTS_ON_CTXSW users, so that exception in the ttwu() code was suspect. Yong found that the interrupt could hit after context_switch() changes current but before it clears p->on_cpu, if that interrupt were to attempt a wake-up of p we would indeed find ourselves spinning in IRQ context. Fix this by reverting to the old behaviour for this situation and perform a full remote wake-up. Cc: Frank Rowand <frank.rowand@am.sony.com> Cc: Yong Zhang <yong.zhang0@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Reported-by: Marc Zyngier <Marc.Zyngier@arm.com> Tested-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-28sched: More sched_domain iterations fixesXiaotian Feng
sched_domain iterations needs to be protected by rcu_read_lock() now, this patch adds another two places which needs the rcu lock, which is spotted by following suspicious rcu_dereference_check() usage warnings. kernel/sched_rt.c:1244 invoked rcu_dereference_check() without protection! kernel/sched_stats.h:41 invoked rcu_dereference_check() without protection! Signed-off-by: Xiaotian Feng <dfeng@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1303469634-11678-1-git-send-email-dfeng@redhat.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-28mfd: Fix build breakage caused by tps65910 gpio directory moveLiam Girdwood
Signed-off-by: Liam Girdwood <lrg@ti.com> Acked-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2011-05-28mfd: Use mfd cell platform_data for db8500-prcmu cells platform bitsMattias Wallin
With the addition of a device platform mfd_cell pointer, MFD drivers can go back to passing platform data back to their sub drivers. This allows for an mfd_cell->mfd_data removal and thus keep the sub drivers MFD agnostic. This is mostly needed for non MFD aware sub drivers. Signed-off-by: Mattias Wallin <mattias.wallin@stericsson.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2011-05-27Merge branch 'for_2.6.40/gpio-move' of ↵Grant Likely
git://git.kernel.org/pub/scm/linux/kernel/git/khilman/linux-omap-pm into gpio/next
2011-05-28fs: block_page_mkwrite should wait for writeback to finishDarrick J. Wong
For filesystems such as nilfs2 and xfs that use block_page_mkwrite, modify that function to wait for pending writeback before allowing the page to become writable. This is needed to stabilize pages during writeback for those two filesystems. Signed-off-by: Darrick J. Wong <djwong@us.ibm.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-05-28mm: Wait for writeback when grabbing pages to begin a writeDarrick J. Wong
When grabbing a page for a buffered IO write, the mm should wait for writeback on the page to complete so that the page does not become writable during the IO operation. This change is needed to provide page stability during writes for all filesystems. Signed-off-by: Darrick J. Wong <djwong@us.ibm.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>