summaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)Author
2012-05-25arch/tile: fix hardwall for tilegx and generalize for idn and ipiChris Metcalf
The hardwall drain code was not properly implemented for tilegx, just tilepro, so you couldn't reliably restart an application that made use of the udn. In addition, the code was only applicable to the udn (user dynamic network). On tilegx there is a second user network that is available (the "idn"), and there is support for having I/O shims deliver user-level interrupts to applications ("ipi") which functions in a very similar way to the inter-core permissions used for udn/idn. So this change also generalizes the code from supporting just the udn to supports udn/idn/ipi on tilegx. By default we now use /dev/hardwall/{udn,idn,ipi} with separate minor numbers for the three devices. Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
2012-05-25arch/tile: support multiple huge page sizes dynamicallyChris Metcalf
This change adds support for a new "super" bit in the PTE, using the new arch_make_huge_pte() method. The Tilera hypervisor sees the bit set at a given level of the page table and gangs together 4, 16, or 64 consecutive pages from that level of the hierarchy to create a larger TLB entry. One extra "super" page size can be specified at each of the three levels of the page table hierarchy on tilegx, using the "hugepagesz" argument on the boot command line. A new hypervisor API is added to allow Linux to tell the hypervisor how many PTEs to gang together at each level of the page table. To allow pre-allocating huge pages larger than the buddy allocator can handle, this change modifies the Tilera bootmem support to put all of memory on tilegx platforms into bootmem. As part of this change I eliminate the vestigial CONFIG_HIGHPTE support, which never worked anyway, and eliminate the hv_page_size() API in favor of the standard vma_kernel_pagesize() API. Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
2012-05-25mm: add new arch_make_huge_pte() method for tile supportChris Metcalf
The tile support for multiple-size huge pages requires tagging the hugetlb PTE with a "super" bit for PTEs that are multiples of the basic size of a pagetable span. To set that bit properly we need to tweak the PTe in make_huge_pte() based on the vma. This change provides the API for a subsequent tile-specific change to use. Reviewed-by: Hillf Danton <dhillf@gmail.com> Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
2012-05-25arch/tile: support kexec() for tilegxChris Metcalf
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
2012-05-25arch/tile: support <asm/cachectl.h> header for cacheflush() syscallChris Metcalf
We already had a syscall that did some dcache flushing, but it was not used in practice. Make it MIPS compatible instead so it can do both the DCACHE and ICACHE actions. We have code that wants to be able to use the ICACHE flush mode from userspace so this change enables that. Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
2012-05-25arch/tile: Allow tilegx to build with either 16K or 64K page sizeChris Metcalf
This change introduces new flags for the hv_install_context() API that passes a page table pointer to the hypervisor. Clients can explicitly request 4K, 16K, or 64K small pages when they install a new context. In practice, the page size is fixed at kernel compile time and the same size is always requested every time a new page table is installed. The <hv/hypervisor.h> header changes so that it provides more abstract macros for managing "page" things like PFNs and page tables. For example there is now a HV_DEFAULT_PAGE_SIZE_SMALL instead of the old HV_PAGE_SIZE_SMALL. The various PFN routines have been eliminated and only PA- or PTFN-based ones remain (since PTFNs are always expressed in fixed 2KB "page" size). The page-table management macros are renamed with a leading underscore and take page-size arguments with the presumption that clients will use those macros in some single place to provide the "real" macros they will use themselves. I happened to notice the old hv_set_caching() API was totally broken (it assumed 4KB pages) so I changed it so it would nominally work correctly with other page sizes. Tag modules with the page size so you can't load a module built with a conflicting page size. (And add a test for SMP while we're at it.) Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
2012-05-25arch/tile: optimize get_user/put_user and friendsChris Metcalf
Use direct load/store for the get_user/put_user. Previously, we would call out to a helper routine that would do the appropriate thing and then return, handling the possible exception internally. Now we inline the load or store, along with a "we succeeded" indication in a register; if the load or store faults, we write a "we failed" indication into the same register and then return to the following instruction. This is more efficient and gives us more compact code, as well as being more in line with what other architectures do. The special futex assembly source file for TILE-Gx also disappears in this change; we just use the same inlining idiom there as well, putting the appropriate atomic operations directly into futex_atomic_op_inuser() (and thus into the FUTEX_WAIT function). The underlying atomic copy_from_user, copy_to_user functions were renamed using the (cryptic) x86 convention as copy_from_user_ll and copy_to_user_ll. Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
2012-05-25arch/tile: support building big-endian kernelChris Metcalf
The toolchain supports big-endian mode now, so add support for building the kernel to run big-endian as well. Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
2012-05-25arch/tile: allow building Linux with transparent huge pages enabledChris Metcalf
The change adds some infrastructure for managing tile pmd's more generally, using pte_pmd() and pmd_pte() methods to translate pmd values to and from ptes, since on TILEPro a pmd is really just a nested structure holding a pgd (aka pte). Several existing pmd methods are moved into this framework, and a whole raft of additional pmd accessors are defined that are used by the transparent hugepage framework. The tile PTE now has a "client2" bit. The bit is used to indicate a transparent huge page is in the process of being split into subpages. This change also fixes a generic bug where the return value of the generic pmdp_splitting_flush() was incorrect. Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
2012-05-25arch/tile: use interrupt critical sections lessChris Metcalf
In general we want to avoid ever touching memory while within an interrupt critical section, since the page fault path goes through a different path from the hypervisor when in an interrupt critical section, and we carefully decided with tilegx that we didn't need to support this path in the kernel. (On tilepro we did implement that path as part of supporting atomic instructions in software.) In practice we always need to touch the kernel stack, since that's where we store the interrupt state before releasing the critical section, but this change cleans up a few things. The IRQ_ENABLE macro is split up so that when we want to enable interrupts in a deferred way (e.g. for cpu_idle or for interrupt return) we can read the per-cpu enable mask before entering the critical section. The cache-migration code is changed to use interrupt masking instead of interrupt critical sections. And, the interrupt-entry code is changed so that we defer loading "tp" from per-cpu data until after we have released the interrupt critical section. Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
2012-05-20Linux 3.4v3.4Linus Torvalds
2012-05-19Merge tag 'parisc-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/parisc-2.6 Pull PA-RISC fixes from James Bottomley: "This is a set of three bug fixes that gets parisc running again on systems with PA1.1 processors. Two fix regressions introduced in 2.6.39 and one fixes a prefetch bug that only affects PA7300LC processors. We also have another pending fix to do with the sectional arrangement of vmlinux.lds, but there's a query on it during testing on one particular system type, so I'll hold off sending it in for now." * tag 'parisc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/parisc-2.6: [PARISC] fix panic on prefetch(NULL) on PA7300LC [PARISC] fix crash in flush_icache_page_asm on PA1.1 [PARISC] fix PA1.1 oops on boot
2012-05-19Merge branch 'x86/ld-fix' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 linker bug workarounds from Peter Anvin. GNU ld-2.22.52.0.[12] (*) has an unfortunate bug where it incorrectly turns certain relocation entries absolute. Section-relative symbols that are part of otherwise empty sections are silently changed them to absolute. We rely on section-relative symbols staying section-relative, and actually have several sections in the linker script solely for this purpose. See for example http://sourceware.org/bugzilla/show_bug.cgi?id=14052 We could just black-list the buggy linker, but it appears that it got shipped in at least F17, and possibly other distros too, so it's sadly not some rare unusual case. This backports the workaround from the x86/trampoline branch, and as Peter says: "This is not a minimal fix, not at all, but it is a tested code base." * 'x86/ld-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, relocs: When printing an error, say relative or absolute x86, relocs: Workaround for binutils 2.22.52.0.1 section bug x86, realmode: 16-bit real-mode code support for relocs tool (*) That's a manly release numbering system. Stupid, sure. But manly.
2012-05-19Merge branch 'for-linus' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block layer fixes from Jens Axboe: "A few small, but important fixes. Most of them are marked for stable as well - Fix failure to release a semaphore on error path in mtip32xx. - Fix crashable condition in bio_get_nr_vecs(). - Don't mark end-of-disk buffers as mapped, limit it to i_size. - Fix for build problem with CONFIG_BLOCK=n on arm at least. - Fix for a buffer overlow on UUID partition printing. - Trivial removal of unused variables in dac960." * 'for-linus' of git://git.kernel.dk/linux-block: block: fix buffer overflow when printing partition UUIDs Fix blkdev.h build errors when BLOCK=n bio allocation failure due to bio_get_nr_vecs() block: don't mark buffers beyond end of disk as mapped mtip32xx: release the semaphore on an error path dac960: Remove unused variables from DAC960_CreateProcEntries()
2012-05-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull one more networking bug-fix from David Miller: "One last straggler. Eric Dumazet's pktgen unload oops fix was not entirely complete, but all the cases should be handled properly now.... fingers crossed." * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: pktgen: fix module unload for good
2012-05-19memcg,thp: fix res_counter:96 regressionHugh Dickins
Occasionally, testing memcg's move_charge_at_immigrate on rc7 shows a flurry of hundreds of warnings at kernel/res_counter.c:96, where res_counter_uncharge_locked() does WARN_ON(counter->usage < val). The first trace of each flurry implicates __mem_cgroup_cancel_charge() of mc.precharge, and an audit of mc.precharge handling points to mem_cgroup_move_charge_pte_range()'s THP handling in commit 12724850e806 ("memcg: avoid THP split in task migration"). Checking !mc.precharge is good everywhere else, when a single page is to be charged; but here the "mc.precharge -= HPAGE_PMD_NR" likely to follow, is liable to result in underflow (a lot can change since the precharge was estimated). Simply check against HPAGE_PMD_NR: there's probably a better alternative, trying precharge for more, splitting if unsuccessful; but this one-liner is safer for now - no kernel/res_counter.c:96 warnings seen in 26 hours. Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-18x86, relocs: When printing an error, say relative or absoluteH. Peter Anvin
When the relocs tool throws an error, let the error message say if it is an absolute or relative symbol. This should make it a lot more clear what action the programmer needs to take and should help us find the reason if additional symbol bugs show up. Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: <stable@vger.kernel.org>
2012-05-18x86, relocs: Workaround for binutils 2.22.52.0.1 section bugH. Peter Anvin
GNU ld 2.22.52.0.1 has a bug that it blindly changes symbols from section-relative to absolute if they are in a section of zero length. This turns the symbols __init_begin and __init_end into absolute symbols. Let the relocs program know that those should be treated as relative symbols. Reported-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: H.J. Lu <hjl.tools@gmail.com> Cc: <stable@vger.kernel.org> Cc: Jarkko Sakkinen <jarkko.sakkinen@intel.com>
2012-05-18x86, realmode: 16-bit real-mode code support for relocs toolH. Peter Anvin
A new option is added to the relocs tool called '--realmode'. This option causes the generation of 16-bit segment relocations and 32-bit linear relocations for the real-mode code. When the real-mode code is moved to the low-memory during kernel initialization, these relocation entries can be used to relocate the code properly. In the assembly code 16-bit segment relocations must be relative to the 'real_mode_seg' absolute symbol. Linear relocations must be relative to a symbol prefixed with 'pa_'. 16-bit segment relocation is used to load cs:ip in 16-bit code. Linear relocations are used in the 32-bit code for relocatable data references. They are declared in the linker script of the real-mode code. The relocs tool is moved to arch/x86/tools/relocs.c, and added new target archscripts that can be used to build scripts needed building an architecture. be compiled before building the arch/x86 tree. [ hpa: accelerating this because it detects invalid absolute relocations, a serious bug in binutils 2.22.52.0.x which currently produces bad kernels. ] Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Link: http://lkml.kernel.org/r/1336501366-28617-2-git-send-email-jarkko.sakkinen@intel.com Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: <stable@vger.kernel.org>
2012-05-18Merge tag 'dm-3.4-fixes-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-dm Pull a dm fix from Alasdair G Kergon: "A fix to the thin provisioning userspace interface." * tag 'dm-3.4-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-dm: dm thin: fix table output when pool target disables discard passdown internally
2012-05-19dm thin: fix table output when pool target disables discard passdown internallyMike Snitzer
When the thin pool target clears the discard_passdown parameter internally, it incorrectly changes the table line reported to userspace. This breaks dumb string comparisons on these table lines in generic userspace device-mapper library code and leads to tables being reloaded repeatedly when nothing is actually meant to be changing. This patch corrects this by no longer changing the table line when discard passdown was disabled. We can still tell when discard passdown is overridden by looking for the message "Discard unsupported by data device (sdX): Disabling discard passdown." This automatic detection is also moved from the 'load' to the 'resume' so that it is re-evaluated should the properties of underlying devices change. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Acked-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-05-18Merge tag 'md-3.4-fixes' of git://neil.brown.name/mdLinus Torvalds
Pull one more md bugfix from NeilBrown: "Fix bug in recent fix to RAID10. Without this patch, recovery will crash" * tag 'md-3.4-fixes' of git://neil.brown.name/md: md/raid10: fix transcription error in calc_sectors conversion.
2012-05-18Merge branch 'stable' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile Pull tile tree bugfix from Chris Metcalf: "This fixes a security vulnerability (and correctness bug) in tilegx" * 'stable' of git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile: tilegx: enable SYSCALL_WRAPPERS support
2012-05-19md/raid10: fix transcription error in calc_sectors conversion.NeilBrown
The old code was sector_div(stride, fc); the new code was sector_dir(size, conf->near_copies); 'size' is right (the stride various wasn't really needed), but 'fc' means 'far_copies', and that is an important difference. Signed-off-by: NeilBrown <neilb@suse.de>
2012-05-18Merge branch 'akpm' (Andrew's patch-bomb)Linus Torvalds
Merge misc fixes from Andrew Morton. * emailed from Andrew Morton <akpm@linux-foundation.org>: (4 patches) frv: delete incorrect task prototypes causing compile fail slub: missing test for partial pages flush work in flush_all() fs, proc: fix ABBA deadlock in case of execution attempt of map_files/ entries drivers/rtc/rtc-pl031.c: configure correct wday for 2000-01-01
2012-05-18proc: move fd symlink i_mode calculations into tid_fd_revalidate()Linus Torvalds
Instead of doing the i_mode calculations at proc_fd_instantiate() time, move them into tid_fd_revalidate(), which is where the other inode state (notably uid/gid information) is updated too. Otherwise we'll end up with stale i_mode information if an fd is re-used while the dentry still hangs around. Not that anything really *cares* (symlink permissions don't really matter), but Tetsuo Handa noticed that the owner read/write bits don't always match the state of the readability of the file descriptor, and we _used_ to get this right a long time ago in a galaxy far, far away. Besides, aside from fixing an ugly detail (that has apparently been this way since commit 61a28784028e: "proc: Remove the hard coded inode numbers" in 2006), this removes more lines of code than it adds. And it just makes sense to update i_mode in the same place we update i_uid/gid. Al Viro correctly points out that we could just do the inode fill in the inode iops ->getattr() function instead. However, that does require somewhat slightly more invasive changes, and adds yet *another* lookup of the file descriptor. We need to do the revalidate() for other reasons anyway, and have the file descriptor handy, so we might as well fill in the information at this point. Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Cc: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Eric Biederman <ebiederm@xmission.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-18pktgen: fix module unload for goodEric Dumazet
commit c57b5468406 (pktgen: fix crash at module unload) did a very poor job with list primitives. 1) list_splice() arguments were in the wrong order 2) list_splice(list, head) has undefined behavior if head is not initialized. 3) We should use the list_splice_init() variant to clear pktgen_threads list. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-18tilegx: enable SYSCALL_WRAPPERS supportChris Metcalf
Some discussion with the glibc mailing lists revealed that this was necessary for 64-bit platforms with MIPS-like sign-extension rules for 32-bit values. The original symptom was that passing (uid_t)-1 to setreuid() was failing in programs linked -pthread because of the "setxid" mechanism for passing setxid-type function arguments to the syscall code. SYSCALL_WRAPPERS handles ensuring that all syscall arguments end up with proper sign-extension and is thus the appropriate fix for this problem. On other platforms (s390, powerpc, sparc64, and mips) this was fixed in 2.6.28.6. The general issue is tracked as CVE-2009-0029. Cc: <stable@vger.kernel.org> Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
2012-05-18Merge tag 'linus-mce-fix' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras Pull a machine check recovery fix from Tony Luck. I really don't like how the MCE code does some of the things it does, but this does seem to be an improvement. * tag 'linus-mce-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras: x86/mce: Only restart instruction after machine check recovery if it is safe
2012-05-17frv: delete incorrect task prototypes causing compile failPaul Gortmaker
Commit 41101809a865 ("fork: Provide weak arch_release_[task_struct| thread_info] functions") in -tip highlights a problem in the frv arch, where it has needles prototypes for alloc_task_struct_node and free_task_struct. This now shows up as: kernel/fork.c:120:66: error: static declaration of 'alloc_task_struct_node' follows non-static declaration kernel/fork.c:127:51: error: static declaration of 'free_task_struct' follows non-static declaration since that commit turned them into real functions. Since arch/frv does does not define define __HAVE_ARCH_TASK_STRUCT_ALLOCATOR (i.e. it just uses the generic ones) it shouldn't list these at all. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: David Howells <dhowells@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-17slub: missing test for partial pages flush work in flush_all()majianpeng
I found some kernel messages such as: SLUB raid5-md127: kmem_cache_destroy called for cache that still has objects. Pid: 6143, comm: mdadm Tainted: G O 3.4.0-rc6+ #75 Call Trace: kmem_cache_destroy+0x328/0x400 free_conf+0x2d/0xf0 [raid456] stop+0x41/0x60 [raid456] md_stop+0x1a/0x60 [md_mod] do_md_stop+0x74/0x470 [md_mod] md_ioctl+0xff/0x11f0 [md_mod] blkdev_ioctl+0xd8/0x7a0 block_ioctl+0x3b/0x40 do_vfs_ioctl+0x96/0x560 sys_ioctl+0x91/0xa0 system_call_fastpath+0x16/0x1b Then using kmemleak I found these messages: unreferenced object 0xffff8800b6db7380 (size 112): comm "mdadm", pid 5783, jiffies 4294810749 (age 90.589s) hex dump (first 32 bytes): 01 01 db b6 ad 4e ad de ff ff ff ff ff ff ff ff .....N.......... ff ff ff ff ff ff ff ff 98 40 4a 82 ff ff ff ff .........@J..... backtrace: kmemleak_alloc+0x21/0x50 kmem_cache_alloc+0xeb/0x1b0 kmem_cache_open+0x2f1/0x430 kmem_cache_create+0x158/0x320 setup_conf+0x649/0x770 [raid456] run+0x68b/0x840 [raid456] md_run+0x529/0x940 [md_mod] do_md_run+0x18/0xc0 [md_mod] md_ioctl+0xba8/0x11f0 [md_mod] blkdev_ioctl+0xd8/0x7a0 block_ioctl+0x3b/0x40 do_vfs_ioctl+0x96/0x560 sys_ioctl+0x91/0xa0 system_call_fastpath+0x16/0x1b This bug was introduced by commit a8364d5555b ("slub: only IPI CPUs that have per cpu obj to flush"), which did not include checks for per cpu partial pages being present on a cpu. Signed-off-by: majianpeng <majianpeng@gmail.com> Cc: Gilad Ben-Yossef <gilad@benyossef.com> Acked-by: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Tested-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-17fs, proc: fix ABBA deadlock in case of execution attempt of map_files/ entriesCyrill Gorcunov
map_files/ entries are never supposed to be executed, still curious minds might try to run them, which leads to the following deadlock ====================================================== [ INFO: possible circular locking dependency detected ] 3.4.0-rc4-24406-g841e6a6 #121 Not tainted ------------------------------------------------------- bash/1556 is trying to acquire lock: (&sb->s_type->i_mutex_key#8){+.+.+.}, at: do_lookup+0x267/0x2b1 but task is already holding lock: (&sig->cred_guard_mutex){+.+.+.}, at: prepare_bprm_creds+0x2d/0x69 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&sig->cred_guard_mutex){+.+.+.}: validate_chain+0x444/0x4f4 __lock_acquire+0x387/0x3f8 lock_acquire+0x12b/0x158 __mutex_lock_common+0x56/0x3a9 mutex_lock_killable_nested+0x40/0x45 lock_trace+0x24/0x59 proc_map_files_lookup+0x5a/0x165 __lookup_hash+0x52/0x73 do_lookup+0x276/0x2b1 walk_component+0x3d/0x114 do_last+0xfc/0x540 path_openat+0xd3/0x306 do_filp_open+0x3d/0x89 do_sys_open+0x74/0x106 sys_open+0x21/0x23 tracesys+0xdd/0xe2 -> #0 (&sb->s_type->i_mutex_key#8){+.+.+.}: check_prev_add+0x6a/0x1ef validate_chain+0x444/0x4f4 __lock_acquire+0x387/0x3f8 lock_acquire+0x12b/0x158 __mutex_lock_common+0x56/0x3a9 mutex_lock_nested+0x40/0x45 do_lookup+0x267/0x2b1 walk_component+0x3d/0x114 link_path_walk+0x1f9/0x48f path_openat+0xb6/0x306 do_filp_open+0x3d/0x89 open_exec+0x25/0xa0 do_execve_common+0xea/0x2f9 do_execve+0x43/0x45 sys_execve+0x43/0x5a stub_execve+0x6c/0xc0 This is because prepare_bprm_creds grabs task->signal->cred_guard_mutex and when do_lookup happens we try to grab task->signal->cred_guard_mutex again in lock_trace. Fix it using plain ptrace_may_access() helper in proc_map_files_lookup() and in proc_map_files_readdir() instead of lock_trace(), the caller must be CAP_SYS_ADMIN granted anyway. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Reported-by: Sasha Levin <levinsasha928@gmail.com> Cc: Konstantin Khlebnikov <khlebnikov@openvz.org> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: Dave Jones <davej@redhat.com> Cc: Vasiliy Kulikov <segoon@openwall.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-17drivers/rtc/rtc-pl031.c: configure correct wday for 2000-01-01Rajkumar Kasirajan
The reset date of the ST Micro version of PL031 is 2000-01-01. The correct weekday for 2000-01-01 is saturday, but pl031 is initialized to sunday. This may lead to alarm malfunction, so configure the correct wday if RTC_DR indicates reset. Signed-off-by: Rajkumar Kasirajan <rajkumar.kasirajan@stericsson.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Cc: Mattias Wallin <mattias.wallin@stericsson.com> Cc: Alessandro Zummo <a.zummo@towertech.it> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-17Merge branch 'fixes' of git://git.linaro.org/people/rmk/linux-armLinus Torvalds
Pull ARM fixes from Russell King: "Small set of fixes again." * 'fixes' of git://git.linaro.org/people/rmk/linux-arm: ARM: 7419/1: vfp: fix VFP flushing regression on sigreturn path ARM: 7418/1: LPAE: fix access flag setup in mem_type_table ARM: prevent VM_GROWSDOWN mmaps extending below FIRST_USER_ADDRESS ARM: 7417/1: vfp: ensure preemption is disabled when enabling VFP access
2012-05-17Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull two networking fixes from David S. Miller: 1) Thanks to Willy Tarreau and Eric Dumazet, we've unlocked a bug that's been present in do_tcp_sendpages() since that function was written in 2002. When we block to wait for memory we have to unconditionally try and push out pending TCP data, otherwise we can block for an unreasonably long amount of time. 2) Fix deadlock in e1000, fixes kernel bugzilla 43132 From Tushar Dave. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: e1000: Prevent reset task killing itself. tcp: do_tcp_sendpages() must try to push data out on oom conditions
2012-05-17ACPI / PCI / PM: Fix device PM regression related to D3hot/D3coldRafael J. Wysocki
Commit 1cc0c998fdf2 ("ACPI: Fix D3hot v D3cold confusion") introduced a bug in __acpi_bus_set_power() and changed the behavior of acpi_pci_set_power_state() in such a way that it generally doesn't work as expected if PCI_D3hot is passed to it as the second argument. First off, if ACPI_STATE_D3 (equal to ACPI_STATE_D3_COLD) is passed to __acpi_bus_set_power() and the explicit_set flag is set for the D3cold state, the function will try to execute AML method called "_PS4", which doesn't exist. Fix this by adding a check to ensure that the name of the AML method to execute for transitions to ACPI_STATE_D3_COLD is correct in __acpi_bus_set_power(). Also make sure that the explicit_set flag for ACPI_STATE_D3_COLD will be set if _PS3 is present and modify acpi_power_transition() to avoid accessing power resources for ACPI_STATE_D3_COLD, because they don't exist. Second, if PCI_D3hot is passed to acpi_pci_set_power_state() as the target state, the function will request a transition to ACPI_STATE_D3_HOT instead of ACPI_STATE_D3. However, ACPI_STATE_D3_HOT is now only marked as supported if the _PR3 AML method is defined for the given device, which is rare. This causes problems to happen on systems where devices were successfully put into ACPI D3 by pci_set_power_state(PCI_D3hot) which doesn't work now. In particular, some unused graphics adapters are not turned off as a result. To fix this issue restore the old behavior of acpi_pci_set_power_state(), which is to request a transition to ACPI_STATE_D3 (equal to ACPI_STATE_D3_COLD) if either PCI_D3hot or PCI_D3cold is passed to it as the argument. This approach is not ideal, because generally power should not be removed from devices if PCI_D3hot is the target power state, but since this behavior is relied on, we have no choice but to restore it at the moment and spend more time on designing a better solution in the future. References: https://bugzilla.kernel.org/show_bug.cgi?id=43228 Reported-by: rocko <rockorequin@hotmail.com> Reported-by: Cristian Rodríguez <crrodriguez@opensuse.org> Reported-and-tested-by: Peter <lekensteyn@gmail.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-17e1000: Prevent reset task killing itself.Tushar Dave
Killing reset task while adapter is resetting causes deadlock. Only kill reset task if adapter is not resetting. Ref bug #43132 on bugzilla.kernel.org CC: stable@vger.kernel.org Signed-off-by: Tushar Dave <tushar.n.dave@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-17tcp: do_tcp_sendpages() must try to push data out on oom conditionsWilly Tarreau
Since recent changes on TCP splicing (starting with commits 2f533844 "tcp: allow splice() to build full TSO packets" and 35f9c09f "tcp: tcp_sendpages() should call tcp_push() once"), I started seeing massive stalls when forwarding traffic between two sockets using splice() when pipe buffers were larger than socket buffers. Latest changes (net: netdev_alloc_skb() use build_skb()) made the problem even more apparent. The reason seems to be that if do_tcp_sendpages() fails on out of memory condition without being able to send at least one byte, tcp_push() is not called and the buffers cannot be flushed. After applying the attached patch, I cannot reproduce the stalls at all and the data rate it perfectly stable and steady under any condition which previously caused the problem to be permanent. The issue seems to have been there since before the kernel migrated to git, which makes me think that the stalls I occasionally experienced with tux during stress-tests years ago were probably related to the same issue. This issue was first encountered on 3.0.31 and 3.2.17, so please backport to -stable. Signed-off-by: Willy Tarreau <w@1wt.eu> Acked-by: Eric Dumazet <edumazet@google.com> Cc: <stable@vger.kernel.org>
2012-05-17Merge branch '3.4-urgent' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending Pull two more target-core updates from Nicholas Bellinger: "The first patch addresses a SPC-2 reservations RELEASE bug in a special (iscsi specific) multi-ISID setup case that was allowing the same initiator to be able to incorrect release it's own reservation on a different SCSI path with enforce_pr_isid=1 operation. This bug was caught by Bernhard Kohl. The second patch is to address a bug with FILEIO backends where the incorrect number of blocks for READ_CAPACITY was being reported after an underlying device-mapper block_device size change. This patch uses now i_size_read() in fd_get_blocks() for FILEIO backends with an underlying block_device, instead of trying to determine this value at setup time during fd_create_virtdevice(). (hch CC'ed) Both are CC'ed to stable." * '3.4-urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: target: Fix bug in handling of FILEIO + block_device resize ops target: Fix SPC-2 RELEASE bug for multi-session iSCSI client setups
2012-05-17target: Fix bug in handling of FILEIO + block_device resize opsNicholas Bellinger
This patch fixes a bug in the handling of FILEIO w/ underlying block_device resize operations where the original fd_dev->fd_dev_size was incorrectly being used in fd_get_blocks() for READ_CAPACITY response payloads. This patch avoids using fd_dev->fd_dev_size for FILEIO devices with an underlying block_device, and instead changes fd_get_blocks() to get the sector count directly from i_size_read() as recommended by hch. Reported-by: Christoph Hellwig <hch@lst.de> Cc: <stable@vger.kernel.org> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2012-05-17Merge branch 'fixes' of git://git.infradead.org/users/vkoul/slave-dmaLinus Torvalds
Pull slave-dmaengine fixes fromVinod Koul: "fixes of cylic dma usages in slave dma drivers" * 'fixes' of git://git.infradead.org/users/vkoul/slave-dma: dmaengine: fix cyclic dma usage dmaengine: pl330: dont complete descriptor for cyclic dma
2012-05-17Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds
Pull last minute virtio fixes from Michael S. Tsirkin: "Here are a couple of last minute virtio fixes for 3.4. Hope it's not too late yes - I might have tried too hard to make sure the fix is well tested. Fixes are by Amit and myself. One fixes module removal and one suspend of a VM, the last one the handling of out of memory condition. They are thus very low risk as most people never hit these paths, but do fix very annoying problems for people that do use the feature. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: virtio_net: invoke softirqs after __napi_schedule virtio: balloon: let host know of updated balloon size before module removal virtio: console: tell host of open ports after resume from s3/s4
2012-05-17Merge tag 'fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc Pull ARM: SoC fixes from Olof Johansson: "I will stop trying to predict when we're done with fixes for a release. Here's another small batch of three patches for arm-soc: - A fix for a boot time WARN_ON() due to irq domain conversion on PRIMA2 - Fix for a regression in Tegra SMP spinup code due to swapped register offsets - Fixed config dependency for mv_cesa crypto driver to avoid build breakage" * tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: ARM: PRIMA2: fix irq domain size and IRQ mask of internal interrupt controller crypto: mv_cesa requires on CRYPTO_HASH to build ARM: tegra: Fix flow controller accesses
2012-05-17Merge tag 'md-3.4-fixes' of git://neil.brown.name/mdLinus Torvalds
Pull two md fixes from NeilBrown: "One fixes a bug in the new raid10 resize code so is relevant to 3.4 only. The other fixes a bug in the use of md by dm-raid, so is relevant to any kernel with dm-raid support" * tag 'md-3.4-fixes' of git://neil.brown.name/md: MD: Add del_timer_sync to mddev_suspend (fix nasty panic) md/raid10: set dev_sectors properly when resizing devices in array.
2012-05-17Merge branches 'perf-urgent-for-linus', 'x86-urgent-for-linus' and ↵Linus Torvalds
'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf, x86 and scheduler updates from Ingo Molnar. * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: tracing: Do not enable function event with enable perf stat: handle ENXIO error for perf_event_open perf: Turn off compiler warnings for flex and bison generated files perf stat: Fix case where guest/host monitoring is not supported by kernel perf build-id: Fix filename size calculation * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, kvm: KVM paravirt kernels don't check for CPUID being unavailable x86: Fix section annotation of acpi_map_cpu2node() x86/microcode: Ensure that module is only loaded on supported Intel CPUs * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched: Fix KVM and ia64 boot crash due to sched_groups circular linked list assumption
2012-05-17ARM: 7419/1: vfp: fix VFP flushing regression on sigreturn pathWill Deacon
Commit ff9a184c ("ARM: 7400/1: vfp: clear fpscr length and stride bits on entry to sig handler") flushes the VFP state prior to entering a signal handler so that a VFP operation inside the handler will trap and force a restore of ABI-compliant registers. Reflushing and disabling VFP on the sigreturn path is predicated on the saved thread state indicating that VFP was used by the handler -- however for SMP platforms this is only set on context-switch, making the check unreliable and causing VFP register corruption in userspace since the register values are not necessarily those restored from the sigframe. This patch unconditionally flushes the VFP state after a signal handler. Since we already perform the flush before the handler and the flushing itself happens lazily, the redundant flush when VFP is not used by the handler is essentially a nop. Reported-by: Jon Medhurst <tixy@linaro.org> Signed-off-by: Jon Medhurst <tixy@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2012-05-17ARM: 7418/1: LPAE: fix access flag setup in mem_type_tableVitaly Andrianov
A zero value for prot_sect in the memory types table implies that section mappings should never be created for the memory type in question. This is checked for in alloc_init_section(). With LPAE, we set a bit to mask access flag faults for kernel mappings. This breaks the aforementioned (!prot_sect) check in alloc_init_section(). This patch fixes this bug by first checking for a non-zero prot_sect before setting the PMD_SECT_AF flag. Signed-off-by: Vitaly Andrianov <vitalya@ti.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2012-05-17virtio_net: invoke softirqs after __napi_scheduleMichael S. Tsirkin
__napi_schedule might raise softirq but nothing causes do_softirq to trigger, so it does not in fact run. As a result, the error message "NOHZ: local_softirq_pending 08" sometimes occurs during boot of a KVM guest when the network service is started and we are oom: ... Bringing up loopback interface: [ OK ] Bringing up interface eth0: Determining IP information for eth0...NOHZ: local_softirq_pending 08 done. [ OK ] ... Further, receive queue processing might get delayed indefinitely until some interrupt triggers: virtio_net expected napi to be run immediately. One way to cause do_softirq to be executed is by invoking local_bh_enable(). As __napi_schedule is normally called from bh or irq context, this seems to make sense: disable bh before __napi_schedule and enable afterwards. In fact it's a very complicated way of calling do_softirq(), and works since this function is only used when we are not in interrupt context. It's not hot at all, in any ideal scenario. Reported-by: Ulrich Obergfell <uobergfe@redhat.com> Tested-by: Ulrich Obergfell <uobergfe@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Rusty Russell <rusty@rustcorp.com.au>
2012-05-17virtio: balloon: let host know of updated balloon size before module removalAmit Shah
When the balloon module is removed, we deflate the balloon, reclaiming all the pages that were given to the host. However, we don't update the config values for the new balloon size, resulting in the host showing outdated balloon values. The size update is done after each leak and fill operation, only the module removal case was left out. Signed-off-by: Amit Shah <amit.shah@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2012-05-17virtio: console: tell host of open ports after resume from s3/s4Amit Shah
If a port was open before going into one of the sleep states, the port can continue normal operation after restore. However, the host has to be told that the guest side of the connection is open to restore pre-suspend state. This wasn't noticed so far due to a bug in qemu that was fixed recently (which marked the guest-side connection as always open). CC: stable@vger.kernel.org # Only for 3.3 Signed-off-by: Amit Shah <amit.shah@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>