summaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)Author
2014-09-10pnfs/blocklayout: return layouts on setattrChristoph Hellwig
This speads up truncate-heavy workloads like fsx by multiple orders of magnitude. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-09-10pnfs/blocklayout: implement the return_range methodChristoph Hellwig
This allows removing extents from the extent tree especially on truncate operations, and thus fixing reads from truncated and re-extended that previously returned stale data. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-09-10pnfs/blocklayout: rewrite extent trackingChristoph Hellwig
Currently the block layout driver tracks extents in three separate data structures: - the two list of pnfs_block_extent structures returned by the server - the list of sectors that were in invalid state but have been written to - a list of pnfs_block_short_extent structures for LAYOUTCOMMIT All of these share the property that they are not only highly inefficient data structures, but also that operations on them are even more inefficient than nessecary. In addition there are various implementation defects like: - using an int to track sectors, causing corruption for large offsets - incorrect normalization of page or block granularity ranges - insufficient error handling - incorrect synchronization as extents can be modified while they are in use This patch replace all three data with a single unified rbtree structure tracking all extents, as well as their in-memory state, although we still need to instance for read-only and read-write extent due to the arcane client side COW feature in the block layouts spec. To fix the problem of extent possibly being modified while in use we make sure to return a copy of the extent for use in the write path - the extent can only be invalidated by a layout recall or return which has to wait until the I/O operations finished due to refcounts on the layout segment. The new extent tree work similar to the schemes used by block based filesystems like XFS or ext4. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-09-10pnfs/blocklayout: don't set pages uptodateChristoph Hellwig
The core nfs code handles setting pages uptodate on reads, no need to mess with the pageflags outselves. Also remove a debug function to dump page flags. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-09-10pnfs/blocklayout: remove read-modify-write handling in bl_write_pagelistChristoph Hellwig
Use the new PNFS_READ_WHOLE_PAGE flag to offload read-modify-write handling to core nfs code, and remove a huge chunk of deadlock prone mess from the block layout writeback path. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-09-10pnfs: add return_range methodChristoph Hellwig
If a layout driver keeps per-inode state outside of the layout segments it needs to be notified of any layout returns or recalls on an inode, and not just about the freeing of layout segments. Add a method to acomplish this, which will allow the block layout driver to handle the case of truncated and re-expanded files properly. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-09-10pnfs: add flag to force read-modify-write in ->write_beginChristoph Hellwig
Like all block based filesystems, the pNFS block layout driver can't read or write at a byte granularity and thus has to perform read-modify-write cycles on writes smaller than this granularity. Add a flag so that the core NFS code always reads a whole page when starting a smaller write, so that we can do it in the place where the VFS expects it instead of doing in very deadlock prone way in the writeback handler. Note that in theory we could do less than page size reads here for disks that have a smaller sector size which are served by a server with a smaller pnfs block size. But so far that doesn't seem like a worthwhile optimization. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-09-10pnfs: force a layout commit when encountering busy segments during recallChristoph Hellwig
Expedite layout recall processing by forcing a layout commit when we see busy segments. Without it the layout recall might have to wait until the VM decided to start writeback for the file, which can introduce long delays. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-09-10NFS: Fix a compile warning when !(CONFIG_NFS_V3 || CONFIG_NFS_V4)Trond Myklebust
gcc reports: linux/fs/nfs/write.c: In function ‘nfs_page_find_head_request_locked.isra.17’: linux/fs/nfs/write.c:121:64: warning: ‘cinfo.mds’ may be used uninitialized in this function [-Wmaybe-uninitialized] list_for_each_entry_safe(freq, t, &cinfo.mds->list, wb_list) { ^ linux/fs/nfs/write.c:110:25: note: ‘cinfo.mds’ was declared here struct nfs_commit_info cinfo; Reported-by: Anna Schumaker <Anna.Schumaker@netapp.com> Cc: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-09-10pnfs/blocklayout: correctly decrement extent lengthChristoph Hellwig
When we do non-page sized reads we can underflow the extent_length variable and read incorrect data. Fix the extent_length calculation and change to defensive <= checks for the extent length in the read and write path. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-09-10pnfs/blocklayout: plug block queuesChristoph Hellwig
Make sure the block queue is plugged when performing pNFS blocklayout I/O. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-09-10pnfs/blocklayout: improve GETDEVICEINFO error reportingChristoph Hellwig
Tell userspace what stage of GETDEVICEINFO failed so that there is a chance to debug it, especially with the userspace daemon clusterf***k in the block layout driver. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-09-10pnfs/blocklayout: reject pnfs blocksize larger than page sizeChristoph Hellwig
The Linux VM subsystem can't support block sizes larger than page size for block based filesystems very well. While this can be hacked around to some extent for simple filesystems the read-modify-write cycles required for pnfs block invalid extents are extremly deadlock prone when operating on multiple pages. Reject this case early on instead of pretending to support it (badly). Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-09-10pnfs: allow splicing pre-encoded pages into the layoutcommit argsChristoph Hellwig
Currently there is no XDR buffer space allocated for the per-layout driver layoutcommit payload, which leads to server buffer overflows in the blocklayout driver even under simple workloads. As we can't do per-layout sizes for XDR operations we'll have to splice a previously encoded list of pages into the XDR stream, similar to how we handle ACL buffers. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-09-10pnfs: avoid using stale stateids after layoutreturnChristoph Hellwig
After we issued a layoutreturn operations the may free the layout stateid and will thus cause bad stateid error when the client uses it again. We currently try to avoid this case by chosing the open stateid if not lsegs are present for this inode. But various places can hold refererence on lsegs and thus cause the list not to be empty shortly after a layout return. Add an explicit flag to mark the current layout stateid invalid and force usage of the openstateid after we did a full file layoutreturn. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-09-10pnfs: retry after a bad stateid error from layoutgetChristoph Hellwig
Currently we fall through to nfs4_async_handle_error when we get a bad stateid error back from layoutget. nfs4_async_handle_error with a NULL state argument will never retry the operations but return the error to higher layer, causing an avoiable fallback to MDS I/O. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-09-10pnfs: don't check sequence on new stateids in layoutgetChristoph Hellwig
When layoutget returns an entirely new layout stateid it should not check the generation counter as the new stateid will start with a new counter entirely unrelated to old one. The current behavior causes constant layoutget failures against a block server which allocates a new stateid after an recall that removed all outstanding layouts. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-09-10pnfs: do not pass uninitialized lsegs to ->free_lsegChristoph Hellwig
Ensure the lsegs are initialized early so that we don't pass an unitialized one back to ->free_lseg during error processing. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-09-10nfs: cap request size to fit a kmalloced page arrayChristoph Hellwig
pNFS servers may return arbitrarily large layouts. Trim back the I/O size to one that we can at least allocate the page array for. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-09-10nfs/filelayout: set layoutcommit depending on write verifierPeng Tao
Following http://www.rfc-editor.org/errata_search.php?rfc=5661&eid=2751 Don't set layoutcommit for commit_through_mds case. For FILE_SYNC writes, don't set layoutcommit. For DATA_SYNC wirtes, set layout commit right after wirtes done. For UNSTABLE writes, set layout commit when commit done. Signed-off-by: Peng Tao <tao.peng@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-09-10nfs41: add a helper function to set layoutcommit after commitPeng Tao
Track lwb in nfs_commit_data so that we can use it to setup layoutcommit in commit_done callback. Signed-off-by: Peng Tao <tao.peng@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-09-10NFS: Clear up state owner lock usageAnna Schumaker
can_open_cached() reads values out of the state structure, meaning that we need the so_lock to have a correct return value. As a bonus, this helps clear up some potentially confusing code. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-09-10rpc: xs_bind - do not bind when requesting a random ephemeral portChris Perl
When attempting to establish a local ephemeral endpoint for a TCP or UDP socket, do not explicitly call bind, instead let it happen implicilty when the socket is first used. The main motivating factor for this change is when TCP runs out of unique ephemeral ports (i.e. cannot find any ephemeral ports which are not a part of *any* TCP connection). In this situation if you explicitly call bind, then the call will fail with EADDRINUSE. However, if you allow the allocation of an ephemeral port to happen implicitly as part of connect (or other functions), then ephemeral ports can be reused, so long as the combination of (local_ip, local_port, remote_ip, remote_port) is unique for TCP sockets on the system. This doesn't matter for UDP sockets, but it seemed easiest to treat TCP and UDP sockets the same. This can allow mount.nfs(8) to continue to function successfully, even in the face of misbehaving applications which are creating a large number of TCP connections. Signed-off-by: Chris Perl <chris.perl@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-09-10pnfs: fix filelayout_retry_commit when idx > 0Weston Andros Adamson
filelayout_retry_commit was recently split out from alloc_ds_commits, but was done in such a way that the bucket pointer always starts at index 0 no matter what the @idx argument is set to. The intention of the @idx argument is to retry commits starting at bucket @idx. This is called when alloc_ds_commits fails for a bucket. Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-09-08nfs: revert "nfs4: queue free_lock_state job submission to nfsiod"Jeff Layton
This reverts commit 49a4bda22e186c4d0eb07f4a36b5b1a378f9398d. Christoph reported an oops due to the above commit: generic/089 242s ...[ 2187.041239] general protection fault: 0000 [#1] SMP [ 2187.042899] Modules linked in: [ 2187.044000] CPU: 0 PID: 11913 Comm: kworker/0:1 Not tainted 3.16.0-rc6+ #1151 [ 2187.044287] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007 [ 2187.044287] Workqueue: nfsiod free_lock_state_work [ 2187.044287] task: ffff880072b50cd0 ti: ffff88007a4ec000 task.ti: ffff88007a4ec000 [ 2187.044287] RIP: 0010:[<ffffffff81361ca6>] [<ffffffff81361ca6>] free_lock_state_work+0x16/0x30 [ 2187.044287] RSP: 0018:ffff88007a4efd58 EFLAGS: 00010296 [ 2187.044287] RAX: 6b6b6b6b6b6b6b6b RBX: ffff88007a947ac0 RCX: 8000000000000000 [ 2187.044287] RDX: ffffffff826af9e0 RSI: ffff88007b093c00 RDI: ffff88007b093db8 [ 2187.044287] RBP: ffff88007a4efd58 R08: ffffffff832d3e10 R09: 000001c40efc0000 [ 2187.044287] R10: 0000000000000000 R11: 0000000000059e30 R12: ffff88007fc13240 [ 2187.044287] R13: ffff88007fc18b00 R14: ffff88007b093db8 R15: 0000000000000000 [ 2187.044287] FS: 0000000000000000(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000 [ 2187.044287] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 2187.044287] CR2: 00007f93ec33fb80 CR3: 0000000079dc2000 CR4: 00000000000006f0 [ 2187.044287] Stack: [ 2187.044287] ffff88007a4efdd8 ffffffff810cc877 ffffffff810cc80d ffff88007fc13258 [ 2187.044287] 000000007a947af0 0000000000000000 ffffffff8353ccc8 ffffffff82b6f3d0 [ 2187.044287] 0000000000000000 ffffffff82267679 ffff88007a4efdd8 ffff88007fc13240 [ 2187.044287] Call Trace: [ 2187.044287] [<ffffffff810cc877>] process_one_work+0x1c7/0x490 [ 2187.044287] [<ffffffff810cc80d>] ? process_one_work+0x15d/0x490 [ 2187.044287] [<ffffffff810cd569>] worker_thread+0x119/0x4f0 [ 2187.044287] [<ffffffff810fbbad>] ? trace_hardirqs_on+0xd/0x10 [ 2187.044287] [<ffffffff810cd450>] ? init_pwq+0x190/0x190 [ 2187.044287] [<ffffffff810d3c6f>] kthread+0xdf/0x100 [ 2187.044287] [<ffffffff810d3b90>] ? __init_kthread_worker+0x70/0x70 [ 2187.044287] [<ffffffff81d9873c>] ret_from_fork+0x7c/0xb0 [ 2187.044287] [<ffffffff810d3b90>] ? __init_kthread_worker+0x70/0x70 [ 2187.044287] Code: 0f 1f 44 00 00 31 c0 5d c3 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 8d b7 48 fe ff ff 48 8b 87 58 fe ff ff 48 89 e5 48 8b 40 30 <48> 8b 00 48 8b 10 48 89 c7 48 8b 92 90 03 00 00 ff 52 28 5d c3 [ 2187.044287] RIP [<ffffffff81361ca6>] free_lock_state_work+0x16/0x30 [ 2187.044287] RSP <ffff88007a4efd58> [ 2187.103626] ---[ end trace 0f11326d28e5d8fa ]--- The original reason for this patch was because the fl_release_private operation couldn't sleep. With commit ed9814d85810 (locks: defer freeing locks in locks_delete_lock until after i_lock has been dropped), this is no longer a problem so we can revert this patch. Reported-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Jeff Layton <jlayton@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-09-08nfs: fix kernel warning when removing proc entryCong Wang
I saw the following kernel warning: [ 1852.321222] ------------[ cut here ]------------ [ 1852.326527] WARNING: CPU: 0 PID: 118 at fs/proc/generic.c:521 remove_proc_entry+0x154/0x16b() [ 1852.335630] remove_proc_entry: removing non-empty directory 'fs/nfsfs', leaking at least 'volumes' [ 1852.344084] CPU: 0 PID: 118 Comm: kworker/u8:2 Not tainted 3.16.0+ #540 [ 1852.350036] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 1852.354992] Workqueue: netns cleanup_net [ 1852.358701] 0000000000000000 ffff880116f2fbd0 ffffffff819c03e9 ffff880116f2fc18 [ 1852.366474] ffff880116f2fc08 ffffffff810744ee ffffffff811e0e6e ffff8800d4e96238 [ 1852.373507] ffffffff81dbe665 ffff8800d46a5948 0000000000000005 ffff880116f2fc68 [ 1852.380224] Call Trace: [ 1852.381976] [<ffffffff819c03e9>] dump_stack+0x4d/0x66 [ 1852.385495] [<ffffffff810744ee>] warn_slowpath_common+0x7a/0x93 [ 1852.389869] [<ffffffff811e0e6e>] ? remove_proc_entry+0x154/0x16b [ 1852.393987] [<ffffffff8107457b>] warn_slowpath_fmt+0x4c/0x4e [ 1852.397999] [<ffffffff811e0e6e>] remove_proc_entry+0x154/0x16b [ 1852.402034] [<ffffffff8129c73d>] nfs_fs_proc_net_exit+0x53/0x56 [ 1852.406136] [<ffffffff812a103b>] nfs_net_exit+0x12/0x1d [ 1852.409774] [<ffffffff81785bc9>] ops_exit_list+0x44/0x55 [ 1852.413529] [<ffffffff81786389>] cleanup_net+0xee/0x182 [ 1852.417198] [<ffffffff81088c9e>] process_one_work+0x209/0x40d [ 1852.502320] [<ffffffff81088bf7>] ? process_one_work+0x162/0x40d [ 1852.587629] [<ffffffff810890c1>] worker_thread+0x1f0/0x2c7 [ 1852.673291] [<ffffffff81088ed1>] ? process_scheduled_works+0x2f/0x2f [ 1852.759470] [<ffffffff8108e079>] kthread+0xc9/0xd1 [ 1852.843099] [<ffffffff8109427f>] ? finish_task_switch+0x3a/0xce [ 1852.926518] [<ffffffff8108dfb0>] ? __kthread_parkme+0x61/0x61 [ 1853.008565] [<ffffffff819cbeac>] ret_from_fork+0x7c/0xb0 [ 1853.076477] [<ffffffff8108dfb0>] ? __kthread_parkme+0x61/0x61 [ 1853.140653] ---[ end trace 69c4c6617f78e32d ]--- It looks wrong that we add "/proc/net/nfsfs" in nfs_fs_proc_net_init() while remove "/proc/fs/nfsfs" in nfs_fs_proc_net_exit(). Fixes: commit 65b38851a17 (NFS: Fix /proc/fs/nfsfs/servers and /proc/fs/nfsfs/volumes) Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Trond Myklebust <trond.myklebust@primarydata.com> Cc: Dan Aloni <dan@kernelim.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> [Trond: replace uses of remove_proc_entry() with remove_proc_subtree() as suggested by Al Viro] Cc: stable@vger.kernel.org # 3.4.x : 65b38851a17: NFS: Fix /proc/fs/nfsfs/servers Cc: stable@vger.kernel.org # 3.4.x Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-09-07Linux 3.17-rc4v3.17-rc4Linus Torvalds
2014-09-07Documentation: new page link in SubmittingPatchesSudip Mukherjee
new link for - How to piss off a Linux kernel subsystem maintainer Signed-off-by: Sudip Mukherjee <sudip@vectorindia.org> Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-09-07Documentation: NFS/RDMA: Document separate Kconfig symbolsPaul Bolle
The NFS/RDMA Kconfig symbol was split into separate options for client and server in commit 2e8c12e1b765 ("xprtrdma: add separate Kconfig options for NFSoRDMA client and server support"). Update the documentation to reflect this split. Signed-off-by: Paul Bolle <pebolle@tiscali.nl> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: "J. Bruce Fields" <bfields@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-09-07Documentation: misc-devices: Rename freefall.c from hpfall.c in lis2lv02dMasanari Iida
hpfall.c was renamed to freefall.c in 3.16, but this file still refer to hpfall.c instead of freefall.c Signed-off-by: Masanari Iida <standby24x7@gmail.com> Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-09-07Documentation: i2c: rename variable "register" to "reg"Jose Manuel Alarcon Roldan
The example code provided with the i2c device interface documentation won't compile since it uses the reserved word "register" to name a variable. The compiler fails with this error message: error: expected identifier or '(' before '=' token __u8 register = 0x20; /* Device register to access */ ^ Rename the variable "register" to simply "reg" in the example code. Another couple of typos has been fixed as well. [Change "! =" to "!=".] Signed-off-by: Jose Alarcon Roldan <jose.alarcon.roldan@gmail.com> Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Acked-by: Wolfram Sang <wsa@the-dreams.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-09-07Documentation: seq_file: Document seq_open_private(), seq_release_private()Rob Jones
Despite the fact that these functions have been around for years, they are little used (only 15 uses in 13 files at the preseht time) even though many other files use work-arounds to achieve the same result. By documenting them, hopefully they will become more widely used. Signed-off-by: Rob Jones <rob.jones@codethink.co.uk> Acked-by: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-09-07Merge tag 'pm+acpi-3.17-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI and power management fixes from Rafael Wysocki: "These are regression fixes (ACPI sysfs, ACPI video, suspend test), ACPI cpuidle deadlock fix, missing runtime validation of ACPI _DSD output, a fix and a new CPU ID for the RAPL driver, new blacklist entry for the ACPI EC driver and a couple of trivial cleanups (intel_pstate and generic PM domains). Specifics: - Fix for recently broken test_suspend= command line argument (Rafael Wysocki). - Fixes for regressions related to the ACPI video driver caused by switching the default to native backlight handling in 3.16 from Hans de Goede. - Fix for a sysfs attribute of ACPI device objects that returns stale values sometimes due to the fact that they are cached instead of executing the appropriate method (_SUN) every time (broken in 3.14). From Yasuaki Ishimatsu. - Fix for a deadlock between cpuidle_lock and cpu_hotplug.lock in the ACPI processor driver from Jiri Kosina. - Runtime output validation for the ACPI _DSD device configuration object missing from the support for it that has been introduced recently. From Mika Westerberg. - Fix for an unuseful and misleading RAPL (Running Average Power Limit) domain detection message in the RAPL driver from Jacob Pan. - New Intel Haswell CPU ID for the RAPL driver from Jason Baron. - New Clevo W350etq blacklist entry for the ACPI EC driver from Lan Tianyu. - Cleanup for the intel_pstate driver and the core generic PM domains code from Gabriele Mazzotta and Geert Uytterhoeven" * tag 'pm+acpi-3.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPI / cpuidle: fix deadlock between cpuidle_lock and cpu_hotplug.lock ACPI / scan: not cache _SUN value in struct acpi_device_pnp cpufreq: intel_pstate: Remove unneeded variable powercap / RAPL: change domain detection message powercap / RAPL: add support for CPU model 0x3f PM / domains: Make generic_pm_domain.name const PM / sleep: Fix test_suspend= command line option ACPI / EC: Add msi quirk for Clevo W350etq ACPI / video: Disable native_backlight on HP ENVY 15 Notebook PC ACPI / video: Add a disable_native_backlight quirk ACPI / video: Fix use_native_backlight selection logic ACPICA: ACPI 5.1: Add support for runtime validation of _DSD package.
2014-09-07Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull filesystem fixes from Al Viro: "Several bugfixes (all of them -stable fodder). Alexey's one deals with double mutex_lock() in UFS (apparently, nobody has tried to test "ufs: sb mutex merge + mutex_destroy" on something like file creation/removal on ufs). Mine deal with two kinds of umount bugs, in umount propagation and in handling of automounted submounts, both resulting in bogus transient EBUSY from umount" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: ufs: fix deadlocks introduced by sb mutex merge fix EBUSY on umount() from MNT_SHRINKABLE get rid of propagate_umount() mistakenly treating slaves as busy.
2014-09-07Merge branch 'core-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RCU fix from Ingo Molnar: "A boot hang fix for the offloaded callback RCU model (RCU_NOCB_CPU=y && (TREE_CPU=y || TREE_PREEMPT_RC)) in certain bootup scenarios" * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: rcu: Make nocb leader kthreads process pending callbacks after spawning
2014-09-07Merge branch 'timers-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fixes from Thomas Gleixner: "Three fixlets from the timer departement: - Update the timekeeper before updating vsyscall and pvclock. This fixes the kvm-clock regression reported by Chris and Paolo. - Use the proper irq work interface from NMI. This fixes the regression reported by Catalin and Dave. - Clarify the compat_nanosleep error handling mechanism to avoid future confusion" * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: timekeeping: Update timekeeper before updating vsyscall and pvclock compat: nanosleep: Clarify error handling nohz: Restore NMI safe local irq work for local nohz kick
2014-09-07ufs: fix deadlocks introduced by sb mutex mergeAlexey Khoroshilov
Commit 0244756edc4b ("ufs: sb mutex merge + mutex_destroy") introduces deadlocks in ufs_new_inode() and ufs_free_inode(). Most callers of that functions acqure the mutex by themselves and ufs_{new,free}_inode() do that via lock_ufs(), i.e we have an unavoidable double lock. The patch proposes to resolve the issue by making sure that ufs_{new,free}_inode() are not called with the mutex held. Found by Linux Driver Verification project (linuxtesting.org). Cc: stable@vger.kernel.org # 3.16 Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-09-06Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm fixes from Paolo Bonzini: "A smattering of bug fixes across most architectures" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: powerpc/kvm/cma: Fix panic introduces by signed shift operation KVM: s390/mm: Fix guest storage key corruption in ptep_set_access_flags KVM: s390/mm: Fix storage key corruption during swapping arm/arm64: KVM: Complete WFI/WFE instructions ARM/ARM64: KVM: Nuke Hyp-mode tlbs before enabling MMU KVM: s390/mm: try a cow on read only pages for key ops KVM: s390: Fix user triggerable bug in dead code
2014-09-06Merge tag 'fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc Pull ARM SoC fixes from Kevin Hilman: "Another round of fixes from arm-soc land, which are mostly DT fixes for: - OMAP: handful of DT fixes devices on newly supported hardware - davinci: fix 2nd EDMA channel - ux500: extend previous pinctrl fix to another board - at91: clock registration fixes, compatibility string precision And one more fix for event cleanup in drivers/bus/arm-ccn" * tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: bus: arm-ccn: Move event cleanup routine ARM: at91/dt: rm9200: fix usb clock definition ARM: at91: rm9200: fix clock registration ARM: at91/dt: sam9g20: set at91sam9g20 pllb driver ARM: dts: dra7-evm: Add vtt regulator support ARM: dts: dra7-evm: Fix spi1 mux documentation ARM: dts: am43x-epos-evm: Disable QSPI to prevent conflict with GPMC-NAND ARM: OMAP2+: gpmc: Don't complain if wait pin is used without r/w monitoring ARM: dts: am43xx-epos-evm: Don't use read/write wait monitoring ARM: dts: am437x-gp-evm: Don't use read/write wait monitoring ARM: dts: am437x-gp-evm: Use BCH16 ECC scheme instead of BCH8 ARM: dts: am43x-epos-evm: Use BCH16 ECC scheme instead of BCH8 ARM: dts: am4372: fix USB regs size ARM: dts: am437x-gp: switch i2c0 to 100KHz ARM: dts: dra7-evm: Fix 8th NAND partition's name ARM: dts: dra7-evm: Fix i2c3 pinmux and frequency ARM: ux500: disable msp2 node on Snowball ARM: edma: Fix configuration parsing for SoCs with multiple eDMA3 CC ARM: dts: set 'ti,set-rate-parent' for dpll4_m5x2 clock
2014-09-06Merge tag 'xfs-for-linus-3.17-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs Pull xfs fixes from Dave Chinner: "The fixes all address recently discovered data corruption issues. The original Direct IO issue was discovered by Chris Mason @ Facebook on a production workload which mixed buffered reads with direct reads and writes IO to the same file. The fix for that exposed other issues with page invalidation (exposed by millions of fsx operations) failing due to dirty buffers beyond EOF. Finally, the collapse_range code could also cause problems due to racing writeback changing the extent map while it was being shifted around. The commits for that problem are simple mitigation fixes that prevent the problem from occuring. A more robust fix for 3.18 that addresses the underlying problem is currently being worked on by Brian. Summary of fixes: - a direct IO read/buffered read data corruption - the associated fallout from the DIO data corruption fix - collapse range bugs that are potential data corruption issues" * tag 'xfs-for-linus-3.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs: xfs: trim eofblocks before collapse range xfs: xfs_file_collapse_range is delalloc challenged xfs: don't log inode unless extent shift makes extent modifications xfs: use ranged writeback and invalidation for direct IO xfs: don't zero partial page cache pages during O_DIRECT writes xfs: don't zero partial page cache pages during O_DIRECT writes xfs: don't dirty buffers beyond EOF
2014-09-06Merge tag 'for-linus-20140905' of git://git.infradead.org/linux-mtdLinus Torvalds
Pull mtd fixes from Brian Norris: "Two trivial MTD updates for 3.17-rc4: - a tiny comment tweak, to kill a bunch of DocBook warnings added during the merge window - a small fixup to the OTP routines' error handling" * tag 'for-linus-20140905' of git://git.infradead.org/linux-mtd: mtd: nand: fix DocBook warnings on nand_sdr_timings doc mtd: cfi_cmdset_0002: check return code for get_chip()
2014-09-06timekeeping: Update timekeeper before updating vsyscall and pvclockThomas Gleixner
The update_walltime() code works on the shadow timekeeper to make the seqcount protected region as short as possible. But that update to the shadow timekeeper does not update all timekeeper fields because it's sufficient to do that once before it becomes life. One of these fields is tkr.base_mono. That stays stale in the shadow timekeeper unless an operation happens which copies the real timekeeper to the shadow. The update function is called after the update calls to vsyscall and pvclock. While not correct, it did not cause any problems because none of the invoked update functions used base_mono. commit cbcf2dd3b3d4 (x86: kvm: Make kvm_get_time_and_clockread() nanoseconds based) changed that in the kvm pvclock update function, so the stale mono_base value got used and caused kvm-clock to malfunction. Put the update where it belongs and fix the issue. Reported-by: Chris J Arges <chris.j.arges@canonical.com> Reported-by: Paolo Bonzini <pbonzini@redhat.com> Cc: Gleb Natapov <gleb@kernel.org> Cc: John Stultz <john.stultz@linaro.org> Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1409050000570.3333@nanos Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-09-06compat: nanosleep: Clarify error handlingThomas Gleixner
The error handling in compat_sys_nanosleep() is correct, but completely non obvious. Document it and restrict it to the -ERESTART_RESTARTBLOCK return value for clarity. Reported-by: Kees Cook <keescook@chromium.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-09-05Merge branch 'i2c/for-current' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c bugfixes from Wolfram Sang: "I2C driver bugfixes for the 3.17 release. Details can be found in the commit messages, yet I think this is typical driver stuff" * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: Revert "i2c: rcar: remove spinlock" i2c: at91: add bound checking on SMBus block length bytes i2c: rk3x: fix bug that cause transfer fails in master receive mode i2c: at91: Fix a race condition during signal handling in at91_do_twi_xfer. i2c: mv64xxx: continue probe when clock-frequency is missing i2c: rcar: fix MNR interrupt handling
2014-09-05Merge tag 'at91-fixes' of git://github.com/at91linux/linux-at91 into fixesKevin Hilman
Merge "at91: fixes for 3.17 #1" from Nicols Ferre: First AT91 fixes batch for 3.17: - compatibility string precision - clock registration and USB DT fix for at91rm9200 * tag 'at91-fixes' of git://github.com/at91linux/linux-at91: ARM: at91/dt: rm9200: fix usb clock definition ARM: at91: rm9200: fix clock registration ARM: at91/dt: sam9g20: set at91sam9g20 pllb driver Signed-off-by: Kevin Hilman <khilman@linaro.org>
2014-09-05bus: arm-ccn: Move event cleanup routinePawel Moll
The function cleaning up an initialized event was called from the "event_del" handler, instead of being used as the "destroy" callback. In case of events group allocation this caused NULL pointer dereference (as events are added and deleted multiple times then). Fixed now. Signed-off-by: Pawel Moll <mail@pawelmoll.com> Signed-off-by: Kevin Hilman <khilman@linaro.org>
2014-09-05Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k Pull m68k updates from Geert Uytterhoeven: "Wire up new syscalls getrandom and memfd_create" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k: m68k: Wire up memfd_create m68k: Wire up getrandom
2014-09-05ARM: at91/dt: rm9200: fix usb clock definitionAlexandre Belloni
The atmel,clk-divisors property is taking 4 divisors, if less are provided, the clock registration will fail. Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com> Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
2014-09-05ARM: at91: rm9200: fix clock registrationAlexandre Belloni
Actually register clocks from device tree when using the common clock framework. Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com> Acked-by: Boris Brezillon <boris.brezillon@free-electrons.com> [nicolas.ferre@atmel.com: add at91 to function name] Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
2014-09-05ARM: at91/dt: sam9g20: set at91sam9g20 pllb driverGaël PORTAY
The at91sam9g20 SOC uses its own pllb implementation which is different from the one inherited from at91sam9260 SOC. Signed-off-by: Gaël PORTAY <gael.portay@gmail.com> Acked-by: Boris Brezillon <boris.brezillon@free-electrons.com> Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>