summaryrefslogtreecommitdiffstats
path: root/fs
AgeCommit message (Collapse)Author
2011-11-22debugfs: bugfix: include <linux/io.h> in file.cAlessandro Rubini
The regs32 machinery uses readl. I forgot the mandatory include and the code was not compiling on all archs. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Alessandro Rubini <rubini@gnudd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-22mount_subtree() pointless use-after-freeAl Viro
d'oh... we'd carefully pinned mnt->mnt_sb down, dropped mnt and attempt to grab s_umount on mnt->mnt_sb. The trouble is, *mnt might've been overwritten by now... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-11-22Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: NFS: Revert pnfs ugliness from the generic NFS read code path SUNRPC: destroy freshly allocated transport in case of sockaddr init error NFS: Fix a regression in the referral code nfs: move nfs_file_operations declaration to bottom of file.c (try #2) nfs: when attempting to open a directory, fall back on normal lookup (try #5)
2011-11-22Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: Btrfs: remove free-space-cache.c WARN during log replay Btrfs: sectorsize align offsets in fiemap Btrfs: clear pages dirty for io and set them extent mapped Btrfs: wait on caching if we're loading the free space cache Btrfs: prefix resize related printks with btrfs: btrfs: fix stat blocks accounting Btrfs: avoid unnecessary bitmap search for cluster setup Btrfs: fix to search one more bitmap for cluster setup btrfs: mirror_num should be int, not u64 btrfs: Fix up 32/64-bit compatibility for new ioctls Btrfs: fix barrier flushes Btrfs: fix tree corruption after multi-thread snapshots and inode_cache flush
2011-11-22ext3: NULL dereference in ext3_evict_inode()Dan Carpenter
This is an fsfuzzer bug. ->s_journal is set at the end of ext3_load_journal() but we try to use it in the error handling from ext3_get_journal() while it's still NULL. [ 337.039041] BUG: unable to handle kernel NULL pointer dereference at 0000000000000024 [ 337.040380] IP: [<ffffffff816e6539>] _raw_spin_lock+0x9/0x30 [ 337.041687] PGD 0 [ 337.043118] Oops: 0002 [#1] SMP [ 337.044483] CPU 3 [ 337.044495] Modules linked in: ecb md4 cifs fuse kvm_intel kvm brcmsmac brcmutil crc8 cordic r8169 [last unloaded: scsi_wait_scan] [ 337.047633] [ 337.049259] Pid: 8308, comm: mount Not tainted 3.2.0-rc2-next-20111121+ #24 SAMSUNG ELECTRONICS CO., LTD. RV411/RV511/E3511/S3511 /RV411/RV511/E3511/S3511 [ 337.051064] RIP: 0010:[<ffffffff816e6539>] [<ffffffff816e6539>] _raw_spin_lock+0x9/0x30 [ 337.052879] RSP: 0018:ffff8800b1d11ae8 EFLAGS: 00010282 [ 337.054668] RAX: 0000000000000100 RBX: 0000000000000000 RCX: ffff8800b77c2000 [ 337.056400] RDX: ffff8800a97b5c00 RSI: 0000000000000000 RDI: 0000000000000024 [ 337.058099] RBP: ffff8800b1d11ae8 R08: 6000000000000000 R09: e018000000000000 [ 337.059841] R10: ff67366cc2607c03 R11: 00000000110688e6 R12: 0000000000000000 [ 337.061607] R13: 0000000000000000 R14: 0000000000000000 R15: ffff8800a78f06e8 [ 337.063385] FS: 00007f9d95652800(0000) GS:ffff8800b7180000(0000) knlGS:0000000000000000 [ 337.065110] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 337.066801] CR2: 0000000000000024 CR3: 00000000aef2c000 CR4: 00000000000006e0 [ 337.068581] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 337.070321] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 337.072105] Process mount (pid: 8308, threadinfo ffff8800b1d10000, task ffff8800b1d02be0) [ 337.073800] Stack: [ 337.075487] ffff8800b1d11b08 ffffffff811f48cf ffff88007ac9b158 0000000000000000 [ 337.077255] ffff8800b1d11b38 ffffffff8119405d ffff88007ac9b158 ffff88007ac9b250 [ 337.078851] ffffffff8181bda0 ffffffff8181bda0 ffff8800b1d11b68 ffffffff81131e31 [ 337.080284] Call Trace: [ 337.081706] [<ffffffff811f48cf>] log_start_commit+0x1f/0x40 [ 337.083107] [<ffffffff8119405d>] ext3_evict_inode+0x1fd/0x2a0 [ 337.084490] [<ffffffff81131e31>] evict+0xa1/0x1a0 [ 337.085857] [<ffffffff81132031>] iput+0x101/0x210 [ 337.087220] [<ffffffff811339d1>] iget_failed+0x21/0x30 [ 337.088581] [<ffffffff811905fc>] ext3_iget+0x15c/0x450 [ 337.089936] [<ffffffff8118b0c1>] ? ext3_rsv_window_add+0x81/0x100 [ 337.091284] [<ffffffff816df9a4>] ext3_get_journal+0x15/0xde [ 337.092641] [<ffffffff811a2e9b>] ext3_fill_super+0xf2b/0x1c30 [ 337.093991] [<ffffffff810ddf7d>] ? register_shrinker+0x4d/0x60 [ 337.095332] [<ffffffff8111c112>] mount_bdev+0x1a2/0x1e0 [ 337.096680] [<ffffffff811a1f70>] ? ext3_setup_super+0x210/0x210 [ 337.098026] [<ffffffff8119a770>] ext3_mount+0x10/0x20 [ 337.099362] [<ffffffff8111cbee>] mount_fs+0x3e/0x1b0 [ 337.100759] [<ffffffff810eda1b>] ? __alloc_percpu+0xb/0x10 [ 337.102330] [<ffffffff81135385>] vfs_kern_mount+0x65/0xc0 [ 337.103889] [<ffffffff8113611f>] do_kern_mount+0x4f/0x100 [ 337.105442] [<ffffffff811378fc>] do_mount+0x19c/0x890 [ 337.106989] [<ffffffff810e8456>] ? memdup_user+0x46/0x90 [ 337.108572] [<ffffffff810e84f3>] ? strndup_user+0x53/0x70 [ 337.110114] [<ffffffff811383fb>] sys_mount+0x8b/0xe0 [ 337.111617] [<ffffffff816ed93b>] system_call_fastpath+0x16/0x1b [ 337.113133] Code: 38 c2 74 0f 66 0f 1f 44 00 00 f3 90 0f b6 03 38 c2 75 f7 48 83 c4 08 5b 5d c3 0f 1f 84 00 00 00 00 00 55 b8 00 01 00 00 48 89 e5 <f0> 66 0f c1 07 0f b6 d4 38 c2 74 0c 0f 1f 00 f3 90 0f b6 07 38 [ 337.116588] RIP [<ffffffff816e6539>] _raw_spin_lock+0x9/0x30 [ 337.118260] RSP <ffff8800b1d11ae8> [ 337.119998] CR2: 0000000000000024 [ 337.188701] ---[ end trace c36d790becac1615 ]--- Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Jan Kara <jack@suse.cz>
2011-11-22GFS2: Fix multi-block allocationSteven Whitehouse
Clean up gfs2_alloc_blocks so that it takes the full extent length rather than just the number of non-inode blocks as an argument. That will only make a difference in the inode allocation case for now. Also, this fixes the extent length handling around gfs2_alloc_extent() so that multi block allocations will work again. The rd_last_alloc block is set to the final block in the allocated extent (as per the update to i_goal, but referenced to a different start point). This also removes the dinode argument to rgblk_search() which is no longer used. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-11-22GFS2: decouple quota allocations from block allocationsBob Peterson
This patch separates the code pertaining to allocations into two parts: quota-related information and block reservations. This patch also moves all the block reservation structure allocations to function gfs2_inplace_reserve to simplify the code, and moves the frees to function gfs2_inplace_release. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-11-22UBIFS: Use kmemdup rather than duplicating its implementationThomas Meyer
The semantic patch that makes this change is available in scripts/coccinelle/api/memdup.cocci. Signed-off-by: Thomas Meyer <thomas@m3y3r.de> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
2011-11-22jbd: clear revoked flag on buffers before a new transaction startedYongqiang Yang
Currently, we clear revoked flag only when a block is reused. However, this can tigger a false journal error. Consider a situation when a block is used as a meta block and is deleted(revoked) in ordered mode, then the block is allocated as a data block to a file. At this moment, user changes the file's journal mode from ordered to journaled and truncates the file. The block will be considered re-revoked by journal because it has revoked flag still pending from the last transaction and an assertion triggers. We fix the problem by keeping the revoked status more uptodate - we clear revoked flag when switching revoke tables to reflect there is no revoked buffers in current transaction any more. Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2011-11-21freezer: implement and use kthread_freezable_should_stop()Tejun Heo
Writeback and thinkpad_acpi have been using thaw_process() to prevent deadlock between the freezer and kthread_stop(); unfortunately, this is inherently racy - nothing prevents freezing from happening between thaw_process() and kthread_stop(). This patch implements kthread_freezable_should_stop() which enters refrigerator if necessary but is guaranteed to return if kthread_stop() is invoked. Both thaw_process() users are converted to use the new function. Note that this deadlock condition exists for many of freezable kthreads. They need to be converted to use the new should_stop or freezable workqueue. Tested with synthetic test case. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Henrique de Moraes Holschuh <ibm-acpi@hmh.eng.br> Cc: Jens Axboe <axboe@kernel.dk> Cc: Oleg Nesterov <oleg@redhat.com>
2011-11-21freezer: unexport refrigerator() and update try_to_freeze() slightlyTejun Heo
There is no reason to export two functions for entering the refrigerator. Calling refrigerator() instead of try_to_freeze() doesn't save anything noticeable or removes any race condition. * Rename refrigerator() to __refrigerator() and make it return bool indicating whether it scheduled out for freezing. * Update try_to_freeze() to return bool and relay the return value of __refrigerator() if freezing(). * Convert all refrigerator() users to try_to_freeze(). * Update documentation accordingly. * While at it, add might_sleep() to try_to_freeze(). Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Samuel Ortiz <samuel@sortiz.org> Cc: Chris Mason <chris.mason@oracle.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jan Kara <jack@suse.cz> Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp> Cc: Christoph Hellwig <hch@infradead.org>
2011-11-21Merge branch 'dev' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4Linus Torvalds
* 'dev' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: fix up a undefined error in ext4_free_blocks in debugging code ext4: add blk_finish_plug in error case of writepages. ext4: Remove kernel_lock annotations ext4: ignore journalled data options on remount if fs has no journal
2011-11-21Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: libceph: Allocate larger oid buffer in request msgs ceph: initialize root dentry ceph: fix iput race when queueing inode work
2011-11-21Btrfs: remove free-space-cache.c WARN during log replayChris Mason
The log replay code only partially loads block groups, since the block group caching code is able to detect and deal with extents the logging code has pinned down. While the logging code is pinning down block groups, there is a bogus WARN_ON we're hitting if the code wasn't able to find an extent in the cache. This commit removes the warning because it can happen any time there isn't a valid free space cache for that block group. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-11-21ext4: fix up a undefined error in ext4_free_blocks in debugging codeYongqiang Yang
sbi is not defined, so let ext4_free_blocks use EXT4_SB(sb) instead when EXT4FS_DEBUG is defined. Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
2011-11-21GFS2: split function rgblk_searchBob Peterson
This patch splits function rgblk_search into a function that finds blocks to allocate (rgblk_search) and a function that assigns those blocks (gfs2_alloc_extent). Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@rehat.com>
2011-11-21GFS2: Fix up "off by one" in the previous patchSteven Whitehouse
The trace point should take extlen and not *ndata as the extent length. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-11-21GFS2: move toward a generic multi-block allocatorBob Peterson
This patch is a revision of the one I previously posted. I tried to integrate all the suggestions Steve gave. The purpose of the patch is to change function gfs2_alloc_block (allocate either a dinode block or an extent of data blocks) to a more generic gfs2_alloc_blocks function that can allocate both a dinode _and_ an extent of data blocks in the same call. This will ultimately help us create a multi-block reservation scheme to reduce file fragmentation. This patch moves more toward a generic multi-block allocator that takes a pointer to the number of data blocks to allocate, plus whether or not to allocate a dinode. In theory, it could be called to allocate (1) a single dinode block, (2) a group of one or more data blocks, or (3) a dinode plus several data blocks. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-11-21GFS2: O_(D)SYNC support for fallocateSteven Whitehouse
Add sync of metadata after fallocate for O_SYNC files to ensure that we meet expectations for everything being on disk in this case. Unfortunately, the offset and len parameters are modified during the course of the fallocate function, so I've had to add a couple of new variables to call generic_write_sync() at the end. I know that potentially this will sync data as well within the range, but I think that is a fairly harmless side-effect overall, since we would not normally expect there to be any dirty data within the range in question. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Benjamin Marzinski <bmarzins@redhat.com>
2011-11-20VFS: Log the fact that we've given ELOOP rather than creating a loopDavid Howells
To prevent an NFS server from being used to create a directory loop in an NFS superblock on the client, the following patch was committed: commit 1836750115f20b774e55c032a3893e8c5bdf41ed Author: Al Viro <viro@zeniv.linux.org.uk> Date: Tue Jul 12 21:42:24 2011 -0400 Subject: fix loop checks in d_materialise_unique() This causes ELOOP to be reported to anyone trying to access the dentry that would otherwise cause the kernel to complete the loop. However, no indication is given to the caller as to why an operation that ought to work doesn't. The fault is with the kernel, which doesn't want to try and solve the problem as it gets horrendously messy if there's another mountpoint somewhere in the trees being spliced that can't be moved[*]. [*] The real problem is that we don't handle the excision of a subtree that gets moved _out_ of what we can see. This can happen on the server where a directory is merely moved between two other dirs on the same filesystem, but where destination dir is not accessible by the client. So, given the choice to return ELOOP rather than trying to reconfigure the dentry tree, we should give the caller some indication of why they aren't being allowed to make what should be a legitimate request and log a message. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Sachin Prabhu <sprabhu@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-11-20qnx4fs: Use kmemdup rather than duplicating its implementationThomas Meyer
The semantic patch that makes this change is available in scripts/coccinelle/api/memdup.cocci. Signed-off-by: Thomas Meyer <thomas@m3y3r.de> Signed-off-by: Anders Larsen <al@alarsen.net> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2011-11-20Btrfs: sectorsize align offsets in fiemapJosef Bacik
We've been hitting BUG()'s in btrfs_cont_expand and btrfs_fallocate and anywhere else that calls btrfs_get_extent while running xfstests 13 in a loop. This is because fiemap is calling btrfs_get_extent with non-sectorsize aligned offsets, which will end up adding mappings that are not sectorsize aligned, which will cause problems in some cases for subsequent calls to btrfs_get_extent for similar areas that are sectorsize aligned. With this patch I ran xfstests 13 in a loop for a couple of hours and didn't hit the problem that I could previously hit in at most 20 minutes. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
2011-11-20Btrfs: clear pages dirty for io and set them extent mappedJosef Bacik
When doing the io_ctl helpers to clean up the free space cache stuff I stopped using our normal prepare_pages stuff, which means I of course forgot to do things like set the pages extent mapped, which will cause us all sorts of wonderful propblems. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
2011-11-20Btrfs: wait on caching if we're loading the free space cacheJosef Bacik
We've been hitting panics when running xfstest 13 in a loop for long periods of time. And actually this problem has always existed so we've been hitting these things randomly for a while. Basically what happens is we get a thread coming into the allocator and reading the space cache off of disk and adding the entries to the free space cache as we go. Then we get another thread that comes in and tries to allocate from that block group. Since block_group->cached != BTRFS_CACHE_NO it goes ahead and tries to do the allocation. We do this because if we're doing the old slow way of caching we don't want to hold people up and wait for everything to finish. The problem with this is we could end up discarding the space cache at some arbitrary point in the future, which means we could very well end up allocating space that is either bad, or when the real caching happens it could end up thinking the space isn't in use when it really is and cause all sorts of other problems. The solution is to add a new flag to indicate we are loading the free space cache from disk, and always try to cache the block group if cache->cached != BTRFS_CACHE_FINISHED. That way if we are loading the space cache anybody else who tries to allocate from the block group will have to wait until it's finished to make sure it completes successfully. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
2011-11-20Btrfs: prefix resize related printks with btrfs:Arnd Hannemann
For the user it is confusing to find something like: [10197.627710] new size for /dev/mapper/vg0-usr_share is 3221225472 in kernel log, because it doesn't point directly to btrfs. This patch prefixes those messages with "btrfs:" like other btrfs related printks. Signed-off-by: Arnd Hannemann <arnd@arndnet.de> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-11-20btrfs: fix stat blocks accountingDavid Sterba
Round inode bytes and delalloc bytes up to real blocksize before converting to sector size. Otherwise eg. files smaller than 512 are reported with zero blocks due to incorrect rounding. Signed-off-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-11-20Btrfs: avoid unnecessary bitmap search for cluster setupLi Zefan
setup_cluster_no_bitmap() searches all the extents and bitmaps starting from offset. Therefore if it returns -ENOSPC, all the bitmaps starting from offset are in the bitmaps list, so it's sufficient to search from this list in setup_cluser_bitmap(). Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-11-20Btrfs: fix to search one more bitmap for cluster setupLi Zefan
Suppose there are two bitmaps [0, 256], [256, 512] and one extent [100, 120] in the free space cache, and we want to setup a cluster with offset=100, bytes=50. In this case, there will be only one bitmap [256, 512] in the temporary bitmaps list, and then setup_cluster_bitmap() won't search bitmap [0, 256]. The cause is, the list is constructed in setup_cluster_no_bitmap(), and only bitmaps with bitmap_entry->offset >= offset will be added into the list, and the very bitmap that convers offset has bitmap_entry->offset <= offset. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-11-20btrfs: mirror_num should be int, not u64Jan Schmidt
My previous patch introduced some u64 for failed_mirror variables, this one makes it consistent again. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-11-20btrfs: Fix up 32/64-bit compatibility for new ioctlsJeff Mahoney
This patch casts to unsigned long before casting to a pointer and fixes the following warnings: fs/btrfs/extent_io.c:2289:20: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast] fs/btrfs/ioctl.c:2933:37: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast] fs/btrfs/ioctl.c:2937:21: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] fs/btrfs/ioctl.c:3020:21: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] fs/btrfs/scrub.c:275:4: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] fs/btrfs/backref.c:686:27: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast] Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-11-20Btrfs: fix barrier flushesChris Mason
When btrfs is writing the super blocks, it send barrier flushes to make sure writeback caching drives get all the metadata on disk in the right order. But, we have two bugs in the way these are sent down. When doing full commits (not via the tree log), we are sending the barrier down before the last super when it should be going down before the first. In multi-device setups, we should be waiting for the barriers to complete on all devices before writing any of the supers. Both of these bugs can cause corruptions on power failures. We fix it with some new code to send down empty barriers to all devices before writing the first super. Alexandre Oliva found the multi-device bug. Arne Jansen did the async barrier loop. Signed-off-by: Chris Mason <chris.mason@oracle.com> Reported-by: Alexandre Oliva <oliva@lsd.ic.unicamp.br>
2011-11-19minixfs: kill manual hweight(), simplifyAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-11-19fs/minix: Verify bitmap block counts before mountingJosh Boyer
Newer versions of MINIX can create filesystems that allocate an extra bitmap block. Mounting of this succeeds, but doing a statfs call will result in an oops in count_free because of a negative number being used for the bh index. Avoid this by verifying the number of allocated blocks at mount time, erroring out if there are not enough and make statfs ignore the extras if there are too many. This fixes https://bugzilla.kernel.org/show_bug.cgi?id=18792 Signed-off-by: Josh Boyer <jwboyer@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-11-19Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: new helper: mount_subtree() switch create_mnt_ns() to saner calling conventions, fix double mntput() in nfs btrfs: fix double mntput() in mount_subvol()
2011-11-19Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfsLinus Torvalds
* 'for-linus' of git://oss.sgi.com/xfs/xfs: MAINTAINERS: update XFS maintainer entry xfs: use doalloc flag in xfs_qm_dqattach_one()
2011-11-18debugfs: print_regs32: make regs array a const pointerAlessandro Rubini
Signed-off-by: Alessandro Rubini <rubini@gnudd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-18pstore: gracefully handle NULL pstore_info functionsKees Cook
If a pstore backend doesn't want to support various portions of the pstore interface, it can just leave those functions NULL instead of creating no-op stubs. Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Tony Luck <tony.luck@intel.com>
2011-11-18NLS: improve UTF8 -> UTF16 string conversion routineAlan Stern
The utf8s_to_utf16s conversion routine needs to be improved. Unlike its utf16s_to_utf8s sibling, it doesn't accept arguments specifying the maximum length of the output buffer or the endianness of its 16-bit output. This patch (as1501) adds the two missing arguments, and adjusts the only two places in the kernel where the function is called. A follow-on patch will add a third caller that does utilize the new capabilities. The two conversion routines are still annoyingly inconsistent in the way they handle invalid byte combinations. But that's a subject for a different patch. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> CC: Clemens Ladisch <clemens@ladisch.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-18debugfs: add tools to printk 32-bit registersAlessandro Rubini
Some debugfs file I deal with are mostly blocks of registers, i.e. lines of the form "<name> = 0x<value>". Some files are only registers, some include registers blocks among other material. This patch introduces data structures and functions to deal with both cases. I expect more users of this over time. Signed-off-by: Alessandro Rubini <rubini@gnudd.com> Acked-by: Giancarlo Asnaghi <giancarlo.asnaghi@st.com> Cc: Felipe Balbi <balbi@ti.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-18dlm: convert rsb list to rb_treeBob Peterson
Change the linked lists to rb_tree's in the rsb hash table to speed up searches. Slow rsb searches were having a large impact on gfs2 performance due to the large number of dlm locks gfs2 uses. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2011-11-18Merge branch 'for-linus' of git://git.kernel.dk/linux-blockLinus Torvalds
* 'for-linus' of git://git.kernel.dk/linux-block: block: add missed trace_block_plug paride: fix potential information leak in pg_read() bio: change some signed vars to unsigned block: avoid unnecessary plug list flush cciss: auto engage SCSI mid layer at driver load time loop: cleanup set_status interface include/linux/bio.h: use a static inline function for bio_integrity_clone() loop: prevent information leak after failed read block: Always check length of all iov entries in blk_rq_map_user_iov() The Windows driver .inf disables ASPM on all cciss devices. Do the same. backing-dev: ensure wakeup_timer is deleted block: Revert "[SCSI] genhd: add a new attribute "alias" in gendisk"
2011-11-18GFS2: remove vestigial al_allocedBob Peterson
This patch removes the vestigial variable al_alloced from the gfs2_alloc structure. This is another baby step toward multi-block reservations. My next planned step is to decouple the quota variables from the gfs2_alloc structure so we can use a different method for allocations. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-11-17pstore: pass reason to backend write callbackKees Cook
This allows a backend to filter on the dmesg reason as well as the pstore reason. When ramoops is switched to pstore, this is needed since it has no interest in storing non-crash dmesg details. Drop pstore_write() as it has no users, and handling the "reason" here has no obviously correct value. Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Tony Luck <tony.luck@intel.com>
2011-11-17pstore: pass allocated memory region back to callerKees Cook
The buf_lock cannot be held while populating the inodes, so make the backend pass forward an allocated and filled buffer instead. This solves the following backtrace. The effect is that "buf" is only ever used to notify the backends that something was written to it, and shouldn't be used in the read path. To replace the buf_lock during the read path, isolate the open/read/close loop with a separate mutex to maintain serialized access to the backend. Note that is is up to the pstore backend to cope if the (*write)() path is called in the middle of the read path. [ 59.691019] BUG: sleeping function called from invalid context at .../mm/slub.c:847 [ 59.691019] in_atomic(): 0, irqs_disabled(): 1, pid: 1819, name: mount [ 59.691019] Pid: 1819, comm: mount Not tainted 3.0.8 #1 [ 59.691019] Call Trace: [ 59.691019] [<810252d5>] __might_sleep+0xc3/0xca [ 59.691019] [<810a26e6>] kmem_cache_alloc+0x32/0xf3 [ 59.691019] [<810b53ac>] ? __d_lookup_rcu+0x6f/0xf4 [ 59.691019] [<810b68b1>] alloc_inode+0x2a/0x64 [ 59.691019] [<810b6903>] new_inode+0x18/0x43 [ 59.691019] [<81142447>] pstore_get_inode.isra.1+0x11/0x98 [ 59.691019] [<81142623>] pstore_mkfile+0xae/0x26f [ 59.691019] [<810a2a66>] ? kmem_cache_free+0x19/0xb1 [ 59.691019] [<8116c821>] ? ida_get_new_above+0x140/0x158 [ 59.691019] [<811708ea>] ? __init_rwsem+0x1e/0x2c [ 59.691019] [<810b67e8>] ? inode_init_always+0x111/0x1b0 [ 59.691019] [<8102127e>] ? should_resched+0xd/0x27 [ 59.691019] [<8137977f>] ? _cond_resched+0xd/0x21 [ 59.691019] [<81142abf>] pstore_get_records+0x52/0xa7 [ 59.691019] [<8114254b>] pstore_fill_super+0x7d/0x91 [ 59.691019] [<810a7ff5>] mount_single+0x46/0x82 [ 59.691019] [<8114231a>] pstore_mount+0x15/0x17 [ 59.691019] [<811424ce>] ? pstore_get_inode.isra.1+0x98/0x98 [ 59.691019] [<810a8199>] mount_fs+0x5a/0x12d [ 59.691019] [<810b9174>] ? alloc_vfsmnt+0xa4/0x14a [ 59.691019] [<810b9474>] vfs_kern_mount+0x4f/0x7d [ 59.691019] [<810b9d7e>] do_kern_mount+0x34/0xb2 [ 59.691019] [<810bb15f>] do_mount+0x5fc/0x64a [ 59.691019] [<810912fb>] ? strndup_user+0x2e/0x3f [ 59.691019] [<810bb3cb>] sys_mount+0x66/0x99 [ 59.691019] [<8137b537>] sysenter_do_call+0x12/0x26 Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Tony Luck <tony.luck@intel.com>
2011-11-17ocfs2: Use filemap_write_and_wait() instead of write_inode_now()Jan Kara
Since ocfs2 has no ->write_inode method, there's no point in calling write_inode_now() from ocfs2_cleanup_delete_inode(). Use filemap_write_and_wait() instead. This helps us to cleanup inode writing interfaces... Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Joel Becker <jlbec@evilplan.org>
2011-11-17ocfs2: honor O_(D)SYNC flag in fallocateMark Fasheh
We need to sync the transaction which updates i_size if the file is marked as needing sync semantics. Signed-off-by: Mark Fasheh <mfasheh@suse.de> Signed-off-by: Joel Becker <jlbec@evilplan.org>
2011-11-17ocfs2: Add a missing journal credit in ocfs2_link_credits() -v2Xiaowei.Hu
With indexed_dir enabled, ocfs2 maintains a list of dirblocks having space. The credit calculation in ocfs2_link_credits() did not correctly account for adding an entry that exactly fills a dirblock that triggers removing that dirblock by changing the pointer in the previous block in the list. The credit calculation did not account for that previous block. To expose, do: mkfs.ocfs2 -b 512 -M local /dev/sdX mount /dev/sdX /ocfs2 mkdir /ocfs2/linkdir touch /ocfs2/linkdir/file1 for i in `seq 1 29` ; do link /ocfs2/linkdir/file1 /ocfs2/linkdir/linklinklinklinklinklink$i; done rm -f /ocfs2/linkdir/linklinklinklinklinklink10 sleep 8 link /ocfs2/linkdir/file1 /ocfs2/linkdir/linklinklinklinklinklinkaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa Note: The link names have been crafted for a 512 byte blocksize. Reproducing with a larger blocksize will require longer (or more) links. The sleep is important. We want jbd2 to commit the transaction so that the missing block does not piggy back on account of the previous transaction. Signed-off-by: XiaoweiHu <xiaowei.hu at oracle.com> Reviewed-by: WengangWang <wen.gang.wang at oracle.com> Reviewed-by: Sunil.Mushran <sunil.mushran at oracle.com> Signed-off-by: Joel Becker <jlbec@evilplan.org>
2011-11-17ocfs2: send correct UUID to cleancache initializationDan Magenheimer
ocfs2: Fix cleancache initialization call to correctly pass uuid As reported by Steven Whitehouse in https://lkml.org/lkml/2011/5/27/221 the ocfs2 volume UUID is incorrectly passed to cleancache. As a result, shared-ephemeral tmem pools will not actually be created; instead they will be private (unshared) which misses out on a major benefit of tmem. Reported-by: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com> Signed-off-by: Joel Becker <jlbec@evilplan.org>
2011-11-17ocfs2: Commit transactions in error cases -v2Wengang Wang
There are three cases found that in error cases, journal transactions are not committed nor aborted. We should take care of these case by committing the transactions. Otherwise, there would left a journal handle which will lead to , in same process context, the comming ocfs2_start_trans() gets wrong credits. Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com> Signed-off-by: Joel Becker <jlbec@evilplan.org>
2011-11-17ocfs2: make direntry invalid when deleting itWengang Wang
When we deleting a direntry from a directory, if it's the first in a block we invalid it by setting inode to 0; otherwise, we merge the deleted one to the prior and contiguous direntry. And we don't truncate directories. There is a problem for the later case since inode is not set to 0. This problem happens when the caller passes a file position as parameter to ocfs2_dir_foreach_blk(). If the position happens to point to a stale(not the first, deleted in betweens of ocfs2_dir_foreach_blk()s) direntry, we are not able to recognize its staleness. So that we treat it as a live one wrongly. The fix is to set inode to 0 in both cases indicating the direntry is stale. This won't introduce additional IOs. Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com> Signed-off-by: Joel Becker <jlbec@evilplan.org>