summaryrefslogtreecommitdiffstats
path: root/fs
AgeCommit message (Collapse)Author
2011-10-19Btrfs: use the transactions block_rsv for the csum rootJosef Bacik
The alloc warnings everybody has been seeing is because we have been reserving space for csums, but we weren't actually using that space. So make get_block_rsv() return the trans->block_rsv if we're modifying the csum root. Also set the trans->block_rsv to NULL so that if we modify the csum root when running delayed ref's that comes out of the global reserve like it's supposed to. With this patch I'm not seeing those alloc warnings anymore. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
2011-10-19Btrfs: handle enospc accounting for free space inodesJosef Bacik
Since free space inodes now use normal checksumming we need to make sure to account for their metadata use. So reserve metadata space, and then if we fail to write out the metadata we can just release it, otherwise it will be freed up when the io completes. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
2011-10-19Btrfs: put the block group cache after we commit the superJosef Bacik
In moving some enospc stuff around I noticed that when we unmount we are often evicting the free space cache inodes before we do our last commit. This isn't bad, but it makes us constantly have to re-read the inodes back. So instead don't evict the cache until after we do our last commit, this will make things a little less crappy and makes a future enospc change work properly. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
2011-10-19Btrfs: set truncate block rsv's sizeJosef Bacik
While debugging a different issue I noticed that we were always reserving space when we tried to use our truncate block rsv's. This is because they didn't have a ->size value, so use_block_rsv just assumes there is nothing reserved and it does a reserve_metadata_bytes. This is because btrfs_check_block_rsv() doesn't actually add to the size of the block rsv. That seems to be the right thing to do so set ->size to the minimum truncate size we need, since we will always only refill to that size anyway, and this way everything works out correctly. Signed-off-by: Josef Bacik <josef@redhat.com>
2011-10-19Btrfs: don't increase the block_rsv's size when emergency allocating spaceJosef Bacik
If we have to emergency reserve space we need to not increase the block_rsv size, otherwise we'll leak space. Take for instance delalloc, say we reserve 4k, and we use that 4k, and then we have to emergency allocate another 4k, we bump the size up to 8k, however we've only accounted for 4k in reservations in all of our supporting logic, so we'll go to free the 4k and end up having a size of 4k, which will cause us to later not free as much space. I saw this doing testing where I wasn't reserving enough space for something but was still leaking space, very frustrating. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
2011-10-19Btrfs: fix space leak when we fail to make an allocationJosef Bacik
When changing back to using a spin_lock to protect the extent counters I decided that since we would only be dropping our original extent, it was ok to just drop the extent and return. However since somebody else could have come in and done a reservation, we need to do the normal song and dance to clear the reservation out properly. So calculate how much space we need to free, and then subtract what we just attempted to reserve. If it's more then we know we need to drop those bytes from the delalloc block rsv. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
2011-10-19Btrfs: fix call to btrfs_search_slot in free space cacheJosef Bacik
We are setting ins_len to 1 even tho we are just modifying an item that should be there already. This may cause the search stuff to split nodes on the way down needelessly. Set this to 0 since we aren't inserting anything. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
2011-10-19Btrfs: allow callers to specify if flushing can occur for btrfs_block_rsv_checkJosef Bacik
If you run xfstest 224 it you will get lots of messages about not being able to delete inodes and that they will be cleaned up next mount. This is because btrfs_block_rsv_check was not calling reserve_metadata_bytes with the ability to flush, so if there was not enough space, it simply failed. But in truncate and evict case we could easily flush space to try and get enough space to do our work, so make btrfs_block_rsv_check take a flush argument to pass down to reserve_metadata_bytes. Now xfstests 224 runs fine without all those complaints. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
2011-10-19Btrfs: reduce the amount of space needed for truncatesJosef Bacik
With btrfs_truncate_inode_items we always return if we have to go to another leaf, which makes us do our reservation again. This means we will only ever modify one leaf at a time, so we only need 1 items worth of slack space. Also, since we are deleting we will not be creating nodes as we go down, if anything we'll be free'ing them as we merge them together, so make a different calculation for truncate which will only have the worst case useage of COW'ing the entire path down to the leaf. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
2011-10-19Btrfs: only reserve space in fallocate if we have to do a preallocateJosef Bacik
Lukas found a problem where if he tries to fallocate over the same region twice and the first fallocate took up all the space we would fail with ENOSPC. This is because we reserve the total space we want to use for fallocate, regardless of wether or not we will have to actually preallocate. So instead move the check into the loop where we actually have to do the preallocate. Thanks, Tested-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: Josef Bacik <josef@redhat.com>
2011-10-19Btrfs: kill btrfs_truncate_reserve_metadataJosef Bacik
Since we've optimized the truncate path, we no longer require this function. Signed-off-by: Josef Bacik <josef@redhat.com>
2011-10-19Btrfs: optimize how we account for space in truncateJosef Bacik
Currently we're starting and stopping a transaction for no real reason, so kill that and just reserve enough space as if we can truncate all in one transaction. Also use btrfs_block_rsv_check() for our reserve to minimize the amount of space we may have to allocate for our slack space. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
2011-10-19Btrfs: don't try to commit in btrfs_block_rsv_checkJosef Bacik
We will try and reserve metadata bytes in btrfs_block_rsv_check and if we cannot because we have a transaction open it will return EAGAIN, so we do not need to try and commit the transaction again. Signed-off-by: Josef Bacik <josef@redhat.com>
2011-10-19Btrfs: kill unused parts of block_rsvJosef Bacik
The priority and refill_used flags are not used anymore, and neither is the usage counter, so just remove them from btrfs_block_rsv. Signed-off-by: Josef Bacik <josef@redhat.com>
2011-10-19Btrfs: ratelimit the generation printk for the free space cacheJosef Bacik
A user reported getting spammed when moving to 3.0 by this message. Since we switched to the normal checksumming infrastructure all old free space caches will be wrong and need to be regenerated so people are likely to see this message a lot, so ratelimit it so it doesn't fill up their logs and freak them out. Thanks, Reported-by: Andrew Lutomirski <luto@mit.edu> Signed-off-by: Josef Bacik <josef@redhat.com>
2011-10-19Btrfs: fix how we reserve space for deleting inodesJosef Bacik
I converted btrfs_truncate to do sane reservations for truncate, but didn't convert btrfs_evict_inode. Basically we need to save the orphan_rsv for deleting the orphan item, and do normal reservations for our truncate. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
2011-10-19Btrfs: kill the durable block rsv stuffJosef Bacik
This is confusing code and isn't used by anything anymore, so delete it. Signed-off-by: Josef Bacik <josef@redhat.com>
2011-10-19Btrfs: kill the orphan space calculation for snapshotsJosef Bacik
This patch kills off the calculation for the amount of space needed for the orphan operations during a snapshot. The thing is we only do snapshots on commit, so any space that is in the block_rsv->freed[] isn't going to be in the new snapshot anyway, so there isn't any reason to require that space to be reserved for the snapshot to occur. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
2011-10-19Btrfs: calculate checksum space correctlyJosef Bacik
We have not been reserving enough space for checksums. We were just reserving bytes for the checksum items themselves, we were not taking into account having to cow the tree and such. This patch adds a csum_bytes counter to the inode for keeping track of the number of bytes outstanding we have for checksums. Then we calculate how many leaves would be required for the checksums we are given and use that to reserve space. This adds a significant amount of bytes to our reservations, but we will handle this later. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
2011-10-19Btrfs: skip looking for delalloc if we don't have ->fill_delallocJosef Bacik
We always look for delalloc bytes in our io_tree so we can fill in delalloc. This is fine in most cases, but if we're writing out the btree_inode this is just a superfluous tree search on the io_tree, and if we have a lot of metadata dirty this could be an expensive check. So instead check to see if our io_tree has a ->fill_delalloc op, and if not don't even bother doing the lookup. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
2011-10-19Btrfs: use bytes_may_use for all ENOSPC reservationsJosef Bacik
We have been using bytes_reserved for metadata reservations, which is wrong since we use that to keep track of outstanding reservations from the allocator. This resulted in us doing a lot of silly things to make sure we don't allocate a bunch of metadata chunks since we never had a real view of how much space was actually in use by metadata. This passes Arne's enospc test and xfstests as well as my own enospc tests. Hopefully this will get us moving in the right direction. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
2011-10-19Btrfs: fix how we mount subvol=<whatever>Josef Bacik
We've only been able to mount with subvol=<whatever> where whatever was a subvol within whatever root we had as the default. This allows us to mount -o subvol=path/to/subvol/you/want relative from the normal fs_tree root. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
2011-10-19Btrfs: use d_obtain_alias when mounting subvol/subvolidJosef Bacik
Currently what we do is just wrong. We either 1) Alloc a new "root" dentry with sb->s_root as it's parent which is just wrong as we could walk into this subvol later on via another path and hilarity could ensue. Also we don't check the return value of d_splice_alias which isn't good either. or 2) Do a d_find_alias() which we could have lost our dentry from cache at this point and found nothing. So use d_obtain_alias(). In the case that we already have the inode/dentry in cache we will get the correct dentry. If not we will get a disconnected dentry tree so if we walk into it later on everything will be connected up properly. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
2011-10-19Btrfs: kill reserved_bytes in inodeJosef Bacik
reserved_bytes is not used for anything in the inode, remove it. Signed-off-by: Josef Bacik <josef@redhat.com>
2011-10-19Btrfs: move stuff around in btrfs_inode to get better packingJosef Bacik
Moving things around to give us better packing in the btrfs_inode. This reduces the size of our inode by 8 bytes. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
2011-10-19nfsd4: implement new 4.1 open reclaim typesJ. Bruce Fields
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-10-19nfsd4: remove unneeded CLAIM_DELEGATE_CUR workaroundJ. Bruce Fields
0c12eaffdf09466f36a9ffe970dda8f4aeb6efc0 "nfsd: don't break lease on CLAIM_DELEGATE_CUR" was a temporary workaround for a problem fixed properly in the vfs layer by 778fc546f749c588aa2f6cd50215d2715c374252 "locks: fix tracking of inprogress lease breaks", so we can revert that change (but keeping some minor cleanup from that commit). Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-10-19jffs2: add compr=lzo and compr=zlib optionsAndres Salomon
..to allow forcing of either compression scheme. This will override compiled-in defaults. jffs2_compress is reworked a bit, as the lzo/zlib override shares lots of code w/ the PRIORITY mode. v2: update show_options accordingly. Signed-off-by: Andres Salomon <dilinger@queued.net> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@intel.com>
2011-10-19jffs2: implement mount option parsing and compression overridingAndres Salomon
Currently jffs2 has compile-time constants (and .config options) controlling whether or not the various compression/decompression drivers are built in and enabled. This is fine for embedded systems, but it clashes with distribution kernels. Distro kernels tend to turn on everything; this causes OpenFirmware to fall over, as it understands ZLIB-compressed inodes. Booting a kernel that has LZO compression enabled, writing to the boot partition, and then rebooting causes OFW to fail to read the kernel from the filesystem. This is because LZO compression has priority when writing new data to jffs2, if LZO is enabled. This patch adds mount option parsing, and a single supported option ("compr=none"). This adds the flexibility of being able to specify which compressor overrides on a per-superblock basis. For now, we can simply disable compression; additional flexibility coming soon. v2: kill some printks, and implement show_options as suggested by Artem Bityutskiy. Signed-off-by: Andres Salomon <dilinger@queued.net> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@intel.com>
2011-10-19block: make gendisk hold a reference to its queueTejun Heo
The following command sequence triggers an oops. # mount /dev/sdb1 /mnt # echo 1 > /sys/class/scsi_device/0\:0\:1\:0/device/delete # umount /mnt general protection fault: 0000 [#1] PREEMPT SMP CPU 2 Modules linked in: Pid: 791, comm: umount Not tainted 3.1.0-rc3-work+ #8 Bochs Bochs RIP: 0010:[<ffffffff810d0879>] [<ffffffff810d0879>] __lock_acquire+0x389/0x1d60 ... Call Trace: [<ffffffff810d2845>] lock_acquire+0x95/0x140 [<ffffffff81aed87b>] _raw_spin_lock+0x3b/0x50 [<ffffffff811573bc>] bdi_lock_two+0x5c/0x70 [<ffffffff811c2f6c>] bdev_inode_switch_bdi+0x4c/0xf0 [<ffffffff811c3fcb>] __blkdev_put+0x11b/0x1d0 [<ffffffff811c4010>] __blkdev_put+0x160/0x1d0 [<ffffffff811c40df>] blkdev_put+0x5f/0x190 [<ffffffff8118f18d>] kill_block_super+0x4d/0x80 [<ffffffff8118f4a5>] deactivate_locked_super+0x45/0x70 [<ffffffff8119003a>] deactivate_super+0x4a/0x70 [<ffffffff811ac4ad>] mntput_no_expire+0xed/0x130 [<ffffffff811acf2e>] sys_umount+0x7e/0x3a0 [<ffffffff81aeeeab>] system_call_fastpath+0x16/0x1b This is because bdev holds on to disk but disk doesn't pin the associated queue. If a SCSI device is removed while the device is still open, the sdev puts the base reference to the queue on release. When the bdev is finally released, the associated queue is already gone along with the bdi and bdev_inode_switch_bdi() ends up dereferencing already freed bdi. Even if it were not for this bug, disk not holding onto the associated queue is very unusual and error-prone. Fix it by making add_disk() take an extra reference to its queue and put it on disk_release() and ensuring that disk and its fops owner are put in that order after all accesses to the disk and queue are complete. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: stable@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2011-10-18NFSv4: Translate NFS4ERR_BADNAME into ENOENT when applied to a lookupTrond Myklebust
Both LOOKUP and OPEN operations may return NFS4ERR_BADNAME if we send a an invalid name as a filename argument. As far as the application is concerned, it just has to know that the file doesn't exist, and so ENOENT would be the appropriate reply. We should only return EINVAL if the filename is being used to _create_ a new object on the remote filesystem. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-10-18NFS: Remove the unused "lookupfh()" version of nfs4_proc_lookup()Trond Myklebust
...and also remove the associated nfs_v4_clientops entry. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-10-18xfs: add AIL pushing tracepointsChristoph Hellwig
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Alex Elder <aelder@sgi.com>
2011-10-18xfs: put in missed fix for merge problemAlex Elder
I intended to do this as part of fixing part of the conflict with the merge with Linus' tree, but evidently it didn't get included in the commit. Signed-off-by: Alex Elder <aelder@sgi.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2011-10-18NFS: Use the inode->i_version to cache NFSv4 change attribute informationTrond Myklebust
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-10-18nfs/super.c: local functions should be staticH Hartley Sweeten
commit ae50c0b5 "pnfs: client stats" added additional information to the output of /proc/self/mountstats. The new functions introduced are only used in this file and should be marked static. If CONFIG_NFS_V4_1 is not defined, empty stub functions are used. If CONFIG_NFS_V4 is not defined these stub functions are not used at all. Adding static for the functions results in compile warnings: fs/nfs/super.c:743: warning: 'show_sessions' defined but not used fs/nfs/super.c:756: warning: 'show_pnfs' defined but not used Fix this by adding a #ifdef CONFIG_NFS_V4 guard around the two show_ functions. Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-10-18pnfsblock: fix writeback deadlockPeng Tao
We should check if the sector is already initialized before trying to grab the page from page cache. Otherwise when two pages of the same block are written back by two threads each calling from writepage_locked, it can cause deadlock like bellow. [ 1080.972099] INFO: task kswapd0:25 blocked for more than 120 seconds. [ 1080.972377] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 1080.972812] kswapd0 D ffff88000c4926c0 0 25 2 0x00000000 [ 1080.972816] ffff88000df276b0 0000000000000046 ffff88000df27640 ffffffff81013ba7 [ 1080.972821] ffff88000c492310 ffff88000df27fd8 ffff88000df27fd8 00000000001d3440 [ 1080.972824] ffff88000c378000 ffff88000c492310 ffff8800175d3d40 ffff880017fc75a8 [ 1080.972828] Call Trace: [ 1080.972860] [<ffffffff81013ba7>] ? read_tsc+0x9/0x19 [ 1080.972877] [<ffffffff810e0b23>] ? lock_page+0x2b/0x2b [ 1080.972899] [<ffffffff81475a1d>] io_schedule+0x63/0x7e [ 1080.972902] [<ffffffff810e0b31>] sleep_on_page+0xe/0x12 [ 1080.972905] [<ffffffff81475fe8>] __wait_on_bit_lock+0x46/0x8f [ 1080.972916] [<ffffffff810822d7>] ? lock_release_holdtime.part.7+0x6b/0x72 [ 1080.972919] [<ffffffff810e0af6>] __lock_page+0x66/0x68 [ 1080.972928] [<ffffffff81072705>] ? autoremove_wake_function+0x3d/0x3d [ 1080.972932] [<ffffffff810e0b1f>] lock_page+0x27/0x2b [ 1080.972934] [<ffffffff810e0bcf>] find_lock_page+0x34/0x57 [ 1080.972937] [<ffffffff810e1738>] find_or_create_page+0x34/0x8a [ 1080.972947] [<ffffffffa034245b>] bl_write_pagelist+0x205/0x6da [blocklayoutdriver] [ 1080.972951] [<ffffffffa034145d>] ? bl_free_lseg+0x38/0x38 [blocklayoutdriver] [ 1080.972995] [<ffffffffa02e27b9>] ? nfs_write_rpcsetup+0x118/0x123 [nfs] [ 1080.973033] [<ffffffffa030246b>] pnfs_generic_pg_writepages+0x10b/0x1f4 [nfs] [ 1080.973089] [<ffffffffa02deaae>] nfs_pageio_doio+0x1a/0x43 [nfs] [ 1080.973098] [<ffffffffa02df035>] nfs_pageio_complete+0x16/0x2d [nfs] [ 1080.973108] [<ffffffffa02e2d8f>] nfs_writepage_locked+0xa0/0xbf [nfs] [ 1080.973119] [<ffffffffa02e36a1>] nfs_writepage+0x16/0x2b [nfs] [ 1080.973122] [<ffffffff810e8762>] ? clear_page_dirty_for_io+0x87/0x9a [ 1080.973133] [<ffffffff810efc5b>] shrink_page_list+0x39b/0x6c8 [ 1080.973139] [<ffffffff810f03bb>] shrink_inactive_list+0x22c/0x39e [ 1080.973144] [<ffffffff810822d7>] ? lock_release_holdtime.part.7+0x6b/0x72 [ 1080.973148] [<ffffffff810f0c33>] shrink_zone+0x445/0x588 [ 1080.973152] [<ffffffff810f1a11>] balance_pgdat+0x2c2/0x56b [ 1080.973170] [<ffffffff81254208>] ? __bitmap_weight+0x34/0x80 [ 1080.973175] [<ffffffff810f1f78>] kswapd+0x2be/0x2fa [ 1080.973179] [<ffffffff810726c8>] ? __init_waitqueue_head+0x4b/0x4b [ 1080.973183] [<ffffffff810f1cba>] ? balance_pgdat+0x56b/0x56b [ 1080.973187] [<ffffffff81071f69>] kthread+0xa8/0xb0 [ 1080.973200] [<ffffffff814806b4>] kernel_thread_helper+0x4/0x10 [ 1080.973205] [<ffffffff81071ec1>] ? __init_kthread_worker+0x5a/0x5a [ 1080.973210] [<ffffffff814806b0>] ? gs_change+0x13/0x13 [ 1080.973213] no locks held by kswapd0/25. Signed-off-by: Peng Tao <peng_tao@emc.com> Signed-off-by: Jim Rees <rees@umich.edu> Cc: stable@kernel.org [3.0] Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-10-18pnfsblock: fix NULL pointer dereferencePeng Tao
bl_add_page_to_bio returns error pointer. bio should be reset to NULL in failure cases as the out path always calls bl_submit_bio. Signed-off-by: Peng Tao <peng_tao@emc.com> Signed-off-by: Jim Rees <rees@umich.edu> Cc: stable@kernel.org [3.0] Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-10-18pnfs: recoalesce when ld read pagelist failsPeng Tao
For pnfs pagelist read failure, we need to pg_recoalesce and resend IO to mds. Signed-off-by: Peng Tao <peng_tao@emc.com> Signed-off-by: Jim Rees <rees@umich.edu> Cc: stable@kernel.org [3.0] Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-10-18pnfs: recoalesce when ld write pagelist failsPeng Tao
For pnfs pagelist write failure, we need to pg_recoalesce and resend IO to mds. Signed-off-by: Peng Tao <peng_tao@emc.com> Signed-off-by: Jim Rees <rees@umich.edu> Cc: stable@kernel.org [3.0] Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-10-18pnfs: make _set_lo_fail genericPeng Tao
file layout and block layout both use it to set mark layout io failure bit. So make it generic. Signed-off-by: Peng Tao <peng_tao@emc.com> Signed-off-by: Jim Rees <rees@umich.edu> Cc: stable@kernel.org [3.0] Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-10-18pnfsblock: add missing rpc_put_mount and path_putPeng Tao
Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Peng Tao <peng_tao@emc.com> Signed-off-by: Jim Rees <rees@umich.edu> Cc: stable@kernel.org [3.0] Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-10-18SUNRPC/NFS: make rpc pipe upcall genericPeng Tao
The same function is used by idmap, gss and blocklayout code. Make it generic. Signed-off-by: Peng Tao <peng_tao@emc.com> Signed-off-by: Jim Rees <rees@umich.edu> Cc: stable@kernel.org [3.0] Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-10-18pnfsblock: fix size of upcall messageJim Rees
Make the status field explicitly 32 bits. "...it's unlikely that the kernel and userspace would differ on the size of an int here, but it might be a good idea to go ahead and make that explicitly 32 bits in case we end up dealing with more exotic arches at some point in the future." Suggested-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Jim Rees <rees@umich.edu> Signed-off-by: Benny Halevy <bhalevy@tonian.com> Cc: stable@kernel.org [3.0] Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-10-18pnfsblock: fix return code confusionJim Rees
Always return PTR_ERR, not NULL, from nfs4_blk_get_deviceinfo and nfs4_blk_decode_device. Check for IS_ERR, not NULL, in bl_set_layoutdriver when calling nfs4_blk_get_deviceinfo. Signed-off-by: Jim Rees <rees@umich.edu> Signed-off-by: Benny Halevy <bhalevy@tonian.com> Cc: stable@kernel.org [3.0] Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-10-18nfs: don't try to migrate pages with active requestsJeff Layton
nfs_find_and_lock_request will take a reference to the nfs_page and will then put it if the req is already locked. It's possible though that the reference will be the last one. That put then can kick off a whole series of reference puts: nfs_page nfs_open_context dentry inode If the inode ends up being deleted, then the VFS will call truncate_inode_pages. That function will try to take the page lock, but it was already locked when migrate_page was called. The code deadlocks. Fix this by simply refusing the migration request if PagePrivate is already set, indicating that the page is already associated with an active read or write request. We've had a customer test a backported version of this patch and the preliminary results seem good. Cc: stable@kernel.org Cc: Andrea Arcangeli <aarcange@redhat.com> Reported-by: Harshula Jayasuriya <harshula@redhat.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-10-18nfs: fix bug about IPv6 address scope checkingMi Jinlong
The result from ipv6_addr_scope() always not be a single SCOPE, so we can't use equal to compare the result with IPV6_ADDR_SCOPE_LINKLOCAL at nfs_sockaddr_match_ipaddr6. This patch fixs the problem, and lets checking address before scope_id. Signed-off-by: Mi Jinlong <mijinlong@cn.fujitsu.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-10-18nfs: don't redirty inode when ncommit == 0 in nfs_commit_unstable_pagesJeff Layton
commit 420e3646 allowed the kernel to reduce the number of unnecessary commit calls by skipping the commit when there are a large number of outstanding pages. However, the current test in nfs_commit_unstable_pages does not handle the edge condition properly. When ncommit == 0, then that means that the kernel doesn't need to do anything more for the inode. The current test though in the WB_SYNC_NONE case will return true, and the inode will end up being marked dirty. Once that happens the inode will never be clean until there's a WB_SYNC_ALL flush. Fix this by immediately returning from nfs_commit_unstable_pages when ncommit == 0. Mike noticed this problem initially in RHEL5 (2.6.18-based kernel) which has a backported version of 420e3646. The inode cache there was growing very large. The inode cache was unable to be shrunk since the inodes were all marked dirty. Calling sync() would essentially "fix" the problem -- the WB_SYNC_ALL flush would result in the inodes all being marked clean. What I'm not clear on is how big a problem this is in mainline kernels as the writeback code there is very different. Either way, it seems incorrect to re-mark the inode dirty in this case. Reported-by: Mike McLean <mikem@redhat.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Cc: stable@kernel.org [2.6.34+] Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-10-18Revert "NFS: Ensure that writeback_single_inode() calls write_inode() when ↵Trond Myklebust
syncing" This reverts commit b80c3cb628f0ebc241b02e38dd028969fb8026a2. The reverted commit was rendered obsolete by a VFS fix: commit 5547e8aac6f71505d621a612de2fca0dd988b439 (writeback: Update dirty flags in two steps). We now no longer need to worry about writeback_single_inode() missing our marking the inode for COMMIT in 'do_writepages()' call. Reverting this patch, fixes a performance regression in which the inode would continuously get queued to the dirty list, causing the writeback code to unnecessarily try to send a COMMIT. Signed-off-by: Trond Myklebust <Trond.Myklebust> Tested-by: Simon Kirby <sim@hostway.ca> Cc: stable@kernel.org [2.6.35+]
2011-10-18CIFS: fix automount for DFS sharesGerlando Falauto
Automounting directories are now invalidated by .d_revalidate() so to be d_instantiate()d again with the right DCACHE_NEED_AUTOMOUNT flag Reviewed-by: Jeff Layton <jlayton@samba.org> Signed-off-by: Gerlando Falauto <gerlando.falauto@keymile.com> Signed-off-by: Steve French <smfrench@gmail.com>