summaryrefslogtreecommitdiffstats
path: root/fs
AgeCommit message (Collapse)Author
2014-01-28btrfs: fix static checker warningsJeff Mahoney
This patch fixes the following warnings: fs/btrfs/extent-tree.c:6201:12: sparse: symbol 'get_raid_name' was not declared. Should it be static? fs/btrfs/extent-tree.c:8430:9: error: format not a string literal and no format arguments [-Werror=format-security] get_raid_name(index)); Signed-off-by: Jeff Mahoney <jeffm@suse.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-01-28Btrfs: fix very slow inode eviction and fs unmountFilipe David Borba Manana
The inode eviction can be very slow, because during eviction we tell the VFS to truncate all of the inode's pages. This results in calls to btrfs_invalidatepage() which in turn does calls to lock_extent_bits() and clear_extent_bit(). These calls result in too many merges and splits of extent_state structures, which consume a lot of time and cpu when the inode has many pages. In some scenarios I have experienced umount times higher than 15 minutes, even when there's no pending IO (after a btrfs fs sync). A quick way to reproduce this issue: $ mkfs.btrfs -f /dev/sdb3 $ mount /dev/sdb3 /mnt/btrfs $ cd /mnt/btrfs $ sysbench --test=fileio --file-num=128 --file-total-size=16G \ --file-test-mode=seqwr --num-threads=128 \ --file-block-size=16384 --max-time=60 --max-requests=0 run $ time btrfs fi sync . FSSync '.' real 0m25.457s user 0m0.000s sys 0m0.092s $ cd .. $ time umount /mnt/btrfs real 1m38.234s user 0m0.000s sys 1m25.760s The same test on ext4 runs much faster: $ mkfs.ext4 /dev/sdb3 $ mount /dev/sdb3 /mnt/ext4 $ cd /mnt/ext4 $ sysbench --test=fileio --file-num=128 --file-total-size=16G \ --file-test-mode=seqwr --num-threads=128 \ --file-block-size=16384 --max-time=60 --max-requests=0 run $ sync $ cd .. $ time umount /mnt/ext4 real 0m3.626s user 0m0.004s sys 0m3.012s After this patch, the unmount (inode evictions) is much faster: $ mkfs.btrfs -f /dev/sdb3 $ mount /dev/sdb3 /mnt/btrfs $ cd /mnt/btrfs $ sysbench --test=fileio --file-num=128 --file-total-size=16G \ --file-test-mode=seqwr --num-threads=128 \ --file-block-size=16384 --max-time=60 --max-requests=0 run $ time btrfs fi sync . FSSync '.' real 0m26.774s user 0m0.000s sys 0m0.084s $ cd .. $ time umount /mnt/btrfs real 0m1.811s user 0m0.000s sys 0m1.564s Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-01-28Btrfs: improve forever loop when doing balance relocationWang Shilong
We hit a forever loop when doing balance relocation,the reason is that we firstly reserve 4M(node size is 16k).and within transaction we will try to add extra reservation for snapshot roots,this will return -EAGAIN if there has been a thread flushing space to reserve space.We will do this again and again with filesystem becoming nearly full. If the above '-EAGAIN' case happens, we try to refill reservation more outsize of transaction, and this will return eariler in enospc case,however, this dosen't really hurt because it makes no sense doing balance relocation with the filesystem nearly full. Miao Xie helped a lot to track this issue, thanks. Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-01-28Btrfs: fix ordered extent check in btrfs_punch_holeFilipe David Borba Manana
If the ordered extent's last byte was 1 less than our region's start byte, we would unnecessarily wait for the completion of that ordered extent, because it doesn't intersect our target range. Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-01-28Btrfs: fix the reserved space leak caused by the race between nonlock dio ↵Miao Xie
and buffered io When we ran sysbench on the fs with compression, the following WARN_ONs were triggered: fs/btrfs/inode.c:7829 WARN_ON(BTRFS_I(inode)->outstanding_extents); fs/btrfs/inode.c:7830 WARN_ON(BTRFS_I(inode)->reserved_extents); fs/btrfs/inode.c:7832 WARN_ON(BTRFS_I(inode)->csum_bytes); Steps to reproduce: # mkfs.btrfs -f <dev> # mount -o compress <dev> <mnt> # cd <mnt> # sysbench --test=fileio --num-threads=8 --file-total-size=8G \ > --file-block-size=32K --file-io-mode=rndwr --file-fsync-freq=0 \ > --file-fsync-end=no --max-requests=300000 --file-extra-flags=direct \ > --file-test-mode=sync prepare # cd - # umount <mnt> # mount -o compress <dev> <mnt> # cd <mnt> # sysbench --test=fileio --num-threads=8 --file-total-size=8G \ > --file-block-size=32K --file-io-mode=rndwr --file-fsync-freq=0 \ > --file-fsync-end=no --max-requests=300000 --file-extra-flags=direct \ > --file-test-mode=sync run # cd - # umount <mnt> The reason of this problem is: Task0 Task1 btrfs_direct_IO unlock(&inode->i_mutex) lock(&inode->i_mutex) reserve_space() prepare_pages() lock_extent() clear_extent() unlock_extent() lock_extent() test_extent(uptodate) return false copy_data() set_delalloc_extent() extent need compress go back to buffered write clear_extent(DELALLOC | DIRTY) unlock_extent() Task 0 and 1 wrote the same place, and task0 cleared the delalloc flag which was set by task1, it made the dirty pages in that extents couldn't be flushed into the disk, so the reserved space for that extent was not released at the end. This patch fixes the above bug by unlocking the extent after the delalloc. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-01-28Btrfs: cleanup unnecessary parameter and variant of prepare_pages()Miao Xie
- the caller has gotten the inode object, needn't pass the file object. And if so, we needn't define a inode pointer variant. - the position should be aligned by the page size not sector size, so we also needn't pass the root object into prepare_pages(). Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-01-28btrfs: replace BUG in can_modify_featureDavid Sterba
We don't need to crash hard here, it's just reading a sysfs file. The values considered in switch are from a fixed set, the default case should not happen at all. Signed-off-by: David Sterba <dsterba@suse.cz> Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-01-28btrfs: reserve no transaction units in btrfs_feature_attr_storeDavid Sterba
Added in patch "btrfs: add ability to change features via sysfs", modifications to superblock don't need to reserve metadata blocks when starting a transaction. Signed-off-by: David Sterba <dsterba@suse.cz> Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-01-28Btrfs: make btrfs_debug match pr_debug handling related to DEBUGFrank Holton
The kernel macro pr_debug is defined as a empty statement when DEBUG is not defined. Make btrfs_debug match pr_debug to avoid spamming the kernel log with debug messages Signed-off-by: Frank Holton <fholton@gmail.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-01-28btrfs: cleanup: removed unused 'btrfs_get_inode_ref_index'Sergei Trofimovich
Found by uselex.rb: > btrfs_get_inode_ref_index: [R]: exported from: fs/btrfs/inode-item.o fs/btrfs/btrfs.o fs/btrfs/built-in.o Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org> Reviewed-by: David Stebra <dsterba@suse.cz> Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-01-28btrfs: expand btrfs_find_item() to include find_orphan_item functionalityKelley Nielsen
This is the third step in bootstrapping the btrfs_find_item interface. The function find_orphan_item(), in orphan.c, is similar to the two functions already replaced by the new interface. It uses two parameters, which are already present in the interface, and is nearly identical to the function brought in in the previous patch. Replace the two calls to find_orphan_item() with calls to btrfs_find_item(), with the defined objectid and type that was used internally by find_orphan_item(), a null path, and a null key. Add a test for a null path to btrfs_find_item, and if it passes, allocate and free the path. Finally, remove find_orphan_item(). Signed-off-by: Kelley Nielsen <kelleynnn@gmail.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-01-28btrfs: expand btrfs_find_item() to include find_root_ref functionalityKelley Nielsen
This patch is the second step in bootstrapping the btrfs_find_item interface. The btrfs_find_root_ref() is similar to the former __inode_info(); it accepts four of its parameters, and duplicates the first half of its functionality. Replace the one former call to btrfs_find_root_ref() with a call to btrfs_find_item(), along with the defined key type that was used internally by btrfs_find_root ref, and a null found key. In btrfs_find_item(), add a test for the null key at the place where the functionality of btrfs_find_root_ref() ends; btrfs_find_item() then returns if the test passes. Finally, remove btrfs_find_root_ref(). Signed-off-by: Kelley Nielsen <kelleynnn@gmail.com> Suggested-by: Zach Brown <zab@redhat.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-01-28btrfs: bootstrap generic btrfs_find_item interfaceKelley Nielsen
There are many btrfs functions that manually search the tree for an item. They all reimplement the same mechanism and differ in the conditions that they use to find the item. __inode_info() is one such example. Zach Brown proposed creating a new interface to take the place of these functions. This patch is the first step to creating the interface. A new function, btrfs_find_item, has been added to ctree.c and prototyped in ctree.h. It is identical to __inode_info, except that the order of the parameters has been rearranged to more closely those of similar functions elsewhere in the code (now, root and path come first, then the objectid, offset and type, and the key to be filled in last). __inode_info's callers have been set to call this new function instead, and __inode_info itself has been removed. Signed-off-by: Kelley Nielsen <kelleynnn@gmail.com> Suggested-by: Zach Brown <zab@redhat.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-01-28btrfs: fix unused variables in qgroup.cValentina Giusti
Use otherwise unused local variables slot in update_qgroup_limit_item and in update_qgroup_info_item, and remove unused variable ins from btrfs_qgroup_account_ref. Signed-off-by: Valentina Giusti <valentina.giusti@microon.de> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-01-28btrfs: replace path->slots[0] with otherwise unused variable 'slot'Valentina Giusti
Signed-off-by: Valentina Giusti <valentina.giusti@microon.de> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-01-28btrfs: remove unused variable from scrub_fixup_nodatasumValentina Giusti
Signed-off-by: Valentina Giusti <valentina.giusti@microon.de> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-01-28btrfs: remove unused variable from setup_cluster_no_bitmapValentina Giusti
The variable window_start in setup_cluster_no_bitmap is not used since commit 1bb91902dc90e25449893e693ad45605cb08fbe5 (Btrfs: revamp clustered allocation logic) Signed-off-by: Valentina Giusti <valentina.giusti@microon.de> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-01-28btrfs: remove unused variables from extent_io.cValentina Giusti
Remove unused variables: * tree from end_bio_extent_writepage, * item from extent_fiemap. Signed-off-by: Valentina Giusti <valentina.giusti@microon.de> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-01-28btrfs: remove unused variable from find_free_extentValentina Giusti
The variable found_uncached_bg in find_free_extent is not used since commit 285ff5af6ce358e73f53b55c9efadd4335f4c2ff (Btrfs: remove the ideal caching code) Signed-off-by: Valentina Giusti <valentina.giusti@microon.de> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-01-28btrfs: remove unused variables from disk-io.cValentina Giusti
Remove unused variables: * tree from csum_dirty_buffer, * tree from btree_readpage_end_io_hook, * tree from btree_writepages, * bytenr from btrfs_create_tree, * fs_info from end_workqueue_fn. Signed-off-by: Valentina Giusti <valentina.giusti@microon.de> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-01-28btrfs: remove unused variable from btrfs_new_inodeValentina Giusti
Variable owner in btrfs_new_inode is unused since commit d82a6f1d7e8b61ed5996334d0db66651bb43641d (Btrfs: kill BTRFS_I(inode)->block_group) Signed-off-by: Valentina Giusti <valentina.giusti@microon.de> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-01-28btrfs: publish fs label in sysfsJeff Mahoney
This adds a writeable attribute which describes the label. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-01-28btrfs: publish device membership in sysfsJeff Mahoney
Now that we have the infrastructure for per-super attributes, we can publish device membership in /sys/fs/btrfs/<fsid>/devices. The information is published as symlinks to the block devices. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-01-28btrfs: publish allocation data in sysfsJeff Mahoney
While trying to debug ENOSPC issues, it's helpful to understand what the kernel's view of the available space is. We export this information via ioctl, but sysfs files are more easily used. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-01-28btrfs: add ioctl to export size of global metadata reservationJeff Mahoney
btrfs filesystem df output will show the size of the metadata space and how much of it is used, and the user assumes that the difference is all usable space. Since that's not actually the case due to the global metadata reservation, we should provide the full picture to the user. This patch adds an ioctl that exports the size of the global metadata reservation so that btrfs filesystem df can report it. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-01-28btrfs: use feature attribute names to print better error messagesJeff Mahoney
Now that we have the feature name strings available in the kernel via the sysfs attributes, we can use them for printing better failure messages from the ioctl path. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-01-28btrfs: add ability to change features via sysfsJeff Mahoney
This patch adds the ability to change (set/clear) features while the file system is mounted. A bitmask is added for each feature set for the support to set and clear the bits. A message indicating which bit has been set or cleared is issued when it's been changed and also when permission or support for a particular bit has been denied. Since the the attributes can now be writable, we need to introduce another struct attribute to hold the different permissions. If neither set or clear is supported, the file will have 0444 permissions. If either set or clear is supported, the file will have 0644 permissions and the store handler will filter out the write based on the bitmask. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-01-28btrfs: publish unknown feature bits in sysfsJeff Mahoney
With the compat and compat-ro bits, it's possible for file systems to exist that have features that aren't supported by the kernel's file system implementation yet still be mountable. This patch publishes read-only info on those features using a prefix:number format, where the number is the bit number rather than the shifted value. e.g. "compat:12" Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-01-28btrfs: publish per-super features in sysfsJeff Mahoney
This patch publishes information on which features are enabled in the file system on a per-super basis. At this point, it only publishes information on features supported by the file system implementation. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-01-28btrfs: publish per-super attributes in sysfsJeff Mahoney
This patch adds per-super attributes to sysfs. It doesn't publish any attributes yet, but does the proper lifetime handling as well as the basic infrastructure to add new attributes. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-01-28btrfs: publish supported featured in sysfsJeff Mahoney
This patch adds the ability to publish supported features to sysfs under /sys/fs/btrfs/features. The files are module-wide and export which features the kernel supports. The content, for now, is just "0\n". Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-01-28btrfs: add ioctls to query/change feature bits onlineJeff Mahoney
There are some feature bits that require no offline setup and can be enabled online. I've only reviewed extended irefs, but there will probably be more. We introduce three new ioctls: - BTRFS_IOC_GET_SUPPORTED_FEATURES: query the kernel for supported features. - BTRFS_IOC_GET_FEATURES: query the kernel for enabled features on a per-fs basis, as well as querying for which features are changeable with mounted. - BTRFS_IOC_SET_FEATURES: change features on a per-fs basis. We introduce two new masks per feature set (_SAFE_SET and _SAFE_CLEAR) that allow us to define which features are safe to change at runtime. The failure modes for BTRFS_IOC_SET_FEATURES are as follows: - Enabling a completely unsupported feature: warns and returns -ENOTSUPP - Enabling a feature that can only be done offline: warns and returns -EPERM Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-01-28Btrfs: skip merge part for delayed data refsLiu Bo
When we have data deduplication on, we'll hang on the merge part because it needs to verify every queued delayed data refs related to this disk offset but we may have millions refs. And in the case of delayed data refs, we don't usually have too much data refs to merge. So it's safe to shut it down for data refs. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-01-28Btrfs: introduce a head ref rbtreeLiu Bo
The way how we process delayed refs is 1) get a bunch of head refs, 2) pick up one head ref, 3) go one node back for any delayed ref updates. The head ref is also linked in the same rbtree as the delayed ref is, so in 1) stage, we have to walk one by one including not only head refs, but delayed refs. When we have a great number of delayed refs pending to process, this'll cost time a lot. Here we introduce a head ref specific rbtree, it only has head refs, so troubles go away. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-01-28Btrfs: fix check-integrity to look at the referenced data properlyJosef Bacik
We were looking at file_extent_num_bytes unconditionally when looking at referenced data bytes, but this isn't correct for compression. Fix this by checking the compression of the file extent we are and setting num_bytes to disk_num_bytes in the case of compression so that we are marking the proper bytes as referenced. This fixes check_int_data freaking out when running btrfs/004. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-01-28Btrfs: incompatible format change to remove hole extentsJosef Bacik
Btrfs has always had these filler extent data items for holes in inodes. This has made somethings very easy, like logging hole punches and sending hole punches. However for large holey files these extent data items are pure overhead. So add an incompatible feature to no longer add hole extents to reduce the amount of metadata used by these sort of files. This has a few changes for logging and send obviously since they will need to detect holes and log/send the holes if there are any. I've tested this thoroughly with xfstests and it doesn't cause any issues with and without the incompat format set. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-01-28nfs: add memory barriers around NFS_INO_INVALID_DATA and NFS_INO_INVALIDATINGJeff Layton
If the setting of NFS_INO_INVALIDATING gets reordered to before the clearing of NFS_INO_INVALID_DATA, then another task may hit a race window where both appear to be clear, even though the inode's pages are still in need of invalidation. Fix this by adding the appropriate memory barriers. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-01-28Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull ceph updates from Sage Weil: "This is a big batch. From Ilya we have: - rbd support for more than ~250 mapped devices (now uses same scheme that SCSI does for device major/minor numbering) - crush updates for new mapping behaviors (will be needed for coming erasure coding support, among other things) - preliminary support for tiered storage pools There is also a big series fixing a pile cephfs bugs with clustered MDSs from Yan Zheng, ACL support for cephfs from Guangliang Zhao, ceph fscache improvements from Li Wang, improved behavior when we get ENOSPC from Josh Durgin, some readv/writev improvements from Majianpeng, and the usual mix of small cleanups" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (76 commits) ceph: cast PAGE_SIZE to size_t in ceph_sync_write() ceph: fix dout() compile warnings in ceph_filemap_fault() libceph: support CEPH_FEATURE_OSD_CACHEPOOL feature libceph: follow redirect replies from osds libceph: rename ceph_osd_request::r_{oloc,oid} to r_base_{oloc,oid} libceph: follow {read,write}_tier fields on osd request submission libceph: add ceph_pg_pool_by_id() libceph: CEPH_OSD_FLAG_* enum update libceph: replace ceph_calc_ceph_pg() with ceph_oloc_oid_to_pg() libceph: introduce and start using oid abstraction libceph: rename MAX_OBJ_NAME_SIZE to CEPH_MAX_OID_NAME_LEN libceph: move ceph_file_layout helpers to ceph_fs.h libceph: start using oloc abstraction libceph: dout() is missing a newline libceph: add ceph_kv{malloc,free}() and switch to them libceph: support CEPH_FEATURE_EXPORT_PEER ceph: add imported caps when handling cap export message ceph: add open export target session helper ceph: remove exported caps when handling cap import message ceph: handle session flush message ...
2014-01-28Merge branch 'for-linus' of git://git.open-osd.org/linux-open-osdLinus Torvalds
Pull exofs and ore fixes from Boaz Harrosh: "The main fix here, the first patch, is also destined for -stable. The rest is small trivia and cosmetics. The ORE patches effect both exofs and pnfs-objects very reproducible bugs" [ ORE is "object raid engine", used by exofs and pnfs - Linus ] * 'for-linus' of git://git.open-osd.org/linux-open-osd: exofs: Print less in r4w exofs: Allow corrupted directory entry to be empty file exofs: Allow O_DIRECT open ore: Don't crash on NULL bio in _clear_bio ore: Fix wrong math in allocation of per device BIO
2014-01-28ceph: cast PAGE_SIZE to size_t in ceph_sync_write()Ilya Dryomov
Use min_t(size_t, ...) instead of plain min(), which does strict type checking, to avoid compile warning on i386. Cc: Jianpeng Ma <majianpeng@gmail.com> Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
2014-01-28ceph: fix dout() compile warnings in ceph_filemap_fault()Ilya Dryomov
PAGE_CACHE_SIZE is unsigned long on all architectures, however size_t is either unsigned int or unsigned long. Rather than change format strings, cast PAGE_CACHE_SIZE to size_t to be in line with dout()s in ceph_page_mkwrite(). Cc: Yan, Zheng <zheng.z.yan@intel.com> Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
2014-01-28Merge tag 'ext4_for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 update from Ted Ts'o: "Bug fixes and cleanups for ext4. We also enable the punch hole functionality for bigalloc file systems" * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: delete "set but not used" variables ext4: don't pass freed handle to ext4_walk_page_buffers ext4: avoid clearing beyond i_blocks when truncating an inline data file ext4: ext4_inode_is_fast_symlink should use EXT4_CLUSTER_SIZE ext4: fix a typo in extents.c ext4: use %pd printk specificer ext4: standardize error handling in ext4_da_write_inline_data_begin() ext4: retry allocation when inline->extent conversion failed ext4: enable punch hole for bigalloc
2014-01-28Merge tag 'nfs-for-3.14-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
Pull NFS client updates from Trond Myklebust: "Highlights include: - stable fix for an infinite loop in RPC state machine - stable fix for a use after free situation in the NFSv4 trunking discovery - stable fix for error handling in the NFSv4 trunking discovery - stable fix for the page write update code - stable fix for the NFSv4.1 mount time security negotiation - stable fix for the NFSv4 open code. - O_DIRECT locking fixes - fix an Oops in the pnfs file commit code - RPC layer needs finer grained handling of connection errors - more RPC GSS upcall fixes" * tag 'nfs-for-3.14-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (30 commits) pnfs: Proper delay for NFS4ERR_RECALLCONFLICT in layout_get_done pnfs: fix BUG in filelayout_recover_commit_reqs nfs4: fix discover_server_trunking use after free NFSv4.1: Handle errors correctly in nfs41_walk_client_list nfs: always make sure page is up-to-date before extending a write to cover the entire page nfs: page cache invalidation for dio nfs: take i_mutex during direct I/O reads nfs: merge nfs_direct_write into nfs_file_direct_write nfs: merge nfs_direct_read into nfs_file_direct_read nfs: increment i_dio_count for reads, too nfs: defer inode_dio_done call until size update is done nfs: fix size updates for aio writes nfs4.1: properly handle ENOTSUP in SECINFO_NO_NAME NFSv4.1: Fix a race in nfs4_write_inode NFSv4.1: Don't trust attributes if a pNFS LAYOUTCOMMIT is outstanding point to the right include file in a comment (left over from a9004abc3) NFS: dprintk() should not print negative fileids and inode numbers nfs: fix dead code of ipv6_addr_scope sunrpc: Fix infinite loop in RPC state machine SUNRPC: Add tracepoint for socket errors ...
2014-01-28Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs updates from Al Viro: "Assorted stuff; the biggest pile here is Christoph's ACL series. Plus assorted cleanups and fixes all over the place... There will be another pile later this week" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (43 commits) __dentry_path() fixes vfs: Remove second variable named error in __dentry_path vfs: Is mounted should be testing mnt_ns for NULL or error. Fix race when checking i_size on direct i/o read hfsplus: remove can_set_xattr nfsd: use get_acl and ->set_acl fs: remove generic_acl nfs: use generic posix ACL infrastructure for v3 Posix ACLs gfs2: use generic posix ACL infrastructure jfs: use generic posix ACL infrastructure xfs: use generic posix ACL infrastructure reiserfs: use generic posix ACL infrastructure ocfs2: use generic posix ACL infrastructure jffs2: use generic posix ACL infrastructure hfsplus: use generic posix ACL infrastructure f2fs: use generic posix ACL infrastructure ext2/3/4: use generic posix ACL infrastructure btrfs: use generic posix ACL infrastructure fs: make posix_acl_create more useful fs: make posix_acl_chmod more useful ...
2014-01-28NFS: Fix races in nfs_revalidate_mappingTrond Myklebust
Commit d529ef83c355f97027ff85298a9709fe06216a66 (NFS: fix the handling of NFS_INO_INVALID_DATA flag in nfs_revalidate_mapping) introduces a potential race, since it doesn't test the value of nfsi->cache_validity and set the bitlock in nfsi->flags atomically. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Cc: Jeff Layton <jlayton@redhat.com>
2014-01-27compat: fix sys_fanotify_markHeiko Carstens
Commit 91c2e0bcae72 ("unify compat fanotify_mark(2), switch to COMPAT_SYSCALL_DEFINE") added a new unified compat fanotify_mark syscall to be used by all architectures. Unfortunately the unified version merges the split mask parameter in a wrong way: the lower and higher word got swapped. This was discovered with glibc's tst-fanotify test case. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Reported-by: Andreas Krebbel <krebbel@linux.vnet.ibm.com> Cc: "James E.J. Bottomley" <jejb@parisc-linux.org> Acked-by: "David S. Miller" <davem@davemloft.net> Acked-by: Al Viro <viro@ZenIV.linux.org.uk> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: <stable@vger.kernel.org> [3.10+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-27ocfs2: do not log ENOENT in unlink()Xiaowei.Hu
Suppress log message like this: (open_delete,8328,0):ocfs2_unlink:951 ERROR: status = -2 Orabug:17445485 Signed-off-by: Xiaowei Hu <xiaowei.hu@oracle.com> Cc: Joe Jin <joe.jin@oracle.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-27nfsd: consider CLAIM_FH when handing out delegationMing Chen
CLAIM_FH was added by NFSv4.1. It is the same as CLAIM_NULL except that it uses only current FH to identify the file to be opened. The NFS client is using CLAIM_FH if the FH is available when opening a file. Currently, we cannot get any delegation if we stat a file before open it because the server delegation code does not recognize CLAIM_FH. We tested this patch and found delegation can be handed out now when claim is CLAIM_FH. See http://marc.info/?l=linux-nfs&m=136369847801388&w=2 and http://www.linux-nfs.org/wiki/index.php/Server_4.0_and_4.1_issues#New_open_claim_types Signed-off-by: Ming Chen <mchen@cs.stonybrook.edu> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-01-27libceph: replace ceph_calc_ceph_pg() with ceph_oloc_oid_to_pg()Ilya Dryomov
Switch ceph_calc_ceph_pg() to new oloc and oid abstractions and rename it to ceph_oloc_oid_to_pg() to make its purpose more clear. Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
2014-01-27sunrpc: turn warn_gssd() log message into a dprintk()Jeff Layton
The original printk() made sense when the GSSAPI codepaths were called only when sec=krb5* was explicitly requested. Now however, in many cases the nfs client will try to acquire GSSAPI credentials by default, even when it's not requested. Since we don't have a great mechanism to distinguish between the two cases, just turn the pr_warn into a dprintk instead. With this change we can also get rid of the ratelimiting. We do need to keep the EXPORT_SYMBOL(gssd_running) in place since auth_gss.ko needs it and sunrpc.ko provides it. We can however, eliminate the gssd_running call in the nfs code since that's a bit of a layering violation. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>