summaryrefslogtreecommitdiffstats
path: root/fs
AgeCommit message (Collapse)Author
2013-04-29fat: introduce a helper fat_get_blknr_offset()Namjae Jeon
Introduce helper function to get the block number and offset for a given i_pos value. Use it in __fat_write_inode() now and later on in nfs.c Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Ravishankar N <ravi.n1@samsung.com> Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com> Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29fat: move fat_i_pos_read to fat.hNamjae Jeon
Move fat_i_pos_read to fat.h so that it can be called from nfs.c in the subsequent patches to encode the file handle. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Ravishankar N <ravi.n1@samsung.com> Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com> Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29fat: introduce 2 new values for the -o nfs mount optionNamjae Jeon
This patchset eliminates the client side ESTALE errors when a FAT partition exported over NFS has its dentries evicted from the cache. The idea is to find the on-disk location_'i_pos' of the dirent of the inode that has been evicted and use it to rebuild the inode. This patch: Provide two possible values 'stale_rw' and 'nostale_ro' for the -o nfs mount option.The first one allows all file operations but does not reduce ESTALE errors on memory constrained systems. The second one eliminates ESTALE errors but mounts the filesystem as read-only. Not specifying a value defaults to 'stale_rw'. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Ravishankar N <ravi.n1@samsung.com> Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com> Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29fs/buffer.c: remove unnecessary init operation after allocating buffer_head.majianpeng
bh allocation uses kmem_cache_zalloc() so we needn't call 'init_buffer(bh, NULL, NULL)' and perform other set-zero-operations. Signed-off-by: Jianpeng Ma <majianpeng@gmail.com> Cc: Jan Kara <jack@suse.cz> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29fs/proc/kcore.c: use register_hotmemory_notifier()Andrew Morton
Saves an ifdef, no code size changes Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29mm: allow arch code to control the user page table ceilingHugh Dickins
On architectures where a pgd entry may be shared between user and kernel (e.g. ARM+LPAE), freeing page tables needs a ceiling other than 0. This patch introduces a generic USER_PGTABLES_CEILING that arch code can override. It is the responsibility of the arch code setting the ceiling to ensure the complete freeing of the page tables (usually in pgd_free()). [catalin.marinas@arm.com: commit log; shift_arg_pages(), asm-generic/pgtables.h changes] Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: <stable@vger.kernel.org> [3.3+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29mm, vmalloc: move get_vmalloc_info() to vmalloc.cJoonsoo Kim
Now get_vmalloc_info() is in fs/proc/mmu.c. There is no reason that this code must be here and it's implementation needs vmlist_lock and it iterate a vmlist which may be internal data structure for vmalloc. It is preferable that vmlist_lock and vmlist is only used in vmalloc.c for maintainability. So move the code to vmalloc.c Signed-off-by: Joonsoo Kim <js1304@gmail.com> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Dave Anderson <anderson@redhat.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> Cc: Ingo Molnar <mingo@kernel.org> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29mm: make snapshotting pages for stable writes a per-bio operationDarrick J. Wong
Walking a bio's page mappings has proved problematic, so create a new bio flag to indicate that a bio's data needs to be snapshotted in order to guarantee stable pages during writeback. Next, for the one user (ext3/jbd) of snapshotting, hook all the places where writes can be initiated without PG_writeback set, and set BIO_SNAP_STABLE there. We must also flag journal "metadata" bios for stable writeout, since file data can be written through the journal. Finally, the MS_SNAP_STABLE mount flag (only used by ext3) is now superfluous, so get rid of it. [akpm@linux-foundation.org: rename _submit_bh()'s `flags' to `bio_flags', delobotomize the _submit_bh declaration] [akpm@linux-foundation.org: teeny cleanup] Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Artem Bityutskiy <dedekind1@gmail.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29fs: don't compile in drop_caches.c when CONFIG_SYSCTL=nJosh Triplett
drop_caches.c provides code only invokable via sysctl, so don't compile it in when CONFIG_SYSCTL=n. Signed-off-by: Josh Triplett <josh@joshtriplett.org> Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29direct-io: submit bio after boundary buffer is added to itJan Kara
Currently, dio_send_cur_page() submits bio before current page and cached sdio->cur_page is added to the bio if sdio->boundary is set. This is actually wrong because sdio->boundary means the current buffer is the last one before metadata needs to be read. So we should rather submit the bio after the current page is added to it. Signed-off-by: Jan Kara <jack@suse.cz> Reported-by: Kazuya Mio <k-mio@sx.jp.nec.com> Tested-by: Kazuya Mio <k-mio@sx.jp.nec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29direct-io: fix boundary block handlingJan Kara
When we read/write a file sequentially, we will read/write not only the data blocks but also the indirect blocks that may not be physically adjacent to the data blocks. So filesystems set the BH_Boundary flag to submit the previous I/O before reading/writing an indirect block. However the generic direct IO code mishandles buffer_boundary(), setting sdio->boundary before each submit_page_section() call which results in sending only one page bios as underlying code thinks this page is the last in the contiguous extent. So fix the problem by setting sdio->boundary only if the current page is really the last one in the mapped extent. With this patch and "direct-io: submit bio after boundary buffer is added to it" I've measured about 10% throughput improvement of direct IO reads on ext3 with SATA harddrive (from 90 MB/s to 100 MB/s). With ramdisk, the improvement was about 3-fold (from 350 MB/s to 1.2 GB/s). For other filesystems (such as ext4), the improvements won't be as visible because the frequency of BH_Boundary flag being set is much smaller. Signed-off-by: Jan Kara <jack@suse.cz> Reported-by: Kazuya Mio <k-mio@sx.jp.nec.com> Tested-by: Kazuya Mio <k-mio@sx.jp.nec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29fs/read_write.c: fix generic_file_llseek() commentMing Lei
Commit ef3d0fd27e90 ("vfs: do (nearly) lockless generic_file_llseek") has removed i_mutex from generic_file_llseek, so update the comment accordingly. Signed-off-by: Ming Lei <tom.leiming@gmail.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29ocfs2/dlm: remove redundant null pointer checkSachin Kamat
kfree on a NULL pointer is a no-op. Remove the redundant null pointer check. Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org> Acked-by: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29ocfs2: fix NULL dereference for moving extentsDan Carpenter
We can't dereference "bg" before it has been assigned. GCC should have warned about this but "bg" was initialized to NULL. I've fixed that as well. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Jie Liu <jeff.liu@oracle.com> Reviewed-by: Jie Liu <jeff.liu@oracle.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29ocfs2: fix error handling in ocfs2_ioctl_move_extents()Dan Carpenter
Smatch complains that if we hit an error (for example if the file is immutable) then "range" has uninitialized stack data and we copy it to the user. I've re-written the error handling to avoid this problem and make it a little cleaner as well. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Jie Liu <jeff.liu@oracle.com> Reviewed-by: Jie Liu <jeff.liu@oracle.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29ocfs2: fix error return code in ocfs2_info_handle_freefrag()Wei Yongjun
Fix to return a negative error code from the error handling case instead of 0, as returned elsewhere in this function. Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29ocfs2: delay inode update transactions after verifying the input flagsJeff Liu
There is no need to start the inode update transactions before/while verifying the input flags. As a refinement, this patch delay the transactions utill the pre-check up is ok. Signed-off-by: Jie Liu <jeff.liu@oracle.com> Acked-by: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29fs/fscache/stats.c: fix memory leakAnurup m
There is a kernel memory leak observed when the proc file /proc/fs/fscache/stats is read. The reason is that in fscache_stats_open, single_open is called and the respective release function is not called during release. Hence fix with correct release function - single_release(). Addresses https://bugzilla.kernel.org/show_bug.cgi?id=57101 Signed-off-by: Anurup m <anurup.m@huawei.com> Cc: shyju pv <shyju.pv@huawei.com> Cc: Sanil kumar <sanil.kumar@huawei.com> Cc: Nataraj m <nataraj.m@huawei.com> Cc: Li Zefan <lizefan@huawei.com> Cc: David Howells <dhowells@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29Merge branch 'nfs-for-next' of git://linux-nfs.org/~trondmy/nfs-2.6 into ↵J. Bruce Fields
for-3.10 Note conflict: Chuck's patches modified (and made static) gss_mech_get_by_OID, which is still needed by gss-proxy patches. The conflict resolution is a bit minimal; we may want some more cleanup.
2013-04-29proc: Split kcore bits from linux/procfs.h into linux/kcore.hDavid Howells
Split kcore bits from linux/procfs.h into linux/kcore.h. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Acked-by: Ralf Baechle <ralf@linux-mips.org> cc: linux-mips@linux-mips.org cc: sparclinux@vger.kernel.org cc: x86@kernel.org cc: linux-mm@kvack.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-04-29Include missing linux/magic.h inclusionsDavid Howells
Include missing linux/magic.h inclusions where the source file is currently expecting to get magic numbers through linux/proc_fs.h. Signed-off-by: David Howells <dhowells@redhat.com> cc: linux-efi@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-04-29Include missing linux/slab.h inclusionsDavid Howells
Include missing linux/slab.h inclusions where the source file is currently expecting to get kmalloc() and co. through linux/proc_fs.h. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> cc: linux-s390@vger.kernel.org cc: sparclinux@vger.kernel.org cc: linux-efi@vger.kernel.org cc: linux-mtd@lists.infradead.org cc: devel@driverdev.osuosl.org cc: x86@kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-04-29proc: Delete create_proc_read_entry()David Howells
Delete create_proc_read_entry() as it no longer has any users. Also delete read_proc_t, write_proc_t, the read_proc member of the proc_dir_entry struct and the support functions that use them. This saves a pointer for every PDE allocated. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-04-29fanotify: don't wank with FASYNC on ->release()Al Viro
... it's done already by __fput() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-04-29hppfs: get rid of ->fsync()Al Viro
it has grown by accident - directories there do *not* use page cache, so there's nothing to write. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-04-29hppfs: fix the leaks on close()Al Viro
we need to close the underlying procfs file and free ->private_data Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-04-29new helper: read_code()Al Viro
switch binfmts that use ->read() to that (and to kernel_read() in several cases in binfmt_flat - sure, it's nommu, but still, doing ->read() into kmalloc'ed buffer...) Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-04-29fs/block_dev.c: fix iov_shorten() criteria in blkdev_aio_read()Gu Zheng
blkdev_aio_read() test 'size' to see if it is equal or greater than the target count we request(iocb->ki_left). If so there is no need to call iov_shorten() to reduce number of segments and the iovec's length. So the judgement should be changed to 'if (size < iocb->ki_left)' instead. Signed-off-by: Jianpeng Ma <majianpeng@gmail.com> Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-04-29Btrfs: cleanup unused functionLiu Bo
btrfs_abort_devices() is no more used. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-04-29Merge tag 'driver-core-3.10-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core update from Greg Kroah-Hartman: "Here's the merge request for the driver core tree for 3.10-rc1 It's pretty small, just a number of driver core and sysfs updates and fixes, all of which have been in linux-next for a while now. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>" Fixed conflict in kernel/rtmutex-tester.c, the locking tree had a better fix for the same sysfs file mode problem. * tag 'driver-core-3.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: PM / Runtime: Idle devices asynchronously after probe|release driver core: handle user namespaces properly with the uid/gid devtmpfs change driver core: devtmpfs: fix compile failure with CONFIG_UIDGID_STRICT_TYPE_CHECKS devtmpfs: add base.h include driver core: add uid and gid to devtmpfs sysfs: check if one entry has been removed before freeing sysfs: fix crash_notes_size build warning sysfs: fix use after free in case of concurrent read/write and readdir rtmutex-tester: fix mode of sysfs files Documentation: Add ABI entry for crash_notes and crash_notes_size sysfs: Add crash_notes_size to export percpu note size driver core: platform_device.h: fix checkpatch errors and warnings driver core: platform.c: fix checkpatch errors and warnings driver core: warn that platform_driver_probe can not use deferred probing sysfs: use atomic_inc_unless_negative in sysfs_get_active base: core: WARN() about bogus permissions on device attributes device: separate all subsys mutexes
2013-04-29NFSv4: Warn once about servers that incorrectly apply open mode to setattrTrond Myklebust
Debugging aid to help identify servers that incorrectly apply open mode checks to setattr requests that are not changing the file size. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-04-29NFSv4: Servers should only check SETATTR stateid open mode on size changeTrond Myklebust
The NFSv4 and NFSv4.1 specs are both clear that the server should only check stateid open mode if a SETATTR specifies the size attribute. If the open mode is not one that allows writing, then it returns NFS4ERR_OPENMODE. In the case where the SETATTR is not changing the size, the client will still pass it the delegation stateid to ensure that the server does not recall that delegation. In that case, the server should _ignore_ the delegation open mode, and simply apply standard permission checks. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-04-29gfs2: Convert print_symbol to %pSRJoe Perches
Use the new vsprintf extension to avoid any possible message interleaving. Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2013-04-29jbd: use kmem_cache_zalloc for allocating journal headZheng Liu
This commit tries to use kmem_cache_zalloc instead of kmem_cache_alloc/ memset when a new journal head is alloctated. Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Jan Kara <jack@suse.cz>
2013-04-29f2fs: check truncation of mapping after lock_pageJaegeuk Kim
We call lock_page when we need to update a page after readpage. Between grab and lock page, the page can be truncated by other thread. So, we should check the page after lock_page whether it was truncated or not. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-04-29f2fs: enhance alloc_nid and build_free_nids flowsJaegeuk Kim
In order to avoid build_free_nid lock contention, let's change the order of function calls as follows. At first, check whether there is enough free nids. - If available, just get a free nid with spin_lock without any overhead. - Otherwise, conduct build_free_nids. : scan nat pages, journal nat entries, and nat cache entries. We should consider carefullly not to serve free nids intermediately made by build_free_nids. We can get stable free nids only after build_free_nids is done. Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-04-29f2fs: add a tracepoint on f2fs_new_inodeJaegeuk Kim
This can help when debugging the free nid allocation flows. Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-04-29romfs: fix nommu map length to keep inside filesystemGreg Ungerer
Checks introduced in commit 4991e7251 ("romfs: do not use mtd->get_unmapped_area directly") re-introduce problems fixed in the earlier commit 2b4b2482e ("romfs: fix romfs_get_unmapped_area() argument check"). If a flat binary app is located at the end of a romfs, its page aligned length may be outside of the romfs filesystem. The flat binary loader, via nommu do_mmap_pgoff(), page aligns the length it is mmaping. So simple offset+size checks will fail - returning EINVAL. We can truncate the length to keep it inside the romfs filesystem, and that also keeps the call to mtd_get_unmapped_area() happy. Are there any side effects to truncating the size here though? Signed-off-by: Greg Ungerer <gerg@uclinux.org>
2013-04-27xfs: implement extended feature masksDave Chinner
The version 5 superblock has extended feature masks for compatible, incompatible and read-only compatible feature sets. Implement the masking and mount-time checking for these feature masks. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-04-27xfs: add CRC checks to the superblockDave Chinner
With the addition of CRCs, there is such a wide and varied change to the on disk format that it makes sense to bump the superblock version number rather than try to use feature bits for all the new functionality. This commit introduces all the new superblock fields needed for all the new functionality: feature masks similar to ext4, separate project quota inodes, a LSN field for recovery and the CRC field. This commit does not bump the superblock version number, however. That will be done as a separate commit at the end of the series after all the new functionality is present so we switch it all on in one commit. This means that we can slowly introduce the changes without them being active and hence maintain bisectability of the tree. This patch is based on a patch originally written by myself back from SGI days, which was subsequently modified by Christoph Hellwig. There is relatively little of that patch remaining, but the history of the patch still should be acknowledged here. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-04-27xfs: buffer type overruns blf_flags fieldDave Chinner
The buffer type passed to log recvoery in the buffer log item overruns the blf_flags field. I had assumed that flags field was a 32 bit value, and it turns out it is a unisgned short. Therefore having 19 flags doesn't really work. Convert the buffer type field to numeric value, and use the top 5 bits of the flags field for it. We currently have 17 types of buffers, so using 5 bits gives us plenty of room for expansion in future.... Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-04-27xfs: add buffer types to directory and attribute buffersDave Chinner
Add buffer types to the buffer log items so that log recovery can validate the buffers and calculate CRCs correctly after the buffers are recovered. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-04-27xfs: add CRC protection to remote attributesDave Chinner
There are two ways of doing this - the first is to add a CRC to the remote attribute entry in the attribute block. The second is to treat them similar to the remote symlink, where each fragment has it's own header and identifies fragment location in the attribute. The problem with the CRC in the remote attr entry is that we cannot identify the owner of the metadata from the metadata blocks themselves, or where the blocks fit into the remote attribute. The down side to this approach is that we never know when the attribute has been read from disk or not and so we have to verify it every time it is read, and we must calculate it during the create transaction and log it. We do not log CRCs for any other metadata, and so this creates a unique set of coherency problems that, in general, are best avoided. Adding an identifying header to each allocated block allows us to identify each fragment and where in the attribute it is located. It enables us to rebuild the remote attribute from just the raw blocks containing the attribute. It also provides us to do per-block CRCs verification at IO time rather than during the transaction context that creates it or every time it is read into a user buffer. Hence it avoids all the problems that an external, logged CRC has, and provides all the benefits of self identifying metadata. The only complexity is that we have to add a header per fragment, and we don't know how many fragments will be needed prior to allocations. If we take the symlink example, the header is 56 bytes and hence for a 4k block size filesystem, in the worst case 16 headers requires 1 extra block for the 64k attribute data. For 512 byte filesystems the worst case is an extra block for every 9 fragments (i.e. 16 extra blocks in the worse case). This will be very rare and so it's not really a major concern. Because allocation is done in two steps - the first finds a hole large enough in the attribute file, the second does the allocation - we only need to find a hole big enough for a worst case allocation. We only need to allocate enough extra blocks for number of headers required by the fragments, and we can calculate that as we go.... Hence it really only makes sense to use the same model as for symlinks - it doesn't add that much complexity, does not require an attribute tree format change, and does not require logging calculated CRC values. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-04-27xfs: split remote attribute code outDave Chinner
Adding CRC support to remote attributes adds a significant amount of remote attribute specific code. Split the existing remote attribute code out into it's own file so that all the relevant remote attribute code is in a single, easy to find place. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-04-27xfs: add CRCs to attr leaf blocksDave Chinner
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-04-27xfs: add CRCs to dir2/da node blocksDave Chinner
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-04-27xfs: shortform directory offsets change for dir3 formatDave Chinner
Because the header size for the CRC enabled directory blocks is larger, the offset of the first entry into a directory block is different to the dir2 format. The shortform directory stores the dirent's offset so that it doesn't change when moving from shortform to block form and back again, and hence it needs to take into account the different header sizes to maintain the correct offsets. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-04-27xfs: add CRC checking to dir2 leaf blocksDave Chinner
This addition follows the same pattern as the dir2 block CRCs. Seeing as both LEAF1 and LEAFN types need to changed at the same time, this is a pretty large amount of change. leaf block headers need to be abstracted away from the on-disk structures (struct xfs_dir3_icleaf_hdr), as do the base leaf entry locations. This header abstract allows the in-core header and leaf entry location to be passed around instead of the leaf block itself. This saves a lot of converting individual variables from on-disk format to host format where they are used, so there's a good chance that the compiler will be able to produce much more optimal code as it's not having to byteswap variables all over the place. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-04-27xfs: add CRC checking to dir2 data blocksDave Chinner
This addition follows the same pattern as the dir2 block CRCs. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-04-27xfs: add CRC checking to dir2 free blocksDave Chinner
This addition follows the same pattern as the dir2 block CRCs, but with a few differences. The main difference is that the free block header is different between the v2 and v3 formats, so an "in-core" free block header has been added and _todisk/_from_disk functions used to abstract the differences in structure format from the code. This is similar to the on-disk superblock versus the in-core superblock setup. The in-core strucutre is populated when the buffer is read from disk, all the in memory checks and modifications are done on the in-core version of the structure which is written back to the buffer before the buffer is logged. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>