summaryrefslogtreecommitdiffstats
path: root/fs
AgeCommit message (Collapse)Author
2011-04-11Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfsLinus Torvalds
* 'for-linus' of git://oss.sgi.com/xfs/xfs: xfs: use proper interfaces for on-stack plugging xfs: fix xfs_debug warnings xfs: fix variable set but not used warnings xfs: convert log tail checking to a warning xfs: catch bad block numbers freeing extents. xfs: push the AIL from memory reclaim and periodic sync xfs: clean up code layout in xfs_trans_ail.c xfs: convert the xfsaild threads to a workqueue xfs: introduce background inode reclaim work xfs: convert ENOSPC inode flushing to use new syncd workqueue xfs: introduce a xfssyncd workqueue xfs: fix extent format buffer allocation size xfs: fix unreferenced var error in xfs_buf.c Also, applied patch from Tony Luck that fixes ia64: xfs_destroy_workqueues() should not be tagged with__exit in the branch before merging.
2011-04-11xfs_destroy_workqueues() should not be tagged with__exitLuck, Tony
ia64 throws away .exit sections for the built-in CONFIG case, so routines that are used in other circumstances should not be tagged as __exit. Signed-off-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-04-11Merge branch 'for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: fix data corruption regression by reverting commit 6de9843dab3f ext4: Allow indirect-block file to grow the file size to max file size ext4: allow an active handle to be started when freezing ext4: sync the directory inode in ext4_sync_parent() ext4: init timer earlier to avoid a kernel panic in __save_error_info jbd2: fix potential memory leak on transaction commit ext4: fix a double free in ext4_register_li_request ext4: fix credits computing for indirect mapped files ext4: remove unnecessary [cm]time update of quota file jbd2: move bdget out of critical section
2011-04-11Merge branch 'for-2.6.39' of git://linux-nfs.org/~bfields/linuxLinus Torvalds
* 'for-2.6.39' of git://linux-nfs.org/~bfields/linux: nfsd4: fix oops on lock failure nfsd: fix auth_domain reference leak on nlm operations
2011-04-10ext4: fix data corruption regression by reverting commit 6de9843dab3fTheodore Ts'o
Revert commit 6de9843dab3f2a1d4d66d80aa9e5782f80977d20, since it caused a data corruption regression with BitTorrent downloads. Thanks to Damien for discovering and bisecting to find the problem commit. https://bugzilla.kernel.org/show_bug.cgi?id=32972 Reported-by: Damien Grassart <damien@grassart.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-04-10ext4: Allow indirect-block file to grow the file size to max file sizeKazuya Mio
We can create 4402345721856 byte file with indirect block mapping. However, if we grow an indirect-block file to the size with ftruncate(), we can see an ext4 warning. The following patch fixes this problem. How to reproduce: # dd if=/dev/zero of=/mnt/mp1/hoge bs=1 count=0 seek=4402345721856 0+0 records in 0+0 records out 0 bytes (0 B) copied, 0.000221428 s, 0.0 kB/s # tail -n 1 /var/log/messages Nov 25 15:10:27 test kernel: EXT4-fs warning (device sda8): ext4_block_to_path:345: block 1074791436 > max in inode 12 Signed-off-by: Kazuya Mio <k-mio@sx.jp.nec.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-04-10ext4: allow an active handle to be started when freezingYongqiang Yang
ext4_journal_start_sb() should not prevent an active handle from being started due to s_frozen. Otherwise, deadlock is easy to happen, below is a situation. ================================================ freeze | truncate ================================================ | ext4_ext_truncate() freeze_super() | starts a handle sets s_frozen | | ext4_ext_truncate() | holds i_data_sem ext4_freeze() | waits for updates | | ext4_free_blocks() | calls dquot_free_block() | | dquot_free_blocks() | calls ext4_dirty_inode() | | ext4_dirty_inode() | trys to start an active | handle | | block due to s_frozen ================================================ Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reported-by: Amir Goldstein <amir73il@users.sf.net> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Andreas Dilger <adilger@dilger.ca>
2011-04-10ext4: sync the directory inode in ext4_sync_parent()Curt Wohlgemuth
ext4 has taken the stance that, in the absence of a journal, when an fsync/fdatasync of an inode is done, the parent directory should be sync'ed if this inode entry is new. ext4_sync_parent(), which implements this, does indeed sync the dirent pages for parent directories, but it does not sync the directory *inode*. This patch fixes this. Also now return error status from ext4_sync_parent(). I tested this using a power fail test, which panics a machine running a file server getting requests from a client. Without this patch, on about every other test run, the server is missing many, many files that had been synced. With this patch, on > 6 runs, I see zero files being lost. Google-Bug-Id: 4179519 Signed-off-by: Curt Wohlgemuth <curtw@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-04-10nfsd4: fix oops on lock failureJ. Bruce Fields
Lock stateid's can have access_bmap 0 if they were only partially initialized (due to a failed lock request); handle that case in free_generic_stateid. ------------[ cut here ]------------ kernel BUG at fs/nfsd/nfs4state.c:380! invalid opcode: 0000 [#1] SMP last sysfs file: /sys/kernel/mm/ksm/run Modules linked in: nfs fscache md4 nls_utf8 cifs ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat bridge stp llc nfsd lockd nfs_acl auth_rpcgss sunrpc ipv6 ppdev parport_pc parport pcnet32 mii pcspkr microcode i2c_piix4 BusLogic floppy [last unloaded: mperf] Pid: 1468, comm: nfsd Not tainted 2.6.38+ #120 VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform EIP: 0060:[<e24f180d>] EFLAGS: 00010297 CPU: 0 EIP is at nfs4_access_to_omode+0x1c/0x29 [nfsd] EAX: ffffffff EBX: dd758120 ECX: 00000000 EDX: 00000004 ESI: dd758120 EDI: ddfe657c EBP: dd54dde0 ESP: dd54dde0 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 Process nfsd (pid: 1468, ti=dd54c000 task=ddc92580 task.ti=dd54c000) Stack: dd54ddf0 e24f19ca 00000000 ddfe6560 dd54de08 e24f1a5d dd758130 deee3a20 ddfe6560 31270000 dd54df1c e24f52fd 0000000f dd758090 e2505dd0 0be304cf dbb51d68 0000000e ddfe657c ddcd8020 dd758130 dd758128 dd7580d8 dd54de68 Call Trace: [<e24f19ca>] free_generic_stateid+0x1c/0x3e [nfsd] [<e24f1a5d>] release_lockowner+0x71/0x8a [nfsd] [<e24f52fd>] nfsd4_lock+0x617/0x66c [nfsd] [<e24e57b6>] ? nfsd_setuser+0x199/0x1bb [nfsd] [<e24e056c>] ? nfsd_setuser_and_check_port+0x65/0x81 [nfsd] [<c07a0052>] ? _cond_resched+0x8/0x1c [<c04ca61f>] ? slab_pre_alloc_hook.clone.33+0x23/0x27 [<c04cac01>] ? kmem_cache_alloc+0x1a/0xd2 [<c04835a0>] ? __call_rcu+0xd7/0xdd [<e24e0dfb>] ? fh_verify+0x401/0x452 [nfsd] [<e24f0b61>] ? nfsd4_encode_operation+0x52/0x117 [nfsd] [<e24ea0d7>] ? nfsd4_putfh+0x33/0x3b [nfsd] [<e24f4ce6>] ? nfsd4_delegreturn+0xd4/0xd4 [nfsd] [<e24ea2c9>] nfsd4_proc_compound+0x1ea/0x33e [nfsd] [<e24de6ee>] nfsd_dispatch+0xd1/0x1a5 [nfsd] [<e1d6e1c7>] svc_process_common+0x282/0x46f [sunrpc] [<e1d6e578>] svc_process+0xdc/0xfa [sunrpc] [<e24de0fa>] nfsd+0xd6/0x115 [nfsd] [<e24de024>] ? nfsd_shutdown+0x24/0x24 [nfsd] [<c0454322>] kthread+0x62/0x67 [<c04542c0>] ? kthread_worker_fn+0x114/0x114 [<c07a6ebe>] kernel_thread_helper+0x6/0x10 Code: eb 05 b8 00 00 27 4f 8d 65 f4 5b 5e 5f 5d c3 83 e0 03 55 83 f8 02 89 e5 74 17 83 f8 03 74 05 48 75 09 eb 09 b8 02 00 00 00 eb 0b <0f> 0b 31 c0 eb 05 b8 01 00 00 00 5d c3 55 89 e5 57 56 89 d6 8d EIP: [<e24f180d>] nfs4_access_to_omode+0x1c/0x29 [nfsd] SS:ESP 0068:dd54dde0 ---[ end trace 2b0bf6c6557cb284 ]--- The trace route is: -> nfsd4_lock() -> if (lock->lk_is_new) { -> alloc_init_lock_stateid() 3739: stp->st_access_bmap = 0; ->if (status && lock->lk_is_new && lock_sop) -> release_lockowner() -> free_generic_stateid() -> nfs4_access_bmap_to_omode() -> nfs4_access_to_omode() 380: BUG(); ***** This problem was introduced by 0997b173609b9229ece28941c118a2a9b278796e. Reported-by: Mi Jinlong <mijinlong@cn.fujitsu.com> Tested-by: Mi Jinlong <mijinlong@cn.fujitsu.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-04-08Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6Linus Torvalds
* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: NFS: Change initial mount authflavor only when server returns NFS4ERR_WRONGSEC NFS: Fix a signed vs. unsigned secinfo bug Revert "net/sunrpc: Use static const char arrays"
2011-04-08Merge branch 'for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6: quota: Don't write quota info in dquot_commit() ext3: Fix writepage credits computation for ordered mode
2011-04-08xfs: use proper interfaces for on-stack pluggingChristoph Hellwig
Add proper blk_start_plug/blk_finish_plug pairs for the two places where we issue buffer I/O, and remove the blk_flush_plug in xfs_buf_lock and xfs_buf_iowait, given that context switches already flush the per-process plugging lists. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
2011-04-08xfs: fix xfs_debug warningsChristoph Hellwig
For a CONFIG_XFS_DEBUG=n build gcc complains about statements with no effect in xfs_debug: fs/xfs/quota/xfs_qm_syscalls.c: In function 'xfs_qm_scall_trunc_qfiles': fs/xfs/quota/xfs_qm_syscalls.c:291:3: warning: statement with no effect The reason for that is that the various new xfs message functions have a return value which is never used, and in case of the non-debug build xfs_debug the macro evaluates to a plain 0 which produces the above warnings. This can be fixed by turning xfs_debug into an inline function instead of a macro, but in addition to that I've also changed all the message helpers to return void as we never use their return values. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Alex Elder <aelder@sgi.com>
2011-04-08xfs: fix variable set but not used warningsChristoph Hellwig
GCC 4.6 now warnings about variables set but not used. Fix the trivially fixable warnings of this sort. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
2011-04-08xfs: convert log tail checking to a warningDave Chinner
On the Power platform, the log tail debug checks fire excessively causing the system to panic early in testing. The debug checks are known to be racy, though on x86_64 there is no evidence that they trigger at all. We want to keep the checks active on debug systems to alert us to problems with log space accounting, but we need to reduce the impact of a racy check on testing on the Power platform. As a result, convert the ASSERT conditions to warnings, and allow them to fire only once per filesystem mount. This will prevent false positives from interfering with testing, whilst still providing us with the indication that they may be a problem with log space accounting should that occur. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com>
2011-04-08xfs: catch bad block numbers freeing extents.Dave Chinner
A fuzzed filesystem crashed a kernel when freeing an extent with a block number beyond the end of the filesystem. Convert all the debug asserts in xfs_free_extent() to active checks so that we catch bad extents and return that the filesytsem is corrupted rather than crashing. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com>
2011-04-08xfs: push the AIL from memory reclaim and periodic syncDave Chinner
When we are short on memory, we want to expedite the cleaning of dirty objects. Hence when we run short on memory, we need to kick the AIL flushing into action to clean as many dirty objects as quickly as possible. To implement this, sample the lsn of the log item at the head of the AIL and use that as the push target for the AIL flush. Further, we keep items in the AIL that are dirty that are not tracked any other way, so we can get objects sitting in the AIL that don't get written back until the AIL is pushed. Hence to get the filesystem to the idle state, we might need to push the AIL to flush out any remaining dirty objects sitting in the AIL. This requires the same push mechanism as the reclaim push. This patch also renames xfs_trans_ail_tail() to xfs_ail_min_lsn() to match the new xfs_ail_max_lsn() function introduced in this patch. Similarly for xfs_trans_ail_push -> xfs_ail_push. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Alex Elder <aelder@sgi.com>
2011-04-08xfs: clean up code layout in xfs_trans_ail.cDave Chinner
This patch rearranges the location of functions in xfs_trans_ail.c to remove the need for forward declarations of those functions in preparation for adding new functions without the need for forward declarations. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Alex Elder <aelder@sgi.com>
2011-04-08xfs: convert the xfsaild threads to a workqueueDave Chinner
Similar to the xfssyncd, the per-filesystem xfsaild threads can be converted to a global workqueue and run periodically by delayed works. This makes sense for the AIL pushing because it uses variable timeouts depending on the work that needs to be done. By removing the xfsaild, we simplify the AIL pushing code and remove the need to spread the code to implement the threading and pushing across multiple files. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com>
2011-04-08xfs: introduce background inode reclaim workDave Chinner
Background inode reclaim needs to run more frequently that the XFS syncd work is run as 30s is too long between optimal reclaim runs. Add a new periodic work item to the xfs syncd workqueue to run a fast, non-blocking inode reclaim scan. Background inode reclaim is kicked by the act of marking inodes for reclaim. When an AG is first marked as having reclaimable inodes, the background reclaim work is kicked. It will continue to run periodically untill it detects that there are no more reclaimable inodes. It will be kicked again when the first inode is queued for reclaim. To ensure shrinker based inode reclaim throttles to the inode cleaning and reclaim rate but still reclaim inodes efficiently, make it kick the background inode reclaim so that when we are low on memory we are trying to reclaim inodes as efficiently as possible. This kick shoul d not be necessary, but it will protect against failures to kick the background reclaim when inodes are first dirtied. To provide the rate throttling, make the shrinker pass do synchronous inode reclaim so that it blocks on inodes under IO. This means that the shrinker will reclaim inodes rather than just skipping over them, but it does not adversely affect the rate of reclaim because most dirty inodes are already under IO due to the background reclaim work the shrinker kicked. These two modifications solve one of the two OOM killer invocations Chris Mason reported recently when running a stress testing script. The particular workload trigger for the OOM killer invocation is where there are more threads than CPUs all unlinking files in an extremely memory constrained environment. Unlike other solutions, this one does not have a performance impact on performance when memory is not constrained or the number of concurrent threads operating is <= to the number of CPUs. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com>
2011-04-08xfs: convert ENOSPC inode flushing to use new syncd workqueueDave Chinner
On of the problems with the current inode flush at ENOSPC is that we queue a flush per ENOSPC event, regardless of how many are already queued. Thi can result in hundreds of queued flushes, most of which simply burn CPU scanned and do no real work. This simply slows down allocation at ENOSPC. We really only need one active flush at a time, and we can easily implement that via the new xfs_syncd_wq. All we need to do is queue a flush if one is not already active, then block waiting for the currently active flush to complete. The result is that we only ever have a single ENOSPC inode flush active at a time and this greatly reduces the overhead of ENOSPC processing. On my 2p test machine, this results in tests exercising ENOSPC conditions running significantly faster - 042 halves execution time, 083 drops from 60s to 5s, etc - while not introducing test regressions. This allows us to remove the old xfssyncd threads and infrastructure as they are no longer used. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com>
2011-04-08xfs: introduce a xfssyncd workqueueDave Chinner
All of the work xfssyncd does is background functionality. There is no need for a thread per filesystem to do this work - it can al be managed by a global workqueue now they manage concurrency effectively. Introduce a new gglobal xfssyncd workqueue, and convert the periodic work to use this new functionality. To do this, use a delayed work construct to schedule the next running of the periodic sync work for the filesystem. When the sync work is complete, queue a new delayed work for the next running of the sync work. For laptop mode, we wait on completion for the sync works, so ensure that the sync work queuing interface can flush and wait for work to complete to enable the work queue infrastructure to replace the current sequence number and wakeup that is used. Because the sync work does non-trivial amounts of work, mark the new work queue as CPU intensive. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com>
2011-04-08xfs: fix extent format buffer allocation sizeDave Chinner
When formatting an inode item, we have to allocate a separate buffer to hold extents when there are delayed allocation extents on the inode and it is in extent format. The allocation size is derived from the in-core data fork representation, which accounts for delayed allocation extents, while the on-disk representation does not contain any delalloc extents. As a result of this mismatch, the allocated buffer can be far larger than needed to hold the real extent list which, due to the fact the inode is in extent format, is limited to the size of the literal area of the inode. However, we can have thousands of delalloc extents, resulting in an allocation size orders of magnitude larger than is needed to hold all the real extents. Fix this by limiting the size of the buffer being allocated to the size of the literal area of the inodes in the filesystem (i.e. the maximum size an inode fork can grow to). Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Alex Elder <aelder@sgi.com>
2011-04-07NFS: Change initial mount authflavor only when server returns NFS4ERR_WRONGSECBryan Schumaker
When attempting an initial mount, we should only attempt other authflavors if AUTH_UNIX receives a NFS4ERR_WRONGSEC error. This allows other errors to be passed back to userspace programs. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-04-07Merge branch 'for-linus' of git://git.infradead.org/ubifs-2.6Linus Torvalds
* 'for-linus' of git://git.infradead.org/ubifs-2.6: UBI: do not select KALLSYMS_ALL UBI: do not compare array with NULL UBI: check if we are in RO mode in the erase routine UBIFS: fix debugging failure in dbg_check_space_info UBIFS: fix error path in dbg_debugfs_init_fs UBIFS: unify error path dbg_debugfs_init_fs UBIFS: do not select KALLSYMS_ALL UBIFS: fix assertion warnings UBIFS: fix oops on error path in read_pnode UBIFS: do not read flash unnecessarily
2011-04-07Merge branch 'for-linus2' of git://git.profusion.mobi/users/lucas/linux-2.6Linus Torvalds
* 'for-linus2' of git://git.profusion.mobi/users/lucas/linux-2.6: Fix common misspellings
2011-04-06NFS: Fix a signed vs. unsigned secinfo bugBryan Schumaker
rpc_authflavor_t is cast from an unsigned int, but the initial code tried to use it as a signed int. I fix this by passing an rpc_authflavor_t pointer around, and returning signed integers from functions. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Reported-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-04-05ext4: init timer earlier to avoid a kernel panic in __save_error_infoTao Ma
During mount, when we fail to open journal inode or root inode, the __save_error_info will mod_timer. But actually s_err_report isn't initialized yet and the kernel oops. The detailed information can be found https://bugzilla.kernel.org/show_bug.cgi?id=32082. The best way is to check whether the timer s_err_report is initialized or not. But it seems that in include/linux/timer.h, we can't find a good function to check the status of this timer, so this patch just move the initializtion of s_err_report earlier so that we can avoid the kernel panic. The corresponding del_timer is also added in the error path. Reported-by: Sami Liedes <sliedes@cc.hut.fi> Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-04-05jbd2: fix potential memory leak on transaction commitZhang Huan
There is potential memory leak of journal head in function jbd2_journal_commit_transaction. The problem is that JBD2 will not reclaim the journal head of commit record if error occurs or journal is abotred. I use the following script to reproduce this issue, on a RHEL6 system. I found it very easy to reproduce with async commit enabled. mount /dev/sdb /mnt -o journal_checksum,journal_async_commit touch /mnt/xxx echo offline > /sys/block/sdb/device/state sync umount /mnt rmmod ext4 rmmod jbd2 Removal of the jbd2 module will make slab complaining that "cache `jbd2_journal_head': can't free all objects". Signed-off-by: Zhang Huan <zhhuan@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-04-05Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-2.6-block * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-2.6-block: ide: always ensure that blk_delay_queue() is called if we have pending IO block: fix request sorting at unplug dm: improve block integrity support fs: export empty_aops ide: ide_requeue_and_plug() reinstate "always plug" behaviour blk-throttle: don't call xchg on bool ufs: remove unessecary blk_flush_plug block: make the flush insertion use the tail of the dispatch list block: get rid of elv_insert() interface block: dump request state on seeing a corrupted request completion
2011-04-05inotify: fix double free/corruption of stuct userEric Paris
On an error path in inotify_init1 a normal user can trigger a double free of struct user. This is a regression introduced by a2ae4cc9a16e ("inotify: stop kernel memory leak on file creation failure"). We fix this by making sure that if a group exists the user reference is dropped when the group is cleaned up. We should not explictly drop the reference on error and also drop the reference when the group is cleaned up. The new lifetime rules are that an inotify group lives from inotify_new_group to the last fsnotify_put_group. Since the struct user and inotify_devs are directly tied to this lifetime they are only changed/updated in those two locations. We get rid of all special casing of struct user or user->inotify_devs. Signed-off-by: Eric Paris <eparis@redhat.com> Cc: stable@kernel.org (2.6.37 and up) Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-04-05fs: export empty_aopsJens Axboe
With the ->sync_page() hook gone, we have a few users that add their own static address_space_operations without any functions defined. fs/inode.c already has an empty_aops that it uses for init purposes. Lets export that and use it in the places where an otherwise empty aops was defined. Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-04-05ufs: remove unessecary blk_flush_plugChristoph Hellwig
We already flush the per-process plugging list when context switching, so a blk_flush_plug call just before a yield() is not needed. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-04-05Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: Btrfs: don't warn in btrfs_add_orphan Btrfs: fix free space cache when there are pinned extents and clusters V2 Btrfs: Fix uninitialized root flags for subvolumes btrfs: clear __GFP_FS flag in the space cache inode Btrfs: fix memory leak in start_transaction() Btrfs: fix memory leak in btrfs_ioctl_start_sync() Btrfs: fix subvol_sem leak in btrfs_rename() Btrfs: Fix oops for defrag with compression turned on Btrfs: fix /proc/mounts info. Btrfs: fix compiler warning in file.c
2011-04-05UBIFS: fix debugging failure in dbg_check_space_infoArtem Bityutskiy
This patch fixes a debugging failure with which looks like this: UBIFS error (pid 32313): dbg_check_space_info: free space changed from 6019344 to 6022654 The reason for this failure is described in the comment this patch adds to the code. But in short - 'c->freeable_cnt' may be different before and after re-mounting, and this is normal. So the debugging code should make sure that free space calculations do not depend on 'c->freeable_cnt'. A similar issue has been reported here: http://lists.infradead.org/pipermail/linux-mtd/2011-April/034647.html This patch should fix it. For the -stable guys: this patch is only relevant for kernels 2.6.30 onwards. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Cc: stable@kernel.org [2.6.30+]
2011-04-05UBIFS: fix error path in dbg_debugfs_init_fsArtem Bityutskiy
The debug interface is substandard and on error returns either NULL or an error code packed in the pointer. So using "IS_ERR" for the pointers returned by debugfs function is incorrect. Instead, we should use IS_ERR_OR_NULL. This path is an improved vestion of the original patch from Phil Carmody. Reported-by: Phil Carmody <ext-phil.2.carmody@nokia.com> Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Acked-by: Phil Carmody <ext-phil.2.carmody@nokia.com>
2011-04-05UBIFS: unify error path dbg_debugfs_init_fsArtem Bityutskiy
This is just a small clean-up patch which simlifies and unifies the error path in the dbg_debugfs_init_fs(). We have common error path for all failure cases in this function except of the very first case. And this patch makes the first failure case use the same error path as the other cases by using the 'fname' and 'dent' variables. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Acked-by: Phil Carmody <ext-phil.2.carmody@nokia.com>
2011-04-05UBIFS: do not select KALLSYMS_ALLArtem Bityutskiy
All UBIFS needs is to make sure we stacktraces when UBIFS debugging is enabled. It is enough to select KALLSYMS for this, KALLSYMS_ALL is not necessary. Moreover, Randy Dunlap reported that UBIFS causes the following Kconfig dependency warning: warning: (UBIFS_FS_DEBUG && LOCKDEP && LATENCYTOP) selects KALLSYMS_ALL which has unmet direct dependencies (DEBUG_KERNEL && KALLSYMS) The reason is that KALLSYMS_ALL requires DEBUG_KERNEL and KALLSYMS, so ideally, to select KALLSYMS_ALL we'd need to select DEBUG_KERNEL and KALLSYMS first. This seems to be too much to select. The easiest way to go is to forget about KALLSYMS_ALL and just select KALLSYMS when UBIFS debugging is enabled - that should be enough for stackdumps. Reported-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
2011-04-05UBIFS: fix assertion warningsArtem Bityutskiy
This patch fixes UBIFS assertion warnings like: UBIFS assert failed in ubifs_leb_unmap at 135 (pid 29365) Pid: 29365, comm: integck Tainted: G I 2.6.37-ubi-2.6+ #34 Call Trace: [<ffffffffa047c663>] ubifs_lpt_init+0x95e/0x9ee [ubifs] [<ffffffffa04623a7>] ubifs_remount_fs+0x2c7/0x762 [ubifs] [<ffffffff810f066e>] do_remount_sb+0xb6/0x101 [<ffffffff81106ff4>] ? do_mount+0x191/0x78e [<ffffffff811070bb>] do_mount+0x258/0x78e [<ffffffff810da1e8>] ? alloc_pages_current+0xa2/0xc5 [<ffffffff81107674>] sys_mount+0x83/0xbd [<ffffffff81009a12>] system_call_fastpath+0x16/0x1b They happen when we re-mount from R/O mode to R/W mode. While re-mounting, we write to the media, but we still have the c->ro_mount flag set. The fix is very simple - just clear the flag before starting re-mounting R/W. These warnings are caused by the following commit: 2ef13294d29bcfb306e0d360f1b97f37b647b0c0 For -stable guys: this bug was introduced in 2.6.38, this is materieal for 2.6.38-stable. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Cc: stable@kernel.org [2.6.38]
2011-04-05UBIFS: fix oops on error path in read_pnodeArtem Bityutskiy
Thanks to coverity which spotted that UBIFS will oops if 'kmalloc()' in 'read_pnode()' fails and we dereference a NULL 'pnode' pointer when we 'goto out'. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Cc: stable@kernel.org
2011-04-05UBIFS: do not read flash unnecessarilyArtem Bityutskiy
This fix makes the 'dbg_check_old_index()' function return immediately if debugging is disabled, instead of executing incorrect 'goto out' which causes UBIFS to: 1. Allocate memory 2. Read the flash On every commit. OK, we do not commit that often, but it is still silly to do unneeded I/O anyway. Credits to coverity for spotting this silly issue. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Cc: stable@kernel.org
2011-04-05Btrfs: don't warn in btrfs_add_orphanJosef Bacik
When I moved the orphan adding to btrfs_truncate I missed the fact that during orphan cleanup we just add the orphan items to the orphan list without going through btrfs_orphan_add, which results in lots of warnings on mount if you have any orphan items that need to be truncated. Just remove this warning since it's ok, this will allow all of the normal space accounting take place. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-04-05Btrfs: fix free space cache when there are pinned extents and clusters V2Josef Bacik
I noticed a huge problem with the free space cache that was presenting as an early ENOSPC. Turns out when writing the free space cache out I forgot to take into account pinned extents and more importantly clusters. This would result in us leaking free space everytime we unmounted the filesystem and remounted it. I fix this by making sure to check and see if the current block group has a cluster and writing out any entries that are in the cluster to the cache, as well as writing any pinned extents we currently have to the cache since those will be available for us to use the next time the fs mounts. This patch also adds a check to the end of load_free_space_cache to make sure we got the right amount of free space cache, and if not make sure to clear the cache and re-cache the old fashioned way. Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-04-05Btrfs: Fix uninitialized root flags for subvolumesLi Zefan
root_item->flags and root_item->byte_limit are not initialized when a subvolume is created. This bug is not revealed until we added readonly snapshot support - now you mount a btrfs filesystem and you may find the subvolumes in it are readonly. To work around this problem, we steal a bit from root_item->inode_item->flags, and use it to indicate if those fields have been properly initialized. When we read a tree root from disk, we check if the bit is set, and if not we'll set the flag and initialize the two fields of the root item. Reported-by: Andreas Philipp <philipp.andreas@gmail.com> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Tested-by: Andreas Philipp <philipp.andreas@gmail.com> cc: stable@kernel.org Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-04-05btrfs: clear __GFP_FS flag in the space cache inodeMiao Xie
the object id of the space cache inode's key is allocated from the relative root, just like the regular file. So we can't identify space cache inode by checking the object id of the inode's key, and we have to clear __GFP_FS flag at the time we look up the space cache inode. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-04-05Btrfs: fix memory leak in start_transaction()Yoshinori Sano
Free btrfs_trans_handle when join_transaction() fails in start_transaction() Signed-off-by: Yoshinori Sano <yoshinori.sano@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-04-05Btrfs: fix memory leak in btrfs_ioctl_start_sync()Tsutomu Itoh
Call btrfs_end_transaction() if btrfs_commit_transaction_async() fails. Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-04-05Btrfs: fix subvol_sem leak in btrfs_rename()Johann Lombardi
btrfs_rename() does not release the subvol_sem if the transaction failed to start. Signed-off-by: Johann Lombardi <johann@whamcloud.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-04-05Btrfs: Fix oops for defrag with compression turned onLi Zefan
When we defrag a file, whose size can be fit into an inline extent, with compression enabled, the compress type is set to be fs_info->compress_type, which is 0 if the btrfs filesystem is mounted without compress option. This leads to oops. Reported-by: Daniel Blueman <daniel.blueman@gmail.com> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-04-05Btrfs: fix /proc/mounts info.Tsutomu Itoh
Some mount options are not displayed by /proc/mounts. This patch displays the option such as compress_type by /proc/mounts. Ex. [before] $ mount | grep sdc2 /dev/sdc2 on /test12 type btrfs (rw,space_cache,compress=lzo) $ cat /proc/mounts | grep sdc2 /dev/sdc2 /test12 btrfs rw,relatime,compress 0 0 [after] $ mount | grep sdc2 /dev/sdc2 on /test12 type btrfs (rw,space_cache,compress=lzo) $ cat /proc/mounts | grep sdc2 /dev/sdc2 /test12 btrfs rw,relatime,compress=lzo,space_cache 0 0 Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>