summaryrefslogtreecommitdiffstats
path: root/fs
AgeCommit message (Collapse)Author
2012-03-06nfsd41: refactor nfs4_open_deleg_none_ext logic out of nfs4_open_delegationBenny Halevy
When a 4.1 client asks for a delegation and the server returns none op_delegate_type is set to NFS4_OPEN_DELEGATE_NONE_EXT and op_why_no_deleg is set to either WND4_CONTENTION or WND4_RESOURCE. Or, if the client sent a NFS4_SHARE_WANT_CANCEL (which it is not supposed to ever do until our server supports delegations signaling), op_why_no_deleg is set to WND4_CANCELLED. Note that for WND4_CONTENTION and WND4_RESOURCE, the xdr layer is hard coded at this time to encode boolean FALSE for ond_server_will_push_deleg / ond_server_will_signal_avail. Signed-off-by: Benny Halevy <bhalevy@tonian.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-03-06nfsd4: fix recovery-entry leak nfsd startup failureJ. Bruce Fields
Another leak on error Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-03-06nfsd4: fix recovery-dir leak on nfsd startup failureJeff Layton
The current code never calls nfsd4_shutdown_recdir if nfs4_state_start returns an error. Also, it's better to go ahead and consolidate these functions since one is just a trivial wrapper around the other. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-03-06nfsd4: purge stable client records with insufficient stateJ. Bruce Fields
To escape having your stable storage record purged at the end of the grace period, it's not sufficient to simply have performed a setclientid_confirm; you also need to meet the same requirements as someone creating a new record: either you should have done an open or open reclaim (in the 4.0 case) or a reclaim_complete (in the 4.1 case). Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-03-06nfsd4: don't set cl_firststate on first reclaim in 4.1 caseJ. Bruce Fields
We set cl_firststate when we first decide that a client will be permitted to reclaim state on next boot. This happens: - for new 4.0 clients, when they confirm their first open - for returning 4.0 clients, when they reclaim their first open - for 4.1+ clients, when they perform reclaim_complete We also use cl_firststate to decide whether a reclaim_complete has already been performed, in the 4.1+ case. We were setting it on 4.1 open reclaims, which caused spurious COMPLETE_ALREADY errors on RECLAIM_COMPLETE from an nfs4.1 client with anything to reclaim. Reported-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-02-17nfsd41: implement NFS4_SHARE_WANT_NO_DELEG, NFS4_OPEN_DELEGATE_NONE_EXT, ↵Benny Halevy
why_no_deleg Respect client request for not getting a delegation in NFSv4.1 Appropriately return delegation "type" NFS4_OPEN_DELEGATE_NONE_EXT and WND4_NOT_WANTED reason. [nfsd41: add missing break when encoding op_why_no_deleg] Signed-off-by: Benny Halevy <bhalevy@tonian.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-02-17NFSD: Clean up the test_stateid functionBryan Schumaker
When I initially wrote it, I didn't understand how lists worked so I wrote something that didn't use them. I think making a list of stateids to test is a more straightforward implementation, especially compared to especially compared to decoding stateids while simultaneously encoding a reply to the client. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-02-17lockd: fix arg parsing for grace_period and timeout.NeilBrown
If you try to set grace_period or timeout via a module parameter to lockd, and do this on a big-endian machine where sizeof(int) != sizeof(unsigned long) it won't work. This number given will be effectively shifted right by the difference in those two sizes. So cast kp->arg properly to get correct result. Cc: stable@kernel.org Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-02-17nfsd41: split out share_access want and signal flags while decodingBenny Halevy
Signed-off-by: Benny Halevy <bhalevy@tonian.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-02-17nfsd41: share_access_to_flags should consider only nfs4.x share_access flagsBenny Halevy
Currently, it will not correctly ignore any nfsv4.1 signal flags if the client sends them. Signed-off-by: Benny Halevy <bhalevy@tonian.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-02-15nfsd41: use current stateid by valueTigran Mkrtchyan
Signed-off-by: Tigran Mkrtchyan <kofemann@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-02-15nfsd41: consume current stateid on DELEGRETURN and OPENDOWNGRADETigran Mkrtchyan
Signed-off-by: Tigran Mkrtchyan <kofemann@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-02-15nfsd41: handle current stateid in SETATTR and FREE_STATEIDTigran Mkrtchyan
Signed-off-by: Tigran Mkrtchyan <kofemann@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-02-15nfsd41: mark LOOKUP, LOOKUPP and CREATE to invalidate current stateidTigran Mkrtchyan
Signed-off-by: Tigran Mkrtchyan <kofemann@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-02-15nfsd41: save and restore current stateid with current fhTigran Mkrtchyan
Signed-off-by: Tigran Mkrtchyan <kofemann@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-02-15nfsd41: mark PUTFH, PUTPUBFH and PUTROOTFH to clear current stateidTigran Mkrtchyan
Signed-off-by: Tigran Mkrtchyan <kofemann@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-02-15nfsd41: consume current stateid on read and writeTigran Mkrtchyan
Signed-off-by: Tigran Mkrtchyan <kofemann@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-02-15nfsd41: handle current stateid on lock and lockuTigran Mkrtchyan
Signed-off-by: Tigran Mkrtchyan <kofemann@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-02-15nfsd41: handle current stateid in open and closeTigran Mkrtchyan
Signed-off-by: Tigran Mkrtchyan <kofemann@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-02-15nfsd4: initialize current stateid at compile timeTigran Mkrtchyan
Signed-off-by: Tigran Mkrtchyan <kofemann@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-02-14nfsd4: check for uninitialized slotJ. Bruce Fields
This fixes an oops when a buggy client tries to use an initial seqid of 0 on a new slot, which we may misinterpret as a replay. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-02-14nfsd4: rearrange struct nfsd4_slotJ. Bruce Fields
Combine two booleans into a single flag field, move the smaller fields to the end. (In practice this doesn't make the struct any smaller. But we'll be adding another flag here soon.) Remove some debugging code that doesn't look useful, while we're in the neighborhood. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-02-13nfsd4: fix sessions slotid wraparound logicJ. Bruce Fields
From RFC 5661 2.10.6.1: "If the previous sequence ID was 0xFFFFFFFF, then the next request for the slot MUST have the sequence ID set to zero." While we're there, delete some redundant comments. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-02-03nfsd: fix default iosize calculation on 32bitJ. Bruce Fields
The rpc buffers will be allocated out of low memory, so we should really only be taking that into account. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-02-03nfsd: cleanup setting of default max_block_sizeJ. Bruce Fields
Move calculation of the default into a helper function. Get rid of an unused variable "err" while we're there. Thanks to Mi Jinlong for catching an arithmetic error in a previous version. Cc: Mi Jinlong <mijinlong@cn.fujitsu.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-02-03nfsd: remove some unneeded checksDan Carpenter
We check for zero length strings in the caller now, so these aren't needed. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-01-31Merge tag 'for-linus' of git://github.com/prasad-joshi/logfs_upstreamLinus Torvalds
There are few important bug fixes for LogFS * tag 'for-linus' of git://github.com/prasad-joshi/logfs_upstream: Logfs: Allow NULL block_isbad() methods logfs: Grow inode in delete path logfs: Free areas before calling generic_shutdown_super() logfs: remove useless BUG_ON MAINTAINERS: Add Prasad Joshi in LogFS maintiners logfs: Propagate page parameter to __logfs_write_inode logfs: set superblock shutdown flag after generic sb shutdown logfs: take write mutex lock during fsync and sync logfs: Prevent memory corruption logfs: update page reference count for pined pages Fix up conflict in fs/logfs/dev_mtd.c due to semantic change in what "mtd->block_isbad" means in commit f2933e86ad93: "Logfs: Allow NULL block_isbad() methods" clashing with the abstraction changes in the commits 7086c19d0742: "mtd: introduce mtd_block_isbad interface" and d58b27ed58a3: "logfs: do not use 'mtd->block_isbad' directly". This resolution takes the semantics from commit f2933e86ad93, and just makes mtd_block_isbad() return zero (false) if the 'block_isbad' function is NULL. But that also means that now "mtd_can_have_bb()" always returns 0. Now, "mtd_block_markbad()" will obviously return an error if the low-level driver doesn't support bad blocks, so this is somewhat non-symmetric, but it actually makes sense if a NULL "block_isbad" function is considered to mean "I assume that all my blocks are always good".
2012-01-28Merge tag 'driver-core-3.3-rc1-bugfixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Here are some patches for the 3.3-rc1 tree. It contains the removal of the sysdev code, now that all users of it are gone, as well as some sysfs bugfixes that have been reported by users. There are also some documentation updates here as well. * tag 'driver-core-3.3-rc1-bugfixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: sysfs: Complain bitterly about attempts to remove files from nonexistent directories. stable: update documentation to ask for kernel version base/core.c:fix typo in comment in function device_add Documentation: devres: add allocation functions to list of supported calls Documentation update for the driver model core kernel-doc: fix new warnings in driver-core kernel-doc: fix new warnings in debugfs kernel-doc: fix new warnings in device.h driver core: remove drivers/base/sys.c and include/linux/sysdev.h
2012-01-28Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: Btrfs: fix reservations in btrfs_page_mkwrite Btrfs: advance window_start if we're using a bitmap btrfs: mask out gfp flags in releasepage Btrfs: fix enospc error caused by wrong checks of the chunk Btrfs: do not defrag a file partially Btrfs: fix warning for 32-bit build of fs/btrfs/check-integrity.c Btrfs: use cluster->window_start when allocating from a cluster bitmap Btrfs: Check for NULL page in extent_range_uptodate btrfs: Fix busyloops in transaction waiting code Btrfs: make sure a bitmap has enough bytes Btrfs: fix uninit warning in backref.c
2012-01-28Logfs: Allow NULL block_isbad() methodsJoern Engel
Not all mtd drivers define block_isbad(). Let's assume no bad blocks instead of refusing to mount. Signed-off-by: Joern Engel <joern@logfs.org>
2012-01-28logfs: Grow inode in delete pathJoern Engel
Can be necessary if an inode gets deleted (through -ENOSPC) before being written. Might be better to move this into logfs_write_rec(), but for now go with the stupid&safe patch. Signed-off-by: Joern Engel <joern@logfs.org>
2012-01-28logfs: Free areas before calling generic_shutdown_super()Joern Engel
Or hit an assertion in map_invalidatepage() instead. Signed-off-by: Joern Engel <joern@logfs.org>
2012-01-28logfs: remove useless BUG_ONJoern Engel
It prevents write sizes >4k. Signed-off-by: Joern Engel <joern@logfs.org>
2012-01-28logfs: Propagate page parameter to __logfs_write_inodePrasad Joshi
During GC LogFS has to rewrite each valid block to a separate segment. Rewrite operation reads data from an old segment and writes it to a newly allocated segment. Since every write operation changes data block pointers maintained in inode, inode should also be rewritten. In GC path to avoid AB-BA deadlock LogFS marks a page with PG_pre_locked in addition to locking the page (PG_locked). The page lock is ignored iff the page is pre-locked. LogFS uses a special file called segment file. The segment file maintains an 8 bytes entry for every segment. It keeps track of erase count, level etc. for every segment. Bad things happen with a segment belonging to the segment file is GCed ------------[ cut here ]------------ kernel BUG at /home/prasad/logfs/readwrite.c:297! invalid opcode: 0000 [#1] SMP Modules linked in: logfs joydev usbhid hid psmouse e1000 i2c_piix4 serio_raw [last unloaded: logfs] Pid: 20161, comm: mount Not tainted 3.1.0-rc3+ #3 innotek GmbH VirtualBox EIP: 0060:[<f809132a>] EFLAGS: 00010292 CPU: 0 EIP is at logfs_lock_write_page+0x6a/0x70 [logfs] EAX: 00000027 EBX: f73f5b20 ECX: c16007c8 EDX: 00000094 ESI: 00000000 EDI: e59be6e4 EBP: c7337b28 ESP: c7337b18 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 Process mount (pid: 20161, ti=c7336000 task=eb323f70 task.ti=c7336000) Stack: f8099a3d c7337b24 f73f5b20 00001002 c7337b50 f8091f6d f8099a4d f80994e4 00000003 00000000 c7337b68 00000000 c67e4400 00001000 c7337b80 f80935e5 00000000 00000000 00000000 00000000 e1fcf000 0000000f e59be618 c70bf900 Call Trace: [<f8091f6d>] logfs_get_write_page.clone.16+0xdd/0x100 [logfs] [<f80935e5>] logfs_mod_segment_entry+0x55/0x110 [logfs] [<f809460d>] logfs_get_segment_entry+0x1d/0x20 [logfs] [<f8091060>] ? logfs_cleanup_journal+0x50/0x50 [logfs] [<f809521b>] ostore_get_erase_count+0x1b/0x40 [logfs] [<f80965b8>] logfs_open_area+0xc8/0x150 [logfs] [<c141a7ec>] ? kmemleak_alloc+0x2c/0x60 [<f809668e>] __logfs_segment_write.clone.16+0x4e/0x1b0 [logfs] [<c10dd563>] ? mempool_kmalloc+0x13/0x20 [<c10dd563>] ? mempool_kmalloc+0x13/0x20 [<f809696f>] logfs_segment_write+0x17f/0x1d0 [logfs] [<f8092e8c>] logfs_write_i0+0x11c/0x180 [logfs] [<f8092f35>] logfs_write_direct+0x45/0x90 [logfs] [<f80934cd>] __logfs_write_buf+0xbd/0xf0 [logfs] [<c102900e>] ? kmap_atomic_prot+0x4e/0xe0 [<f809424b>] logfs_write_buf+0x3b/0x60 [logfs] [<f80947a9>] __logfs_write_inode+0xa9/0x110 [logfs] [<f8094cb0>] logfs_rewrite_block+0xc0/0x110 [logfs] [<f8095300>] ? get_mapping_page+0x10/0x60 [logfs] [<f8095aa0>] ? logfs_load_object_aliases+0x2e0/0x2f0 [logfs] [<f808e57d>] logfs_gc_segment+0x2ad/0x310 [logfs] [<f808e62a>] __logfs_gc_once+0x4a/0x80 [logfs] [<f808ed43>] logfs_gc_pass+0x683/0x6a0 [logfs] [<f8097a89>] logfs_mount+0x5a9/0x680 [logfs] [<c1126b21>] mount_fs+0x21/0xd0 [<c10f6f6f>] ? __alloc_percpu+0xf/0x20 [<c113da41>] ? alloc_vfsmnt+0xb1/0x130 [<c113db4b>] vfs_kern_mount+0x4b/0xa0 [<c113e06e>] do_kern_mount+0x3e/0xe0 [<c113f60d>] do_mount+0x34d/0x670 [<c10f2749>] ? strndup_user+0x49/0x70 [<c113fcab>] sys_mount+0x6b/0xa0 [<c142d87c>] syscall_call+0x7/0xb Code: f8 e8 8b 93 39 c9 8b 45 f8 3e 0f ba 28 00 19 d2 85 d2 74 ca eb d0 0f 0b 8d 45 fc 89 44 24 04 c7 04 24 3d 9a 09 f8 e8 09 92 39 c9 <0f> 0b 8d 74 26 00 55 89 e5 3e 8d 74 26 00 8b 10 80 e6 01 74 09 EIP: [<f809132a>] logfs_lock_write_page+0x6a/0x70 [logfs] SS:ESP 0068:c7337b18 ---[ end trace 96e67d5b3aa3d6ca ]--- The patch passes locked page to __logfs_write_inode. It calls function logfs_get_wblocks() to pre-lock the page. This ensures any further attempts to lock the page are ignored (esp from get_erase_count). Acked-by: Joern Engel <joern@logfs.org> Signed-off-by: Prasad Joshi <prasadjoshi.linux@gmail.com>
2012-01-28logfs: set superblock shutdown flag after generic sb shutdownPrasad Joshi
While unmounting the file system LogFS calls generic_shutdown_super. The function does file system independent superblock shutdown. However, it might result in call file system specific inode eviction. LogFS marks FS shutting down by setting bit LOGFS_SB_FLAG_SHUTDOWN in super->s_flags. Since, inode eviction might call truncate on inode, following BUG is observed when file system is unmounted: ------------[ cut here ]------------ kernel BUG at /home/prasad/logfs/segment.c:362! invalid opcode: 0000 [#1] PREEMPT SMP CPU 3 Modules linked in: logfs binfmt_misc ppdev virtio_blk parport_pc lp parport psmouse floppy virtio_pci serio_raw virtio_ring virtio Pid: 1933, comm: umount Not tainted 3.0.0+ #4 Bochs Bochs RIP: 0010:[<ffffffffa008c841>] [<ffffffffa008c841>] logfs_segment_write+0x211/0x230 [logfs] RSP: 0018:ffff880062d7b9e8 EFLAGS: 00010202 RAX: 000000000000000e RBX: ffff88006eca9000 RCX: 0000000000000000 RDX: ffff88006fd87c40 RSI: ffffea00014ff468 RDI: ffff88007b68e000 RBP: ffff880062d7ba48 R08: 8000000020451430 R09: 0000000000000000 R10: dead000000100100 R11: 0000000000000000 R12: ffff88006fd87c40 R13: ffffea00014ff468 R14: ffff88005ad0a460 R15: 0000000000000000 FS: 00007f25d50ea760(0000) GS:ffff88007fd80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000d05e48 CR3: 0000000062c72000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process umount (pid: 1933, threadinfo ffff880062d7a000, task ffff880070b44500) Stack: ffff880062d7ba38 ffff88005ad0a508 0000000000001000 0000000000000000 8000000020451430 ffffea00014ff468 ffff880062d7ba48 ffff88005ad0a460 ffff880062d7bad8 ffffea00014ff468 ffff88006fd87c40 0000000000000000 Call Trace: [<ffffffffa0088fee>] logfs_write_i0+0x12e/0x190 [logfs] [<ffffffffa0089360>] __logfs_write_rec+0x140/0x220 [logfs] [<ffffffffa0089312>] __logfs_write_rec+0xf2/0x220 [logfs] [<ffffffffa00894a4>] logfs_write_rec+0x64/0xd0 [logfs] [<ffffffffa0089616>] __logfs_write_buf+0x106/0x110 [logfs] [<ffffffffa008a19e>] logfs_write_buf+0x4e/0x80 [logfs] [<ffffffffa008a6b8>] __logfs_write_inode+0x98/0x110 [logfs] [<ffffffffa008a7c4>] logfs_truncate+0x54/0x290 [logfs] [<ffffffffa008abfc>] logfs_evict_inode+0xdc/0x190 [logfs] [<ffffffff8115eef5>] evict+0x85/0x170 [<ffffffff8115f126>] iput+0xe6/0x1b0 [<ffffffff8115b4a8>] shrink_dcache_for_umount_subtree+0x218/0x280 [<ffffffff8115ce91>] shrink_dcache_for_umount+0x51/0x90 [<ffffffff8114796c>] generic_shutdown_super+0x2c/0x100 [<ffffffffa008cc47>] logfs_kill_sb+0x57/0xf0 [logfs] [<ffffffff81147de5>] deactivate_locked_super+0x45/0x70 [<ffffffff811487ea>] deactivate_super+0x4a/0x70 [<ffffffff81163934>] mntput_no_expire+0xa4/0xf0 [<ffffffff8116469f>] sys_umount+0x6f/0x380 [<ffffffff814dd46b>] system_call_fastpath+0x16/0x1b Code: 55 c8 49 8d b6 a8 00 00 00 45 89 f9 45 89 e8 4c 89 e1 4c 89 55 b8 c7 04 24 00 00 00 00 e8 68 fc ff ff 4c 8b 55 b8 e9 3c ff ff ff <0f> 0b 0f 0b c7 45 c0 00 00 00 00 e9 44 fe ff ff 66 66 66 66 66 RIP [<ffffffffa008c841>] logfs_segment_write+0x211/0x230 [logfs] RSP <ffff880062d7b9e8> ---[ end trace fe6b040cea952290 ]--- Therefore, move super->s_flags setting after the fs-indenpendent work has been finished. Reviewed-by: Joern Engel <joern@logfs.org> Signed-off-by: Prasad Joshi <prasadjoshi.linux@gmail.com>
2012-01-28logfs: take write mutex lock during fsync and syncPrasad Joshi
LogFS uses super->s_write_mutex while writing data to disk. Taking the same mutex lock in sync and fsync code path solves the following BUG: ------------[ cut here ]------------ kernel BUG at /home/prasad/logfs/dev_bdev.c:134! Pid: 2387, comm: flush-253:16 Not tainted 3.0.0+ #4 Bochs Bochs RIP: 0010:[<ffffffffa007deed>] [<ffffffffa007deed>] bdev_writeseg+0x25d/0x270 [logfs] Call Trace: [<ffffffffa007c381>] logfs_open_area+0x91/0x150 [logfs] [<ffffffff8128dcb2>] ? find_level.clone.9+0x62/0x100 [<ffffffffa007c49c>] __logfs_segment_write.clone.20+0x5c/0x190 [logfs] [<ffffffff810ef005>] ? mempool_kmalloc+0x15/0x20 [<ffffffff810ef383>] ? mempool_alloc+0x53/0x130 [<ffffffffa007c7a4>] logfs_segment_write+0x1d4/0x230 [logfs] [<ffffffffa0078f8e>] logfs_write_i0+0x12e/0x190 [logfs] [<ffffffffa0079300>] __logfs_write_rec+0x140/0x220 [logfs] [<ffffffffa0079444>] logfs_write_rec+0x64/0xd0 [logfs] [<ffffffffa00795b6>] __logfs_write_buf+0x106/0x110 [logfs] [<ffffffffa007a13e>] logfs_write_buf+0x4e/0x80 [logfs] [<ffffffffa0073e33>] __logfs_writepage+0x23/0x80 [logfs] [<ffffffffa007410c>] logfs_writepage+0xdc/0x110 [logfs] [<ffffffff810f5ba7>] __writepage+0x17/0x40 [<ffffffff810f6208>] write_cache_pages+0x208/0x4f0 [<ffffffff810f5b90>] ? set_page_dirty+0x70/0x70 [<ffffffff810f653a>] generic_writepages+0x4a/0x70 [<ffffffff810f75d1>] do_writepages+0x21/0x40 [<ffffffff8116b9d1>] writeback_single_inode+0x101/0x250 [<ffffffff8116bdbd>] writeback_sb_inodes+0xed/0x1c0 [<ffffffff8116c5fb>] writeback_inodes_wb+0x7b/0x1e0 [<ffffffff8116cc23>] wb_writeback+0x4c3/0x530 [<ffffffff814d984d>] ? sub_preempt_count+0x9d/0xd0 [<ffffffff8116cd6b>] wb_do_writeback+0xdb/0x290 [<ffffffff814d984d>] ? sub_preempt_count+0x9d/0xd0 [<ffffffff814d6208>] ? _raw_spin_unlock_irqrestore+0x18/0x40 [<ffffffff8105aa5a>] ? del_timer+0x8a/0x120 [<ffffffff8116cfac>] bdi_writeback_thread+0x8c/0x2e0 [<ffffffff8116cf20>] ? wb_do_writeback+0x290/0x290 [<ffffffff8106d2e6>] kthread+0x96/0xa0 [<ffffffff814de514>] kernel_thread_helper+0x4/0x10 [<ffffffff8106d250>] ? kthread_worker_fn+0x190/0x190 [<ffffffff814de510>] ? gs_change+0xb/0xb RIP [<ffffffffa007deed>] bdev_writeseg+0x25d/0x270 [logfs] ---[ end trace 0211ad60a57657c4 ]--- Reviewed-by: Joern Engel <joern@logfs.org> Signed-off-by: Prasad Joshi <prasadjoshi.linux@gmail.com>
2012-01-28logfs: Prevent memory corruptionJoern Engel
This is a bad one. I wonder whether we were so far protected by no_free_segments(sb) usually being smaller than LOGFS_NO_AREAS. Found by Dan Carpenter <dan.carpenter@oracle.com> using smatch. Signed-off-by: Joern Engel <joern@logfs.org> Signed-off-by: Prasad Joshi <prasadjoshi.linux@gmail.com>
2012-01-28logfs: update page reference count for pined pagesPrasad Joshi
LogFS sets PG_private flag to indicate a pined page. We assumed that marking a page as private is enough to ensure its existence. But instead it is necessary to hold a reference count to the page. The change resolves the following BUG BUG: Bad page state in process flush-253:16 pfn:6a6d0 page flags: 0x100000000000808(uptodate|private) Suggested-and-Acked-by: Joern Engel <joern@logfs.org> Signed-off-by: Prasad Joshi <prasadjoshi.linux@gmail.com>
2012-01-27Btrfs: fix reservations in btrfs_page_mkwriteChris Mason
Josef fixed btrfs_page_mkwrite to properly release reserved extents if there was an error. But if we fail to get a reservation and we fail to dirty the inode (for ENOSPC reasons), we'll end up trying to release a reservation we never had. This makes sure we only release if we were able to reserve. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-01-26Btrfs: advance window_start if we're using a bitmapJosef Bacik
If we span a long area in a bitmap we could end up taking a lot of time searching to the next free area if we're searching from the original window_start, so advance window_start in order to make sure we don't do any superficial searching. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-01-26btrfs: mask out gfp flags in releasepageDavid Sterba
btree_releasepage is a callback and can be passed unknown gfp flags and then they may end up in kmem_cache_alloc called from alloc_extent_state, slab allocator will BUG_ON when there is HIGHMEM or DMA32 flag set. This may happen when btrfs is mounted from a loop device, which masks out __GFP_IO flag. The check in try_release_extent_state 3399 if ((mask & GFP_NOFS) == GFP_NOFS) 3400 mask = GFP_NOFS; will not work and passes unfiltered flags further resulting in crash at mm/slab.c:2963 [<000000000024ae4c>] cache_alloc_refill+0x3b4/0x5c8 [<000000000024c810>] kmem_cache_alloc+0x204/0x294 [<00000000001fd3c2>] mempool_alloc+0x52/0x170 [<000003c000ced0b0>] alloc_extent_state+0x40/0xd4 [btrfs] [<000003c000cee5ae>] __clear_extent_bit+0x38a/0x4cc [btrfs] [<000003c000cee78c>] try_release_extent_state+0x9c/0xd4 [btrfs] [<000003c000cc4c66>] btree_releasepage+0x7e/0xd0 [btrfs] [<0000000000210d84>] shrink_page_list+0x6a0/0x724 [<0000000000211394>] shrink_inactive_list+0x230/0x578 [<0000000000211bb8>] shrink_list+0x6c/0x120 [<0000000000211e4e>] shrink_zone+0x1e2/0x228 [<0000000000211f24>] shrink_zones+0x90/0x254 [<0000000000213410>] do_try_to_free_pages+0xac/0x420 [<0000000000213ae0>] try_to_free_pages+0x13c/0x1b0 [<0000000000204e6c>] __alloc_pages_nodemask+0x5b4/0x9a8 [<00000000001fb04a>] grab_cache_page_write_begin+0x7e/0xe8 Signed-off-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-01-26Btrfs: fix enospc error caused by wrong checks of the chunkMiao Xie
When we did sysbench test for inline files, enospc error happened easily though there was lots of free disk space which could be allocated for new chunks. Reproduce steps: # mkfs.btrfs -b $((2 * 1024 * 1024 * 1024)) <test partition> # mount <test partition> /mnt # ulimit -n 102400 # cd /mnt # sysbench --num-threads=1 --test=fileio --file-num=81920 \ > --file-total-size=80M --file-block-size=1K --file-io-mode=sync \ > --file-test-mode=seqwr prepare # sysbench --num-threads=1 --test=fileio --file-num=81920 \ > --file-total-size=80M --file-block-size=1K --file-io-mode=sync \ > --file-test-mode=seqwr run <soon later, BUG_ON() was triggered by enospc error> The reason of this bug is: Now, we can reserve space which is larger than the free space in the chunks if we have enough free disk space which can be used for new chunks. By this way, the space allocator should allocate a new chunk by force if there is no free space in the free space cache. But there are two wrong checks which break this operation. One is if (ret == -ENOSPC && num_bytes > min_alloc_size) in btrfs_reserve_extent(), it is wrong, we should try to allocate a new chunk even we fail to allocate free space by minimum allocable size. The other is if (space_info->force_alloc) force = space_info->force_alloc; in do_chunk_alloc(). It makes the allocator ignore CHUNK_ALLOC_FORCE If someone sets ->force_alloc to CHUNK_ALLOC_LIMITED, and makes the enospc error happen. Fix these two wrong checks. Especially the second one, we fix it by changing the value of CHUNK_ALLOC_LIMITED and CHUNK_ALLOC_FORCE, and make CHUNK_ALLOC_FORCE greater than CHUNK_ALLOC_LIMITED since CHUNK_ALLOC_FORCE has higher priority. And if the value which is passed in by the caller is greater than ->force_alloc, use the passed value. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-01-26Btrfs: do not defrag a file partiallyLiu Bo
xfstests 218 complains that btrfs defrags a file partially: After: 1 Write backwards sync, but contiguous - should defrag to 1 extent Before: 10 -After: 1 +After: 2 To fix this, we need to set max_to_defrag count properly. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-01-26Btrfs: fix warning for 32-bit build of fs/btrfs/check-integrity.cStefan Behrens
There have been 4 warnings on 32-bit build, they are herewith fixed. Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-01-26Btrfs: use cluster->window_start when allocating from a cluster bitmapJosef Bacik
We specifically set window_start in the cluster struct to indicate where the cluster starts in a bitmap, but we've been using min_start to indicate where we're searching from. This is usually the start of the blockgroup, so essentially means we're constantly searching from the start of any bitmap we find, which completely negates all the trouble we go to in order to setup a cluster. So start using window_start to make sure we actually use the area we found. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-01-26Btrfs: Check for NULL page in extent_range_uptodateMitch Harder
A user has encountered a NULL pointer kernel oops in btrfs when encountering media errors. The problem has been identified as an unhandled NULL pointer returned from find_get_page(). This modification simply checks for a NULL page, and returns with an error if found (the extent_range_uptodate() function returns 1 on errors). After testing this patch, the user reported that the error with the NULL pointer oops was solved. However, there is still a remaining problem with a thread becoming stuck in wait_on_page_locked(page) in the read_extent_buffer_pages(...) function in extent_io.c for (i = start_i; i < num_pages; i++) { page = extent_buffer_page(eb, i); wait_on_page_locked(page); if (!PageUptodate(page)) ret = -EIO; } This patch leaves the issue with the locked page yet to be resolved. Signed-off-by: Mitch Harder <mitch.harder@sabayonlinux.org> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-01-26btrfs: Fix busyloops in transaction waiting codeJan Kara
wait_log_commit() and wait_for_writer() were using slightly different conditions for deciding whether they should call schedule() and whether they should continue in the wait loop. Thus it could happen that we busylooped when the first condition was not true while the second one was. That is burning CPU cycles needlessly and is deadly on UP machines... Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-01-26Btrfs: make sure a bitmap has enough bytesJosef Bacik
We have only been checking for min_bytes available in bitmap entries, but we won't successfully setup a bitmap cluster unless it has at least bytes in the bitmap, so in the common case min_bytes is 4k and we want something like 2MB, so if there are a bunch of bitmap entries with less than 2mb's in them, we'll search all them anyway, which is suboptimal. Fix this check. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-01-26Btrfs: fix uninit warning in backref.cJan Schmidt
Added initialization with the declaration of ret. It isn't set later on the switch-default branch (which should never be taken). Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-01-25Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfsLinus Torvalds
Quoth Ben Myers: "Please pull in the following bugfix for xfs. We forgot to drop a lock on error in xfs_readlink. It hasn't been through -next yet, but there is no -next tree tomorrow. The fix is clear so I'm sending this request today." * 'for-linus' of git://oss.sgi.com/xfs/xfs: xfs: Fix missing xfs_iunlock() on error recovery path in xfs_readlink()