summaryrefslogtreecommitdiffstats
path: root/fs
AgeCommit message (Collapse)Author
2010-09-17nfs: standardize the rename args containerJeff Layton
Each NFS version has its own version of the rename args container. Standardize them on a common one that's identical to the one NFSv4 uses. Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-09-17ceph: select CRYPTOSage Weil
We select CRYPTO_AES, but not CRYPTO. Signed-off-by: Sage Weil <sage@newdream.net>
2010-09-17ceph: check mapping to determine if FILE_CACHE cap is usedSage Weil
See if the i_data mapping has any pages to determine if the FILE_CACHE capability is currently in use, instead of assuming it is any time the rdcache_gen value is set (i.e., issued -> used). This allows the MDS RECALL_STATE process work for inodes that have cached pages. Signed-off-by: Sage Weil <sage@newdream.net>
2010-09-17ceph: only send one flushsnap per cap_snap per mds sessionSage Weil
Sending multiple flushsnap messages is problematic because we ignore the response if the tid doesn't match, and the server may only respond to each one once. It's also a waste. So, skip cap_snaps that are already on the flushing list, unless the caller tells us to resend (because we are reconnecting). Signed-off-by: Sage Weil <sage@newdream.net>
2010-09-17NFS: Add an 'open_context' element to struct nfs_rpc_opsTrond Myklebust
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-09-17NFS: Clean up nfs4_proc_create()Trond Myklebust
Remove all remaining references to the struct nameidata from the low level NFS layers. Again pass down a partially initialised struct nfs_open_context when we want to do atomic open+create. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-09-17NFSv4: Further cleanups for nfs4_open_revalidate()Trond Myklebust
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-09-17NFSv4: Clean up nfs4_open_revalidateTrond Myklebust
Remove references to 'struct nameidata' from the low-level open_revalidate code, and replace them with a struct nfs_open_context which will be correctly initialised upon success. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-09-17NFSv4: Further minor cleanups for nfs4_atomic_open()Trond Myklebust
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-09-17NFSv4: Clean up nfs4_atomic_openTrond Myklebust
Start moving the 'struct nameidata' dependent code out of the lower level NFS code in preparation for the removal of open intents. Instead of the struct nameidata, we pass down a partially initialised struct nfs_open_context that will be fully initialised by the atomic open upon success. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-09-17NFS: Allow NFSROOT debugging messages to be enabled dynamicallyChuck Lever
As a convenience, introduce a kernel command line option to enable NFSROOT debugging messages. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-09-17NFS: Clean up nfsroot.cChuck Lever
Clean up: now that mount option parsing for nfsroot is handled in fs/nfs/super.c, remove code in fs/nfs/nfsroot.c that is no longer used. This includes code that constructs the legacy nfs_mount_data structure, and code that does a MNT call to the server. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-09-17NFS: Use super.c for NFSROOT mount option parsingChuck Lever
Replace duplicate code in NFSROOT for mounting an NFS server on '/' with logic that uses the existing mainline text-based logic in the NFS client. Add documenting comments where appropriate. Note that this means NFSROOT mounts now use the same default settings as v2/v3 mounts done via mount(2) from user space. vers=3,tcp,rsize=<negotiated default>,wsize=<negotiated default> As before, however, no version/protocol negotiation with the server is done. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-09-17NFS: Clean up NFSROOT command line parsingChuck Lever
Clean up: To reduce confusion, rename nfs_root_name as nfs_root_parms, as this buffer contains more than just the name of the remote server. Introduce documenting comments around nfs_root_setup(). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-09-17NFS: Remove \t from mount debugging messageChuck Lever
During boot, a random character is displayed instead of a tab. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-09-17UBIFS: introduce new flag for RO due to errorsArtem Bityutskiy
The R/O state may have various reasons: 1. The UBI volume is R/O 2. The FS is mounted R/O 3. The FS switched to R/O mode because of an error However, in UBIFS we have only one variable which represents cases 1 and 3 - 'c->ro_media'. Indeed, we set this to 1 if we switch to R/O mode due to an error, and then we test it in many places to make sure that we stop writing as soon as the error happens. But this is very unclean. One consequence of this, for example, is that in 'ubifs_remount_fs()' we use 'c->ro_media' to check whether we are in R/O mode because on an error, and we print a message in this case. However, if we are in R/O mode because the media is R/O, our message is bogus. This patch introduces new flag - 'c->ro_error' which is set when we switch to R/O mode because of an error. It also changes all "if (c->ro_media)" checks to "if (c->ro_error)" checks, because this is what the checks actually mean. We do not need to check for 'c->ro_media' because if the UBI volume is in R/O mode, we do not allow R/W mounting, and now writes can happen. This is guaranteed by VFS. But it is good to double-check this, so this patch also adds many "ubifs_assert(!c->ro_media)" checks. In the 'ubifs_remount_fs()' function this patch makes a bit more changes - it fixes the error messages as well. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2010-09-17GFS2: gfs2_logd should be using interruptible waitsSteven Whitehouse
Looks like this crept in, in a recent update. Reported-by: Krzysztof Urbaniak <urban@bash.org.pl> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2010-09-16ceph: fix cap_snap and realm splitSage Weil
The cap_snap creation/queueing relies on both the current i_head_snapc _and_ the i_snap_realm pointers being correct, so that the new cap_snap can properly reference the old context and the new i_head_snapc can be updated to reference the new snaprealm's context. To fix this, we: - move inodes completely to the new (split) realm so that i_snap_realm is correct, and - generate the new snapc's _before_ queueing the cap_snaps in ceph_update_snap_trace(). Signed-off-by: Sage Weil <sage@newdream.net>
2010-09-16Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: cifs: fix potential double put of TCP session reference
2010-09-16block: remove BLKDEV_IFL_WAITChristoph Hellwig
All the blkdev_issue_* helpers can only sanely be used for synchronous caller. To issue cache flushes or barriers asynchronously the caller needs to set up a bio by itself with a completion callback to move the asynchronous state machine ahead. So drop the BLKDEV_IFL_WAIT flag that is always specified when calling blkdev_issue_* and also remove the now unused flags argument to blkdev_issue_flush and blkdev_issue_zeroout. For blkdev_issue_discard we need to keep it for the secure discard flag, which gains a more descriptive name and loses the bitops vs flag confusion. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-09-15ocfs2: Initialize the bktcnt variable properly, and call it bucket_countJoel Becker
Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-09-15ocfs2: Silence unused warning.Joel Becker
When CONFIG_OCFS2_DEBUG_MASKLOG is undefined, we don't use the dentry variable in ocfs2_sync_file(). Let's just move all access to the dentry inside the logging call. Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-09-15genhd, efi: add efi partition metadata to hd_structsWill Drewry
This change extends the partition_meta_info structure to support EFI GPT-specific metadata and ensures that data is copied in on partition scanning. Signed-off-by: Will Drewry <wad@chromium.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-09-15block, partition: add partition_meta_info to hd_structWill Drewry
I'm reposting this patch series as v4 since there have been no additional comments, and I cleaned up one extra bit of unneeded code (in 3/3). The patches are against Linus's tree: 2bfc96a127bc1cc94d26bfaa40159966064f9c8c (2.6.36-rc3). Would this patchset be suitable for inclusion in an mm branch? This changes adds a partition_meta_info struct which itself contains a union of structures that provide partition table specific metadata. This change leaves the union empty. The subsequent patch includes an implementation for CONFIG_EFI_PARTITION-based metadata. Signed-off-by: Will Drewry <wad@chromium.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-09-14Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6Linus Torvalds
* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: SUNRPC: Fix the NFSv4 and RPCSEC_GSS Kconfig dependencies statfs() gives ESTALE error NFS: Fix a typo in nfs_sockaddr_match_ipaddr6 sunrpc: increase MAX_HASHTABLE_BITS to 14 gss:spkm3 miss returning error to caller when import security context gss:krb5 miss returning error to caller when import security context Remove incorrect do_vfs_lock message SUNRPC: cleanup state-machine ordering SUNRPC: Fix a race in rpc_info_open SUNRPC: Fix race corrupting rpc upcall Fix null dereference in call_allocate
2010-09-14aio: check for multiplication overflow in do_io_submitJeff Moyer
Tavis Ormandy pointed out that do_io_submit does not do proper bounds checking on the passed-in iocb array:        if (unlikely(nr < 0))                return -EINVAL;        if (unlikely(!access_ok(VERIFY_READ, iocbpp, (nr*sizeof(iocbpp)))))                return -EFAULT;                      ^^^^^^^^^^^^^^^^^^ The attached patch checks for overflow, and if it is detected, the number of iocbs submitted is scaled down to a number that will fit in the long.  This is an ok thing to do, as sys_io_submit is documented as returning the number of iocbs submitted, so callers should handle a return value of less than the 'nr' argument passed in. Reported-by: Tavis Ormandy <taviso@cmpxchg8b.com> Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-09-14cifs: fix potential double put of TCP session referenceJeff Layton
cifs_get_smb_ses must be called on a server pointer on which it holds an active reference. It first does a search for an existing SMB session. If it finds one, it'll put the server reference and then try to ensure that the negprot is done, etc. If it encounters an error at that point then it'll return an error. There's a potential problem here though. When cifs_get_smb_ses returns an error, the caller will also put the TCP server reference leading to a double-put. Fix this by having cifs_get_smb_ses only put the server reference if it found an existing session that it could use and isn't returning an error. Cc: stable@kernel.org Reviewed-by: Suresh Jayaraman <sjayaraman@suse.de> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
2010-09-14ceph: stop sending FLUSHSNAPs when we hit a dirty capsnapSage Weil
Stop sending FLUSHSNAP messages when we hit a capsnap that has dirty_pages or is still writing. We'll send the newer capsnaps only after the older ones complete. Signed-off-by: Sage Weil <sage@newdream.net>
2010-09-14ceph: correctly set 'follows' in flushsnap messagesSage Weil
The 'follows' should match the seq for the snap context for the given snap cap, which is the context under which we have been dirtying and writing data and metadata. The snapshot that _contains_ those updates thus _follows_ that context's seq #. Signed-off-by: Sage Weil <sage@newdream.net>
2010-09-13Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: cifs: prevent possible memory corruption in cifs_demultiplex_thread cifs: eliminate some more premature cifsd exits cifs: prevent cifsd from exiting prematurely [CIFS] ntlmv2/ntlmssp remove-unused-function CalcNTLMv2_partial_mac_key cifs: eliminate redundant xdev check in cifs_rename Revert "[CIFS] Fix ntlmv2 auth with ntlmssp" Revert "missing changes during ntlmv2/ntlmssp auth and sign" Revert "Eliminate sparse warning - bad constant expression" Revert "[CIFS] Eliminate unused variable warning"
2010-09-13ceph: fix dn offset during readdir_prepopulateSage Weil
When adding the readdir results to the cache, ceph_set_dentry_offset was clobbered our just-set offset. This can cause the readdir result offsets to get out of sync with the server. Add an argument to the helper so that it does not. This bug was introduced by 1cd3935bedccf592d44343890251452a6dd74fc4. Signed-off-by: Sage Weil <sage@newdream.net>
2010-09-13fs/9p: Don't use dotl version of mknod for dotu inode operationsAneesh Kumar K.V
We should not use dotlversion for the dotu inode operations Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-09-13fs/9p: Use the correct dentry operationsAneesh Kumar K.V
We should use the cached dentry operation only if caching mode is enabled Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-09-139p: Check for NULL fid in v9fs_dir_release()jvrao
NULL fid should be handled in cases where we endup calling v9fs_dir_release() before even we instantiate the fid in filp. Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-09-13fs/9p: Fix error handling in v9fs_get_sbAneesh Kumar K.V
This was introduced by 7cadb63d58a932041afa3f957d5cbb6ce69dcee5 Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-09-13fs/9p, net/9p: memory leak fixesLatchesar Ionkov
Four memory leak fixes in the 9P code. Signed-off-by: Latchesar Ionkov <lucho@ionkov.net> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-09-12SUNRPC: Fix the NFSv4 and RPCSEC_GSS Kconfig dependenciesTrond Myklebust
The NFSv4 client's callback server calls svc_gss_principal(), which is defined in the auth_rpcgss.ko The NFSv4 server has the same dependency, and in addition calls svcauth_gss_flavor(), gss_mech_get_by_pseudoflavor(), gss_pseudoflavor_to_service() and gss_mech_put() from the same module. The module auth_rpcgss itself has no dependencies aside from sunrpc, so we only need to select RPCSEC_GSS. Reported-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-09-12statfs() gives ESTALE errorMenyhart Zoltan
Hi, An NFS client executes a statfs("file", &buff) call. "file" exists / existed, the client has read / written it, but it has already closed it. user_path(pathname, &path) looks up "file" successfully in the directory-cache and restarts the aging timer of the directory-entry. Even if "file" has already been removed from the server, because the lookupcache=positive option I use, keeps the entries valid for a while. nfs_statfs() returns ESTALE if "file" has already been removed from the server. If the user application repeats the statfs("file", &buff) call, we are stuck: "file" remains young forever in the directory-cache. Signed-off-by: Zoltan Menyhart <Zoltan.Menyhart@bull.net> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@kernel.org
2010-09-12NFS: Fix a typo in nfs_sockaddr_match_ipaddr6Trond Myklebust
Reported-by: Ben Greear <greearb@candelatech.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@kernel.org
2010-09-12Remove incorrect do_vfs_lock messageFabio Olive Leite
The do_vfs_lock function on fs/nfs/file.c is only called if NLM is not being used, via the -onolock mount option. Therefore it cannot really be "out of sync with lock manager" when the local locking function called returns an error, as there will be no corresponding call to the NLM. For details, simply check the if/else on do_setlk and do_unlk on fs/nfs/file.c. Signed-Off-By: Fabio Olive Leite <fleite@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-09-11ceph: fix file offset wrapping at 4GB on 32-bit archsSage Weil
Cast the value before shifting so that we don't run out of bits with a 32-bit unsigned long. This fixes wrapping of high file offsets into the low 4GB of a file on disk, and the subsequent data corruption for large files. Signed-off-by: Sage Weil <sage@newdream.net>
2010-09-11ceph: fix reconnect encoding for old serversSage Weil
Fix the reconnect encoding to encode the cap record when the MDS does not have the FLOCK capability (i.e., pre v0.22). Signed-off-by: Sage Weil <sage@newdream.net>
2010-09-11ceph: fix pagelist kunmap tailYehuda Sadeh
A wrong parameter was passed to the kunmap. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
2010-09-11ceph: fix null pointer deref on anon root dentry releaseSage Weil
When we release a root dentry, particularly after a splice, the parent (actually our) inode was evaluating to NULL and was getting dereferenced by ceph_snap(). This is reproduced by something as simple as mount -t ceph monhost:/a/b mnt mount -t ceph monhost:/a mnt2 ls mnt2 A splice_dentry() would kill the old 'b' inode's root dentry, and we'd crash while releasing it. Fix by checking for both the ROOT and NULL cases explicitly. We only need to invalidate the parent dir when we have a correct parent to invalidate. Signed-off-by: Sage Weil <sage@newdream.net>
2010-09-10Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfsLinus Torvalds
* 'for-linus' of git://oss.sgi.com/xfs/xfs: xfs: log IO completion workqueue is a high priority queue xfs: prevent reading uninitialized stack memory
2010-09-10Ocfs2: Handle empty list in lockres_seq_start() for dlmdebug.cTristan Ye
This patch tries to handle the case in which list 'dlm->tracking_list' is empty, to avoid accessing an invalid pointer. It fixes the following oops: http://oss.oracle.com/bugzilla/show_bug.cgi?id=1287 Signed-off-by: Tristan Ye <tristan.ye@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-09-10Ocfs2: Re-access the journal after ocfs2_insert_extent() in dxdir codes.Tristan Ye
In ocfs2_dx_dir_rebalance(), we need to rejournal_acess the blocks after calling ocfs2_insert_extent() since growing an extent tree may trigger ocfs2_extend_trans(), which makes previous journal_access meaningless. Signed-off-by: Tristan Ye <tristan.ye@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-09-10ocfs2: Fix lockdep warning in reflink.Tao Ma
This patch change mutex_lock to a new subclass and add a new inode lock subclass for the target inode which caused this lockdep warning. ============================================= [ INFO: possible recursive locking detected ] 2.6.35+ #5 --------------------------------------------- reflink/11086 is trying to acquire lock: (Meta){+++++.}, at: [<ffffffffa06f9d65>] ocfs2_reflink_ioctl+0x898/0x1229 [ocfs2] but task is already holding lock: (Meta){+++++.}, at: [<ffffffffa06f9aa0>] ocfs2_reflink_ioctl+0x5d3/0x1229 [ocfs2] other info that might help us debug this: 6 locks held by reflink/11086: #0: (&sb->s_type->i_mutex_key#15/1){+.+.+.}, at: [<ffffffff820e09ec>] lookup_create+0x26/0x97 #1: (&sb->s_type->i_mutex_key#15){+.+.+.}, at: [<ffffffffa06f99a0>] ocfs2_reflink_ioctl+0x4d3/0x1229 [ocfs2] #2: (Meta){+++++.}, at: [<ffffffffa06f9aa0>] ocfs2_reflink_ioctl+0x5d3/0x1229 [ocfs2] #3: (&oi->ip_xattr_sem){+.+.+.}, at: [<ffffffffa06f9b58>] ocfs2_reflink_ioctl+0x68b/0x1229 [ocfs2] #4: (&oi->ip_alloc_sem){+.+.+.}, at: [<ffffffffa06f9b67>] ocfs2_reflink_ioctl+0x69a/0x1229 [ocfs2] #5: (&sb->s_type->i_mutex_key#15/2){+.+...}, at: [<ffffffffa06f9d4f>] ocfs2_reflink_ioctl+0x882/0x1229 [ocfs2] stack backtrace: Pid: 11086, comm: reflink Not tainted 2.6.35+ #5 Call Trace: [<ffffffff82063dd9>] validate_chain+0x56e/0xd68 [<ffffffff82062275>] ? mark_held_locks+0x49/0x69 [<ffffffff82064d6d>] __lock_acquire+0x79a/0x7f1 [<ffffffff82065a81>] lock_acquire+0xc6/0xed [<ffffffffa06f9d65>] ? ocfs2_reflink_ioctl+0x898/0x1229 [ocfs2] [<ffffffffa06c9ade>] __ocfs2_cluster_lock+0x975/0xa0d [ocfs2] [<ffffffffa06f9d65>] ? ocfs2_reflink_ioctl+0x898/0x1229 [ocfs2] [<ffffffffa06e107b>] ? ocfs2_wait_for_recovery+0x15/0x8a [ocfs2] [<ffffffffa06cb6ea>] ocfs2_inode_lock_full_nested+0x1ac/0xdc5 [ocfs2] [<ffffffffa06f9d65>] ? ocfs2_reflink_ioctl+0x898/0x1229 [ocfs2] [<ffffffff820623a0>] ? trace_hardirqs_on_caller+0x10b/0x12f [<ffffffff82060193>] ? debug_mutex_free_waiter+0x4f/0x53 [<ffffffffa06f9d65>] ocfs2_reflink_ioctl+0x898/0x1229 [ocfs2] [<ffffffffa06ce24a>] ? ocfs2_file_lock_res_init+0x66/0x78 [ocfs2] [<ffffffff820bb2d2>] ? might_fault+0x40/0x8d [<ffffffffa06df9f6>] ocfs2_ioctl+0x61a/0x656 [ocfs2] [<ffffffff820ee5d3>] ? mntput_no_expire+0x1d/0xb0 [<ffffffff820e07b3>] ? path_put+0x2c/0x31 [<ffffffff820e53ac>] vfs_ioctl+0x2a/0x9d [<ffffffff820e5903>] do_vfs_ioctl+0x45d/0x4ae [<ffffffff8233a7f6>] ? _raw_spin_unlock+0x26/0x2a [<ffffffff8200299c>] ? sysret_check+0x27/0x62 [<ffffffff820e59ab>] sys_ioctl+0x57/0x7a [<ffffffff8200296b>] system_call_fastpath+0x16/0x1b Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-09-10ocfs2/lockdep: Move ip_xattr_sem out of ocfs2_xattr_get_nolock.Tao Ma
As the name shows, we shouldn't have any lock in ocfs2_xattr_get_nolock. so lift ip_xattr_sem to the caller. This should be safe for us since the only 2 callers are: 1. ocfs2_xattr_get which will lock the resources. 2. ocfs2_mknod which don't need this locking. And this also resolves the following lockdep warning. ======================================================= [ INFO: possible circular locking dependency detected ] 2.6.35+ #5 ------------------------------------------------------- reflink/30027 is trying to acquire lock: (&oi->ip_alloc_sem){+.+.+.}, at: [<ffffffffa0673b67>] ocfs2_reflink_ioctl+0x69a/0x1226 [ocfs2] but task is already holding lock: (&oi->ip_xattr_sem){++++..}, at: [<ffffffffa0673b58>] ocfs2_reflink_ioctl+0x68b/0x1226 [ocfs2] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #3 (&oi->ip_xattr_sem){++++..}: [<ffffffff82064d6d>] __lock_acquire+0x79a/0x7f1 [<ffffffff82065a81>] lock_acquire+0xc6/0xed [<ffffffff82339650>] down_read+0x34/0x47 [<ffffffffa0691cb8>] ocfs2_xattr_get_nolock+0xa0/0x4e6 [ocfs2] [<ffffffffa069d64f>] ocfs2_get_acl_nolock+0x5c/0x132 [ocfs2] [<ffffffffa069d9c7>] ocfs2_init_acl+0x60/0x243 [ocfs2] [<ffffffffa066499d>] ocfs2_mknod+0xae8/0xfea [ocfs2] [<ffffffffa0665041>] ocfs2_create+0x9d/0x105 [ocfs2] [<ffffffff820e1c83>] vfs_create+0x9b/0xf4 [<ffffffff820e20bb>] do_last+0x2fd/0x5be [<ffffffff820e31c0>] do_filp_open+0x1fb/0x572 [<ffffffff820d6cf6>] do_sys_open+0x5a/0xe7 [<ffffffff820d6dac>] sys_open+0x1b/0x1d [<ffffffff8200296b>] system_call_fastpath+0x16/0x1b -> #2 (jbd2_handle){+.+...}: [<ffffffff82064d6d>] __lock_acquire+0x79a/0x7f1 [<ffffffff82065a81>] lock_acquire+0xc6/0xed [<ffffffffa0604ff8>] start_this_handle+0x4a3/0x4bc [jbd2] [<ffffffffa06051d6>] jbd2__journal_start+0xba/0xee [jbd2] [<ffffffffa0605218>] jbd2_journal_start+0xe/0x10 [jbd2] [<ffffffffa065ca34>] ocfs2_start_trans+0xb7/0x19b [ocfs2] [<ffffffffa06645f3>] ocfs2_mknod+0x73e/0xfea [ocfs2] [<ffffffffa0665041>] ocfs2_create+0x9d/0x105 [ocfs2] [<ffffffff820e1c83>] vfs_create+0x9b/0xf4 [<ffffffff820e20bb>] do_last+0x2fd/0x5be [<ffffffff820e31c0>] do_filp_open+0x1fb/0x572 [<ffffffff820d6cf6>] do_sys_open+0x5a/0xe7 [<ffffffff820d6dac>] sys_open+0x1b/0x1d [<ffffffff8200296b>] system_call_fastpath+0x16/0x1b -> #1 (&journal->j_trans_barrier){.+.+..}: [<ffffffff82064d6d>] __lock_acquire+0x79a/0x7f1 [<ffffffff82064fa9>] lock_release_non_nested+0x1e5/0x24b [<ffffffff82065999>] lock_release+0x158/0x17a [<ffffffff823389f6>] __mutex_unlock_slowpath+0xbf/0x11b [<ffffffff82338a5b>] mutex_unlock+0x9/0xb [<ffffffffa0679673>] ocfs2_free_ac_resource+0x31/0x67 [ocfs2] [<ffffffffa067c6bc>] ocfs2_free_alloc_context+0x11/0x1d [ocfs2] [<ffffffffa0633de0>] ocfs2_write_begin_nolock+0x141e/0x159b [ocfs2] [<ffffffffa0635523>] ocfs2_write_begin+0x11e/0x1e7 [ocfs2] [<ffffffff820a1297>] generic_file_buffered_write+0x10c/0x210 [<ffffffffa0653624>] ocfs2_file_aio_write+0x4cc/0x6d3 [ocfs2] [<ffffffff820d822d>] do_sync_write+0xc2/0x106 [<ffffffff820d897b>] vfs_write+0xae/0x131 [<ffffffff820d8e55>] sys_write+0x47/0x6f [<ffffffff8200296b>] system_call_fastpath+0x16/0x1b -> #0 (&oi->ip_alloc_sem){+.+.+.}: [<ffffffff82063f92>] validate_chain+0x727/0xd68 [<ffffffff82064d6d>] __lock_acquire+0x79a/0x7f1 [<ffffffff82065a81>] lock_acquire+0xc6/0xed [<ffffffff82339694>] down_write+0x31/0x52 [<ffffffffa0673b67>] ocfs2_reflink_ioctl+0x69a/0x1226 [ocfs2] [<ffffffffa06599f6>] ocfs2_ioctl+0x61a/0x656 [ocfs2] [<ffffffff820e53ac>] vfs_ioctl+0x2a/0x9d [<ffffffff820e5903>] do_vfs_ioctl+0x45d/0x4ae [<ffffffff820e59ab>] sys_ioctl+0x57/0x7a [<ffffffff8200296b>] system_call_fastpath+0x16/0x1b Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-09-10Track negative entries v3Goldwyn Rodrigues
Track negative dentries by recording the generation number of the parent directory in d_fsdata. The generation number for the parent directory is recorded in the inode_info, which increments every time the lock on the directory is dropped. If the generation number of the parent directory and the negative dentry matches, there is no need to perform the revalidate, else a revalidate is forced. This improves performance in situations where nodes look for the same non-existent file multiple times. Thanks Mark for explaining the DLM sequence. Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.de> Signed-off-by: Joel Becker <joel.becker@oracle.com>