summaryrefslogtreecommitdiffstats
path: root/fs
AgeCommit message (Collapse)Author
2010-09-10ocfs2: Cache system inodes of other slots.Tao Ma
Durring orphan scan, if we are slot 0, and we are replaying orphan_dir:0001, the general process is that for every file in this dir: 1. we will iget orphan_dir:0001, since there is no inode for it. we will have to create an inode and read it from the disk. 2. do the normal work, such as delete_inode and remove it from the dir if it is allowed. 3. call iput orphan_dir:0001 when we are done. In this case, since we have no dcache for this inode, i_count will reach 0, and VFS will have to call clear_inode and in ocfs2_clear_inode we will checkpoint the inode which will let ocfs2_cmt and journald begin to work. 4. We loop back to 1 for the next file. So you see, actually for every deleted file, we have to read the orphan dir from the disk and checkpoint the journal. It is very time consuming and cause a lot of journal checkpoint I/O. A better solution is that we can have another reference for these inodes in ocfs2_super. So if there is no other race among nodes(which will let dlmglue to checkpoint the inode), for step 3, clear_inode won't be called and for step 1, we may only need to read the inode for the 1st time. This is a big win for us. So this patch will try to cache system inodes of other slots so that we will have one more reference for these inodes and avoid the extra inode read and journal checkpoint. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-09-10libfs: Fix shift bug in generic_check_addressable()Joel Becker
generic_check_addressable() erroneously shifts pages down by a block factor when it should be shifting up. To prevent overflow, we shift blocks down to pages. Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-09-10OCFS2: Allow huge (> 16 TiB) volumes to mountPatrick J. LoPresti
The OCFS2 developers have already done all of the hard work to allow volumes larger than 16 TiB. But there is still a "sanity check" in fs/ocfs2/super.c that prevents the mounting of such volumes, even when the cluster size and journal options would allow it. This patch replaces that sanity check with a more sophisticated one to mount a huge volume provided that (a) it is addressable by the raw word/address size of the system (borrowing a test from ext4); (b) the volume is using JBD2; and (c) the JBD2_FEATURE_INCOMPAT_64BIT flag is set on the journal. I factored out the sanity check into its own function. I also moved it from ocfs2_initialize_super() down to ocfs2_check_volume(); any earlier, and the journal will not have been initialized yet. This patch is one of a pair, and it depends on the other ("JBD2: Allow feature checks before journal recovery"). I have tested this patch on small volumes, huge volumes, and huge volumes without 64-bit block support in the journal. All of them appear to work or to fail gracefully, as appropriate. Signed-off-by: Patrick LoPresti <lopresti@gmail.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-09-10JBD2: Allow feature checks before journal recoveryPatrick J. LoPresti
Before we start accessing a huge (> 16 TiB) OCFS2 volume, we need to confirm that its journal supports 64-bit offsets. In particular, we need to check the journal's feature bits before recovering the journal. This is not possible with JBD2 at present, because the journal superblock (where the feature bits reside) is not loaded from disk until the journal is recovered. This patch loads the journal superblock in jbd2_journal_check_used_features() if it has not already been loaded, allowing us to check the feature bits before journal recovery. Signed-off-by: Patrick LoPresti <lopresti@gmail.com> Cc: linux-ext4@vger.kernel.org Acked-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-09-10ext3/ext4: Factor out disk addressability checkPatrick J. LoPresti
As part of adding support for OCFS2 to mount huge volumes, we need to check that the sector_t and page cache of the system are capable of addressing the entire volume. An identical check already appears in ext3 and ext4. This patch moves the addressability check into its own function in fs/libfs.c and modifies ext3 and ext4 to invoke it. [Edited to -EINVAL instead of BUG_ON() for bad blocksize_bits -- Joel] Signed-off-by: Patrick LoPresti <lopresti@gmail.com> Cc: linux-ext4@vger.kernel.org Acked-by: Andreas Dilger <adilger@dilger.ca> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-09-10Merge branch 'cow_readahead' of git://oss.oracle.com/git/tma/linux-2.6 into ↵Joel Becker
merge-2
2010-09-10ocfs2: Remove obsolete comments before ocfs2_start_trans.Tao Ma
Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-09-10ocfs2: Remove unused old_id in ocfs2_commit_cache.Tao Ma
Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-09-10ocfs2: Remove ocfs2_sync_inode()Jan Kara
ocfs2_sync_inode() is used only from ocfs2_sync_file(). But all data has already been written before calling ocfs2_sync_file() and ocfs2 doesn't use inode's private_list for tracking metadata buffers thus sync_mapping_buffers() is superfluous as well. Signed-off-by: Jan Kara <jack@suse.cz> Acked-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-09-10Reorganize data elements to reduce struct sizesGoldwyn Rodrigues
Thanks for the comments. I have incorportated them all. CONFIG_OCFS2_FS_STATS is enabled and CONFIG_DEBUG_LOCK_ALLOC is disabled. Statistics now look like - ocfs2_write_ctxt: 2144 - 2136 = 8 ocfs2_inode_info: 1960 - 1848 = 112 ocfs2_journal: 168 - 160 = 8 ocfs2_lock_res: 336 - 304 = 32 ocfs2_refcount_tree: 512 - 472 = 40 Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.de> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-09-10ocfs2: Remove obscure error handling in direct_write.Tao Ma
In ocfs2, actually we don't allow any direct write pass i_size, see the function ocfs2_prepare_inode_for_write. So we don't need the bogus simple_setsize. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-09-10ocfs2: Add some trace log for orphan scan.Tao Ma
Now orphan scan worker has no trace log, so it is very hard to tell whether it is finished or blocked. So add 2 mlog trace log so that we can tell whether the current orphan scan worker is blocked or not. It does help when I analyzed a orphan scan bug. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-09-10Ocfs2: Add new OCFS2_IOC_INFO ioctl for ocfs2 v8.Tristan Ye
The reason why we need this ioctl is to offer the none-privileged end-user a possibility to get filesys info gathering. We use OCFS2_IOC_INFO to manipulate the new ioctl, userspace passes a structure to kernel containing an array of request pointers and request count, such as, * From userspace: struct ocfs2_info_blocksize oib = { .ib_req = { .ir_magic = OCFS2_INFO_MAGIC, .ir_code = OCFS2_INFO_BLOCKSIZE, ... } ... } struct ocfs2_info_clustersize oic = { ... } uint64_t reqs[2] = {(unsigned long)&oib, (unsigned long)&oic}; struct ocfs2_info info = { .oi_requests = reqs, .oi_count = 2, } ret = ioctl(fd, OCFS2_IOC_INFO, &info); * In kernel: Get the request pointers from *info*, then handle each request one bye one. Idea here is to make the spearated request small enough to guarantee a better backward&forward compatibility since a small piece of request would be less likely to be broken if filesys on raw disk get changed. Currently, the following 7 requests are supported per the requirement from userspace tool o2info, and I believe it will grow over time:-) OCFS2_INFO_CLUSTERSIZE OCFS2_INFO_BLOCKSIZE OCFS2_INFO_MAXSLOTS OCFS2_INFO_LABEL OCFS2_INFO_UUID OCFS2_INFO_FS_FEATURES OCFS2_INFO_JOURNAL_SIZE This ioctl is only specific to OCFS2. Signed-off-by: Tristan Ye <tristan.ye@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-09-10xfs: log IO completion workqueue is a high priority queueDave Chinner
The workqueue implementation in 2.6.36-rcX has changed, resulting in the workqueues no longer having dedicated threads for work processing. This has caused severe livelocks under heavy parallel create workloads because the log IO completions have been getting held up behind metadata IO completions. Hence log commits would stall, memory allocation would stall because pages could not be cleaned, and lock contention on the AIL during inode IO completion processing was being seen to slow everything down even further. By making the log Io completion workqueue a high priority workqueue, they are queued ahead of all data/metadata IO completions and processed before the data/metadata completions. Hence the log never gets stalled, and operations needed to clean memory can continue as quickly as possible. This avoids the livelock conditions and allos the system to keep running under heavy load as per normal. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
2010-09-10execve: make responsive to SIGKILL with large argumentsRoland McGrath
An execve with a very large total of argument/environment strings can take a really long time in the execve system call. It runs uninterruptibly to count and copy all the strings. This change makes it abort the exec quickly if sent a SIGKILL. Note that this is the conservative change, to interrupt only for SIGKILL, by using fatal_signal_pending(). It would be perfectly correct semantics to let any signal interrupt the string-copying in execve, i.e. use signal_pending() instead of fatal_signal_pending(). We'll save that change for later, since it could have user-visible consequences, such as having a timer set too quickly make it so that an execve can never complete, though it always happened to work before. Signed-off-by: Roland McGrath <roland@redhat.com> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-09-10execve: improve interactivity with large argumentsRoland McGrath
This adds a preemption point during the copying of the argument and environment strings for execve, in copy_strings(). There is already a preemption point in the count() loop, so this doesn't add any new points in the abstract sense. When the total argument+environment strings are very large, the time spent copying them can be much more than a normal user time slice. So this change improves the interactivity of the rest of the system when one process is doing an execve with very large arguments. Signed-off-by: Roland McGrath <roland@redhat.com> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-09-10setup_arg_pages: diagnose excessive argument sizeRoland McGrath
The CONFIG_STACK_GROWSDOWN variant of setup_arg_pages() does not check the size of the argument/environment area on the stack. When it is unworkably large, shift_arg_pages() hits its BUG_ON. This is exploitable with a very large RLIMIT_STACK limit, to create a crash pretty easily. Check that the initial stack is not too large to make it possible to map in any executable. We're not checking that the actual executable (or intepreter, for binfmt_elf) will fit. So those mappings might clobber part of the initial stack mapping. But that is just userland lossage that userland made happen, not a kernel problem. Signed-off-by: Roland McGrath <roland@redhat.com> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-09-10Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-blockLinus Torvalds
* 'for-linus' of git://git.kernel.dk/linux-2.6-block: block: Range check cpu in blk_cpu_to_group scatterlist: prevent invalid free when alloc fails writeback: Fix lost wake-up shutting down writeback thread writeback: do not lose wakeup events when forking bdi threads cciss: fix reporting of max queue depth since init block: switch s390 tape_block and mg_disk to elevator_change() block: add function call to switch the IO scheduler from a driver fs/bio-integrity.c: return -ENOMEM on kmalloc failure bio-integrity.c: remove dependency on __GFP_NOFAIL BLOCK: fix bio.bi_rw handling block: put dev->kobj in blk_register_queue fail path cciss: handle allocation failure cfq-iosched: Documentation help for new tunables cfq-iosched: blktrace print per slice sector stats cfq-iosched: Implement tunable group_idle cfq-iosched: Do group share accounting in IOPS when slice_idle=0 cfq-iosched: Do not idle if slice_idle=0 cciss: disable doorbell reset on reset_devices blkio: Fix return code for mkdir calls
2010-09-10xfs: prevent reading uninitialized stack memoryDan Rosenberg
The XFS_IOC_FSGETXATTR ioctl allows unprivileged users to read 12 bytes of uninitialized stack memory, because the fsxattr struct declared on the stack in xfs_ioc_fsgetxattr() does not alter (or zero) the 12-byte fsx_pad member before copying it back to the user. This patch takes care of it. Signed-off-by: Dan Rosenberg <dan.j.rosenberg@gmail.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Alex Elder <aelder@sgi.com>
2010-09-10block: remove the BH_Eopnotsupp flagChristoph Hellwig
This flag was only set for barrier buffers, which we don't submit anymore. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-09-10fat: do not send discards as barriersChristoph Hellwig
fat already uses synchronous discards, no need to add I/O barriers. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-09-10ext4: do not send discards as barriersChristoph Hellwig
ext4 already uses synchronous discards, no need to add I/O barriers. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-09-10jbd2: replace barriers with explicit flush / FUA usageChristoph Hellwig
Switch to the WRITE_FLUSH_FUA flag for journal commits and remove the EOPNOTSUPP detection for barriers. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-09-10jbd2: Modify ASYNC_COMMIT code to not rely on queue draining on barrierJan Kara
Currently JBD2 relies blkdev_issue_flush() draining the queue when ASYNC_COMMIT feature is set. This property is going away so make JBD2 wait for buffers it needs on its own before submitting the cache flush. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-09-10jbd: replace barriers with explicit flush / FUA usageChristoph Hellwig
Switch to the WRITE_FLUSH_FUA flag for journal commits and remove the EOPNOTSUPP detection for barriers. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-09-10nilfs2: replace barriers with explicit flush / FUA usageChristoph Hellwig
Switch to the WRITE_FLUSH_FUA flag for log writes, remove the EOPNOTSUPP detection for barriers and stop setting the barrier flag for discards. tj: nilfs is now fixed to wait for discard completion. Updated this patch accordingly and dropped warning about it. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-09-10reiserfs: replace barriers with explicit flush / FUA usageChristoph Hellwig
Switch to the WRITE_FLUSH_FUA flag for log writes and remove the EOPNOTSUPP detection for barriers. Note that reiserfs had a fairly different code path for barriers before as it wa the only filesystem actually making use of them. The new code always uses the old non-barrier codepath and just sets the WRITE_FLUSH_FUA explicitly for the journal commits. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Jan Kara <jack@suse.cz> Acked-by: Chris Mason <chris.mason@oracle.com> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-09-10gfs2: replace barriers with explicit flush / FUA usageChristoph Hellwig
Switch to the WRITE_FLUSH_FUA flag for log writes, remove the EOPNOTSUPP detection for barriers and stop setting the barrier flag for discards. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Steven Whitehouse <swhiteho@redhat.com> Acked-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-09-10btrfs: replace barriers with explicit flush / FUA usageChristoph Hellwig
Switch to the WRITE_FLUSH_FUA flag for log writes, remove the EOPNOTSUPP detection for barriers and stop setting the barrier flag for discards. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Chris Mason <chris.mason@oracle.com> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-09-10xfs: replace barriers with explicit flush / FUA usageChristoph Hellwig
Switch to the WRITE_FLUSH_FUA flag for log writes and remove the EOPNOTSUPP detection for barriers. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-09-10block: pass gfp_mask and flags to sb_issue_discardChristoph Hellwig
We'll need to get rid of the BLKDEV_IFL_BARRIER flag, and to facilitate that and to make the interface less confusing pass all flags explicitly. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-09-09minix: fix regression in minix_mkdir()Jorge Boncompte [DTI2]
Commit 9eed1fb721c ("minix: replace inode uid,gid,mode init with helper") broke directory creation on minix filesystems. Fix it by passing the needed mode flag to inode init helper. Signed-off-by: Jorge Boncompte [DTI2] <jorge@dti2.net> Cc: Dmitry Monakhov <dmonakhov@openvz.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: <stable@kernel.org> [2.6.35.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-09-09vfs: take O_NONBLOCK out of the O_* uniqueness testJames Bottomley
O_NONBLOCK on parisc has a dual value: #define O_NONBLOCK 000200004 /* HPUX has separate NDELAY & NONBLOCK */ It is caught by the O_* bits uniqueness check and leads to a parisc compile error. The fix would be to take O_NONBLOCK out. Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de> Cc: Jamie Lokier <jamie@shareable.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-09-09binfmt_misc: fix binfmt_misc priorityJan Sembera
Commit 74641f584da ("alpha: binfmt_aout fix") (May 2009) introduced a regression - binfmt_misc is now consulted after binfmt_elf, which will unfortunately break ia32el. ia32 ELF binaries on ia64 used to be matched using binfmt_misc and executed using wrapper. As 32bit binaries are now matched by binfmt_elf before bindmt_misc kicks in, the wrapper is ignored. The fix increases precedence of binfmt_misc to the original state. Signed-off-by: Jan Sembera <jsembera@suse.cz> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Richard Henderson <rth@twiddle.net Cc: <stable@kernel.org> [2.6.everything.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-09-09proc: export uncached bit properly in /proc/kpageflagsTakashi Iwai
Fix the left-over old ifdef for PG_uncached in /proc/kpageflags. Now it's used by x86, too. Signed-off-by: Takashi Iwai <tiwai@suse.de> Cc: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-09-09O_DIRECT: fix the splitting up of contiguous I/OJeff Moyer
commit c2c6ca4 (direct-io: do not merge logically non-contiguous requests) introduced a bug whereby all O_DIRECT I/Os were submitted a page at a time to the block layer. The problem is that the code expected dio->block_in_file to correspond to the current page in the dio. In fact, it corresponds to the previous page submitted via submit_page_section. This was purely an oversight, as the dio->cur_page_fs_offset field was introduced for just this purpose. This patch simply uses the correct variable when calculating whether there is a mismatch between contiguous logical blocks and contiguous physical blocks (as described in the comments). I also switched the if conditional following this check to an else if, to ensure that we never call dio_bio_submit twice for the same dio (in theory, this should not happen, anyway). I've tested this by running blktrace and verifying that a 64KB I/O was submitted as a single I/O. I also ran the patched kernel through xfstests' aio tests using xfs, ext4 (with 1k and 4k block sizes) and btrfs and verified that there were no regressions as compared to an unpatched kernel. Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Acked-by: Josef Bacik <jbacik@redhat.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Chris Mason <chris.mason@oracle.com> Cc: <stable@kernel.org> [2.6.35.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-09-09mm: Move vma_stack_continue into mm.hStefan Bader
So it can be used by all that need to check for that. Signed-off-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-09-09Merge branch 'fixes' of git://oss.oracle.com/git/tma/linux-2.6Linus Torvalds
* 'fixes' of git://oss.oracle.com/git/tma/linux-2.6: ocfs2: Fix orphan add in ocfs2_create_inode_in_orphan ocfs2: split out ocfs2_prepare_orphan_dir() into locking and prep functions ocfs2: allow return of new inode block location before allocation of the inode ocfs2: use ocfs2_alloc_dinode_update_counts() instead of open coding ocfs2: split out inode alloc code from ocfs2_mknod_locked Ocfs2: Fix a regression bug from mainline commit(6b933c8e6f1a2f3118082c455eef25f9b1ac7b45). ocfs2: Fix deadlock when allocating page ocfs2: properly set and use inode group alloc hint ocfs2: Use the right group in nfs sync check. ocfs2: Flush drive's caches on fdatasync ocfs2: make __ocfs2_page_mkwrite handle file end properly. ocfs2: Fix incorrect checksum validation error ocfs2: Fix metaecc error messages
2010-09-08cifs: prevent possible memory corruption in cifs_demultiplex_threadJeff Layton
cifs_demultiplex_thread sets the addr.sockAddr.sin_port without any regard for the socket family. While it may be that the error in question here never occurs on an IPv6 socket, it's probably best to be safe and set the port properly if it ever does. Break the port setting code out of cifs_fill_sockaddr and into a new function, and call that from cifs_demultiplex_thread. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
2010-09-08cifs: eliminate some more premature cifsd exitsJeff Layton
If the tcpStatus is still CifsNew, the main cifs_demultiplex_loop can break out prematurely in some cases. This is wrong as we will almost always have other structures with pointers to the TCP_Server_Info. If the main loop breaks under any other condition other than tcpStatus == CifsExiting, then it'll face a use-after-free situation. I don't see any reason to treat a CifsNew tcpStatus differently than CifsGood. I believe we'll still want to attempt to reconnect in either case. What should happen in those situations is that the MIDs get marked as MID_RETRY_NEEDED. This will make CIFSSMBNegotiate return -EAGAIN, and then the caller can retry the whole thing on a newly reconnected socket. If that fails again in the same way, the caller of cifs_get_smb_ses should tear down the TCP_Server_Info struct. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
2010-09-08cifs: prevent cifsd from exiting prematurelyJeff Layton
When cifs_demultiplex_thread exits, it does a number of cleanup tasks including freeing the TCP_Server_Info struct. Much of the existing code in cifs assumes that when there is a cisfSesInfo struct, that it holds a reference to a valid TCP_Server_Info struct. We can never allow cifsd to exit when a cifsSesInfo struct is still holding a reference to the server. The server pointers will then point to freed memory. This patch eliminates a couple of questionable conditions where it does this. The idea here is to make an -EINTR return from kernel_recvmsg behave the same way as -ERESTARTSYS or -EAGAIN. If the task was signalled from cifs_put_tcp_session, then tcpStatus will be CifsExiting, and the kernel_recvmsg call will return quickly. There's also another condition where this can occur too -- if the tcpStatus is still in CifsNew, then it will also exit if the server closes the socket prematurely. I think we'll probably also need to fix that situation, but that requires a bit more consideration. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
2010-09-08[CIFS] ntlmv2/ntlmssp remove-unused-function CalcNTLMv2_partial_mac_keySteve French
This function is not used, so remove the definition and declaration. Reviewed-by: Jeff Layton <jlayton@samba.org> Signed-off-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
2010-09-08cifs: eliminate redundant xdev check in cifs_renameJeff Layton
The VFS always checks that the source and target of a rename are on the same vfsmount, and hence have the same superblock. So, this check is redundant. Remove it and simplify the error handling. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
2010-09-08Revert "[CIFS] Fix ntlmv2 auth with ntlmssp"Steve French
This reverts commit 9fbc590860e75785bdaf8b83e48fabfe4d4f7d58. The change to kernel crypto and fixes to ntlvm2 and ntlmssp series, introduced a regression. Deferring this patch series to 2.6.37 after Shirish fixes it. Signed-off-by: Steve French <sfrench@us.ibm.com> Acked-by: Jeff Layton <jlayton@redhat.com> CC: Shirish Pargaonkar <shirishp@us.ibm.com>
2010-09-08Revert "missing changes during ntlmv2/ntlmssp auth and sign"Steve French
This reverts commit 3ec6bbcdb4e85403f2c5958876ca9492afdf4031. The change to kernel crypto and fixes to ntlvm2 and ntlmssp series, introduced a regression. Deferring this patch series to 2.6.37 after Shirish fixes it. Signed-off-by: Steve French <sfrench@us.ibm.com> Acked-by: Jeff Layton <jlayton@redhat.com> CC: Shirish Pargaonkar <shirishp@us.ibm.com>
2010-09-08Revert "Eliminate sparse warning - bad constant expression"Steve French
This reverts commit 2d20ca835867d93ead6ce61780d883a4b128106d. The change to kernel crypto and fixes to ntlvm2 and ntlmssp series, introduced a regression. Deferring this patch series to 2.6.37 after Shirish fixes it. Signed-off-by: Steve French <sfrench@us.ibm.com> Acked-by: Jeff Layton <jlayton@redhat.com> CC: Shirish Pargaonkar <shirishp@us.ibm.com>
2010-09-08Revert "[CIFS] Eliminate unused variable warning"Steve French
The change to kernel crypto and fixes to ntlvm2 and ntlmssp series, introduced a regression. Deferring this patch series to 2.6.37 after Shirish fixes it. This reverts commit c89e5198b26a869ce2842bad8519264f3394dee9. Signed-off-by: Steve French <sfrench@us.ibm.com> Acked-by: Jeff Layton <jlayton@redhat.com> CC: Shirish Pargaonkar <shirishp@us.ibm.com>
2010-09-08Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: fix lock annotations fuse: flush background queue on connection close
2010-09-08ocfs2: Fix orphan add in ocfs2_create_inode_in_orphanMark Fasheh
ocfs2_create_inode_in_orphan() is used by reflink to create the newly reflinked inode simultaneously in the orphan dir. This allows us to easily handle partially-reflinked files during recovery cleanup. We have a problem though - the orphan dir stringifies inode # to determine a unique name under which the orphan entry dirent can be created. Since ocfs2_create_inode_in_orphan() needs the space allocated in the orphan dir before it can allocate the inode, we currently call into the orphan code: /* * We give the orphan dir the root blkno to fake an orphan name, * and allocate enough space for our insertion. */ status = ocfs2_prepare_orphan_dir(osb, &orphan_dir, osb->root_blkno, orphan_name, &orphan_insert); Using osb->root_blkno might work fine on unindexed directories, but the orphan dir can have an index. When it has that index, the above code fails to allocate the proper index entry. Later, when we try to remove the file from the orphan dir (using the actual inode #), the reflink operation will fail. To fix this, I created a function ocfs2_alloc_orphaned_file() which uses the newly split out orphan and inode alloc code to figure out what the inode block number will be (once allocated) and then prepare the orphan dir from that data. Signed-off-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Tao Ma <tao.ma@oracle.com>
2010-09-08ocfs2: split out ocfs2_prepare_orphan_dir() into locking and prep functionsMark Fasheh
We do this because ocfs2_create_inode_in_orphan() wants to order locking of the orphan dir with respect to locking of the inode allocator *before* making any changes to the directory. Signed-off-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Tao Ma <tao.ma@oracle.com>