asmadeus/linux.git - The linux kernel

Age	Commit message (Collapse)	Author
2012-09-26	cifs: change DOS/NT/POSIX mapping of ERRnoresource	Jeff Layton
	ERRnoresource is an ERRSRV level (aka server-side) error and means "No resources currently available for request". Currently that maps to POSIX -ENOBUFS. No NT errors map to it currently. NT_STATUS_INSUFFICIENT_RESOURCES and NT_STATUS_INSUFF_SERVER_RESOURCES are also similar in meaning. Currently the client maps those to ERRnomem, which maps to -ENOMEM in POSIX. All of these mappings seem to be quite wrong to me and are confusing for users. All of the above errors indicate problems on the server, not the client. Reporting -ENOMEM or -ENOBUFS implies that the client is running out of resources. This patch changes those mappings. The NT_* errors are changed to map to the SRV level ERRnoresource. That error is in turn changed to return -EREMOTEIO which is the only POSIX error I could find that conveys that something went wrong on the server. While we're at it, change the SMB2 equivalent error to return the same. Signed-off-by: Jeff Layton <jlayton@redhat.com> Acked-by: Suresh Jayaraman <sjayaraman@suse.com> Signed-off-by: Steve French <smfrench@gmail.com>
2012-09-26	ext4: reimplement uninit extent optimization for move_extent_per_page()	Dmitry Monakhov
	Uninitialized extent may became initialized(parallel writeback task) at any moment after we drop i_data_sem, so we have to recheck extent's state after we hold page's lock and i_data_sem. If we about to change page's mapping we must hold page's lock in order to serialize other users. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-09-26	ext4: clean up online defrag bugs in move_extent_per_page()	Dmitry Monakhov
	Non-full list of bugs: 1) uninitialized extent optimization does not hold page's lock, and simply replace brunches after that writeback code goes crazy because block mapping changed under it's feets kernel BUG at fs/ext4/inode.c:1434! ( 288'th xfstress) 2) uninitialized extent may became initialized right after we drop i_data_sem, so extent state must be rechecked 3) Locked pages goes uptodate via following sequence: ->readpage(page); lock_page(page); use_that_page(page) But after readpage() one may invalidate it because it is uptodate and unlocked (reclaimer does that) As result kernel bug at include/linux/buffer_head.c:133! 4) We call write_begin() with already opened stansaction which result in following deadlock: ->move_extent_per_page() ->ext4_journal_start()-> hold journal transaction ->write_begin() ->ext4_da_write_begin() ->ext4_nonda_switch() ->writeback_inodes_sb_if_idle() --> will wait for journal_stop() 5) try_to_release_page() may fail and it does fail if one of page's bh was pinned by journal 6) If we about to change page's mapping we MUST hold it's lock during entire remapping procedure, this is true for both pages(original and donor one) Fixes: - Avoid (1) and (2) simply by temproraly drop uninitialized extent handling optimization, this will be reimplemented later. - Fix (3) by manually forcing page to uptodate state w/o dropping it's lock - Fix (4) by rearranging existing locking: from: journal_start(); ->write_begin to: write_begin(); journal_extend() - Fix (5) simply by checking retvalue - Fix (6) by locking both (original and donor one) pages during extent swap with help of mext_page_double_lock() Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-09-26	NFSv4.1: decode_getdeviceinfo should check xdr_read_pages() return value	Trond Myklebust
	Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-09-26	ext4: online defrag is not supported for journaled files	Dmitry Monakhov
	Proper block swap for inodes with full journaling enabled is truly non obvious task. In order to be on a safe side let's explicitly disable it for now. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org
2012-09-26	ext4: move_extent code cleanup	Dmitry Monakhov
	- Remove usless checks, because it is too late to check that inode != NULL at the moment it was referenced several times. - Double lock routines looks very ugly and locking ordering relays on order of i_ino, but other kernel code rely on order of pointers. Let's make them simple and clean. - check that inodes belongs to the same SB as soon as possible. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org
2012-09-26	fs/block_dev.c:1644:5: sparse: symbol 'blkdev_mmap' was not declared	Fengguang Wu
	blkdev_mmap() isn't used outside of fs/block_dev.c, mark it as static. Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-09-26	blockdev: turn a rw semaphore into a percpu rw semaphore	Mikulas Patocka
	This avoids cache line bouncing when many processes lock the semaphore for read. New percpu lock implementation The lock consists of an array of percpu unsigned integers, a boolean variable and a mutex. When we take the lock for read, we enter rcu read section, check for a "locked" variable. If it is false, we increase a percpu counter on the current cpu and exit the rcu section. If "locked" is true, we exit the rcu section, take the mutex and drop it (this waits until a writer finished) and retry. Unlocking for read just decreases percpu variable. Note that we can unlock on a difference cpu than where we locked, in this case the counter underflows. The sum of all percpu counters represents the number of processes that hold the lock for read. When we need to lock for write, we take the mutex, set "locked" variable to true and synchronize rcu. Since RCU has been synchronized, no processes can create new read locks. We wait until the sum of percpu counters is zero - when it is, there are no readers in the critical section. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-09-26	Fix a crash when block device is read and block size is changed at the same time	Mikulas Patocka
	The kernel may crash when block size is changed and I/O is issued simultaneously. Because some subsystems (udev or lvm) may read any block device anytime, the bug actually puts any code that changes a block device size in jeopardy. The crash can be reproduced if you place "msleep(1000)" to blkdev_get_blocks just before "bh->b_size = max_blocks << inode->i_blkbits;". Then, run "dd if=/dev/ram0 of=/dev/null bs=4k count=1 iflag=direct" While it is waiting in msleep, run "blockdev --setbsz 2048 /dev/ram0" You get a BUG. The direct and non-direct I/O is written with the assumption that block size does not change. It doesn't seem practical to fix these crashes one-by-one there may be many crash possibilities when block size changes at a certain place and it is impossible to find them all and verify the code. This patch introduces a new rw-lock bd_block_size_semaphore. The lock is taken for read during I/O. It is taken for write when changing block size. Consequently, block size can't be changed while I/O is being submitted. For asynchronous I/O, the patch only prevents block size change while the I/O is being submitted. The block size can change when the I/O is in progress or when the I/O is being finished. This is acceptable because there are no accesses to block size when asynchronous I/O is being finished. The patch prevents block size changing while the device is mapped with mmap. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-09-26	ext4: don't call update_backups() multiple times for the same bg	Tao Ma
	When performing an online resize, we add a bunch of groups at one time in ext4_flex_group_add, so in most cases a lot of group descriptors will be in the same group block. But in the end of this function, update_backups will be called for every group descriptor and the same block will be copied and journalled again and again. It is really a waste. Fix things so we only update a particular bg descriptor block once and skip subsequent updates of the same block. Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-09-25	ext4: fix double unlock buffer mess during fs-resize	Dmitry Monakhov
	bh_submit_read() is responsible for unlock bh on endio. In addition, we need to use bh_uptodate_or_lock() to avoid races. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-09-25	compat_ioctl: Avoid using undefined RS-485 IOCTLs	Jaeden Amero
	Wrap the use of TIOCSRS485 and TIOCGRS485 in #ifdef so that we avoid adding undefined IOCTLs to the ioctl pointer list as compatible ioctls. This change was motivated by a build error on a MIPS build. tree: git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty.git tty-next head: ac57e7f38ea6fe7358cd0b7a2f2d21aef5ab70cd commit: 84c3b84860440a9e3a3666c14112f41311b8f623 [10/16] compat_ioctl: Add RS-485 IOCTLs to the list config: mips-fuloong2e_defconfig All related error/warning messages: fs/compat_ioctl.c:869:1: error: 'TIOCSRS485' undeclared here (not in a function) fs/compat_ioctl.c:870:1: error: 'TIOCGRS485' undeclared here (not in a function) vim +869 fs/compat_ioctl.c 863 COMPATIBLE_IOCTL(TIOCSPGRP) 864 COMPATIBLE_IOCTL(TIOCGPGRP) 865 COMPATIBLE_IOCTL(TIOCGPTN) 866 COMPATIBLE_IOCTL(TIOCSPTLCK) 867 COMPATIBLE_IOCTL(TIOCSERGETLSR) 868 COMPATIBLE_IOCTL(TIOCSIG) > 869 COMPATIBLE_IOCTL(TIOCSRS485) 870 COMPATIBLE_IOCTL(TIOCGRS485) 871 #ifdef TCGETS2 872 COMPATIBLE_IOCTL(TCGETS2) Reported-by: Fengguang Wu <fengguang.wu@intel.com> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Jaeden Amero <jaeden.amero@ni.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-09-25	nfsd4: fix bind_conn_to_session xdr comment	J. Bruce Fields
	Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-09-25	NFS4: avoid underflow when converting error to pointer.	NeilBrown
	In nfs4_create_sec_client, 'flavor' can hold a negative error code (returned from nfs4_negotiate_security), even though it is an 'enum' and hence unsigned. The code is careful to cast it to an (int) before testing if it is negative, however it doesn't cast to an (int) before calling ERR_PTR. On a machine where "void*" is larger than "int", this results in the unsigned equivalent of -1 (e.g. 0xffffffff) being converted to a pointer. Subsequent code determines that this is not negative, and so dereferences it with predictable results. So: cast 'flavor' to a (signed) int before passing to ERR_PTR. cc: Benny Halevy <bhalevy@tonian.com> Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-09-25	NFS: fix the return value check by using IS_ERR	Wei Yongjun
	In case of error, the function rpcauth_create() returns ERR_PTR() and never returns NULL pointer. The NULL test in the return value check should be replaced with IS_ERR(). dpatch engine is used to auto generated this patch. (https://github.com/weiyj/dpatch) Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-09-24	cifs: remove support for deprecated "forcedirectio" and "strictcache" mount ↵	Jeff Layton
	options ...and make the default cache=strict as promised for 3.7. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com>
2012-09-24	cifs: remove support for CIFS_IOC_CHECKUMOUNT ioctl	Jeff Layton
	...as promised for 3.7. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com>
2012-09-24	CIFS: Fix possible memory leaks in SMB2 code	Pavel Shilovsky
	and add missed increments of failed async read and write requests. Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru> Signed-off-by: Steve French <smfrench@gmail.com>
2012-09-24	CIFS: Fix endian conversion of IndexNumber	Pavel Shilovsky
	by making it __le64 rather than __u64 in FILE_AL_INFO structure. Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru> Signed-off-by: Steve French <smfrench@gmail.com>
2012-09-24	Trivial endian fixes	Steve French
	Some trivial endian fixes for the SMB2 code. One warning remains which I asked Pavel to look at. Reviewed-by: Pavel Shilovsky <piastry@etersoft.ru> Signed-off-by: Steve French <smfrench@gmail.com>
2012-09-24	MARK SMB2 support EXPERIMENTAL	Steve French
	Now that the merge of the remaining pieces needed for SMB2 (SMB2.1 dialect) are in, and most test cases pass, we can consider SMB2.1 EXPERIMENTAL rather than "BROKEN." Reviewed-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Pavel Shilovsky <piastry@etersoft.ru> Signed-off-by: Steve French <smfrench@gmail.com>
2012-09-24	Update cifs version number	Steve French
	With SMB2 support, update from version 1.79 to 2.0 to make it easier for users to recognize which version has SMB2 support. Signed-off-by: Steve French <sfrench@us.ibm.com> Reviewed-by: Jeff Layton <jlayton@redhat.com>
2012-09-24	cifs: add FL_CLOSE to fl_flags mask in cifs_read_flock	Jeff Layton
	FL_CLOSE is quite common when you close a file on which you hold a lock. The spurious "Unknown lock flags" message in cFYI is confusing in this case. Reported-by: Alexander Bokovoy <abokovoy@redhat.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Pavel Shilovsky <piastry@etersoft.ru> Signed-off-by: Steve French <sfrench@us.ibm.com>
2012-09-24	cifs: Mangle string used for unc in /proc/mounts	Sachin Prabhu
	The string for "unc=" in /proc/mounts needs to be escaped. The current behaviour can create problems in cases when mounting a share starting with a number. example: >mount -t cifs -o username=test,password=x vm140-31:/17000-test /mnt >mount -o remount,password=x /mnt mount error: could not resolve address for vm140-31x00-test: Unknown error The sub-string "\170" which is part of the unc for the mount above in /proc/mounts is interpreted as character'x' in the case above. Escaping the string fixes the problem. Signed-off-by: Sachin Prabhu <sprabhu@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
2012-09-24	cifs: cleanups for cifs_mkdir_qinfo	Jeff Layton
	Rename inode pointers for better clarity. Move the d_instantiate call to the end of the function to prevent other tasks from seeing it before we've finished constructing it. Since we should have exclusive access to the inode at this point, remove the spinlock around i_nlink update. Reviewed-by: Pavel Shilovsky <piastry@etersoft.ru> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
2012-09-24	CIFS: Fix fast lease break after open problem	Pavel Shilovsky
	Now we walk though cifsFileInfo's list for every incoming lease break and look for an equivalent there. That approach misses lease breaks that come just after an open response - we don't have time to populate new cifsFileInfo structure to the list. Fix this by adding new list of pending opens and look for a lease there if we didn't find it in the list of cifsFileInfo structures. Signed-off-by: Pavel Shilovsky <pshilovsky@etersoft.ru> Signed-off-by: Steve French <sfrench@us.ibm.com>
2012-09-24	CIFS: Add SMB2.1 lease break support	Pavel Shilovsky
	Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org> Signed-off-by: Steve French <sfrench@us.ibm.com>
2012-09-24	CIFS: Fix cache coherency for read oplock case	Pavel Shilovsky
	When we have a file opened with read oplock and we are writing a data to this file, we need to store the data in the cache and then send to the server to ensure that the next read operation will get a coherent data. Also mark it as CONFIG_CIFS_SMB2 because it's more suitable for SMB2 code but can fix some CIFS problems too (when server delays sending an oplock break after a write request). We can drop this ifdefs dependence in future. Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru> Signed-off-by: Steve French <sfrench@us.ibm.com>
2012-09-24	CIFS: Request SMB2.1 leases	Pavel Shilovsky
	if server supports them and we need oplocks. Signed-off-by: Pavel Shilovsky <piastryyy@gmail.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
2012-09-24	CIFS: Check for mandatory brlocks on read/write	Pavel Shilovsky
	Currently CIFS code accept read/write ops on mandatory locked area when two processes use the same file descriptor - it's wrong. Fix this by serializing io and brlock operations on the inode. Signed-off-by: Pavel Shilovsky <pshilovsky@etersoft.ru> Signed-off-by: Steve French <sfrench@us.ibm.com>
2012-09-24	CIFS: Turn lock mutex into rw semaphore	Pavel Shilovsky
	and allow several processes to walk through the lock list and read can_cache_brlcks value if they are not going to modify them. Signed-off-by: Pavel Shilovsky <pshilovsky@etersoft.ru> Signed-off-by: Steve French <sfrench@us.ibm.com>
2012-09-24	CIFS: Use brlock cache for SMB2	Pavel Shilovsky
	Signed-off-by: Pavel Shilovsky <pshilovsky@etersoft.ru> Signed-off-by: Steve French <sfrench@us.ibm.com>
2012-09-24	CIFS: Add brlock support for SMB2	Pavel Shilovsky
	Signed-off-by: Pavel Shilovsky <pshilovsky@etersoft.ru>
2012-09-24	CIFS: Handle SMB2 lock flags	Pavel Shilovsky
	Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>
2012-09-24	CIFS: Move brlock code to ops struct	Pavel Shilovsky
	Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org>
2012-09-24	CIFS: Remove spinlock dependence in brlock processing	Pavel Shilovsky
	Now we need to lock/unlock a spinlock while processing brlock ops on the inode. Move brlocks of a fid to a separate list and attach all such lists to the inode. This let us not hold a spinlock. Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org>
2012-09-24	CIFS: Add NTLMSSP sec type to defaults	Pavel Shilovsky
	to let us negotiate SMB2 without specifying sec type explicitly. Signed-off-by: Pavel Shilovsky <piastryyy@gmail.com>
2012-09-24	cifs: remove kmap lock and rsize limit	Jeff Layton
	Now that we aren't abusing the kmap address space, there's no need for this lock or to impose a limit on the rsize. Signed-off-by: Jeff Layton <jlayton@redhat.com>
2012-09-24	cifs: replace kvec array in readdata with a single kvec	Jeff Layton
	The array is no longer needed. We just need a single kvec to hold the header for signature checking. Signed-off-by: Jeff Layton <jlayton@redhat.com>
2012-09-24	cifs: convert async read code to use pages array without kmapping	Jeff Layton
	Replace the "marshal_iov" function with a "read_into_pages" function. That function will copy the read data off the socket and into the pages array, kmapping and reading pages one at a time. Signed-off-by: Jeff Layton <jlayton@redhat.com>
2012-09-24	cifs: turn the pages list in cifs_readdata into an array	Jeff Layton
	We'll need an array to put into a smb_rqst, so convert this into an array instead of (ab)using the lru list_head. Signed-off-by: Jeff Layton <jlayton@redhat.com>
2012-09-24	cifs: allocate kvec array for cifs_readdata as a separate allocation	Jeff Layton
	Eventually, we're going to want to append a list of pages to cifs_readdata instead of a list of kvecs. To prepare for that, turn the kvec array allocation into a separate one and just keep a pointer to it in the readdata. Signed-off-by: Jeff Layton <jlayton@redhat.com>
2012-09-24	cifs: add deprecation warning to sockopt=TCP_NODELAY option	Jeff Layton
	Now that we're using TCP_CORK on the socket, there's no value in continuting to support this option. Schedule it for removal in 3.9. Reviewed-by: Pavel Shilovsky <pshilovsky@samba.org> Signed-off-by: Jeff Layton <jlayton@redhat.com>
2012-09-24	cifs: remove the kmap size limit from wsize	Jeff Layton
	Now that we're not kmapping so much at once, there's no need to cap the wsize at the amount that can be simultaneously kmapped. Reviewed-by: Pavel Shilovsky <pshilovsky@samba.org> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com>
2012-09-24	cifs: convert async write code to pass in data via rq_pages array	Jeff Layton
	Reviewed-by: Pavel Shilovsky <pshilovsky@samba.org> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com>
2012-09-24	cifs: change cifs_call_async to use smb_rqst structs	Jeff Layton
	For now, none of the callers populate rq_pages. That will be done for writes in a later patch. While we're at it, change the prototype of setup_async_request not to need a return pointer argument. Just return the pointer to the mid_q_entry or an ERR_PTR. Reviewed-by: Pavel Shilovsky <pshilovsky@samba.org> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com>
2012-09-24	cifs: teach signing routines how to deal with arrays of pages in a smb_rqst	Jeff Layton
	Use the smb_send_rqst helper function to kmap each page in the array and update the hash for that chunk. Reviewed-by: Pavel Shilovsky <pshilovsky@samba.org> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com>
2012-09-24	cifs: teach smb_send_rqst how to handle arrays of pages	Jeff Layton
	Add code that allows smb_send_rqst to send an array of pages after the initial kvec array has been sent. For now, we simply kmap the page array and send it using the standard smb_send_kvec function. Eventually, we may want to convert this code to use kernel_sendpage under the hood and avoid the kmap altogether for the page data. Reviewed-by: Pavel Shilovsky <pshilovsky@samba.org> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com>
2012-09-24	cifs: cork the socket before a send and uncork it afterward	Jeff Layton
	We want to send SMBs as "atomically" as possible. Prior to sending any data on the socket, cork it to make sure that no non-full frames go out. Afterward, uncork it to make sure all of the data gets pushed out to the wire. Note that this more or less renders the socket=TCP_NODELAY mount option obsolete. When TCP_CORK and TCP_NODELAY are used on the same socket, TCP_NODELAY is essentially ignored. Acked-by: Pavel Shilovsky <pshilovsky@samba.org> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com>
2012-09-24	cifs: convert send code to use smb_rqst structs	Jeff Layton
	Again, just a change in the arguments and some function renaming here. In later patches, we'll change this code to deal with page arrays. In this patch, we add a new smb_send_rqst wrapper and have smb_sendv call that. Then we move most of the existing smb_sendv code into a new function -- smb_send_kvec. This seems a little redundant, but later we'll flesh this out to deal with arrays of pages. Reviewed-by: Pavel Shilovsky <pshilovsky@samba.org> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com>