summaryrefslogtreecommitdiffstats
path: root/fs
AgeCommit message (Collapse)Author
2012-09-26procfs: Convert /proc/pid/fdinfo/ handling routines to seq-file v2Cyrill Gorcunov
This patch converts /proc/pid/fdinfo/ handling routines to seq-file which is needed to extend seq operations and plug in auxiliary fdinfo provides from subsystems like eventfd/eventpoll/fsnotify. Note the proc_fd_link no longer call for proc_fd_info, simply because the guts of proc_fd_info() got merged into ->show() of that seq_file Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-09-26procfs: Move /proc/pid/fd[info] handling code to fd.[ch]Cyrill Gorcunov
This patch prepares the ground for further extension of /proc/pid/fd[info] handling code by moving fdinfo handling code into fs/proc/fd.c. I think such move makes both fs/proc/base.c and fs/proc/fd.c easier to read. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Acked-by: Pavel Emelyanov <xemul@parallels.com> CC: Al Viro <viro@ZenIV.linux.org.uk> CC: Alexey Dobriyan <adobriyan@gmail.com> CC: Andrew Morton <akpm@linux-foundation.org> CC: James Bottomley <jbottomley@parallels.com> CC: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> CC: Alexey Dobriyan <adobriyan@gmail.com> CC: Matthew Helsley <matt.helsley@gmail.com> CC: "J. Bruce Fields" <bfields@fieldses.org> CC: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-09-26new helper: daemonize_descriptors()Al Viro
descriptor-related parts of daemonize, done right. As the result we simplify the locking rules for ->files - we hold task_lock in *all* cases when we modify ->files. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-09-26do_coredump(): make sure that descriptor table isn't sharedAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-09-26new helper: iterate_fd()Al Viro
iterates through the opened files in given descriptor table, calling a supplied function; we stop once non-zero is returned. Callback gets struct file *, descriptor number and const void * argument passed to iterator. It is called with files->file_lock held, so it is not allowed to block. tty_io, netprio_cgroup and selinux flush_unauthorized_files() converted to its use. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-09-26make expand_files() and alloc_fd() staticAl Viro
no callers outside of fs/file.c left Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-09-26take __{set,clear}_{open_fd,close_on_exec}() into fs/file.cAl Viro
nobody uses those outside anymore. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-09-26new helper: replace_fd()Al Viro
analog of dup2(), except that it takes struct file * as source. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-09-26take purely descriptor-related stuff from fcntl.c to file.cAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-09-26take close-on-exec logics to fs/file.c, clean it up a bitAl Viro
... and add cond_resched() there, while we are at it. We can get large latencies as is... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-09-26ext4: remove unused function ext4_ext_check_cacheLukas Czerner
Remove unused function ext4_ext_check_cache() and merge the code back to the ext4_ext_in_cache(). Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-09-26take descriptor-related part of close() to file.cAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-09-26take fget() and friends to fs/file.cAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-09-26expose a low-level variant of fd_install() for binderAl Viro
Similar situation to that of __alloc_fd(); do not use unless you really have to. You should not touch any descriptor table other than your own; it's a sure sign of a really bad API design. As with __alloc_fd(), you *must* use a first-class reference to struct files_struct; something obtained by get_files_struct(some task) (let alone direct task->files) will not do. It must be either current->files, or obtained by get_files_struct(current) by the owner of that sucker and given to you. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-09-26move put_unused_fd() and fd_install() to fs/file.cAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-09-26trim free_fdtable_rcu()Al Viro
embedded case isn't hit anymore Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-09-26don't bother with call_rcu() in put_files_struct()Al Viro
At that point nobody can see us anyway; everything that looks at files_fdtable(files) is separated from the guts of put_files_struct(files) - either since files is current->files or because we fetched it under task_lock() and hadn't dropped that yet, or because we'd bumped files->count while holding task_lock()... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-09-26move files_struct-related bits from kernel/exit.c to fs/file.cAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-09-26new helper: __alloc_fd()Al Viro
Essentially, alloc_fd() in a files_struct we own a reference to. Most of the time wanting to use it is a sign of lousy API design (such as android/binder). It's *not* a general-purpose interface; better that than open-coding its guts, but again, playing with other process' descriptor table is a sign of bad design. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-09-26take rlimit check to callers of expand_files()Al Viro
... except for one in android, where the check is different and already done in caller. No need to recalculate rlimit many times in alloc_fd() either. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-09-26fanotify: sanitize failure exits in copy_event_to_user()Al Viro
* do copy_to_user() before prepare_for_access_response(); that kills the need in remove_access_response(). * don't do fd_install() until we are past the last possible failure exit. Don't use sys_close() on cleanup side - just put_unused_fd() and fput(). Less racy that way... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-09-26pipe(2) - race-free error recoveryAl Viro
don't mess with sys_close() if copy_to_user() fails; just postpone fd_install() until we know it hasn't. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-09-26autofs4: don't open-code fd_install()Al Viro
The only difference between autofs_dev_ioctl_fd_install() and fd_install() is __set_close_on_exec() done by the latter. Just use get_unused_fd_flags(O_CLOEXEC) to allocate the descriptor and be done with that... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-09-26make get_unused_fd_flags() a functionAl Viro
... and get_unused_fd() a macro around it Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-09-26Merge remote branch 'origin' into for-nextAl Viro
2012-09-26ext4: use kmem_cache_zalloc instead of kmem_cache_alloc/memsetWei Yongjun
Using kmem_cache_zalloc() instead of kmem_cache_alloc() and memset(). spatch with a semantic match is used to found this problem. (http://coccinelle.lip6.fr/) Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-09-26xfs: Make inode32 a remountable optionCarlos Maiolino
As inode64 is the default option now, and was also made remountable previously, inode32 can also be remounted on-the-fly when it is needed. Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2012-09-26xfs: add inode64->inode32 transition into xfs_set_inode32()Carlos Maiolino
To make inode32 a remountable option, xfs_set_inode32() should be able to make a transition from inode64 option, disabling inode allocation on higher AGs. Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2012-09-26xfs: Fix mp->m_maxagi update during inode64 remountCarlos Maiolino
With the changes made on xfs_set_inode64(), to make it behave as xfs_set_inode32() (now leaving to the caller the responsibility to update mp->m_maxagi), we use the return value of xfs_set_inode64() to update mp->m_maxagi during remount. Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2012-09-26xfs: reduce code duplication handling inode32/64 optionsCarlos Maiolino
Add xfs_set_inode32() to be used to enable inode32 allocation mode. this will reduce the amount of duplicated code needed to mount/remount a filesystem with inode32 option. This patch also changes xfs_set_inode64() to return the maximum AG number that inodes can be allocated instead of set mp->m_maxagi by itself, so that the behaviour is the same as xfs_set_inode32(). This simplifies code that calls these functions and needs to know the maximum AG that inodes can be allocated in. Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2012-09-26xfs: make inode64 as the default allocation modeCarlos Maiolino
since 64-bit inodes can be accessed while using inode32, and these can also be used on 32-bit kernels, there is no reason to still keep inode32 as the default mount option. If the filesystem cannot handle 64bit inode numbers (i.e CONFIG_LBDAF is not enabled and BITS_PER_LONG == 32), XFS_MOUNT_SMALL_INUMS will still be set by default, so inode64 is not an unconditional default value. Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2012-09-26xfs: Fix m_agirotor reset during AG selectionCarlos Maiolino
xfs_ialloc_next_ag() currently resets m_agirotor when it is equal to m_maxagi: if (++mp->m_agirotor == mp->m_maxagi) mp->m_agirotor = 0; But, if for some reason mp->m_maxagi changes to a lower value than current m_agirotor, this condition will never be true, causing m_agirotor to exceed the maximum allowed value (m_maxagi). This implies mainly during lookups for xfs_perag structs in its radix tree, since the agno value used for the lookup is based on m_agirotor. An out-of-range m_agirotor may cause a lookup failure which in case will return NULL. As an example, the value of m_maxagi is decreased during inode64->inode32 remount process, case where I've found this problem. Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2012-09-26Make inode64 a remountable optionCarlos Maiolino
Actually, there is no reason about why a user must umount and mount a XFS filesystem to enable 'inode64' option. So, this patch makes this a remountable option. Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2012-09-26cifs: change DOS/NT/POSIX mapping of ERRnoresourceJeff Layton
ERRnoresource is an ERRSRV level (aka server-side) error and means "No resources currently available for request". Currently that maps to POSIX -ENOBUFS. No NT errors map to it currently. NT_STATUS_INSUFFICIENT_RESOURCES and NT_STATUS_INSUFF_SERVER_RESOURCES are also similar in meaning. Currently the client maps those to ERRnomem, which maps to -ENOMEM in POSIX. All of these mappings seem to be quite wrong to me and are confusing for users. All of the above errors indicate problems on the server, not the client. Reporting -ENOMEM or -ENOBUFS implies that the client is running out of resources. This patch changes those mappings. The NT_* errors are changed to map to the SRV level ERRnoresource. That error is in turn changed to return -EREMOTEIO which is the only POSIX error I could find that conveys that something went wrong on the server. While we're at it, change the SMB2 equivalent error to return the same. Signed-off-by: Jeff Layton <jlayton@redhat.com> Acked-by: Suresh Jayaraman <sjayaraman@suse.com> Signed-off-by: Steve French <smfrench@gmail.com>
2012-09-26ext4: reimplement uninit extent optimization for move_extent_per_page()Dmitry Monakhov
Uninitialized extent may became initialized(parallel writeback task) at any moment after we drop i_data_sem, so we have to recheck extent's state after we hold page's lock and i_data_sem. If we about to change page's mapping we must hold page's lock in order to serialize other users. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-09-26ext4: clean up online defrag bugs in move_extent_per_page()Dmitry Monakhov
Non-full list of bugs: 1) uninitialized extent optimization does not hold page's lock, and simply replace brunches after that writeback code goes crazy because block mapping changed under it's feets kernel BUG at fs/ext4/inode.c:1434! ( 288'th xfstress) 2) uninitialized extent may became initialized right after we drop i_data_sem, so extent state must be rechecked 3) Locked pages goes uptodate via following sequence: ->readpage(page); lock_page(page); use_that_page(page) But after readpage() one may invalidate it because it is uptodate and unlocked (reclaimer does that) As result kernel bug at include/linux/buffer_head.c:133! 4) We call write_begin() with already opened stansaction which result in following deadlock: ->move_extent_per_page() ->ext4_journal_start()-> hold journal transaction ->write_begin() ->ext4_da_write_begin() ->ext4_nonda_switch() ->writeback_inodes_sb_if_idle() --> will wait for journal_stop() 5) try_to_release_page() may fail and it does fail if one of page's bh was pinned by journal 6) If we about to change page's mapping we MUST hold it's lock during entire remapping procedure, this is true for both pages(original and donor one) Fixes: - Avoid (1) and (2) simply by temproraly drop uninitialized extent handling optimization, this will be reimplemented later. - Fix (3) by manually forcing page to uptodate state w/o dropping it's lock - Fix (4) by rearranging existing locking: from: journal_start(); ->write_begin to: write_begin(); journal_extend() - Fix (5) simply by checking retvalue - Fix (6) by locking both (original and donor one) pages during extent swap with help of mext_page_double_lock() Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-09-26NFSv4.1: decode_getdeviceinfo should check xdr_read_pages() return valueTrond Myklebust
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-09-26ext4: online defrag is not supported for journaled filesDmitry Monakhov
Proper block swap for inodes with full journaling enabled is truly non obvious task. In order to be on a safe side let's explicitly disable it for now. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org
2012-09-26ext4: move_extent code cleanupDmitry Monakhov
- Remove usless checks, because it is too late to check that inode != NULL at the moment it was referenced several times. - Double lock routines looks very ugly and locking ordering relays on order of i_ino, but other kernel code rely on order of pointers. Let's make them simple and clean. - check that inodes belongs to the same SB as soon as possible. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org
2012-09-26fs/block_dev.c:1644:5: sparse: symbol 'blkdev_mmap' was not declaredFengguang Wu
blkdev_mmap() isn't used outside of fs/block_dev.c, mark it as static. Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-09-26blockdev: turn a rw semaphore into a percpu rw semaphoreMikulas Patocka
This avoids cache line bouncing when many processes lock the semaphore for read. New percpu lock implementation The lock consists of an array of percpu unsigned integers, a boolean variable and a mutex. When we take the lock for read, we enter rcu read section, check for a "locked" variable. If it is false, we increase a percpu counter on the current cpu and exit the rcu section. If "locked" is true, we exit the rcu section, take the mutex and drop it (this waits until a writer finished) and retry. Unlocking for read just decreases percpu variable. Note that we can unlock on a difference cpu than where we locked, in this case the counter underflows. The sum of all percpu counters represents the number of processes that hold the lock for read. When we need to lock for write, we take the mutex, set "locked" variable to true and synchronize rcu. Since RCU has been synchronized, no processes can create new read locks. We wait until the sum of percpu counters is zero - when it is, there are no readers in the critical section. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-09-26Fix a crash when block device is read and block size is changed at the same timeMikulas Patocka
The kernel may crash when block size is changed and I/O is issued simultaneously. Because some subsystems (udev or lvm) may read any block device anytime, the bug actually puts any code that changes a block device size in jeopardy. The crash can be reproduced if you place "msleep(1000)" to blkdev_get_blocks just before "bh->b_size = max_blocks << inode->i_blkbits;". Then, run "dd if=/dev/ram0 of=/dev/null bs=4k count=1 iflag=direct" While it is waiting in msleep, run "blockdev --setbsz 2048 /dev/ram0" You get a BUG. The direct and non-direct I/O is written with the assumption that block size does not change. It doesn't seem practical to fix these crashes one-by-one there may be many crash possibilities when block size changes at a certain place and it is impossible to find them all and verify the code. This patch introduces a new rw-lock bd_block_size_semaphore. The lock is taken for read during I/O. It is taken for write when changing block size. Consequently, block size can't be changed while I/O is being submitted. For asynchronous I/O, the patch only prevents block size change while the I/O is being submitted. The block size can change when the I/O is in progress or when the I/O is being finished. This is acceptable because there are no accesses to block size when asynchronous I/O is being finished. The patch prevents block size changing while the device is mapped with mmap. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-09-26ext4: don't call update_backups() multiple times for the same bgTao Ma
When performing an online resize, we add a bunch of groups at one time in ext4_flex_group_add, so in most cases a lot of group descriptors will be in the same group block. But in the end of this function, update_backups will be called for every group descriptor and the same block will be copied and journalled again and again. It is really a waste. Fix things so we only update a particular bg descriptor block once and skip subsequent updates of the same block. Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-09-25ext4: fix double unlock buffer mess during fs-resizeDmitry Monakhov
bh_submit_read() is responsible for unlock bh on endio. In addition, we need to use bh_uptodate_or_lock() to avoid races. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-09-25compat_ioctl: Avoid using undefined RS-485 IOCTLsJaeden Amero
Wrap the use of TIOCSRS485 and TIOCGRS485 in #ifdef so that we avoid adding undefined IOCTLs to the ioctl pointer list as compatible ioctls. This change was motivated by a build error on a MIPS build. tree: git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty.git tty-next head: ac57e7f38ea6fe7358cd0b7a2f2d21aef5ab70cd commit: 84c3b84860440a9e3a3666c14112f41311b8f623 [10/16] compat_ioctl: Add RS-485 IOCTLs to the list config: mips-fuloong2e_defconfig All related error/warning messages: fs/compat_ioctl.c:869:1: error: 'TIOCSRS485' undeclared here (not in a function) fs/compat_ioctl.c:870:1: error: 'TIOCGRS485' undeclared here (not in a function) vim +869 fs/compat_ioctl.c 863 COMPATIBLE_IOCTL(TIOCSPGRP) 864 COMPATIBLE_IOCTL(TIOCGPGRP) 865 COMPATIBLE_IOCTL(TIOCGPTN) 866 COMPATIBLE_IOCTL(TIOCSPTLCK) 867 COMPATIBLE_IOCTL(TIOCSERGETLSR) 868 COMPATIBLE_IOCTL(TIOCSIG) > 869 COMPATIBLE_IOCTL(TIOCSRS485) 870 COMPATIBLE_IOCTL(TIOCGRS485) 871 #ifdef TCGETS2 872 COMPATIBLE_IOCTL(TCGETS2) Reported-by: Fengguang Wu <fengguang.wu@intel.com> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Jaeden Amero <jaeden.amero@ni.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-09-25nfsd4: fix bind_conn_to_session xdr commentJ. Bruce Fields
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-09-25NFS4: avoid underflow when converting error to pointer.NeilBrown
In nfs4_create_sec_client, 'flavor' can hold a negative error code (returned from nfs4_negotiate_security), even though it is an 'enum' and hence unsigned. The code is careful to cast it to an (int) before testing if it is negative, however it doesn't cast to an (int) before calling ERR_PTR. On a machine where "void*" is larger than "int", this results in the unsigned equivalent of -1 (e.g. 0xffffffff) being converted to a pointer. Subsequent code determines that this is not negative, and so dereferences it with predictable results. So: cast 'flavor' to a (signed) int before passing to ERR_PTR. cc: Benny Halevy <bhalevy@tonian.com> Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-09-25NFS: fix the return value check by using IS_ERRWei Yongjun
In case of error, the function rpcauth_create() returns ERR_PTR() and never returns NULL pointer. The NULL test in the return value check should be replaced with IS_ERR(). dpatch engine is used to auto generated this patch. (https://github.com/weiyj/dpatch) Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-09-24cifs: remove support for deprecated "forcedirectio" and "strictcache" mount ↵Jeff Layton
options ...and make the default cache=strict as promised for 3.7. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com>
2012-09-24cifs: remove support for CIFS_IOC_CHECKUMOUNT ioctlJeff Layton
...as promised for 3.7. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com>