summaryrefslogtreecommitdiffstats
path: root/fs
AgeCommit message (Collapse)Author
2014-12-12btrfs: sink blocksize parameter to tree_block_processedDavid Sterba
Signed-off-by: David Sterba <dsterba@suse.cz>
2014-12-12btrfs: sink blocksize parameter to btrfs_find_create_tree_blockDavid Sterba
Finally it's clear that the requested blocksize is always equal to nodesize, with one exception, the superblock. Superblock has fixed size regardless of the metadata block size, but uses the same helpers to initialize sys array/chunk tree and to work with the chunk items. So it pretends to be an extent_buffer for a moment, btrfs_read_sys_array is full of special cases, we're adding one more. Signed-off-by: David Sterba <dsterba@suse.cz>
2014-12-12btrfs: sink blocksize parameter to btrfs_init_new_bufferDavid Sterba
Signed-off-by: David Sterba <dsterba@suse.cz>
2014-12-12btrfs: sink blocksize parameter to reada_tree_block_flaggedDavid Sterba
Signed-off-by: David Sterba <dsterba@suse.cz>
2014-12-12btrfs: remove blocksize from reada_extentDavid Sterba
Replace with global nodesize instead. Signed-off-by: David Sterba <dsterba@suse.cz>
2014-12-12btrfs: sink blocksize parameter to readahead_tree_blockDavid Sterba
All callers pass nodesize. Signed-off-by: David Sterba <dsterba@suse.cz>
2014-12-12fuse: use file_inode() in fuse_file_fallocate()Miklos Szeredi
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2014-12-12fuse: introduce fuse_simple_request() helperMiklos Szeredi
The following pattern is repeated many times: req = fuse_get_req_nopages(fc); /* Initialize req->(in|out).args */ fuse_request_send(fc, req); err = req->out.h.error; fuse_put_request(req); Create a new replacement helper: /* Initialize args */ err = fuse_simple_request(fc, &args); In addition to reducing the code size, this will ease moving from the complex arg-based to a simpler page-based I/O on the fuse device. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2014-12-12fuse: reduce max out argsMiklos Szeredi
The third out-arg is never actually used. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2014-12-12fuse: hold inode instead of path after releaseMiklos Szeredi
path_put() in release could trigger a DESTROY request in fuseblk. The possible deadlock was worked around by doing the path_put() with schedule_work(). This complexity isn't needed if we just hold the inode instead of the path. Since we now flush all requests before destroying the super block we can be sure that all held inodes will be dropped. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2014-12-12fuse: flush requests on umountMiklos Szeredi
Use fuse_abort_conn() instead of fuse_conn_kill() in fuse_put_super(). This flushes and aborts requests still on any queues. But since we've already reset fc->connected, those requests would not be useful anyway and would be flushed when the fuse device is closed. Next patches will rely on requests being flushed before the superblock is destroyed. Use fuse_abort_conn() in cuse_process_init_reply() too, since it makes no difference there, and we can get rid of fuse_conn_kill(). Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2014-12-12fuse: don't wake up reserved req in fuse_conn_kill()Miklos Szeredi
Waking up reserved_req_waitq from fuse_conn_kill() doesn't make sense since we aren't chaging ff->reserved_req here, which is what this waitqueue signals. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2014-12-11Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linusLinus Torvalds
Pull MIPS updates from Ralf Baechle: "This is an unusually large pull request for MIPS - in parts because lots of patches missed the 3.18 deadline but primarily because some folks opened the flood gates. - Retire the MIPS-specific phys_t with the generic phys_addr_t. - Improvments for the backtrace code used by oprofile. - Better backtraces on SMP systems. - Cleanups for the Octeon platform code. - Cleanups and fixes for the Loongson platform code. - Cleanups and fixes to the firmware library. - Switch ATH79 platform to use the firmware library. - Grand overhault to the SEAD3 and Malta interrupt code. - Move the GIC interrupt code to drivers/irqchip - Lots of GIC cleanups and updates to the GIC code to use modern IRQ infrastructures and features of the kernel. - OF documentation updates for the GIC bindings - Move GIC clocksource driver to drivers/clocksource - Merge GIC clocksource driver with clockevent driver. - Further updates to bring the GIC clocksource driver up to date. - R3000 TLB code cleanups - Improvments to the Loongson 3 platform code. - Convert pr_warning to pr_warn. - Merge a bunch of small lantiq and ralink fixes that have been staged/lingering inside the openwrt tree for a while. - Update archhelp for IP22/IP32 - Fix a number of issues for Loongson 1B. - New clocksource and clockevent driver for Loongson 1B. - Further work on clk handling for Loongson 1B. - Platform work for Broadcom BMIPS. - Error handling cleanups for TurboChannel. - Fixes and optimization to the microMIPS support. - Option to disable the FTLB. - Dump more relevant information on machine check exception - Change binfmt to allow arch to examine PT_*PROC headers - Support for new style FPU register model in O32 - VDSO randomization. - BCM47xx cleanups - BCM47xx reimplement the way the kernel accesses NVRAM information. - Random cleanups - Add support for ATH25 platforms - Remove pointless locking code in some PCI platforms. - Some improvments to EVA support - Minor Alchemy cleanup" * 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus: (185 commits) MIPS: Add MFHC0 and MTHC0 instructions to uasm. MIPS: Cosmetic cleanups of page table headers. MIPS: Add CP0 macros for extended EntryLo registers MIPS: Remove now unused definition of phys_t. MIPS: Replace use of phys_t with phys_addr_t. MIPS: Replace MIPS-specific 64BIT_PHYS_ADDR with generic PHYS_ADDR_T_64BIT PCMCIA: Alchemy Don't select 64BIT_PHYS_ADDR in Kconfig. MIPS: lib: memset: Clean up some MIPS{EL,EB} ifdefery MIPS: iomap: Use __mem_{read,write}{b,w,l} for MMIO MIPS: <asm/types.h> fix indentation. MAINTAINERS: Add entry for BMIPS multiplatform kernel MIPS: Enable VDSO randomization MIPS: Remove a temporary hack for debugging cache flushes in SMTC configuration MIPS: Remove declaration of obsolete arch_init_clk_ops() MIPS: atomic.h: Reformat to fit in 79 columns MIPS: Apply `.insn' to fixup labels throughout MIPS: Fix microMIPS LL/SC immediate offsets MIPS: Kconfig: Only allow 32-bit microMIPS builds MIPS: signal.c: Fix an invalid cast in ISA mode bit handling MIPS: mm: Only build one microassembler that is suitable ...
2014-12-11userns: Add a knob to disable setgroups on a per user namespace basisEric W. Biederman
- Expose the knob to user space through a proc file /proc/<pid>/setgroups A value of "deny" means the setgroups system call is disabled in the current processes user namespace and can not be enabled in the future in this user namespace. A value of "allow" means the segtoups system call is enabled. - Descendant user namespaces inherit the value of setgroups from their parents. - A proc file is used (instead of a sysctl) as sysctls currently do not allow checking the permissions at open time. - Writing to the proc file is restricted to before the gid_map for the user namespace is set. This ensures that disabling setgroups at a user namespace level will never remove the ability to call setgroups from a process that already has that ability. A process may opt in to the setgroups disable for itself by creating, entering and configuring a user namespace or by calling setns on an existing user namespace with setgroups disabled. Processes without privileges already can not call setgroups so this is a noop. Prodcess with privilege become processes without privilege when entering a user namespace and as with any other path to dropping privilege they would not have the ability to call setgroups. So this remains within the bounds of what is possible without a knob to disable setgroups permanently in a user namespace. Cc: stable@vger.kernel.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2014-12-11Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds
Pull networking updates from David Miller: 1) New offloading infrastructure and example 'rocker' driver for offloading of switching and routing to hardware. This work was done by a large group of dedicated individuals, not limited to: Scott Feldman, Jiri Pirko, Thomas Graf, John Fastabend, Jamal Hadi Salim, Andy Gospodarek, Florian Fainelli, Roopa Prabhu 2) Start making the networking operate on IOV iterators instead of modifying iov objects in-situ during transfers. Thanks to Al Viro and Herbert Xu. 3) A set of new netlink interfaces for the TIPC stack, from Richard Alpe. 4) Remove unnecessary looping during ipv6 routing lookups, from Martin KaFai Lau. 5) Add PAUSE frame generation support to gianfar driver, from Matei Pavaluca. 6) Allow for larger reordering levels in TCP, which are easily achievable in the real world right now, from Eric Dumazet. 7) Add a variable of napi_schedule that doesn't need to disable cpu interrupts, from Eric Dumazet. 8) Use a doubly linked list to optimize neigh_parms_release(), from Nicolas Dichtel. 9) Various enhancements to the kernel BPF verifier, and allow eBPF programs to actually be attached to sockets. From Alexei Starovoitov. 10) Support TSO/LSO in sunvnet driver, from David L Stevens. 11) Allow controlling ECN usage via routing metrics, from Florian Westphal. 12) Remote checksum offload, from Tom Herbert. 13) Add split-header receive, BQL, and xmit_more support to amd-xgbe driver, from Thomas Lendacky. 14) Add MPLS support to openvswitch, from Simon Horman. 15) Support wildcard tunnel endpoints in ipv6 tunnels, from Steffen Klassert. 16) Do gro flushes on a per-device basis using a timer, from Eric Dumazet. This tries to resolve the conflicting goals between the desired handling of bulk vs. RPC-like traffic. 17) Allow userspace to ask for the CPU upon what a packet was received/steered, via SO_INCOMING_CPU. From Eric Dumazet. 18) Limit GSO packets to half the current congestion window, from Eric Dumazet. 19) Add a generic helper so that all drivers set their RSS keys in a consistent way, from Eric Dumazet. 20) Add xmit_more support to enic driver, from Govindarajulu Varadarajan. 21) Add VLAN packet scheduler action, from Jiri Pirko. 22) Support configurable RSS hash functions via ethtool, from Eyal Perry. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1820 commits) Fix race condition between vxlan_sock_add and vxlan_sock_release net/macb: fix compilation warning for print_hex_dump() called with skb->mac_header net/mlx4: Add support for A0 steering net/mlx4: Refactor QUERY_PORT net/mlx4_core: Add explicit error message when rule doesn't meet configuration net/mlx4: Add A0 hybrid steering net/mlx4: Add mlx4_bitmap zone allocator net/mlx4: Add a check if there are too many reserved QPs net/mlx4: Change QP allocation scheme net/mlx4_core: Use tasklet for user-space CQ completion events net/mlx4_core: Mask out host side virtualization features for guests net/mlx4_en: Set csum level for encapsulated packets be2net: Export tunnel offloads only when a VxLAN tunnel is created gianfar: Fix dma check map error when DMA_API_DEBUG is enabled cxgb4/csiostor: Don't use MASTER_MUST for fw_hello call net: fec: only enable mdio interrupt before phy device link up net: fec: clear all interrupt events to support i.MX6SX net: fec: reset fep link status in suspend function net: sock: fix access via invalid file descriptor net: introduce helper macro for_each_cmsghdr ...
2014-12-11pstore-ram: Allow optional mapping with pgprot_noncachedTony Lindgren
On some ARMs the memory can be mapped pgprot_noncached() and still be working for atomic operations. As pointed out by Colin Cross <ccross@android.com>, in some cases you do want to use pgprot_noncached() if the SoC supports it to see a debug printk just before a write hanging the system. On ARMs, the atomic operations on strongly ordered memory are implementation defined. So let's provide an optional kernel parameter for configuring pgprot_noncached(), and use pgprot_writecombine() by default. Cc: Arnd Bergmann <arnd@arndb.de> Cc: Rob Herring <robherring2@gmail.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Anton Vorontsov <anton@enomsg.org> Cc: Colin Cross <ccross@android.com> Cc: Olof Johansson <olof@lixom.net> Cc: Russell King <linux@arm.linux.org.uk> Cc: stable@vger.kernel.org Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
2014-12-11pstore-ram: Fix hangs by using write-combine mappingsRob Herring
Currently trying to use pstore on at least ARMs can hang as we're mapping the peristent RAM with pgprot_noncached(). On ARMs, pgprot_noncached() will actually make the memory strongly ordered, and as the atomic operations pstore uses are implementation defined for strongly ordered memory, they may not work. So basically atomic operations have undefined behavior on ARM for device or strongly ordered memory types. Let's fix the issue by using write-combine variants for mappings. This corresponds to normal, non-cacheable memory on ARM. For many other architectures, this change does not change the mapping type as by default we have: #define pgprot_writecombine pgprot_noncached The reason why pgprot_noncached() was originaly used for pstore is because Colin Cross <ccross@android.com> had observed lost debug prints right before a device hanging write operation on some systems. For the platforms supporting pgprot_noncached(), we can add a an optional configuration option to support that. But let's get pstore working first before adding new features. Cc: Arnd Bergmann <arnd@arndb.de> Cc: Anton Vorontsov <cbouatmailru@gmail.com> Cc: Colin Cross <ccross@android.com> Cc: Olof Johansson <olof@lixom.net> Cc: linux-kernel@vger.kernel.org Cc: stable@vger.kernel.org Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: Rob Herring <rob.herring@calxeda.com> [tony@atomide.com: updated description] Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
2014-12-11coda_venus_readdir(): use file_inode()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-12-11fs/namei.c: fold link_path_walk() call into path_init()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-12-11path_init(): don't bother with LOOKUP_PARENT in argumentAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-12-11fs/namei.c: new helper (path_cleanup())Al Viro
All callers of path_init() proceed to do the identical cleanup when they are done with nameidata. Don't open-code it... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-12-11path_init(): store the "base" pointer to file in nameidata itselfAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-12-10Merge branch 'akpm' (patchbomb from Andrew)Linus Torvalds
Merge first patchbomb from Andrew Morton: - a few minor cifs fixes - dma-debug upadtes - ocfs2 - slab - about half of MM - procfs - kernel/exit.c - panic.c tweaks - printk upates - lib/ updates - checkpatch updates - fs/binfmt updates - the drivers/rtc tree - nilfs - kmod fixes - more kernel/exit.c - various other misc tweaks and fixes * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (190 commits) exit: pidns: fix/update the comments in zap_pid_ns_processes() exit: pidns: alloc_pid() leaks pid_namespace if child_reaper is exiting exit: exit_notify: re-use "dead" list to autoreap current exit: reparent: call forget_original_parent() under tasklist_lock exit: reparent: avoid find_new_reaper() if no children exit: reparent: introduce find_alive_thread() exit: reparent: introduce find_child_reaper() exit: reparent: document the ->has_child_subreaper checks exit: reparent: s/while_each_thread/for_each_thread/ in find_new_reaper() exit: reparent: fix the cross-namespace PR_SET_CHILD_SUBREAPER reparenting exit: reparent: fix the dead-parent PR_SET_CHILD_SUBREAPER reparenting exit: proc: don't try to flush /proc/tgid/task/tgid exit: release_task: fix the comment about group leader accounting exit: wait: drop tasklist_lock before psig->c* accounting exit: wait: don't use zombie->real_parent exit: wait: cleanup the ptrace_reparented() checks usermodehelper: kill the kmod_thread_locker logic usermodehelper: don't use CLONE_VFORK for ____call_usermodehelper() fs/hfs/catalog.c: fix comparison bug in hfs_cat_keycmp nilfs2: fix the nilfs_iget() vs. nilfs_new_inode() races ...
2014-12-10make default ->i_fop have ->open() fail with ENXIOAl Viro
As it is, default ->i_fop has NULL ->open() (along with all other methods). The only case where it matters is reopening (via procfs symlink) a file that didn't get its ->f_op from ->i_fop - anything else will have ->i_fop assigned to something sane (default would fail on read/write/ioctl/etc.). Unfortunately, such case exists - alloc_file() users, especially anon_get_file() ones. There we have tons of opened files of very different kinds sharing the same inode. As the result, attempt to reopen those via procfs succeeds and you get a descriptor you can't do anything with. Moreover, in case of sockets we set ->i_fop that will only be used on such reopen attempts - and put a failing ->open() into it to make sure those do not succeed. It would be simpler to put such ->open() into default ->i_fop and leave it unchanged both for anon inode (as we do anyway) and for socket ones. Result: * everything going through do_dentry_open() works as it used to * sock_no_open() kludge is gone * attempts to reopen anon-inode files fail as they really ought to * ditto for aio_private_file() * ditto for perfmon - this one actually tried to imitate sock_no_open() trick, but failed to set ->i_fop, so in the current tree reopens succeed and yield completely useless descriptor. Intent clearly had been to fail with -ENXIO on such reopens; now it actually does. * everything else that used alloc_file() keeps working - it has ->i_fop set for its inodes anyway Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-12-10make nameidata completely opaque outside of fs/namei.cAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-12-10Merge branch 'nsfs' into for-nextAl Viro
2014-12-10kill proc_ns completelyAl Viro
procfs inodes need only the ns_ops part; nsfs inodes don't need it at all Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-12-10take the targets of /proc/*/ns/* symlinks to separate fsAl Viro
New pseudo-filesystem: nsfs. Targets of /proc/*/ns/* live there now. It's not mountable (not even registered, so it's not in /proc/filesystems, etc.). Files on it *are* bindable - we explicitly permit that in do_loopback(). This stuff lives in fs/nsfs.c now; proc_ns_fget() moved there as well. get_proc_ns() is a macro now (it's simply returning ->i_private; would have been an inline, if not for header ordering headache). proc_ns_inode() is an ex-parrot. The interface used in procfs is ns_get_path(path, task, ops) and ns_get_name(buf, size, task, ops). Dentries and inodes are never hashed; a non-counting reference to dentry is stashed in ns_common (removed by ->d_prune()) and reused by ns_get_path() if present. See ns_get_path()/ns_prune_dentry/nsfs_evict() for details of that mechanism. As the result, proc_ns_follow_link() has stopped poking in nd->path.mnt; it does nd_jump_link() on a consistent <vfsmount,dentry> pair it gets from ns_get_path(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-12-10exit: proc: don't try to flush /proc/tgid/task/tgidOleg Nesterov
proc_flush_task_mnt() always tries to flush task/pid, but this is pointless if we reap the leader. d_invalidate() is recursive, and if nothing else the next d_hash_and_lookup(tgid) should fail anyway. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Aaron Tomlin <atomlin@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Rik van Riel <riel@redhat.com> Cc: Sterling Alexander <stalexan@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-10fs/hfs/catalog.c: fix comparison bug in hfs_cat_keycmpRasmus Villemoes
Relying on the sign (after casting to int) of the difference of two quantities for comparison is usually wrong. For example, should a-b turn out to be 2^31, the return value of cmp(a,b) is -2^31; but that would also be the return value from cmp(b, a). So a compares less than b and b compares less than a. One can also easily find three values a,b,c such that a compares less than b, b compares less than c, but a does not compare less than c. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Reviewed-by: Vyacheslav Dubeyko <slava@dubeyko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-10nilfs2: fix the nilfs_iget() vs. nilfs_new_inode() racesRyusuke Konishi
Same story as in commit 41080b5a2401 ("nfsd race fixes: ext2") (similar ext2 fix) except that nilfs2 needs to use insert_inode_locked4() instead of insert_inode_locked() and a bug of a check for dead inodes needs to be fixed. If nilfs_iget() is called from nfsd after nilfs_new_inode() calls insert_inode_locked4(), nilfs_iget() will wait for unlock_new_inode() at the end of nilfs_mkdir()/nilfs_create()/etc to unlock the inode. If nilfs_iget() is called before nilfs_new_inode() calls insert_inode_locked4(), it will create an in-core inode and read its data from the on-disk inode. But, nilfs_iget() will find i_nlink equals zero and fail at nilfs_read_inode_common(), which will lead it to call iget_failed() and cleanly fail. However, this sanity check doesn't work as expected for reused on-disk inodes because they leave a non-zero value in i_mode field and it hinders the test of i_nlink. This patch also fixes the issue by removing the test on i_mode that nilfs2 doesn't need. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-10nilfs2: deletion of an unnecessary check before the function call "iput"Markus Elfring
The iput() function tests whether its argument is NULL and then returns immediately. Thus the test around the call is not needed. This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-10nilfs2: avoid duplicate segment construction for fsync()Andreas Rohner
This patch removes filemap_write_and_wait_range() from nilfs_sync_file(), because it triggers a data segment construction by calling nilfs_writepages() with WB_SYNC_ALL. A data segment construction does not remove the inode from the i_dirty list and it does not clear the NILFS_I_DIRTY flag. Therefore nilfs_inode_dirty() still returns true, which leads to an unnecessary duplicate segment construction in nilfs_sync_file(). A call to filemap_write_and_wait_range() is not needed, because NILFS2 does not rely on the generic writeback mechanisms. Instead it implements its own mechanism to collect all dirty pages and write them into segments. It is more efficient to initiate the segment construction directly in nilfs_sync_file() without the detour over filemap_write_and_wait_range(). Additionally the lock of i_mutex is not needed, because all code blocks that are protected by i_mutex are also protected by a NILFS transaction: Function i_mutex nilfs_transaction ------------------------------------------------------ nilfs_ioctl_setflags: yes yes nilfs_fiemap: yes no nilfs_write_begin: yes yes nilfs_write_end: yes yes nilfs_lookup: yes no nilfs_create: yes yes nilfs_link: yes yes nilfs_mknod: yes yes nilfs_symlink: yes yes nilfs_mkdir: yes yes nilfs_unlink: yes yes nilfs_rmdir: yes yes nilfs_rename: yes yes nilfs_setattr: yes yes For nilfs_lookup() i_mutex is held for the parent directory, to protect it from modification. The segment construction does not modify directory inodes, so no lock is needed. nilfs_fiemap() reads the block layout on the disk, by using nilfs_bmap_lookup_contig(). This is already protected by bmap->b_sem. Signed-off-by: Andreas Rohner <andreas.rohner@gmx.net> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-10ncpfs: return proper error from NCP_IOC_SETROOT ioctlJan Kara
If some error happens in NCP_IOC_SETROOT ioctl, the appropriate error return value is then (in most cases) just overwritten before we return. This can result in reporting success to userspace although error happened. This bug was introduced by commit 2e54eb96e2c8 ("BKL: Remove BKL from ncpfs"). Propagate the errors correctly. Coverity id: 1226925. Fixes: 2e54eb96e2c80 ("BKL: Remove BKL from ncpfs") Signed-off-by: Jan Kara <jack@suse.cz> Cc: Petr Vandrovec <petr@vandrovec.name> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-10fs/binfmt_elf.c: fix internal inconsistency relating to vma dump sizeJungseung Lee
vma_dump_size() has been used several times on actual dumper and it is supposed to return the same value for the same vma. But vma_dump_size() could return different values for same vma. The known problem case is concurrent shared memory removal. If a vma is used for a shared memory and that shared memory is removed between writing program header and dumping vma memory, this will result in a dump file which is internally consistent. To fix the problem, we set baseline to get dump size and store the size into vma_filesz and always use the same vma dump size which is stored in vma_filsz. The consistnecy with reality is not actually guranteed, but it's tolerable since that is fully consistent with base line. Signed-off-by: Jungseung Lee <js07.lee@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-10fs/binfmt_misc.c: use GFP_KERNEL instead of GFP_USERAndrew Morton
GFP_USER means "honour cpuset nodes-allowed beancounting". These are regular old kernel objects and there seems no reason to give them this treatment. Acked-by: Mike Frysinger <vapier@gentoo.org> Cc: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-10binfmt_misc: clean up code style a bitMike Frysinger
Clean up various coding style issues that checkpatch complains about. No functional changes here. Signed-off-by: Mike Frysinger <vapier@gentoo.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-10binfmt_misc: add comments & debug logsMike Frysinger
When trying to develop a custom format handler, the errors returned all effectively get bucketed as EINVAL with no kernel messages. The other errors (ENOMEM/EFAULT) are internal/obvious and basic. Thus any time a bad handler is rejected, the developer has to walk the dense code and try to guess where it went wrong. Needing to dive into kernel code is itself a fairly high barrier for a lot of people. To improve this situation, let's deploy extensive pr_debug markers at logical parse points, and add comments to the dense parsing logic. It let's you see exactly where the parsing aborts, the string the kernel received (useful when dealing with shell code), how it translated the buffers to binary data, and how it will apply the mask at runtime. Some example output: $ echo ':qemu-foo:M::\x7fELF\xAD\xAD\x01\x00:\xff\xff\xff\xff\xff\x00\xff\x00:/usr/bin/qemu-foo:POC' > register $ dmesg binfmt_misc: register: received 92 bytes binfmt_misc: register: delim: 0x3a {:} binfmt_misc: register: name: {qemu-foo} binfmt_misc: register: type: M (magic) binfmt_misc: register: offset: 0x0 binfmt_misc: register: magic[raw]: 5c 78 37 66 45 4c 46 5c 78 41 44 5c 78 41 44 5c \x7fELF\xAD\xAD\ binfmt_misc: register: magic[raw]: 78 30 31 5c 78 30 30 00 x01\x00. binfmt_misc: register: mask[raw]: 5c 78 66 66 5c 78 66 66 5c 78 66 66 5c 78 66 66 \xff\xff\xff\xff binfmt_misc: register: mask[raw]: 5c 78 66 66 5c 78 30 30 5c 78 66 66 5c 78 30 30 \xff\x00\xff\x00 binfmt_misc: register: mask[raw]: 00 . binfmt_misc: register: magic/mask length: 8 binfmt_misc: register: magic[decoded]: 7f 45 4c 46 ad ad 01 00 .ELF.... binfmt_misc: register: mask[decoded]: ff ff ff ff ff 00 ff 00 ........ binfmt_misc: register: magic[masked]: 7f 45 4c 46 ad 00 01 00 .ELF.... binfmt_misc: register: interpreter: {/usr/bin/qemu-foo} binfmt_misc: register: flag: P (preserve argv0) binfmt_misc: register: flag: O (open binary) binfmt_misc: register: flag: C (preserve creds) The [raw] lines show us exactly what was received from userspace. The lines after that show us how the kernel has decoded things. Signed-off-by: Mike Frysinger <vapier@gentoo.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-10fs/file.c: replace get_unused_fd() with get_unused_fd_flags(0)Yann Droneaud
This patch replaces calls to get_unused_fd() with equivalent call to get_unused_fd_flags(0) to preserve current behavor for existing code. In a further patch, get_unused_fd() will be removed so that new code start using get_unused_fd_flags(), with the hope O_CLOEXEC could be used, either by default or choosen by userspace. Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-10binfmt_misc: replace get_unused_fd() with get_unused_fd_flags(0)Yann Droneaud
This patch replaces calls to get_unused_fd() with equivalent call to get_unused_fd_flags(0) to preserve current behavor for existing code. In a further patch, get_unused_fd() will be removed so that new code start using get_unused_fd_flags(), with the hope O_CLOEXEC could be used, either by default or choosen by userspace. Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-10proc: task_state: ptrace_parent() doesn't need pid_alive() checkOleg Nesterov
p->ptrace != 0 means that release_task(p) was not called, so pid_alive() buys nothing and we can remove this check. Other callers already use it directly without additional checks. Note: with or without this patch ptrace_parent() can return the pointer to the freed task, this will be explained/fixed later. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Aaron Tomlin <atomlin@redhat.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com>, Cc: Sterling Alexander <stalexan@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Roland McGrath <roland@hack.frob.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-10proc: task_state: move the main seq_printf() outside of rcu_read_lock()Oleg Nesterov
task_state() does seq_printf() under rcu_read_lock(), but this is only needed for task_tgid_nr_ns() and task_numa_group_id(). We can calculate tgid/ngid and drop rcu lock. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Aaron Tomlin <atomlin@redhat.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com>, Cc: Sterling Alexander <stalexan@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Roland McGrath <roland@hack.frob.com> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-10proc: task_state: deuglify the max_fds calculationOleg Nesterov
1. The usage of fdt looks very ugly, it can't be NULL if ->files is not NULL. We can use "unsigned int max_fds" instead. 2. This also allows to move seq_printf(max_fds) outside of task_lock() and join it with the previous seq_printf(). See also the next patch. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Aaron Tomlin <atomlin@redhat.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com>, Cc: Sterling Alexander <stalexan@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Roland McGrath <roland@hack.frob.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-10proc: task_state: read cred->group_info outside of task_lock()Oleg Nesterov
task_state() reads cred->group_info under task_lock() because a long ago it was task_struct->group_info and it was actually protected by task->alloc_lock. Today this task_unlock() after rcu_read_unlock() just adds the confusion, move task_unlock() up. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Aaron Tomlin <atomlin@redhat.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com>, Cc: Sterling Alexander <stalexan@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Roland McGrath <roland@hack.frob.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-10fs/proc.c: use rb_entry_safe() instead of rb_entry()Nicolas Dichtel
Better to use existing macro that rewriting them. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-10procfs: fix error handling of proc_register()Debabrata Banerjee
proc_register() error paths are leaking inodes and directory refcounts. Signed-off-by: Debabrata Banerjee <dbanerje@akamai.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-10fs/proc: use a rb tree for the directory entriesNicolas Dichtel
When a lot of netdevices are created, one of the bottleneck is the creation of proc entries. This serie aims to accelerate this part. The current implementation for the directories in /proc is using a single linked list. This is slow when handling directories with large numbers of entries (eg netdevice-related entries when lots of tunnels are opened). This patch replaces this linked list by a red-black tree. Here are some numbers: dummy30000.batch contains 30 000 times 'link add type dummy'. Before the patch: $ time ip -b dummy30000.batch real 2m31.950s user 0m0.440s sys 2m21.440s $ time rmmod dummy real 1m35.764s user 0m0.000s sys 1m24.088s After the patch: $ time ip -b dummy30000.batch real 2m0.874s user 0m0.448s sys 1m49.720s $ time rmmod dummy real 1m13.988s user 0m0.000s sys 1m1.008s The idea of improving this part was suggested by Thierry Herbelot. [akpm@linux-foundation.org: initialise proc_root.subdir at compile time] Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: David S. Miller <davem@davemloft.net> Cc: Thierry Herbelot <thierry.herbelot@6wind.com>. Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-10mm: fix huge zero page accounting in smaps reportKirill A. Shutemov
As a small zero page, huge zero page should not be accounted in smaps report as normal page. For small pages we rely on vm_normal_page() to filter out zero page, but vm_normal_page() is not designed to handle pmds. We only get here due hackish cast pmd to pte in smaps_pte_range() -- pte and pmd format is not necessary compatible on each and every architecture. Let's add separate codepath to handle pmds. follow_trans_huge_pmd() will detect huge zero page for us. We would need pmd_dirty() helper to do this properly. The patch adds it to THP-enabled architectures which don't yet have one. [akpm@linux-foundation.org: use do_div to fix 32-bit build] Signed-off-by: "Kirill A. Shutemov" <kirill@shutemov.name> Reported-by: Fengguang Wu <fengguang.wu@intel.com> Tested-by: Fengwei Yin <yfw.kernel@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-10fs/char_dev.c: remove pointless assignment from __register_chrdev_region()Jan Kara
At one place we assign major number we found to ret. That assignment is then never used and actually doesn't make any sense given how the code is currently structured (the assignment comes from pre-git times). Just remove it. Coverity id: 1226852. Signed-off-by: Jan Kara <jack@suse.cz> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-10ocfs2: remove unneeded NULL checkDan Carpenter
In commit 1faf289454b9 ("ocfs2_dlm: disallow a domain join if node maps mismatch") we introduced a new earlier NULL check so this one is not needed. Also static checkers complain because we dereference it first and then check for NULL. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>