summaryrefslogtreecommitdiffstats
path: root/fs
AgeCommit message (Collapse)Author
2012-07-31nfs: prevent page allocator recursions with swap over NFS.Mel Gorman
GFP_NOFS is _more_ permissive than GFP_NOIO in that it will initiate IO, just not of any filesystem data. The problem is that previously NOFS was correct because that avoids recursion into the NFS code. With swap-over-NFS, it is no longer correct as swap IO can lead to this recursion. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Rik van Riel <riel@redhat.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: David S. Miller <davem@davemloft.net> Cc: Eric B Munson <emunson@mgebm.net> Cc: Eric Paris <eparis@redhat.com> Cc: James Morris <jmorris@namei.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Christie <michaelc@cs.wisc.edu> Cc: Neil Brown <neilb@suse.de> Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Xiaotian Feng <dfeng@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-31nfs: enable swap on NFSMel Gorman
Implement the new swapfile a_ops for NFS and hook up ->direct_IO. This will set the NFS socket to SOCK_MEMALLOC and run socket reconnect under PF_MEMALLOC as well as reset SOCK_MEMALLOC before engaging the protocol ->connect() method. PF_MEMALLOC should allow the allocation of struct socket and related objects and the early (re)setting of SOCK_MEMALLOC should allow us to receive the packets required for the TCP connection buildup. [jlayton@redhat.com: Restore PF_MEMALLOC task flags in all cases] [dfeng@redhat.com: Fix handling of multiple swap files] [a.p.zijlstra@chello.nl: Original patch] Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Rik van Riel <riel@redhat.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: David S. Miller <davem@davemloft.net> Cc: Eric B Munson <emunson@mgebm.net> Cc: Eric Paris <eparis@redhat.com> Cc: James Morris <jmorris@namei.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Christie <michaelc@cs.wisc.edu> Cc: Neil Brown <neilb@suse.de> Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Xiaotian Feng <dfeng@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-31nfs: disable data cache revalidation for swapfilesMel Gorman
The VM does not like PG_private set on PG_swapcache pages. As suggested by Trond in http://lkml.org/lkml/2006/8/25/348, this patch disables NFS data cache revalidation on swap files. as it does not make sense to have other clients change the file while it is being used as swap. This avoids setting PG_private on swap pages, since there ought to be no further races with invalidate_inode_pages2() to deal with. Since we cannot set PG_private we cannot use page->private which is already used by PG_swapcache pages to store the nfs_page. Thus augment the new nfs_page_find_request logic. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Rik van Riel <riel@redhat.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: David S. Miller <davem@davemloft.net> Cc: Eric B Munson <emunson@mgebm.net> Cc: Eric Paris <eparis@redhat.com> Cc: James Morris <jmorris@namei.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Christie <michaelc@cs.wisc.edu> Cc: Neil Brown <neilb@suse.de> Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Xiaotian Feng <dfeng@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-31nfs: teach the NFS client how to treat PG_swapcache pagesMel Gorman
Replace all relevant occurences of page->index and page->mapping in the NFS client with the new page_file_index() and page_file_mapping() functions. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Rik van Riel <riel@redhat.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: David S. Miller <davem@davemloft.net> Cc: Eric B Munson <emunson@mgebm.net> Cc: Eric Paris <eparis@redhat.com> Cc: James Morris <jmorris@namei.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Christie <michaelc@cs.wisc.edu> Cc: Neil Brown <neilb@suse.de> Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Xiaotian Feng <dfeng@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-31vmscan: remove obsolete shrink_control commentMinchan Kim
09f363c7 ("vmscan: fix shrinker callback bug in fs/super.c") fixed a shrinker callback which was returning -1 when nr_to_scan is zero, which caused excessive slab scanning. But 635697c6 ("vmscan: fix initial shrinker size handling") fixed the problem, again so we can freely return -1 although nr_to_scan is zero. So let's revert 09f363c7 because the comment added in 09f363c7 made an unnecessary rule. Signed-off-by: Minchan Kim <minchan@kernel.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Mikulas Patocka <mpatocka@redhat.com> Cc: Konstantin Khlebnikov <khlebnikov@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-31hugetlb: use mmu_gather instead of a temporary linked list for accumulating ↵Aneesh Kumar K.V
pages Use a mmu_gather instead of a temporary linked list for accumulating pages when we unmap a hugepage range Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: David Rientjes <rientjes@google.com> Cc: Hillf Danton <dhillf@gmail.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-31mm: prepare for removal of obsolete /proc/sys/vm/nr_pdflush_threadsWanpeng Li
Since per-BDI flusher threads were introduced in 2.6, the pdflush mechanism is not used any more. But the old interface exported through /proc/sys/vm/nr_pdflush_threads still exists and is obviously useless. For back-compatibility, printk warning information and return 2 to notify the users that the interface is removed. Signed-off-by: Wanpeng Li <liwp@linux.vnet.ibm.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-30Merge tag 'writeback-proportions' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux Pull writeback updates from Wu Fengguang: "Use time based periods to age the writeback proportions, which can adapt equally well to fast/slow devices." Fix up trivial conflict in comment in fs/sync.c * tag 'writeback-proportions' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux: writeback: Fix some comment errors block: Convert BDI proportion calculations to flexible proportions lib: Fix possible deadlock in flexible proportion code lib: Proportions with flexible period
2012-07-30Merge tag 'nfs-for-3.6-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
Pull NFS client updates from Trond Myklebust: "Features include: - More preparatory patches for modularising NFSv2/v3/v4. Split out the various NFSv2/v3/v4-specific code into separate files - More preparation for the NFSv4 migration code - Ensure that OPEN(O_CREATE) observes the pNFS mds threshold parameters - pNFS fast failover when the data servers are down - Various cleanups and debugging patches" * tag 'nfs-for-3.6-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (67 commits) nfs: fix fl_type tests in NFSv4 code NFS: fix pnfs regression with directio writes NFS: fix pnfs regression with directio reads sunrpc: clnt: Add missing braces nfs: fix stub return type warnings NFS: exit_nfs_v4() shouldn't be an __exit function SUNRPC: Add a missing spin_unlock to gss_mech_list_pseudoflavors NFS: Split out NFS v4 client functions NFS: Split out the NFS v4 filesystem types NFS: Create a single nfs_clone_super() function NFS: Split out NFS v4 server creating code NFS: Initialize the NFS v4 client from init_nfs_v4() NFS: Move the v4 getroot code to nfs4getroot.c NFS: Split out NFS v4 file operations NFS: Initialize v4 sysctls from nfs_init_v4() NFS: Create an init_nfs_v4() function NFS: Split out NFS v4 inode operations NFS: Split out NFS v3 inode operations NFS: Split out NFS v2 inode operations NFS: Clean up nfs4_proc_setclientid() and friends ...
2012-07-30Merge branch 'akpm' (Andrew's patch-bomb)Linus Torvalds
Merge Andrew's first set of patches: "Non-MM patches: - lots of misc bits - tree-wide have_clk() cleanups - quite a lot of printk tweaks. I draw your attention to "printk: convert the format for KERN_<LEVEL> to a 2 byte pattern" which looks a bit scary. But afaict it's solid. - backlight updates - lib/ feature work (notably the addition and use of memweight()) - checkpatch updates - rtc updates - nilfs updates - fatfs updates (partial, still waiting for acks) - kdump, proc, fork, IPC, sysctl, taskstats, pps, etc - new fault-injection feature work" * Merge emailed patches from Andrew Morton <akpm@linux-foundation.org>: (128 commits) drivers/misc/lkdtm.c: fix missing allocation failure check lib/scatterlist: do not re-write gfp_flags in __sg_alloc_table() fault-injection: add tool to run command with failslab or fail_page_alloc fault-injection: add selftests for cpu and memory hotplug powerpc: pSeries reconfig notifier error injection module memory: memory notifier error injection module PM: PM notifier error injection module cpu: rewrite cpu-notifier-error-inject module fault-injection: notifier error injection c/r: fcntl: add F_GETOWNER_UIDS option resource: make sure requested range is included in the root range include/linux/aio.h: cpp->C conversions fs: cachefiles: add support for large files in filesystem caching pps: return PTR_ERR on error in device_create taskstats: check nla_reserve() return sysctl: suppress kmemleak messages ipc: use Kconfig options for __ARCH_WANT_[COMPAT_]IPC_PARSE_VERSION ipc: compat: use signed size_t types for msgsnd and msgrcv ipc: allow compat IPC version field parsing if !ARCH_WANT_OLD_COMPAT_IPC ipc: add COMPAT_SHMLBA support ...
2012-07-30c/r: fcntl: add F_GETOWNER_UIDS optionCyrill Gorcunov
When we restore file descriptors we would like them to look exactly as they were at dumping time. With help of fcntl it's almost possible, the missing snippet is file owners UIDs. To be able to read their values the F_GETOWNER_UIDS is introduced. This option is valid iif CONFIG_CHECKPOINT_RESTORE is turned on, otherwise returning -EINVAL. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Cc: "Serge E. Hallyn" <serge@hallyn.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-30fs: cachefiles: add support for large files in filesystem cachingJustin Lecher
Support the caching of large files. Addresses https://bugzilla.kernel.org/show_bug.cgi?id=31182 Signed-off-by: Justin Lecher <jlec@gentoo.org> Signed-off-by: Suresh Jayaraman <sjayaraman@suse.com> Tested-by: Suresh Jayaraman <sjayaraman@suse.com> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-30proc: do not allow negative offsets on /proc/<pid>/environDjalal Harouni
__mem_open() which is called by both /proc/<pid>/environ and /proc/<pid>/mem ->open() handlers will allow the use of negative offsets. /proc/<pid>/mem has negative offsets but not /proc/<pid>/environ. Clean this by moving the 'force FMODE_UNSIGNED_OFFSET flag' to mem_open() to allow negative offsets only on /proc/<pid>/mem. Signed-off-by: Djalal Harouni <tixxdz@opendz.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Brad Spengler <spender@grsecurity.net> Acked-by: Kees Cook <keescook@chromium.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-30proc: environ_read() make sure offset points to environment address rangeDjalal Harouni
Currently the following offset and environment address range check in environ_read() of /proc/<pid>/environ is buggy: int this_len = mm->env_end - (mm->env_start + src); if (this_len <= 0) break; Large or negative offsets on /proc/<pid>/environ converted to 'unsigned long' may pass this check since '(mm->env_start + src)' can overflow and 'this_len' will be positive. This can turn /proc/<pid>/environ to act like /proc/<pid>/mem since (mm->env_start + src) will point and read from another VMA. There are two fixes here plus some code cleaning: 1) Fix the overflow by checking if the offset that was converted to unsigned long will always point to the [mm->env_start, mm->env_end] address range. 2) Remove the truncation that was made to the result of the check, storing the result in 'int this_len' will alter its value and we can not depend on it. For kernels that have commit b409e578d ("proc: clean up /proc/<pid>/environ handling") which adds the appropriate ptrace check and saves the 'mm' at ->open() time, this is not a security issue. This patch is taken from the grsecurity patch since it was just made available. Signed-off-by: Djalal Harouni <tixxdz@opendz.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Brad Spengler <spender@grsecurity.net> Acked-by: Kees Cook <keescook@chromium.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-30coredump: fix wrong comments on core limits of pipe coredump caseJovi Zhang
In commit 898b374af6f7 ("exec: replace call_usermodehelper_pipe with use of umh init function and resolve limit"), the core limits recursive check value was changed from 0 to 1, but the corresponding comments were not updated. Signed-off-by: Jovi Zhang <bookjovi@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-30fat: refactor shortname parsingSteven J. Magnani
Nearly identical shortname parsing is performed in fat_search_long() and __fat_readdir(). Extract this code into a function that may be called by both. Signed-off-by: Steven J. Magnani <steve@digidescorp.com> Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-30fat: accessors for msdos_dir_entry 'start' fieldsSteven J. Magnani
Simplify code by providing accessor functions for the directory entry start cluster fields. Signed-off-by: Steven J. Magnani <steve@digidescorp.com> Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-30hfsplus: use -ENOMEM when kzalloc() failsNamjae Jeon
Use -ENOMEM return value instead of -EINVAL when kzalloc() fails. Signed-off-by: Namjae Jeon <linkinjeon@gmail.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-30nilfs2: add omitted comments for different structures in driver implementationVyacheslav Dubeyko
Add omitted comments for different structures in driver implementation. Signed-off-by: Vyacheslav Dubeyko <slava@dubeyko.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-30nilfs2: fix deadlock issue between chcp and thaw ioctlsRyusuke Konishi
An fs-thaw ioctl causes deadlock with a chcp or mkcp -s command: chcp D ffff88013870f3d0 0 1325 1324 0x00000004 ... Call Trace: nilfs_transaction_begin+0x11c/0x1a0 [nilfs2] wake_up_bit+0x20/0x20 copy_from_user+0x18/0x30 [nilfs2] nilfs_ioctl_change_cpmode+0x7d/0xcf [nilfs2] nilfs_ioctl+0x252/0x61a [nilfs2] do_page_fault+0x311/0x34c get_unmapped_area+0x132/0x14e do_vfs_ioctl+0x44b/0x490 __set_task_blocked+0x5a/0x61 vm_mmap_pgoff+0x76/0x87 __set_current_blocked+0x30/0x4a sys_ioctl+0x4b/0x6f system_call_fastpath+0x16/0x1b thaw D ffff88013870d890 0 1352 1351 0x00000004 ... Call Trace: rwsem_down_failed_common+0xdb/0x10f call_rwsem_down_write_failed+0x13/0x20 down_write+0x25/0x27 thaw_super+0x13/0x9e do_vfs_ioctl+0x1f5/0x490 vm_mmap_pgoff+0x76/0x87 sys_ioctl+0x4b/0x6f filp_close+0x64/0x6c system_call_fastpath+0x16/0x1b where the thaw ioctl deadlocked at thaw_super() when called while chcp was waiting at nilfs_transaction_begin() called from nilfs_ioctl_change_cpmode(). This deadlock is 100% reproducible. This is because nilfs_ioctl_change_cpmode() first locks sb->s_umount in read mode and then waits for unfreezing in nilfs_transaction_begin(), whereas thaw_super() locks sb->s_umount in write mode. The locking of sb->s_umount here was intended to make snapshot mounts and the downgrade of snapshots to checkpoints exclusive. This fixes the deadlock issue by replacing the sb->s_umount usage in nilfs_ioctl_change_cpmode() with a dedicated mutex which protects snapshot mounts. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp> Tested-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-30nilfs2: fix timing issue between rmcp and chcp ioctlsRyusuke Konishi
The checkpoint deletion ioctl (rmcp ioctl) has potential for breaking snapshot because it is not fully exclusive with checkpoint mode change ioctl (chcp ioctl). The rmcp ioctl first tests if the specified checkpoint is a snapshot or not within nilfs_cpfile_delete_checkpoint function, and then calls nilfs_cpfile_delete_checkpoints function to actually invalidate the checkpoint only if it's not a snapshot. However, the checkpoint can be changed into a snapshot by the chcp ioctl between these two operations. In that case, calling nilfs_cpfile_delete_checkpoints() wrongly invalidates the snapshot, which leads to snapshot list corruption and snapshot count mismatch. This fixes the issue by changing nilfs_cpfile_delete_checkpoints() so that it reconfirms the target checkpoints are snapshot or not. This second check is exclusive with the chcp operation since it is protected by an existing semaphore. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-30nilfs2: remove references to long gone super operationsFernando Luis Vazquez Cao
->delete_inode(), ->write_super_lockfs(), ->unlockfs() are gone so remove references to them in the NTFS code. Noticed while cleaning up the fsfreeze mess. Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-30nilfs2: add omitted comment for ns_mount_state field of the_nilfs structureVyacheslav Dubeyko
Add omitted comment for ns_mount_state field of the_nilfs structure. Signed-off-by: Vyacheslav Dubeyko <slava@dubeyko.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-30minixfs: fix block limit checkVladimir Serbinenko
On minix2 and minix3 usually max_size is 7fffffff and the check in question prohibits creation of last block spanning right before 7fffffff, due to downward rounding during the division. Fix it by using multiplication instead. [akpm@linux-foundation.org: fix up code layout, use local `sb'] Signed-off-by: Vladimir Serbinenko <phcoder@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-30ext4: use memweight()Akinobu Mita
Convert ext4_count_free() to use memweight() instead of table lookup based counting clear bits implementation. This change only affects the code segments enabled by EXT4FS_DEBUG. Note that this memweight() call can't be replaced with a single bitmap_weight() call, although the pointer to the memory area is aligned to long-word boundary. Because the size of the memory area may not be a multiple of BITS_PER_LONG, then it returns wrong value on big-endian architecture. This also includes the following change. - Remove unnecessary map == NULL check in ext4_count_free() which always takes non-null pointer as the memory area. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-30ext3: use memweight()Akinobu Mita
Convert ext3_count_free() to use memweight() instead of table lookup based counting clear bits implementation. This change only affects the code segments enabled by EXT3FS_DEBUG. Note that this memweight() call can't be replaced with a single bitmap_weight() call, although the pointer to the memory area is aligned to long-word boundary. Because the size of the memory area may not be a multiple of BITS_PER_LONG, then it returns wrong value on big-endian architecture. This also includes the following changes. - Remove unnecessary map == NULL check in ext3_count_free() which always takes non-null pointer as the memory area. - Fix printk format warning that only reveals with EXT3FS_DEBUG. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Acked-by: Jan Kara <jack@suse.cz> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-30ext2: use memweight()Akinobu Mita
Convert ext2_count_free() to use memweight() instead of table lookup based counting clear bits implementation. This change only affects the code segments enabled by EXT2FS_DEBUG. Note that this memweight() call can't be replaced with a single bitmap_weight() call, although the pointer to the memory area is aligned to long-word boundary. Because the size of the memory area may not be a multiple of BITS_PER_LONG, then it returns wrong value on big-endian architecture. This also includes the following changes. - Remove unnecessary map == NULL check in ext2_count_free() which always takes non-null pointer as the memory area. - Fix printk format warning that only reveals with EXT2FS_DEBUG. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-30ocfs2: use memweight()Akinobu Mita
Use memweight to count the total number of bits set in memory area. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-30affs: use memweight()Akinobu Mita
Use memweight() to count the total number of bits set in memory area. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-30qnx4fs: use memweight()Akinobu Mita
Use memweight() to count the total number of bits clear in memory area. Note that this memweight() call can't be replaced with a single bitmap_weight() call, although the pointer to the memory area is aligned to long-word boundary. Because the size of the memory area may not be a multiple of BITS_PER_LONG, then it returns wrong value on big-endian architecture. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Acked-by: Anders Larsen <al@alarsen.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-30btrfs: use printk_get_level and printk_skip_level, add __printf, fix falloutJoe Perches
Use the generic printk_get_level() to search a message for a kern_level. Add __printf to verify format and arguments. Fix a few messages that had mismatches in format and arguments. Add #ifdef CONFIG_PRINTK blocks to shrink the object size a bit when not using printk. [akpm@linux-foundation.org: whitespace tweak] Signed-off-by: Joe Perches <joe@perches.com> Cc: Kay Sievers <kay.sievers@vrfy.org> Cc: Chris Mason <chris.mason@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-30coredump: warn about unsafe suid_dumpable / core_pattern comboKees Cook
When suid_dumpable=2, detect unsafe core_pattern settings and warn when they are seen. Signed-off-by: Kees Cook <keescook@chromium.org> Suggested-by: Andrew Morton <akpm@linux-foundation.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Alan Cox <alan@linux.intel.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Doug Ledford <dledford@redhat.com> Cc: Serge Hallyn <serge.hallyn@canonical.com> Cc: James Morris <james.l.morris@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-30fs: make dumpable=2 require fully qualified pathKees Cook
When the suid_dumpable sysctl is set to "2", and there is no core dump pipe defined in the core_pattern sysctl, a local user can cause core files to be written to root-writable directories, potentially with user-controlled content. This means an admin can unknowningly reintroduce a variation of CVE-2006-2451, allowing local users to gain root privileges. $ cat /proc/sys/fs/suid_dumpable 2 $ cat /proc/sys/kernel/core_pattern core $ ulimit -c unlimited $ cd / $ ls -l core ls: cannot access core: No such file or directory $ touch core touch: cannot touch `core': Permission denied $ OHAI="evil-string-here" ping localhost >/dev/null 2>&1 & $ pid=$! $ sleep 1 $ kill -SEGV $pid $ ls -l core -rw------- 1 root kees 458752 Jun 21 11:35 core $ sudo strings core | grep evil OHAI=evil-string-here While cron has been fixed to abort reading a file when there is any parse error, there are still other sensitive directories that will read any file present and skip unparsable lines. Instead of introducing a suid_dumpable=3 mode and breaking all users of mode 2, this only disables the unsafe portion of mode 2 (writing to disk via relative path). Most users of mode 2 (e.g. Chrome OS) already use a core dump pipe handler, so this change will not break them. For the situations where a pipe handler is not defined but mode 2 is still active, crash dumps will only be written to fully qualified paths. If a relative path is defined (e.g. the default "core" pattern), dump attempts will trigger a printk yelling about the lack of a fully qualified path. Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Alan Cox <alan@linux.intel.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Doug Ledford <dledford@redhat.com> Cc: Serge Hallyn <serge.hallyn@canonical.com> Cc: James Morris <james.l.morris@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-30fs/xattr.c:getxattr(): improve handling of allocation failuresSasha Levin
This allocation can be as large as 64k. - Add __GFP_NOWARN so the falied kmalloc() is silent - Fall back to vmalloc() if the kmalloc() failed Signed-off-by: Sasha Levin <levinsasha928@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-30ntfs: remove references to long gone super operations and unimplemented methodsFernando Luis Vazquez Cao
->delete_inode(), ->write_super_lockfs(), ->unlockfs() are gone so remove refereces to them in the NTFS code. Remove unnecessary comments about unimplemented methods while at it (suggested by Christoph Hellwig). Noticed while cleaning up the fsfreeze mess. Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp> Cc: Anton Altaparmakov <anton@tuxera.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-30nfs: fix fl_type tests in NFSv4 codeJeff Layton
fl_type is not a bitmap. Reported-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-07-30NFS: fix pnfs regression with directio writesFred Isaman
Commit 57208fa7e51 "NFS: Create an write_pageio_init() function" did not modify the calls in direct.c, preventing direct io from using pnfs. This reintroduces that capability. Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-07-30NFS: fix pnfs regression with directio readsFred Isaman
Commit 1abb50886af "NFS: Create an read_pageio_init() function" did not modify the call in direct.c, preventing direct io from using pnfs. This reintroduces that capability. Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-07-30nfs: fix stub return type warningsRandy Dunlap
Fix numerous repeated warnings by making the stub function void instead of non-void: fs/nfs/nfs4_fs.h: In function 'nfs4_unregister_sysctl': fs/nfs/nfs4_fs.h:385:1: warning: no return statement in function returning non-void Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-07-30Merge tag 'for-linus-v3.6-rc1' of git://oss.sgi.com/xfs/xfsLinus Torvalds
Pull xfs update from Ben Myers: "Numerous cleanups and several bug fixes. Here are some highlights: - Discontiguous directory buffer support - Inode allocator refactoring - Removal of the IO lock in inode reclaim - Implementation of .update_time - Fix for handling of EOF in xfs_vm_writepage - Fix for races in xfsaild, and idle mode is re-enabled - Fix for a crash in xfs_buf completion handlers on unmount." Fix up trivial conflicts in fs/xfs/{xfs_buf.c,xfs_log.c,xfs_log_priv.h} due to duplicate patches that had already been merged for 3.5. * tag 'for-linus-v3.6-rc1' of git://oss.sgi.com/xfs/xfs: (44 commits) xfs: wait for the write the superblock on unmount xfs: re-enable xfsaild idle mode and fix associated races xfs: remove iolock lock classes xfs: avoid the iolock in xfs_free_eofblocks for evicted inodes xfs: do not take the iolock in xfs_inactive xfs: remove xfs_inactive_attrs xfs: clean up xfs_inactive xfs: do not read the AGI buffer in xfs_dialloc until nessecary xfs: refactor xfs_ialloc_ag_select xfs: add a short cut to xfs_dialloc for the non-NULL agbp case xfs: remove the alloc_done argument to xfs_dialloc xfs: split xfs_dialloc xfs: remove xfs_ialloc_find_free Prefix IO_XX flags with XFS_IO_XX to avoid namespace colision. xfs: remove xfs_inotobp xfs: merge xfs_itobp into xfs_imap_to_bp xfs: handle EOF correctly in xfs_vm_writepage xfs: implement ->update_time xfs: fix comment typo of struct xfs_da_blkinfo. xfs: do not call xfs_bdstrat_cb in xfs_buf_iodone_callbacks ...
2012-07-29xfs: wait for the write the superblock on unmountMark Tinguely
v2: Add the xfs_buf_lock to xfs_quiesce_attr(). Add explaination why xfs_buf_lock() is used to wait for write. xfs_wait_buftarg() does not wait for the completion of the write of the uncached superblock. This write can race with the shutdown of the log and causes a panic if the write does not win the race. During the log write, xfsaild_push() will lock the buffer and set the XBF_ASYNC flag. Because the XBF_FLAG is set, complete() is not performed on the buffer's iowait entry, we cannot call xfs_buf_iowait() to wait for the write to complete. The buffer's lock is held until the write is complete, so we can block on a xfs_buf_lock() request to be notified that the write is complete. Signed-off-by: Mark Tinguely <tinguely@sgi.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>
2012-07-29xfs: re-enable xfsaild idle mode and fix associated racesBrian Foster
xfsaild idle mode logic currently leads to a couple hangs: 1.) If xfsaild is rescheduled in during an incremental scan (i.e., tout != 0) and the target has been updated since the previous run, we can hit the new target and go into idle mode with a still populated ail. 2.) A wake up is only issued when the target is pushed forward. The wake up can race with xfsaild if it is currently in the process of entering idle mode, causing future wake up events to be lost. These hangs have been reproduced and verified as fixed by running xfstests 273 in a loop on a slightly modified upstream kernel. The kernel is modified to re-enable idle mode as previously implemented (when count == 0) and with a revert of commit 670ce93f, which includes performance improvements that make this harder to reproduce. The solution, the algorithm for which has been outlined by Dave Chinner, is to modify xfsaild to enter idle mode only when the ail is empty and the push target has not been moved forward since the last push. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>
2012-07-29xfs: remove iolock lock classesChristoph Hellwig
Content-Disposition: inline; filename=xfs-remove-iolock-classes Now that we never take the iolock during inode reclaim we don't need to play games with lock classes. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Rich Johnston <rjohnston@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2012-07-29xfs: avoid the iolock in xfs_free_eofblocks for evicted inodesChristoph Hellwig
Same rational as the last patch - these inodes are not reachable, so don't bother with locking. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Rich Johnston <rjohnston@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2012-07-29xfs: do not take the iolock in xfs_inactiveChristoph Hellwig
An inode that enters xfs_inactive has been removed from all global lists but the inode hash, and can't be recycled in xfs_iget before it has been marked reclaimable. Thus taking the iolock in here is not nessecary at all, and given the amount of lockdep false positives it has triggered already I'd rather remove the locking. The only change outside of xfs_inactive is relaxing an assert in xfs_itruncate_extents. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Rich Johnston <rjohnston@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2012-07-29xfs: remove xfs_inactive_attrsChristoph Hellwig
Remove this helper as the code flow is a lot more obvious when it gets merged into its only caller. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Rich Johnston <rjohnston@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2012-07-29xfs: clean up xfs_inactiveChristoph Hellwig
The code to reserve log space and join the inode to the transaction is common for all cases, so don't duplicate it. Also remove the trivial xfs_inactive_symlink_local helper which can simply be opencode now. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Rich Johnston <rjohnston@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2012-07-29xfs: do not read the AGI buffer in xfs_dialloc until nessecaryChristoph Hellwig
Refactor the AG selection loop in xfs_dialloc to operate on the in-memory perag data as much as possible. We only read the AGI buffer once we have selected an AG to allocate inodes now instead of for every AG considered. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2012-07-29xfs: refactor xfs_ialloc_ag_selectChristoph Hellwig
Loop over the in-core perag structures and prefer using pagi_freecount over going out to the AGI buffer where possible. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2012-07-29xfs: add a short cut to xfs_dialloc for the non-NULL agbp caseChristoph Hellwig
In this case we already have selected an AG and know it has free space beause the buffer lock never got released. Jump directly into xfs_dialloc_ag and short cut the AG selection loop. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>