Age | Commit message (Collapse) | Author |
|
For architecture dependent compat syscalls in common code an architecture
must define something like __ARCH_WANT_<WHATEVER> if it wants to use the
code.
This however is not true for compat_sys_getdents64 for which architectures
must define __ARCH_OMIT_COMPAT_SYS_GETDENTS64 if they do not want the code.
This leads to the situation where all architectures, except mips, get the
compat code but only x86_64, arm64 and the generic syscall architectures
actually use it.
So invert the logic, so that architectures actively must do something to
get the compat code.
This way a couple of architectures get rid of otherwise dead code.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
|
|
Add ELF_HWCAP2 to the set of auxv entries that is passed to
a 32-bit ELF program running in 32-bit compat mode under a
64-bit kernel.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
This patch fixes a long standing issue in mapping the journal
extents. Most journals will consist of only a single extent,
and although the cache took account of that by merging extents,
it did not actually map large extents, but instead was doing a
block by block mapping. Since the journal was only being mapped
on mount, this was not normally noticeable.
With the updated code, it is now possible to use the same extent
mapping system during journal recovery (which will be added in a
later patch). This will allow checking of the integrity of the
journal before any reply of the journal content is attempted. For
this reason the code is moving to bmap.c, since it will be used
more widely in due course.
An exercise left for the reader is to compare the new function
gfs2_map_journal_extents() with gfs2_write_alloc_required()
Additionally, should there be a failure, the error reporting is
also updated to show more detail about what went wrong.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
|
We know "fatal" is zero here. The code can be simplified a bit by
assigning directly.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jan Kara <jack@suse.cz>
|
|
We already know "ret" is zero so there is no need to do:
if (!ret)
ret = err;
We can just assign ret directly instead.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jan Kara <jack@suse.cz>
|
|
Mark function as static in ext3/xattr_security.c because it is not used
outside this file.
This eliminates the following warning in ext3/xattr_security.c:
fs/ext3/xattr_security.c:46:5: warning: no previous prototype for ‘ext3_initxattrs’ [-Wmissing-prototypes]
Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Jan Kara <jack@suse.cz>
|
|
Mark function as static in ext3/dir.c because it is not used outside
this file.
This also eliminates the following warning in ext3/dir.c:
fs/ext3/dir.c:278:8: warning: no previous prototype for ‘ext3_dir_llseek’ [-Wmissing-prototypes]
Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Jan Kara <jack@suse.cz>
|
|
Mark function as static in ext2/xattr_security.c because it is not
used outside this file.
This also elimiantes the following warning in ext2/xattr_security.c:
fs/ext2/xattr_security.c:45:5: warning: no previous prototype for ‘ext2_initxattrs’ [-Wmissing-prototypes]
Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Jan Kara <jack@suse.cz>
|
|
init_inodecache is only called by __init init_ext3_fs.
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Jan Kara <jack@suse.cz>
|
|
init_inodecache is only called by __init init_ext2_fs.
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Jan Kara <jack@suse.cz>
|
|
init_inodecache is only called by __init init_udf_fs.
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Jan Kara <jack@suse.cz>
|
|
Both affs and isofs check for blocksize integrity during
parse_options.Do the same thing for udf.
Valid values : 512, 1024, 2048 or 4096 bytes.
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Jan Kara <jack@suse.cz>
|
|
We want the fixes in here.
|
|
The clean-up in commit 36281caa839f ended up removing a NULL pointer check
that is needed in order to prevent an Oops in
nfs_async_inode_return_delegation().
Reported-by: "Yan, Zheng" <zheng.z.yan@intel.com>
Link: http://lkml.kernel.org/r/5313E9F6.2020405@intel.com
Fixes: 36281caa839f (NFSv4: Further clean-ups of delegation stateid validation)
Cc: stable@vger.kernel.org # 3.4+
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
This patch fixes performance regression of dbench reported by
Alex <hbx7d@yandex.com>.
This issue was revealed by Phoronix tests results:
http://www.phoronix.com/scan.php?page=article&item=linux_314_ssdfs&num=2
It turns out that we need to assign WRITE_SYNC to the node writes, if
fsync is triggered.
The performance numbers are like below, which is measured by Alex.
1. 355MB/s ext4
2. 225MB/s f2fs : WRITE for node writes
3. 525MB/s f2fs : WRITE_SYNC for node writes
Reported-And-Tested-by: Alex <hbx7d@yandex.com>.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull sysfs fix from Greg KH:
"Here is a single sysfs fix for 3.14-rc5. It fixes a reported problem
with the namespace code in sysfs"
* tag 'driver-core-3.14-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
sysfs: fix namespace refcnt leak
|
|
nfs4_release_lockowner needs to set the rpc_message reply to point to
the nfs4_sequence_res in order to avoid another Oopsable situation
in nfs41_assign_slot.
Fixes: fbd4bfd1d9d21 (NFS: Add nfs4_sequence calls for RELEASE_LOCKOWNER)
Cc: stable@vger.kernel.org # 3.12+
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
The rfc1002 length actually includes a type byte, which we aren't
masking off. In most cases, it's not a problem since the
RFC1002_SESSION_MESSAGE type is 0, but when doing a RFC1002 session
establishment, the type is non-zero and that throws off the returned
length.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Tested-by: Sachin Prabhu <sprabhu@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu
Pull RCU updates from Paul E. McKenney:
* Update RCU documentation. These were posted to LKML at
https://lkml.org/lkml/2014/2/17/555.
* Miscellaneous fixes. These were posted to LKML at
https://lkml.org/lkml/2014/2/17/530. Note that two of these
are RCU changes to other maintainer's trees: add1f0995454
(fs) and 8857563b819b (notifer), both of which substitute
rcu_access_pointer() for rcu_dereference_raw().
* Real-time latency fixes. These were posted to LKML at
https://lkml.org/lkml/2014/2/17/544.
* Torture-test changes, including refactoring of rcutorture
and introduction of a vestigial locktorture. These were posted
to LKML at https://lkml.org/lkml/2014/2/17/599.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
We should de-account dirty counters for page when redirty in ->writepage().
Wu Fengguang described in 'commit 971767caf632190f77a40b4011c19948232eed75':
"writeback: fix dirtied pages accounting on redirty
De-account the accumulative dirty counters on page redirty.
Page redirties (very common in ext4) will introduce mismatch between
counters (a) and (b)
a) NR_DIRTIED, BDI_DIRTIED, tsk->nr_dirtied
b) NR_WRITTEN, BDI_WRITTEN
This will introduce systematic errors in balanced_rate and result in
dirty page position errors (ie. the dirty pages are no longer balanced
around the global/bdi setpoints)."
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull filesystem fixes from Jan Kara:
"Notification, writeback, udf, quota fixes
The notification patches are (with one exception) a fallout of my
fsnotify rework which went into -rc1 (I've extented LTP to cover these
cornercases to avoid similar breakage in future).
The UDF patch is a nasty data corruption Al has recently reported,
the revert of the writeback patch is due to possibility of violating
sync(2) guarantees, and a quota bug can lead to corruption of quota
files in ocfs2"
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
fsnotify: Allocate overflow events with proper type
fanotify: Handle overflow in case of permission events
fsnotify: Fix detection whether overflow event is queued
Revert "writeback: do not sync data dirtied after sync start"
quota: Fix race between dqput() and dquot_scan_active()
udf: Fix data corruption on file type conversion
inotify: Fix reporting of cookies for inotify events
|
|
Use kzalloc and __vmalloc __GFP_ZERO for clean sd_quota_bitmap allocation.
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
|
This patch use existing macro F2FS_INODE/NEXT_FREE_BLKADDR to clean up some
codes.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
|
|
If there are multi segments in one section, we will read those SSA blocks which
have contiguous address one by one in f2fs_gc. It may lost performance, let's
read ahead SSA blocks by merge multi read request.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
|
|
This patch adds an sysfs entry to control dir_level used by the large directory.
The description of this entry is:
dir_level This parameter controls the directory level to
support large directory. If a directory has a
number of files, it can reduce the file lookup
latency by increasing this dir_level value.
Otherwise, it needs to decrease this value to
reduce the space overhead. The default value is 0.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
|
|
This patch introduces an i_dir_level field to support large directory.
Previously, f2fs maintains multi-level hash tables to find a dentry quickly
from a bunch of chiild dentries in a directory, and the hash tables consist of
the following tree structure as below.
In Documentation/filesystems/f2fs.txt,
----------------------
A : bucket
B : block
N : MAX_DIR_HASH_DEPTH
----------------------
level #0 | A(2B)
|
level #1 | A(2B) - A(2B)
|
level #2 | A(2B) - A(2B) - A(2B) - A(2B)
. | . . . .
level #N/2 | A(2B) - A(2B) - A(2B) - A(2B) - A(2B) - ... - A(2B)
. | . . . .
level #N | A(4B) - A(4B) - A(4B) - A(4B) - A(4B) - ... - A(4B)
But, if we can guess that a directory will handle a number of child files,
we don't need to traverse the tree from level #0 to #N all the time.
Since the lower level tables contain relatively small number of dentries,
the miss ratio of the target dentry is likely to be high.
In order to avoid that, we can configure the hash tables sparsely from level #0
like this.
level #0 | A(2B) - A(2B) - A(2B) - A(2B)
level #1 | A(2B) - A(2B) - A(2B) - A(2B) - A(2B) - ... - A(2B)
. | . . . .
level #N/2 | A(2B) - A(2B) - A(2B) - A(2B) - A(2B) - ... - A(2B)
. | . . . .
level #N | A(4B) - A(4B) - A(4B) - A(4B) - A(4B) - ... - A(4B)
With this structure, we can skip the ineffective tree searches in lower level
hash tables.
This patch adds just a facility for this by introducing i_dir_level in
f2fs_inode.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
|
|
It turns out that a bit operation like find_next_bit is not always fast enough
for f2fs_find_entry.
Instead, it is pretty much simple and fast to traverse each dentries.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
|
|
The change to add the IO lock to protect the directory extent map
during readdir operations has cause lockdep to have a heart attack
as it now sees a different locking order on inodes w.r.t. the
mmap_sem because readdir has a different ordering to write().
Add a new lockdep class for directory inodes to avoid this false
positive.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
|
The struct xfs_da_args used to pass directory/attribute operation
information to the lower layers is 128 bytes in size and is
allocated on the stack. Dynamically allocate them to reduce the
stack footprint of directory operations.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
|
Log forces can occur deep in the call chain when we have relatively
little stack free. Log forces can also happen at close to the call
chain leaves (e.g. xfs_buf_lock()) and hence we can trigger IO from
places where we really don't want to add more stack overhead.
This stack overhead occurs because log forces do foreground CIL
pushes (xlog_cil_push_foreground()) rather than waking the
background push wq and waiting for the for the push to complete.
This foreground push was done to avoid confusing the CFQ Io
scheduler when fsync()s were issued, as it has trouble dealing with
dependent IOs being issued from different process contexts.
Avoiding blowing the stack is much more critical than performance
optimisations for CFQ, especially as we've been recommending against
the use of CFQ for XFS since 3.2 kernels were release because of
it's problems with multi-threaded IO workloads.
Hence convert xlog_cil_push_foreground() to move the push work
to the CIL workqueue. We already do the waiting for the push to
complete in xlog_cil_force_lsn(), so there's nothing else we need to
modify to make this work.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
|
Modify all read & write verifiers to differentiate
between CRC errors and other inconsistencies.
This sets the appropriate error number on bp->b_error,
and then calls xfs_verifier_error() if something went
wrong. That function will issue the appropriate message
to the user.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
|
xfs_error_report used to just print the hex address of the caller;
%pF will give us something more human-readable.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Jie Liu <jeff.liu@oracle.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
|
We want to distinguish between corruption, CRC errors,
etc. In addition, the full stack trace on verifier errors
seems less than helpful; it looks more like an oops than
corruption.
Create a new function to specifically alert the user to
verifier errors, which can differentiate between
EFSCORRUPTED and CRC mismatches. It doesn't dump stack
unless the xfs error level is turned up high.
Define a new error message (EFSBADCRC) to clearly identify
CRC errors. (Defined to EBADMSG, bad message)
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
|
Many/most callers of xfs_update_cksum() pass bp->b_addr and
BBTOB(bp->b_length) as the first 2 args. Add a helper
which can just accept the bp and the crc offset, and work
it out on its own, for brevity.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
|
Many/most callers of xfs_verify_cksum() pass bp->b_addr and
BBTOB(bp->b_length) as the first 2 args. Add a helper
which can just accept the bp and the crc offset, and work
it out on its own, for brevity.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
|
Some calls to crc functions used useful #defines,
others used awkward offsetof() constructs.
Switch them all to #define to make things a bit cleaner.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
|
Most write verifiers don't update CRCs after the verifier
has failed and the buffer has been marked in error. These
two didn't, but should.
Add returns to the verifier failure block, since the buffer
won't be written anyway.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Jie Liu <jeff.liu@oracle.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
|
As mount() and kill_sb() is not a one-to-one match, we shoudn't get
ns refcnt unconditionally in sysfs_mount(), and instead we should
get the refcnt only when kernfs_mount() allocated a new superblock.
v2:
- Changed the name of the new argument, suggested by Tejun.
- Made the argument optional, suggested by Tejun.
v3:
- Make the new argument as second-to-last arg, suggested by Tejun.
Signed-off-by: Li Zefan <lizefan@huawei.com>
Acked-by: Tejun Heo <tj@kernel.org>
---
fs/kernfs/mount.c | 8 +++++++-
fs/sysfs/mount.c | 5 +++--
include/linux/kernfs.h | 9 +++++----
3 files changed, 15 insertions(+), 7 deletions(-)
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
By reordering some of the assignments in gfs2_log_flush() it
is possible to remove one of the "if" statements as it can be
merged with one higher up the function.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
|
Commit 7053aee26a35 "fsnotify: do not share events between notification
groups" used overflow event statically allocated in a group with the
size of the generic notification event. This causes problems because
some code looks at type specific parts of event structure and gets
confused by a random data it sees there and causes crashes.
Fix the problem by allocating overflow event with type corresponding to
the group type so code cannot get confused.
Signed-off-by: Jan Kara <jack@suse.cz>
|
|
If the event queue overflows when we are handling permission event, we
will never get response from userspace. So we must avoid waiting for it.
Change fsnotify_add_notify_event() to return whether overflow has
happened so that we can detect it in fanotify_handle_event() and act
accordingly.
Signed-off-by: Jan Kara <jack@suse.cz>
|
|
Currently we didn't initialize event's list head when we removed it from
the event list. Thus a detection whether overflow event is already
queued wasn't working. Fix it by always initializing the list head when
deleting event from a list.
Signed-off-by: Jan Kara <jack@suse.cz>
|
|
This patch moves the dereference of "buffer" after the check for NULL.
The only place which passes a NULL parameter is gfs2_set_acl().
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
|
Now we have a master transaction into which other transactions
are merged, the accounting can be done using this master
transaction. We no longer require the superblock fields which
were being used for this function.
In addition, this allows for a clean up in calc_reserved()
making it rather easier understand. Also, by reducing the
number of variables used to track the buffers being added
and removed from the journal, a number of error checks are
now no longer required.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
|
Over time, we hope to be able to improve the concurrency available
in the log code. This is one small step towards that, by moving
the buffer lists from the super block, and into the transaction
structure, so that each transaction builds its own buffer lists.
At transaction commit time, the buffer lists are merged into
the currently accumulating transaction. That transaction then
is passed into the before and after commit functions at journal
flush time. Thus there should be no change in overall behaviour
yet.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
|
The stat_show is just to show the current status of f2fs.
So, we can remove all the there-in locks.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
|
|
This patch introduces a radix tree for the list of free_nids, which enhances
the performance on free nid management.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
|
|
Introduce help macro on_build_free_nids() which just uses build_lock
to judge whether the building free nid is going, so that we can remove
the on_build_free_nids field from f2fs_sb_info.
Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
[Jaegeuk Kim: remove an unnecessary white line removal]
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
|
|
The nat cache entry maintains a status whether it is checkpointed or not.
So, if a new cache entry is loaded from the last checkpoint,
nat_entry->checkpointed should be true.
If the cache entry is modified as being dirty, nat_entry->checkpoint should
be false.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
|
|
At the end of the recovery procedure, write_checkpoint is called and updates
the cp count which is managed by f2fs stat.
But, previously build_stat() is called after the recovery procedure, which
results in:
BUG: unable to handle kernel NULL pointer dereference at 000000000000012c
IP: [<ffffffffa03b1030>] write_checkpoint+0x720/0xbc0 [f2fs]
Call Trace:
[<ffffffff810a6b44>] ? mark_held_locks+0x74/0x140
[<ffffffff8109a3e0>] ? __init_waitqueue_head+0x60/0x60
[<ffffffffa03bf036>] recover_fsync_data+0x656/0xf20 [f2fs]
[<ffffffff812ee3eb>] ? security_d_instantiate+0x1b/0x30
[<ffffffffa03aeb4d>] f2fs_fill_super+0x94d/0xa00 [f2fs]
[<ffffffff811a9825>] mount_bdev+0x1a5/0x1f0
[<ffffffff8114915e>] ? __get_free_pages+0xe/0x40
[<ffffffffa03ae200>] ? f2fs_remount+0x130/0x130 [f2fs]
[<ffffffffa03aa575>] f2fs_mount+0x15/0x20 [f2fs]
[<ffffffff811aa713>] mount_fs+0x43/0x1b0
[<ffffffff811c7124>] vfs_kern_mount+0x74/0x160
[<ffffffff811c5cb1>] ? __get_fs_type+0x51/0x60
[<ffffffff811c9727>] do_mount+0x237/0xb50
[<ffffffff811c936a>] ? copy_mount_options+0x3a/0x170
So, this patche changes the order of recovery_fsync_data() and
f2fs_build_stats().
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
|