Age | Commit message (Collapse) | Author |
|
When xfs_growfs_data_private() is updating backup superblocks,
it bails out on the first error encountered, whether reading or
writing:
* If we get an error writing out the alternate superblocks,
* just issue a warning and continue. The real work is
* already done and committed.
This can cause a problem later during repair, because repair
looks at all superblocks, and picks the most prevalent one
as correct. If we bail out early in the backup superblock
loop, we can end up with more "bad" matching superblocks than
good, and a post-growfs repair may revert the filesystem to
the old geometry.
With the combination of superblock verifiers and old bugs,
we're more likely to encounter read errors due to verification.
And perhaps even worse, we don't even properly write any of the
newly-added superblocks in the new AGs.
Even with this change, growfs will still say:
xfs_growfs: XFS_IOC_FSGROWFSDATA xfsctl failed: Structure needs cleaning
data blocks changed from 319815680 to 335216640
which might be confusing to the user, but it at least communicates
that something has gone wrong, and dmesg will probably highlight
the need for an xfs_repair.
And this is still best-effort; if verifiers fail on more than
half the backup supers, they may still "win" - but that's probably
best left to repair to more gracefully handle by doing its own
strict verification as part of the backup super "voting."
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Acked-by: Dave Chinner <david@fromorbit.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
|
If we get EWRONGFS due to probing of non-xfs filesystems,
there's no need to issue the scary corruption error and backtrace.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
|
__xfs_printk adds its own "\n". Having it in the original string
leads to unintentional blank lines from these messages.
Most format strings have no newline, but a few do, leading to
i.e.:
[ 7347.119911] XFS (sdb2): Access to block zero in inode 132 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 1a05
[ 7347.119911]
[ 7347.119919] XFS (sdb2): Access to block zero in inode 132 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 1a05
[ 7347.119919]
Fix them all.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
|
Recent analysis of a deadlocked XFS filesystem from a kernel
crash dump indicated that the filesystem was stuck waiting for log
space. The short story of the hang on the RHEL6 kernel is this:
- the tail of the log is pinned by an inode
- the inode has been pushed by the xfsaild
- the inode has been flushed to it's backing buffer and is
currently flush locked and hence waiting for backing
buffer IO to complete and remove it from the AIL
- the backing buffer is marked for write - it is on the
delayed write queue
- the inode buffer has been modified directly and logged
recently due to unlinked inode list modification
- the backing buffer is pinned in memory as it is in the
active CIL context.
- the xfsbufd won't start buffer writeback because it is
pinned
- xfssyncd won't force the log because it sees the log as
needing to be covered and hence wants to issue a dummy
transaction to move the log covering state machine along.
Hence there is no trigger to force the CIL to the log and hence
unpin the inode buffer and therefore complete the inode IO, remove
it from the AIL and hence move the tail of the log along, allowing
transactions to start again.
Mainline kernels also have the same deadlock, though the signature
is slightly different - the inode buffer never reaches the delayed
write lists because xfs_buf_item_push() sees that it is pinned and
hence never adds it to the delayed write list that the xfsaild
flushes.
There are two possible solutions here. The first is to simply force
the log before trying to cover the log and so ensure that the CIL is
emptied before we try to reserve space for the dummy transaction in
the xfs_log_worker(). While this might work most of the time, it is
still racy and is no guarantee that we don't get stuck in
xfs_trans_reserve waiting for log space to come free. Hence it's not
the best way to solve the problem.
The second solution is to modify xfs_log_need_covered() to be aware
of the CIL. We only should be attempting to cover the log if there
is no current activity in the log - covering the log is the process
of ensuring that the head and tail in the log on disk are identical
(i.e. the log is clean and at idle). Hence, by definition, if there
are items in the CIL then the log is not at idle and so we don't
need to attempt to cover it.
When we don't need to cover the log because it is active or idle, we
issue a log force from xfs_log_worker() - if the log is idle, then
this does nothing. However, if the log is active due to there being
items in the CIL, it will force the items in the CIL to the log and
unpin them.
In the case of the above deadlock scenario, instead of
xfs_log_worker() getting stuck in xfs_trans_reserve() attempting to
cover the log, it will instead force the log, thereby unpinning the
inode buffer, allowing IO to be issued and complete and hence
removing the inode that was pinning the tail of the log from the
AIL. At that point, everything will start moving along again. i.e.
the xfs_log_worker turns back into a watchdog that can alleviate
deadlocks based around pinned items that prevent the tail of the log
from being moved...
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
|
The xfs_inactive() return value is meaningless. Turn xfs_inactive()
into a void function and clean up the error handling appropriately.
Kill the VN_INACTIVE_[NO]CACHE directives as they are not relevant
to Linux.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
|
Push the inode free work performed during xfs_inactive() down into
a new xfs_inactive_ifree() helper. This clears xfs_inactive() from
all inode locking and transaction management more directly
associated with freeing the inode xattrs, extents and the inode
itself.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
|
Create the new xfs_inactive_truncate() function to handle the
truncate portion of xfs_inactive(). Push the locking and
transaction management into the new function.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
|
Push down the transaction management for remote symlinks from
xfs_inactive() down to xfs_inactive_symlink_rmt(). The latter is
cleaned up to avoid transaction management intended for the
calling context (i.e., trans duplication, reservation, item
attachment).
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
|
Add the inode type directory type support to XFS_IOC_FSGEOM
so that xfs_repair/xfs_info knows if the superblock v4 filesystem
enabled the feature.
Signed-off-by: Mark Tinguely <tinguely@sgi.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
|
XFS never calls mark_inode_bad or iget_failed, so it will never see a
bad inode. Remove all checks for is_bad_inode because they are
unnecessary.
Signed-off-by: Ben Myers <bpm@sgi.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
|
|
At xfs_iext_realloc_direct(), the new_size is changed by adding
if_bytes if originally the extent records are stored at the inline
extent buffer, and we have to switch from it to a direct extent
list for those new allocated extents, this is wrong. e.g,
Create a file with three extents which was showing as following,
xfs_io -f -c "truncate 100m" /xfs/testme
for i in $(seq 0 5 10); do
offset=$(($i * $((1 << 20))))
xfs_io -c "pwrite $offset 1m" /xfs/testme
done
Inline
------
irec: if_bytes bytes_diff new_size
1st 0 16 16
2nd 16 16 32
Switching
--------- rnew_size
3rd 32 16 48 + 32 = 80 roundup=128
In this case, the desired value of new_size should be 48, and then
it will be roundup to 64 and be assigned to rnew_size.
However, this issue has been covered by resetting the if_bytes to
the new_size which is calculated at the begnning of xfs_iext_add()
before leaving out this function, and in turn make the rnew_size
correctly again. Hence, this can not be detected via xfstestes.
This patch fix above problem and revise the new_size comments at
xfs_iext_realloc_direct() to make it more readable. Also, fix the
comments while switching from the inline extent buffer to a direct
extent list to reflect this change.
Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
|
Get rid of function variable count from xfs_iomap_write_allocate() as
it is unused.
Additionally, checkpatch warn me of the following for this change:
WARNING: extern prototypes should be avoided in .h files
+extern int xfs_iomap_write_allocate(struct xfs_inode *, xfs_off_t,
So this patch also remove all extern function prototypes at xfs_iomap.h
to suppress it to make this code style in consistent manner in this file.
Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
|
This fixes a build failure caused by calling the free() function which
does not exist in the Linux kernel.
Signed-off-by: Thierry Reding <treding@nvidia.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
|
Free the memory in error path of xlog_recover_add_to_trans().
Normally this memory is freed in recovery pass2, but is leaked
in the error path.
Signed-off-by: Mark Tinguely <tinguely@sgi.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
|
The determination of whether a directory entry contains a dtype
field originally was dependent on the filesystem having CRCs
enabled. This meant that the format for dtype beign enabled could be
determined by checking the directory block magic number rather than
doing a feature bit check. This was useful in that it meant that we
didn't need to pass a struct xfs_mount around to functions that
were already supplied with a directory block header.
Unfortunately, the introduction of dtype fields into the v4
structure via a feature bit meant this "use the directory block
magic number" method of discriminating the dirent entry sizes is
broken. Hence we need to convert the places that use magic number
checks to use feature bit checks so that they work correctly and not
by chance.
The current code works on v4 filesystems only because the dirent
size roundup covers the extra byte needed by the dtype field in the
places where this problem occurs.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
|
Michael Semon reported that xfs/299 generated this lockdep warning:
=============================================
[ INFO: possible recursive locking detected ]
3.12.0-rc2+ #2 Not tainted
---------------------------------------------
touch/21072 is trying to acquire lock:
(&xfs_dquot_other_class){+.+...}, at: [<c12902fb>] xfs_trans_dqlockedjoin+0x57/0x64
but task is already holding lock:
(&xfs_dquot_other_class){+.+...}, at: [<c12902fb>] xfs_trans_dqlockedjoin+0x57/0x64
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&xfs_dquot_other_class);
lock(&xfs_dquot_other_class);
*** DEADLOCK ***
May be due to missing lock nesting notation
7 locks held by touch/21072:
#0: (sb_writers#10){++++.+}, at: [<c11185b6>] mnt_want_write+0x1e/0x3e
#1: (&type->i_mutex_dir_key#4){+.+.+.}, at: [<c11078ee>] do_last+0x245/0xe40
#2: (sb_internal#2){++++.+}, at: [<c122c9e0>] xfs_trans_alloc+0x1f/0x35
#3: (&(&ip->i_lock)->mr_lock/1){+.+...}, at: [<c126cd1b>] xfs_ilock+0x100/0x1f1
#4: (&(&ip->i_lock)->mr_lock){++++-.}, at: [<c126cf52>] xfs_ilock_nowait+0x105/0x22f
#5: (&dqp->q_qlock){+.+...}, at: [<c12902fb>] xfs_trans_dqlockedjoin+0x57/0x64
#6: (&xfs_dquot_other_class){+.+...}, at: [<c12902fb>] xfs_trans_dqlockedjoin+0x57/0x64
The lockdep annotation for dquot lock nesting only understands
locking for user and "other" dquots, not user, group and quota
dquots. Fix the annotations to match the locking heirarchy we now
have.
Reported-by: Michael L. Semon <mlsemon35@gmail.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
|
Commit f5ea1100 cleans up the disk to host conversions for
node directory entries, but because a variable is reused in
xfs_node_toosmall() the next node is not correctly found.
If the original node is small enough (<= 3/8 of the node size),
this change may incorrectly cause a node collapse when it should
not. That will cause an assert in xfstest generic/319:
Assertion failed: first <= last && last < BBTOB(bp->b_length),
file: /root/newest/xfs/fs/xfs/xfs_trans_buf.c, line: 569
Keep the original node header to get the correct forward node.
(When a node is considered for a merge with a sibling, it overwrites the
sibling pointers of the original incore nodehdr with the sibling's
pointers. This leads to loop considering the original node as a merge
candidate with itself in the second pass, and so it incorrectly
determines a merge should occur.)
Signed-off-by: Mark Tinguely <tinguely@sgi.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
[v3: added Dave Chinner's (slightly modified) suggestion to the commit header,
cleaned up whitespace. -bpm]
|
|
After a fair number of xfstests runs, xfs/182 started to fail
regularly with a corrupted directory - a directory read verifier was
failing after recovery because it found a block with a XARM magic
number (remote attribute block) rather than a directory data block.
The first time I saw this repeated failure I did /something/ and the
problem went away, so I was never able to find the underlying
problem. Test xfs/182 failed again today, and I found the root
cause before I did /something else/ that made it go away.
Tracing indicated that the block in question was being correctly
logged, the log was being flushed by sync, but the buffer was not
being written back before the shutdown occurred. Tracing also
indicated that log recovery was also reading the block, but then
never writing it before log recovery invalidated the cache,
indicating that it was not modified by log recovery.
More detailed analysis of the corpse indicated that the filesystem
had a uuid of "a4131074-1872-4cac-9323-2229adbcb886" but the XARM
block had a uuid of "8f32f043-c3c9-e7f8-f947-4e7f989c05d3", which
indicated it was a block from an older filesystem. The reason that
log recovery didn't replay it was that the LSN in the XARM block was
larger than the LSN of the transaction being replayed, and so the
block was not overwritten by log recovery.
Hence, log recovery cant blindly trust the magic number and LSN in
the block - it must verify that it belongs to the filesystem being
recovered before using the LSN. i.e. if the UUIDs don't match, we
need to unconditionally recovery the change held in the log.
This patch was first tested on a block device that was repeatedly
causing xfs/182 to fail with the same failure on the same block with
the same directory read corruption signature (i.e. XARM block). It
did not fail, and hasn't failed since.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
|
It uses a kernel internal structure in it's definition rather than
the user visible structure that is passed to the ioctl.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
|
When we free an inode, we do so via RCU. As an RCU lookup can occur
at any time before we free an inode, and that lookup takes the inode
flags lock, we cannot safely assert that the flags lock is not held
just before marking it dead and running call_rcu() to free the
inode.
We check on allocation of a new inode structre that the lock is not
held, so we still have protection against locks being leaked and
hence not correctly initialised when allocated out of the slab.
Hence just remove the assert...
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
|
Regression introduced by commit 46f9d2e ("xfs: aborted buf items can
be in the AIL") which fails to lock the AIL before removing the
item. Spinlock debugging throws a warning about this.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
|
Pull CIFS fixes from Steve French:
"Two minor cifs fixes and a minor documentation cleanup for cifs.txt"
* 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
cifs: update cifs.txt and remove some outdated infos
cifs: Avoid calling unlock_page() twice in cifs_readpage() when using fscache
cifs: Do not take a reference to the page in cifs_readpage_worker()
|
|
Pull ubifs fix from Artem Bityutskiy:
"Just one patch which fixes the power-cut recovery testing mode.
I'll start using a single UBI/UBIFS tree instead of 2 trees from now
on. So in the future you'll get 1 small pull request instead of 2
tiny ones"
* tag 'upstream-3.12-rc1' of git://git.infradead.org/linux-ubifs:
UBIFS: remove invalid warn msg with tst_recovery enabled
|
|
Sedat points out that I transposed some letters in "LRU" and wrote "RLU"
instead in one of the new comments explaining the flow. Let's just fix
it.
Reported-by: Sedat Dilek <sedat.dilek@jpberlin.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux
Pull writeback fix from Wu Fengguang:
"A trivial writeback fix"
* tag 'writeback-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux:
writeback: Do not sort b_io list only because of block device inode
|
|
The LRU list changes interacted badly with our nr_dentry_unused
accounting, and even worse with the new DCACHE_LRU_LIST bit logic.
This introduces helper functions to make sure everything follows the
proper dcache d_lru list rules: the dentry cache is complicated by the
fact that some of the hotpaths don't even want to look at the LRU list
at all, and the fact that we use the same list entry in the dentry for
both the LRU list and for our temporary shrinking lists when removing
things from the LRU.
The helper functions temporarily have some extra sanity checking for the
flag bits that have to match the current LRU state of the dentry. We'll
remove that before the final 3.12 release, but considering how easy it
is to get wrong, this first cleanup version has some very particular
sanity checking.
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
When reading a single page with cifs_readpage(), we make a call to
fscache_read_or_alloc_page() which once done, asynchronously calls
the completion function cifs_readpage_from_fscache_complete(). This
completion function unlocks the page once it has been populated from
cache. The module then attempts to unlock the page a second time in
cifs_readpage() which leads to warning messages.
In case of a successful call to fscache_read_or_alloc_page() we should skip
the second unlock_page() since this will be called by the
cifs_readpage_from_fscache_complete() once the page has been populated by
fscache.
With the modifications to cifs_readpage_worker(), we will need to re-grab the
page lock in cifs_write_begin().
The problem was first noticed when testing new fscache patches for cifs.
https://bugzilla.redhat.com/show_bug.cgi?id=1005737
Signed-off-by: Sachin Prabhu <sprabhu@redhat.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
|
|
We do not need to take a reference to the pagecache in
cifs_readpage_worker() since the calling function will have already
taken one before passing the pointer to the page as an argument to the
function.
Signed-off-by: Sachin Prabhu <sprabhu@redhat.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
|
|
Pull aio changes from Ben LaHaise:
"First off, sorry for this pull request being late in the merge window.
Al had raised a couple of concerns about 2 items in the series below.
I addressed the first issue (the race introduced by Gu's use of
mm_populate()), but he has not provided any further details on how he
wants to rework the anon_inode.c changes (which were sent out months
ago but have yet to be commented on).
The bulk of the changes have been sitting in the -next tree for a few
months, with all the issues raised being addressed"
* git://git.kvack.org/~bcrl/aio-next: (22 commits)
aio: rcu_read_lock protection for new rcu_dereference calls
aio: fix race in ring buffer page lookup introduced by page migration support
aio: fix rcu sparse warnings introduced by ioctx table lookup patch
aio: remove unnecessary debugging from aio_free_ring()
aio: table lookup: verify ctx pointer
staging/lustre: kiocb->ki_left is removed
aio: fix error handling and rcu usage in "convert the ioctx list to table lookup v3"
aio: be defensive to ensure request batching is non-zero instead of BUG_ON()
aio: convert the ioctx list to table lookup v3
aio: double aio_max_nr in calculations
aio: Kill ki_dtor
aio: Kill ki_users
aio: Kill unneeded kiocb members
aio: Kill aio_rw_vect_retry()
aio: Don't use ctx->tail unnecessarily
aio: io_cancel() no longer returns the io_event
aio: percpu ioctx refcount
aio: percpu reqs_available
aio: reqs_active -> reqs_available
aio: fix build when migration is disabled
...
|
|
Pull xfs update #2 from Ben Myers:
"Here we have defrag support for v5 superblock, a number of bugfixes
and a cleanup or two.
- defrag support for CRC filesystems
- fix endian worning in xlog_recover_get_buf_lsn
- fixes for sparse warnings
- fix for assert in xfs_dir3_leaf_hdr_from_disk
- fix for log recovery of remote symlinks
- fix for log recovery of btree root splits
- fixes formemory allocation failures with ACLs
- fix for assert in xfs_buf_item_relse
- fix for assert in xfs_inode_buf_verify
- fix an assignment in an assert that should be a test in
xfs_bmbt_change_owner
- remove dead code in xlog_recover_inode_pass2"
* tag 'xfs-for-linus-v3.12-rc1-2' of git://oss.sgi.com/xfs/xfs:
xfs: remove dead code from xlog_recover_inode_pass2
xfs: = vs == typo in ASSERT()
xfs: don't assert fail on bad inode numbers
xfs: aborted buf items can be in the AIL.
xfs: factor all the kmalloc-or-vmalloc fallback allocations
xfs: fix memory allocation failures with ACLs
xfs: ensure we copy buffer type in da btree root splits
xfs: set remote symlink buffer type for recovery
xfs: recovery of swap extents operations for CRC filesystems
xfs: swap extents operations for CRC filesystems
xfs: check magic numbers in dir3 leaf verifier first
xfs: fix some minor sparse warnings
xfs: fix endian warning in xlog_recover_get_buf_lsn()
|
|
Merge more patches from Andrew Morton:
"The rest of MM. Plus one misc cleanup"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (35 commits)
mm/Kconfig: add MMU dependency for MIGRATION.
kernel: replace strict_strto*() with kstrto*()
mm, thp: count thp_fault_fallback anytime thp fault fails
thp: consolidate code between handle_mm_fault() and do_huge_pmd_anonymous_page()
thp: do_huge_pmd_anonymous_page() cleanup
thp: move maybe_pmd_mkwrite() out of mk_huge_pmd()
mm: cleanup add_to_page_cache_locked()
thp: account anon transparent huge pages into NR_ANON_PAGES
truncate: drop 'oldsize' truncate_pagecache() parameter
mm: make lru_add_drain_all() selective
memcg: document cgroup dirty/writeback memory statistics
memcg: add per cgroup writeback pages accounting
memcg: check for proper lock held in mem_cgroup_update_page_stat
memcg: remove MEMCG_NR_FILE_MAPPED
memcg: reduce function dereference
memcg: avoid overflow caused by PAGE_ALIGN
memcg: rename RESOURCE_MAX to RES_COUNTER_MAX
memcg: correct RESOURCE_MAX to ULLONG_MAX
mm: memcg: do not trap chargers with full callstack on OOM
mm: memcg: rework and document OOM waiting and wakeup
...
|
|
We use NR_ANON_PAGES as base for reporting AnonPages to user. There's
not much sense in not accounting transparent huge pages there, but add
them on printing to user.
Let's account transparent huge pages in NR_ANON_PAGES in the first place.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Hugh Dickins <hughd@google.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Ning Qu <quning@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
truncate_pagecache() doesn't care about old size since commit
cedabed49b39 ("vfs: Fix vmtruncate() regression"). Let's drop it.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs pile 4 from Al Viro:
"list_lru pile, mostly"
This came out of Andrew's pile, Al ended up doing the merge work so that
Andrew didn't have to.
Additionally, a few fixes.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (42 commits)
super: fix for destroy lrus
list_lru: dynamically adjust node arrays
shrinker: Kill old ->shrink API.
shrinker: convert remaining shrinkers to count/scan API
staging/lustre/libcfs: cleanup linux-mem.h
staging/lustre/ptlrpc: convert to new shrinker API
staging/lustre/obdclass: convert lu_object shrinker to count/scan API
staging/lustre/ldlm: convert to shrinkers to count/scan API
hugepage: convert huge zero page shrinker to new shrinker API
i915: bail out earlier when shrinker cannot acquire mutex
drivers: convert shrinkers to new count/scan API
fs: convert fs shrinkers to new scan/count API
xfs: fix dquot isolation hang
xfs-convert-dquot-cache-lru-to-list_lru-fix
xfs: convert dquot cache lru to list_lru
xfs: rework buffer dispose list tracking
xfs-convert-buftarg-lru-to-generic-code-fix
xfs: convert buftarg LRU to generic code
fs: convert inode and dentry shrinking to be node aware
vmscan: per-node deferred work
...
|
|
Pull NFS client bugfixes (part 2) from Trond Myklebust:
"Bugfixes:
- Fix a few credential reference leaks resulting from the
SP4_MACH_CRED NFSv4.1 state protection code.
- Fix the SUNRPC bloatometer footprint: convert a 256K hashtable into
the intended 64 byte structure.
- Fix a long standing XDR issue with FREE_STATEID
- Fix a potential WARN_ON spamming issue
- Fix a missing dprintk() kuid conversion
New features:
- Enable the NFSv4.1 state protection support for the WRITE and
COMMIT operations"
* tag 'nfs-for-3.12-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
SUNRPC: No, I did not intend to create a 256KiB hashtable
sunrpc: Add missing kuids conversion for printing
NFSv4.1: sp4_mach_cred: WARN_ON -> WARN_ON_ONCE
NFSv4.1: sp4_mach_cred: no need to ref count creds
NFSv4.1: fix SECINFO* use of put_rpccred
NFSv4.1: sp4_mach_cred: ask for WRITE and COMMIT
NFSv4.1 fix decode_free_stateid
|
|
This avoids the spinlocks and refcounts in the d_path() sequence too
(used by /proc and various other entities). See commit 8b19e34188a3 for
the equivalent getcwd() system call path.
And unlike getcwd(), d_path() doesn't copy the result to user space, so
I don't need to fear _that_ particular bug happening again.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
It's a pathname. It should use the pathname allocators and
deallocators, and PATH_MAX instead of PAGE_SIZE. Never mind that the
two are commonly the same.
With this, the allocations scale up nicely too, and I can do getcwd()
system calls at a rate of about 300M/s, with no lock contention
anywhere.
Of course, nobody sane does that, especially since getcwd() is
traditionally a very slow operation in Unix. But this was also the
simplest way to benchmark the prepend_path() improvements by Waiman, and
once I saw the profiles I couldn't leave it well enough alone.
But apart from being an performance improvement (from using per-cpu slab
allocators instead of the raw page allocator), it's actually a valid and
real cleanup.
Signed-off-by: Linus "OCD" Torvalds <torvalds@linux-foundation.org>
|
|
Oops. That wasn't very smart. We don't actually need the RCU lock any
more by the time we copy the cwd string to user space, but I had
stupidly surrounded the whole thing with it.
Introduced by commit 8b19e34188a3 ("vfs: make getcwd() get the root and
pwd path under rcu")
Is-a-big-hairy-idiot: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This allows us to skip all the crazy spinlocks and reference count
updates, and instead use the fs sequence read-lock to get an atomic
snapshot of the root and cwd information.
We might want to make the rule that "prepend_path()" is always called
with the RCU lock held, but the RCU lock nests fine and this is the
minimal fix.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Let's not pollute the include files with inline functions that are only
used in a single place. Especially not if we decide we might want to
change the semantics of said function to make it more efficient..
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs updates from Chris Mason:
"This is against 3.11-rc7, but was pulled and tested against your tree
as of yesterday. We do have two small incrementals queued up, but I
wanted to get this bunch out the door before I hop on an airplane.
This is a fairly large batch of fixes, performance improvements, and
cleanups from the usual Btrfs suspects.
We've included Stefan Behren's work to index subvolume UUIDs, which is
targeted at speeding up send/receive with many subvolumes or snapshots
in place. It closes a long standing performance issue that was built
in to the disk format.
Mark Fasheh's offline dedup work is also here. In this case offline
means the FS is mounted and active, but the dedup work is not done
inline during file IO. This is a building block where utilities are
able to ask the FS to dedup a series of extents. The kernel takes
care of verifying the data involved really is the same. Today this
involves reading both extents, but we'll continue to evolve the
patches"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (118 commits)
Btrfs: optimize key searches in btrfs_search_slot
Btrfs: don't use an async starter for most of our workers
Btrfs: only update disk_i_size as we remove extents
Btrfs: fix deadlock in uuid scan kthread
Btrfs: stop refusing the relocation of chunk 0
Btrfs: fix memory leak of uuid_root in free_fs_info
btrfs: reuse kbasename helper
btrfs: return btrfs error code for dev excl ops err
Btrfs: allow partial ordered extent completion
Btrfs: convert all bug_ons in free-space-cache.c
Btrfs: add support for asserts
Btrfs: adjust the fs_devices->missing count on unmount
Btrf: cleanup: don't check for root_refs == 0 twice
Btrfs: fix for patch "cleanup: don't check the same thing twice"
Btrfs: get rid of one BUG() in write_all_supers()
Btrfs: allocate prelim_ref with a slab allocater
Btrfs: pass gfp_t to __add_prelim_ref() to avoid always using GFP_ATOMIC
Btrfs: fix race conditions in BTRFS_IOC_FS_INFO ioctl
Btrfs: fix race between removing a dev and writing sbs
Btrfs: remove ourselves from the cluster list under lock
...
|
|
This patch modifies read_seqbegin_or_lock() and need_seqretry() to use
newly introduced read_seqlock_excl() and read_sequnlock_excl()
primitives so that they won't change the sequence number even if they
fall back to take the lock. This is OK as no change to the protected
data structure is being made.
It will prevent one fallback to lock taking from cascading into a series
of lock taking reducing performance because of the sequence number
change. It will also allow other sequence readers to go forward while
an exclusive reader lock is taken.
This patch also updates some of the inaccurate comments in the code.
Signed-off-by: Waiman Long <Waiman.Long@hp.com>
To: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Additional code in the error handler of xlog_recover_inode_pass2()
results in the following error:
static checker warning: "fs/xfs/xfs_log_recover.c:2999
xlog_recover_inode_pass2()
info: ignoring unreachable code."
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Mark Tinguely <tinguely@sgi.com>
Reviewed-by: Ben Myers <bpm@sgi.com
Signed-off-by: Ben Myers <bpm@sgi.com>
|
|
There is a '=' vs '==' typo so the ASSERT()s are always true.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
|
Pull CIFS fixes from Steve French:
"CIFS update including case insensitive file name matching improvements
for UTF-8 to Unicode, various small cifs fixes, SMB2/SMB3 leasing
improvements, support for following SMB2 symlinks, SMB3 packet signing
improvements"
* 'for-next' of git://git.samba.org/sfrench/cifs-2.6: (25 commits)
CIFS: Respect epoch value from create lease context v2
CIFS: Add create lease v2 context for SMB3
CIFS: Move parsing lease buffer to ops struct
CIFS: Move creating lease buffer to ops struct
CIFS: Store lease state itself rather than a mapped oplock value
CIFS: Replace clientCanCache* bools with an integer
[CIFS] quiet sparse compile warning
cifs: Start using per session key for smb2/3 for signature generation
cifs: Add a variable specific to NTLMSSP for key exchange.
cifs: Process post session setup code in respective dialect functions.
CIFS: convert to use le32_add_cpu()
CIFS: Fix missing lease break
CIFS: Fix a memory leak when a lease break comes
cifs: add winucase_convert.pl to Documentation/ directory
cifs: convert case-insensitive dentry ops to use new case conversion routines
cifs: add new case-insensitive conversion routines that are based on wchar_t's
[CIFS] Add Scott to list of cifs contributors
cifs: Move and expand MAX_SERVER_SIZE definition
cifs: Expand max share name length to 256
cifs: Move string length definitions to uapi
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs
Pull eCryptfs fixes from Tyler Hicks:
"Two small fixes to the code that initializes the per-file crypto
contexts"
* tag 'ecryptfs-3.12-rc1-crypt-ctx' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs:
ecryptfs: avoid ctx initialization race
ecryptfs: remove check for if an array is NULL
|
|
Merge first patch-bomb from Andrew Morton:
- Some pidns/fork/exec tweaks
- OCFS2 updates
- Most of MM - there remain quite a few memcg parts which depend on
pending core cgroups changes. Which might have been already merged -
I'll check tomorrow...
- Various misc stuff all over the place
- A few block bits which I never got around to sending to Jens -
relatively minor things.
- MAINTAINERS maintenance
- A small number of lib/ updates
- checkpatch updates
- epoll
- firmware/dmi-scan
- Some kprobes work for S390
- drivers/rtc updates
- hfsplus feature work
- vmcore feature work
- rbtree upgrades
- AOE updates
- pktcdvd cleanups
- PPS
- memstick
- w1
- New "inittmpfs" feature, which does the obvious
- More IPC work from Davidlohr.
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (303 commits)
lz4: fix compression/decompression signedness mismatch
ipc: drop ipc_lock_check
ipc, shm: drop shm_lock_check
ipc: drop ipc_lock_by_ptr
ipc, shm: guard against non-existant vma in shmdt(2)
ipc: document general ipc locking scheme
ipc,msg: drop msg_unlock
ipc: rename ids->rw_mutex
ipc,shm: shorten critical region for shmat
ipc,shm: cleanup do_shmat pasta
ipc,shm: shorten critical region for shmctl
ipc,shm: make shmctl_nolock lockless
ipc,shm: introduce shmctl_nolock
ipc: drop ipcctl_pre_down
ipc,shm: shorten critical region in shmctl_down
ipc,shm: introduce lockless functions to obtain the ipc object
initmpfs: use initramfs if rootfstype= or root= specified
initmpfs: make rootfs use tmpfs when CONFIG_TMPFS enabled
initmpfs: move rootfs code from fs/ramfs/ to init/
initmpfs: move bdi setup from init_rootfs to init_ramfs
...
|
|
When the rootfs code was a wrapper around ramfs, having them in the same
file made sense. Now that it can wrap another filesystem type, move it in
with the init code instead.
This also allows a subsequent patch to access rootfstype= command line
arg.
Signed-off-by: Rob Landley <rob@landley.net>
Cc: Jeff Layton <jlayton@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Stephen Warren <swarren@nvidia.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jim Cromie <jim.cromie@gmail.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Even though ramfs hasn't got a backing device, commit e0bf68ddec4f ("mm:
bdi init hooks") added one anyway, and put the initialization in
init_rootfs() since that's the first user, leaving it out of init_ramfs()
to avoid duplication.
But initmpfs uses init_tmpfs() instead, so move the init into the
filesystem's init function, add a "once" guard to prevent duplicate
initialization, and call the filesystem init from rootfs init.
This goes part of the way to allowing ramfs to be built as a module.
[akpm@linux-foundation.org; using bit 1 was odd]
Signed-off-by: Rob Landley <rob@landley.net>
Cc: Jeff Layton <jlayton@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Stephen Warren <swarren@nvidia.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jim Cromie <jim.cromie@gmail.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Mounting MS_NOUSER prevents --bind mounts from rootfs. Prevent new rootfs
mounts with a different mechanism that doesn't affect bind mounts.
Signed-off-by: Rob Landley <rob@landley.net>
Cc: Jeff Layton <jlayton@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Stephen Warren <swarren@nvidia.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jim Cromie <jim.cromie@gmail.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|