Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull a core sparse warning fix from Ingo Molnar
* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
mm/memblock: Use NULL instead of 0 for pointers
|
|
are nested for them
Currently, cgroup hierarchy support is a mess. cpu related subsystems
behave correctly - configuration, accounting and control on a parent
properly cover its children. blkio and freezer completely ignore
hierarchy and treat all cgroups as if they're directly under the root
cgroup. Others show yet different behaviors.
These differing interpretations of cgroup hierarchy make using cgroup
confusing and it impossible to co-mount controllers into the same
hierarchy and obtain sane behavior.
Eventually, we want full hierarchy support from all subsystems and
probably a unified hierarchy. Users using separate hierarchies
expecting completely different behaviors depending on the mounted
subsystem is deterimental to making any progress on this front.
This patch adds cgroup_subsys.broken_hierarchy and sets it to %true
for controllers which are lacking in hierarchy support. The goal of
this patch is two-fold.
* Move users away from using hierarchy on currently non-hierarchical
subsystems, so that implementing proper hierarchy support on those
doesn't surprise them.
* Keep track of which controllers are broken how and nudge the
subsystems to implement proper hierarchy support.
For now, start with a single warning message. We can whine louder
later on.
v2: Fixed a typo spotted by Michal. Warning message updated.
v3: Updated memcg part so that it doesn't generate warning in the
cases where .use_hierarchy=false doesn't make the behavior
different from root.use_hierarchy=true. Fixed a typo spotted by
Glauber.
v4: Check ->broken_hierarchy after cgroup creation is complete so that
->create() can affect the result per Michal. Dropped unnecessary
memcg root handling per Michal.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Li Zefan <lizefan@huawei.com>
Acked-by: Serge E. Hallyn <serue@us.ibm.com>
Cc: Glauber Costa <glommer@parallels.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul Turner <pjt@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Thomas Graf <tgraf@suug.ch>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
|
|
DEADLOCK will be report while running a kernel with NUMA and LOCKDEP enabled,
the process of this fake report is:
kmem_cache_free() //free obj in cachep
-> cache_free_alien() //acquire cachep's l3 alien lock
-> __drain_alien_cache()
-> free_block()
-> slab_destroy()
-> kmem_cache_free() //free slab in cachep->slabp_cache
-> cache_free_alien() //acquire cachep->slabp_cache's l3 alien lock
Since the cachep and cachep->slabp_cache's l3 alien are in the same lock class,
fake report generated.
This should not happen since we already have init_lock_keys() which will
reassign the lock class for both l3 list and l3 alien.
However, init_lock_keys() was invoked at a wrong position which is before we
invoke enable_cpucache() on each cache.
Since until set slab_state to be FULL, we won't invoke enable_cpucache()
on caches to build their l3 alien while creating them, so although we invoked
init_lock_keys(), the l3 alien lock class won't change since we don't have
them until invoked enable_cpucache() later.
This patch will invoke init_lock_keys() after we done enable_cpucache()
instead of before to avoid the fake DEADLOCK report.
Michael traced the problem back to a commit in release 3.0.0:
commit 30765b92ada267c5395fc788623cb15233276f5c
Author: Peter Zijlstra <peterz@infradead.org>
Date: Thu Jul 28 23:22:56 2011 +0200
slab, lockdep: Annotate the locks before using them
Fernando found we hit the regular OFF_SLAB 'recursion' before we
annotate the locks, cure this.
The relevant portion of the stack-trace:
> [ 0.000000] [<c085e24f>] rt_spin_lock+0x50/0x56
> [ 0.000000] [<c04fb406>] __cache_free+0x43/0xc3
> [ 0.000000] [<c04fb23f>] kmem_cache_free+0x6c/0xdc
> [ 0.000000] [<c04fb2fe>] slab_destroy+0x4f/0x53
> [ 0.000000] [<c04fb396>] free_block+0x94/0xc1
> [ 0.000000] [<c04fc551>] do_tune_cpucache+0x10b/0x2bb
> [ 0.000000] [<c04fc8dc>] enable_cpucache+0x7b/0xa7
> [ 0.000000] [<c0bd9d3c>] kmem_cache_init_late+0x1f/0x61
> [ 0.000000] [<c0bba687>] start_kernel+0x24c/0x363
> [ 0.000000] [<c0bba0ba>] i386_start_kernel+0xa9/0xaf
Reported-by: Fernando Lopez-Lezcano <nando@ccrma.Stanford.EDU>
Acked-by: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1311888176.2617.379.camel@laptop
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The commit moved init_lock_keys() before we build up the alien, so we
failed to reclass it.
Cc: <stable@vger.kernel.org> # 3.0+
Acked-by: Christoph Lameter <cl@linux.com>
Tested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Michael Wang <wangyun@linux.vnet.ibm.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
|
|
Tony Luck reported the following problem on IA-64:
Worked fine yesterday on next-20120905, crashes today. First sign of
trouble was an unaligned access, then a NULL dereference. SL*B related
bits of my config:
CONFIG_SLUB_DEBUG=y
# CONFIG_SLAB is not set
CONFIG_SLUB=y
CONFIG_SLABINFO=y
# CONFIG_SLUB_DEBUG_ON is not set
# CONFIG_SLUB_STATS is not set
And he console log.
PID hash table entries: 4096 (order: 1, 32768 bytes)
Dentry cache hash table entries: 262144 (order: 7, 2097152 bytes)
Inode-cache hash table entries: 131072 (order: 6, 1048576 bytes)
Memory: 2047920k/2086064k available (13992k code, 38144k reserved,
6012k data, 880k init)
kernel unaligned access to 0xca2ffc55fb373e95, ip=0xa0000001001be550
swapper[0]: error during unaligned kernel access
-1 [1]
Modules linked in:
Pid: 0, CPU 0, comm: swapper
psr : 00001010084a2018 ifs : 800000000000060f ip :
[<a0000001001be550>] Not tainted (3.6.0-rc4-zx1-smp-next-20120906)
ip is at new_slab+0x90/0x680
unat: 0000000000000000 pfs : 000000000000060f rsc : 0000000000000003
rnat: 9666960159966a59 bsps: a0000001001441c0 pr : 9666960159965a59
ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c8a70433f
csd : 0000000000000000 ssd : 0000000000000000
b0 : a0000001001be500 b6 : a00000010112cb20 b7 : a0000001011660a0
f6 : 0fff7f0f0f0f0e54f0000 f7 : 0ffe8c5c1000000000000
f8 : 1000d8000000000000000 f9 : 100068800000000000000
f10 : 10005f0f0f0f0e54f0000 f11 : 1003e0000000000000078
r1 : a00000010155eef0 r2 : 0000000000000000 r3 : fffffffffffc1638
r8 : e0000040600081b8 r9 : ca2ffc55fb373e95 r10 : 0000000000000000
r11 : e000004040001646 r12 : a000000101287e20 r13 : a000000101280000
r14 : 0000000000004000 r15 : 0000000000000078 r16 : ca2ffc55fb373e75
r17 : e000004040040000 r18 : fffffffffffc1646 r19 : e000004040001646
r20 : fffffffffffc15f8 r21 : 000000000000004d r22 : a00000010132fa68
r23 : 00000000000000ed r24 : 0000000000000000 r25 : 0000000000000000
r26 : 0000000000000001 r27 : a0000001012b8500 r28 : a00000010135f4a0
r29 : 0000000000000000 r30 : 0000000000000000 r31 : 0000000000000001
Unable to handle kernel NULL pointer dereference (address
0000000000000018)
swapper[0]: Oops 11003706212352 [2]
Modules linked in:
Pid: 0, CPU 0, comm: swapper
psr : 0000121008022018 ifs : 800000000000cc18 ip :
[<a0000001004dc8f1>] Not tainted (3.6.0-rc4-zx1-smp-next-20120906)
ip is at __copy_user+0x891/0x960
unat: 0000000000000000 pfs : 0000000000000813 rsc : 0000000000000003
rnat: 0000000000000000 bsps: 0000000000000000 pr : 9666960159961765
ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c0270033f
csd : 0000000000000000 ssd : 0000000000000000
b0 : a00000010004b550 b6 : a00000010004b740 b7 : a00000010000c750
f6 : 000000000000000000000 f7 : 1003e9e3779b97f4a7c16
f8 : 1003e0a00000010001550 f9 : 100068800000000000000
f10 : 10005f0f0f0f0e54f0000 f11 : 1003e0000000000000078
r1 : a00000010155eef0 r2 : a0000001012870b0 r3 : a0000001012870b8
r8 : 0000000000000298 r9 : 0000000000000013 r10 : 0000000000000000
r11 : 9666960159961a65 r12 : a000000101287010 r13 : a000000101280000
r14 : a000000101287068 r15 : a000000101287080 r16 : 0000000000000298
r17 : 0000000000000010 r18 : 0000000000000018 r19 : a000000101287310
r20 : 0000000000000290 r21 : 0000000000000000 r22 : 0000000000000000
r23 : a000000101386f58 r24 : 0000000000000000 r25 : 000000007fffffff
r26 : a000000101287078 r27 : a0000001013c69b0 r28 : 0000000000000000
r29 : 0000000000000014 r30 : 0000000000000000 r31 : 0000000000000813
Sedat Dilek and Hugh Dickins reported similar problems as well.
Earlier patches in the common set moved the zeroing of the kmem_cache
structure into common code. See "Move allocation of kmem_cache into
common code".
The allocation for the two special structures is still done from SLUB
specific code but no zeroing is done since the cache creation functions
used to zero. This now needs to be updated so that the structures are
zeroed during allocation in kmem_cache_init(). Otherwise random pointer
values may be followed.
Reported-by: Tony Luck <tony.luck@intel.com>
Reported-by: Sedat Dilek <sedat.dilek@gmail.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Reported-by: Hugh Dickins <hughd@google.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
|
|
Trivially triggerable, found by trinity:
kernel BUG at mm/mempolicy.c:2546!
Process trinity-child2 (pid: 23988, threadinfo ffff88010197e000, task ffff88007821a670)
Call Trace:
show_numa_map+0xd5/0x450
show_pid_numa_map+0x13/0x20
traverse+0xf2/0x230
seq_read+0x34b/0x3e0
vfs_read+0xac/0x180
sys_pread64+0xa2/0xc0
system_call_fastpath+0x1a/0x1f
RIP: mpol_to_str+0x156/0x360
Cc: stable@vger.kernel.org
Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This reverts commit 96d17b7be0a9849d381442030886211dbb2a7061 which
caused the following errors at boot:
[ 1.114885] kobject (ffff88001a802578): tried to init an initialized object, something is seriously wrong.
[ 1.114885] Pid: 1, comm: swapper/0 Tainted: G W 3.6.0-rc1+ #6
[ 1.114885] Call Trace:
[ 1.114885] [<ffffffff81273f37>] kobject_init+0x87/0xa0
[ 1.115555] [<ffffffff8127426a>] kobject_init_and_add+0x2a/0x90
[ 1.115555] [<ffffffff8127c870>] ? sprintf+0x40/0x50
[ 1.115555] [<ffffffff81124c60>] sysfs_slab_add+0x80/0x210
[ 1.115555] [<ffffffff81100175>] kmem_cache_create+0xa5/0x250
[ 1.115555] [<ffffffff81cf24cd>] ? md_init+0x144/0x144
[ 1.115555] [<ffffffff81cf25b6>] local_init+0xa4/0x11b
[ 1.115555] [<ffffffff81cf24e1>] dm_init+0x14/0x45
[ 1.115836] [<ffffffff810001ba>] do_one_initcall+0x3a/0x160
[ 1.116834] [<ffffffff81cc2c90>] kernel_init+0x133/0x1b7
[ 1.117835] [<ffffffff81cc25c4>] ? do_early_param+0x86/0x86
[ 1.117835] [<ffffffff8171aff4>] kernel_thread_helper+0x4/0x10
[ 1.118401] [<ffffffff81cc2b5d>] ? start_kernel+0x33f/0x33f
[ 1.119832] [<ffffffff8171aff0>] ? gs_change+0xb/0xb
[ 1.120325] ------------[ cut here ]------------
[ 1.120835] WARNING: at fs/sysfs/dir.c:536 sysfs_add_one+0xc1/0xf0()
[ 1.121437] sysfs: cannot create duplicate filename '/kernel/slab/:t-0000016'
[ 1.121831] Modules linked in:
[ 1.122138] Pid: 1, comm: swapper/0 Tainted: G W 3.6.0-rc1+ #6
[ 1.122831] Call Trace:
[ 1.123074] [<ffffffff81195ce1>] ? sysfs_add_one+0xc1/0xf0
[ 1.123833] [<ffffffff8103adfa>] warn_slowpath_common+0x7a/0xb0
[ 1.124405] [<ffffffff8103aed1>] warn_slowpath_fmt+0x41/0x50
[ 1.124832] [<ffffffff81195ce1>] sysfs_add_one+0xc1/0xf0
[ 1.125337] [<ffffffff81195eb3>] create_dir+0x73/0xd0
[ 1.125832] [<ffffffff81196221>] sysfs_create_dir+0x81/0xe0
[ 1.126363] [<ffffffff81273d3d>] kobject_add_internal+0x9d/0x210
[ 1.126832] [<ffffffff812742a3>] kobject_init_and_add+0x63/0x90
[ 1.127406] [<ffffffff81124c60>] sysfs_slab_add+0x80/0x210
[ 1.127832] [<ffffffff81100175>] kmem_cache_create+0xa5/0x250
[ 1.128384] [<ffffffff81cf24cd>] ? md_init+0x144/0x144
[ 1.128833] [<ffffffff81cf25b6>] local_init+0xa4/0x11b
[ 1.129831] [<ffffffff81cf24e1>] dm_init+0x14/0x45
[ 1.130305] [<ffffffff810001ba>] do_one_initcall+0x3a/0x160
[ 1.130831] [<ffffffff81cc2c90>] kernel_init+0x133/0x1b7
[ 1.131351] [<ffffffff81cc25c4>] ? do_early_param+0x86/0x86
[ 1.131830] [<ffffffff8171aff4>] kernel_thread_helper+0x4/0x10
[ 1.132392] [<ffffffff81cc2b5d>] ? start_kernel+0x33f/0x33f
[ 1.132830] [<ffffffff8171aff0>] ? gs_change+0xb/0xb
[ 1.133315] ---[ end trace 2703540871c8fab7 ]---
[ 1.133830] ------------[ cut here ]------------
[ 1.134274] WARNING: at lib/kobject.c:196 kobject_add_internal+0x1f5/0x210()
[ 1.134829] kobject_add_internal failed for :t-0000016 with -EEXIST, don't try to register things with the same name in the same directory.
[ 1.135829] Modules linked in:
[ 1.136135] Pid: 1, comm: swapper/0 Tainted: G W 3.6.0-rc1+ #6
[ 1.136828] Call Trace:
[ 1.137071] [<ffffffff81273e95>] ? kobject_add_internal+0x1f5/0x210
[ 1.137830] [<ffffffff8103adfa>] warn_slowpath_common+0x7a/0xb0
[ 1.138402] [<ffffffff8103aed1>] warn_slowpath_fmt+0x41/0x50
[ 1.138830] [<ffffffff811955a3>] ? release_sysfs_dirent+0x73/0xf0
[ 1.139419] [<ffffffff81273e95>] kobject_add_internal+0x1f5/0x210
[ 1.139830] [<ffffffff812742a3>] kobject_init_and_add+0x63/0x90
[ 1.140429] [<ffffffff81124c60>] sysfs_slab_add+0x80/0x210
[ 1.140830] [<ffffffff81100175>] kmem_cache_create+0xa5/0x250
[ 1.141829] [<ffffffff81cf24cd>] ? md_init+0x144/0x144
[ 1.142307] [<ffffffff81cf25b6>] local_init+0xa4/0x11b
[ 1.142829] [<ffffffff81cf24e1>] dm_init+0x14/0x45
[ 1.143307] [<ffffffff810001ba>] do_one_initcall+0x3a/0x160
[ 1.143829] [<ffffffff81cc2c90>] kernel_init+0x133/0x1b7
[ 1.144352] [<ffffffff81cc25c4>] ? do_early_param+0x86/0x86
[ 1.144829] [<ffffffff8171aff4>] kernel_thread_helper+0x4/0x10
[ 1.145405] [<ffffffff81cc2b5d>] ? start_kernel+0x33f/0x33f
[ 1.145828] [<ffffffff8171aff0>] ? gs_change+0xb/0xb
[ 1.146313] ---[ end trace 2703540871c8fab8 ]---
Conflicts:
mm/slub.c
Signed-off-by: Pekka Enberg <penberg@kernel.org>
|
|
Get rid of the refcount stuff in the allocators and do that part of
kmem_cache management in the common code.
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
|
|
Do the initial settings of the fields in common code. This will allow us
to push more processing into common code later and improve readability.
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
|
|
Shift the allocations to common code. That way the allocation and
freeing of the kmem_cache structures is handled by common code.
Reviewed-by: Glauber Costa <glommer@parallels.com>
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
|
|
Simplify locking by moving the slab_add_sysfs after all locks have been
dropped. Eases the upcoming move to provide sysfs support for all
allocators.
Reviewed-by: Glauber Costa <glommer@parallels.com>
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
|
|
The slab aliasing logic causes some strange contortions in slub. So add
a call to deal with aliases to slab_common.c but disable it for other
slab allocators by providng stubs that fail to create aliases.
Full general support for aliases will require additional cleanup passes
and more standardization of fields in kmem_cache.
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
|
|
Duping of the slabname has to be done by each slab. Moving this code to
slab_common avoids duplicate implementations.
With this patch we have common string handling for all slab allocators.
Strings passed to kmem_cache_create() are copied internally. Subsystems
can create temporary strings to create slab caches.
Slabs allocated in early states of bootstrap will never be freed (and
those can never be freed since they are essential to slab allocator
operations). During bootstrap we therefore do not have to worry about
duping names.
Reviewed-by: Glauber Costa <glommer@parallels.com>
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
|
|
What is done there can be done in __kmem_cache_shutdown.
This affects RCU handling somewhat. On rcu free all slab allocators do
not refer to other management structures than the kmem_cache structure.
Therefore these other structures can be freed before the rcu deferred
free to the page allocator occurs.
Reviewed-by: Joonsoo Kim <js1304@gmail.com>
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
|
|
The freeing action is basically the same in all slab allocators.
Move to the common kmem_cache_destroy() function.
Reviewed-by: Glauber Costa <glommer@parallels.com>
Reviewed-by: Joonsoo Kim <js1304@gmail.com>
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
|
|
Make all allocators use the "kmem_cache" slabname for the "kmem_cache"
structure.
Reviewed-by: Glauber Costa <glommer@parallels.com>
Reviewed-by: Joonsoo Kim <js1304@gmail.com>
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
|
|
kmem_cache_destroy does basically the same in all allocators.
Extract common code which is easy since we already have common mutex
handling.
Reviewed-by: Glauber Costa <glommer@parallels.com>
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
|
|
Move the code to append the new kmem_cache to the list of slab caches to
the kmem_cache_create code in the shared code.
This is possible now since the acquisition of the mutex was moved into
kmem_cache_create().
Acked-by: David Rientjes <rientjes@google.com>
Reviewed-by: Glauber Costa <glommer@parallels.com>
Reviewed-by: Joonsoo Kim <js1304@gmail.com>
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
|
|
Instead of using s == NULL use an errorcode. This allows much more
detailed diagnostics as to what went wrong. As we add more functionality
from the slab allocators to the common kmem_cache_create() function we will
also add more error conditions.
Print the error code during the panic as well as in a warning if the module
can handle failure. The API for kmem_cache_create() currently does not allow
the returning of an error code. Return NULL but log the cause of the problem
in the syslog.
Reviewed-by: Glauber Costa <glommer@parallels.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
|
|
Do not use kmalloc() but kmem_cache_alloc() for the allocation
of the kmem_cache structures in slub.
Reviewed-by: Glauber Costa <glommer@parallels.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
|
|
Add additional debugging to check that the objects is actually from the cache
the caller claims. Doing so currently trips up some other debugging code. It
takes a lot to infer from that what was happening.
Reviewed-by: Glauber Costa <glommer@parallels.com>
Signed-off-by: Christoph Lameter <cl@linux.com>
[ penberg@kernel.org: Use pr_err() ]
Signed-off-by: Pekka Enberg <penberg@kernel.org>
|
|
This type cleanup also fixes the following sparse warning:
mm/memblock.c:249:49: warning: Using plain integer as NULL pointer
Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: patches@linaro.org
Cc: linux-mm@kvack.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Without this patch we can get (many) kmem trace events
with call site at krealloc().
This happens because krealloc is calling __krealloc,
which performs the allocation through kmalloc_track_caller.
Since neither krealloc nor __krealloc are marked inline explicitly,
the caller can be traced as being krealloc, which clearly is not
the intended behavior.
This patch allows to get the real caller of krealloc, by creating
an always inlined function __do_krealloc, thus tracing the
call site accurately.
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Glauber Costa <glommer@parallels.com>
Signed-off-by: Ezequiel Garcia <elezegarcia@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
|
|
cache_grow() can reenable irqs so the cpu (and node) can change, so ensure
that we take list_lock on the correct nodelist.
This fixes an issue with commit 072bb0aa5e06 ("mm: sl[au]b: add
knowledge of PFMEMALLOC reserve pages") where list_lock for the wrong
node was taken after growing the cache.
Reported-and-tested-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
It marks pages as reserved, as the long description says.
Signed-off-by: Javi Merino <javi.merino@arm.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
|
|
Pull block-related fixes from Jens Axboe:
- Improvements to the buffered and direct write IO plugging from
Fengguang.
- Abstract out the mapping of a bio in a request, and use that to
provide a blk_bio_map_sg() helper. Useful for mapping just a bio
instead of a full request.
- Regression fix from Hugh, fixing up a patch that went into the
previous release cycle (and marked stable, too) attempting to prevent
a loop in __getblk_slow().
- Updates to discard requests, fixing up the sizing and how we align
them. Also a change to disallow merging of discard requests, since
that doesn't really work properly yet.
- A few drbd fixes.
- Documentation updates.
* 'for-linus' of git://git.kernel.dk/linux-block:
block: replace __getblk_slow misfix by grow_dev_page fix
drbd: Write all pages of the bitmap after an online resize
drbd: Finish requests that completed while IO was frozen
drbd: fix drbd wire compatibility for empty flushes
Documentation: update tunable options in block/cfq-iosched.txt
Documentation: update tunable options in block/cfq-iosched.txt
Documentation: update missing index files in block/00-INDEX
block: move down direct IO plugging
block: remove plugging at buffered write time
block: disable discard request merge temporarily
bio: Fix potential memory leak in bio_find_or_create_slab()
block: Don't use static to define "void *p" in show_partition_start()
block: Add blk_bio_map_sg() helper
block: Introduce __blk_segment_map_sg() helper
fs/block-dev.c:fix performance regression in O_DIRECT writes to md block devices
block: split discard into aligned requests
block: reorganize rounding of max_discard_sectors
|
|
Fix checkpatch warnings:
WARNING: consider using kstrto* in preference to simple_strtoul
for the below sys entry parsers:
/sys/block/<block device>/bdi/read_ahead_kb
/sys/block/<block device>/bdi/max_ratio
/sys/block/<block device>/bdi/min_ratio
Signed-off-by: Namjae Jeon <linkinjeon@gmail.com>
Signed-off-by: Vivek Trivedi <vtrivedi018@gmail.com>
|
|
Extract in-memory xattr APIs from tmpfs. Will be used by cgroup.
$ size vmlinux.o
text data bss dec hex filename
4658782 880729 5195032 10734543 a3cbcf vmlinux.o
$ size vmlinux.o
text data bss dec hex filename
4658957 880729 5195032 10734718 a3cc7e vmlinux.o
v7:
- checkpatch warnings fixed
- Implement the changes requested by Hugh Dickins:
- make simple_xattrs_init and simple_xattrs_free inline
- get rid of locking and list reinitialization in simple_xattrs_free,
they're not needed
v6:
- no changes
v5:
- no changes
v4:
- move simple_xattrs_free() to fs/xattr.c
v3:
- in kmem_xattrs_free(), reinitialize the list
- use simple_xattr_* prefix
- introduce simple_xattr_add() to prevent direct list usage
Original-patch-by: Li Zefan <lizefan@huawei.com>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Lennart Poettering <lpoetter@redhat.com>
Acked-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
"This tree contains misc fixlets: a perf script python binding fix, a
uprobes fix and a syscall tracing fix."
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf tools: Add missing files to build the python binding
uprobes: Fix mmap_region()'s mm->mm_rb corruption if uprobe_mmap() fails
tracing/syscalls: Fix perf syscall tracing when syscall_nr == -1
|
|
Jim Schutt reported a problem that pointed at compaction contending
heavily on locks. The workload is straight-forward and in his own words;
The systems in question have 24 SAS drives spread across 3 HBAs,
running 24 Ceph OSD instances, one per drive. FWIW these servers
are dual-socket Intel 5675 Xeons w/48 GB memory. I've got ~160
Ceph Linux clients doing dd simultaneously to a Ceph file system
backed by 12 of these servers.
Early in the test everything looks fine
procs -------------------memory------------------ ---swap-- -----io---- --system-- -----cpu-------
r b swpd free buff cache si so bi bo in cs us sy id wa st
31 15 0 287216 576 38606628 0 0 2 1158 2 14 1 3 95 0 0
27 15 0 225288 576 38583384 0 0 18 2222016 203357 134876 11 56 17 15 0
28 17 0 219256 576 38544736 0 0 11 2305932 203141 146296 11 49 23 17 0
6 18 0 215596 576 38552872 0 0 7 2363207 215264 166502 12 45 22 20 0
22 18 0 226984 576 38596404 0 0 3 2445741 223114 179527 12 43 23 22 0
and then it goes to pot
procs -------------------memory------------------ ---swap-- -----io---- --system-- -----cpu-------
r b swpd free buff cache si so bi bo in cs us sy id wa st
163 8 0 464308 576 36791368 0 0 11 22210 866 536 3 13 79 4 0
207 14 0 917752 576 36181928 0 0 712 1345376 134598 47367 7 90 1 2 0
123 12 0 685516 576 36296148 0 0 429 1386615 158494 60077 8 84 5 3 0
123 12 0 598572 576 36333728 0 0 1107 1233281 147542 62351 7 84 5 4 0
622 7 0 660768 576 36118264 0 0 557 1345548 151394 59353 7 85 4 3 0
223 11 0 283960 576 36463868 0 0 46 1107160 121846 33006 6 93 1 1 0
Note that system CPU usage is very high blocks being written out has
dropped by 42%. He analysed this with perf and found
perf record -g -a sleep 10
perf report --sort symbol --call-graph fractal,5
34.63% [k] _raw_spin_lock_irqsave
|
|--97.30%-- isolate_freepages
| compaction_alloc
| unmap_and_move
| migrate_pages
| compact_zone
| compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_slowpath
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| do_page_fault
| page_fault
| |
| |--87.39%-- skb_copy_datagram_iovec
| | tcp_recvmsg
| | inet_recvmsg
| | sock_recvmsg
| | sys_recvfrom
| | system_call
| | __recv
| | |
| | --100.00%-- (nil)
| |
| --12.61%-- memcpy
--2.70%-- [...]
There was other data but primarily it is all showing that compaction is
contended heavily on the zone->lock and zone->lru_lock.
commit [b2eef8c0: mm: compaction: minimise the time IRQs are disabled
while isolating pages for migration] noted that it was possible for
migration to hold the lru_lock for an excessive amount of time. Very
broadly speaking this patch expands the concept.
This patch introduces compact_checklock_irqsave() to check if a lock
is contended or the process needs to be scheduled. If either condition
is true then async compaction is aborted and the caller is informed.
The page allocator will fail a THP allocation if compaction failed due
to contention. This patch also introduces compact_trylock_irqsave()
which will acquire the lock only if it is not contended and the process
does not need to schedule.
Reported-by: Jim Schutt <jaschut@sandia.gov>
Tested-by: Jim Schutt <jaschut@sandia.gov>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Commit 7db8889ab05b ("mm: have order > 0 compaction start off where it
left") introduced a caching mechanism to reduce the amount work the free
page scanner does in compaction. However, it has a problem. Consider
two process simultaneously scanning free pages
C
Process A M S F
|---------------------------------------|
Process B M FS
C is zone->compact_cached_free_pfn
S is cc->start_pfree_pfn
M is cc->migrate_pfn
F is cc->free_pfn
In this diagram, Process A has just reached its migrate scanner, wrapped
around and updated compact_cached_free_pfn accordingly.
Simultaneously, Process B finishes isolating in a block and updates
compact_cached_free_pfn again to the location of its free scanner.
Process A moves to "end_of_zone - one_pageblock" and runs this check
if (cc->order > 0 && (!cc->wrapped ||
zone->compact_cached_free_pfn >
cc->start_free_pfn))
pfn = min(pfn, zone->compact_cached_free_pfn);
compact_cached_free_pfn is above where it started so the free scanner
skips almost the entire space it should have scanned. When there are
multiple processes compacting it can end in a situation where the entire
zone is not being scanned at all. Further, it is possible for two
processes to ping-pong update to compact_cached_free_pfn which is just
random.
Overall, the end result wrecks allocation success rates.
There is not an obvious way around this problem without introducing new
locking and state so this patch takes a different approach.
First, it gets rid of the skip logic because it's not clear that it
matters if two free scanners happen to be in the same block but with
racing updates it's too easy for it to skip over blocks it should not.
Second, it updates compact_cached_free_pfn in a more limited set of
circumstances.
If a scanner has wrapped, it updates compact_cached_free_pfn to the end
of the zone. When a wrapped scanner isolates a page, it updates
compact_cached_free_pfn to point to the highest pageblock it
can isolate pages from.
If a scanner has not wrapped when it has finished isolated pages it
checks if compact_cached_free_pfn is pointing to the end of the
zone. If so, the value is updated to point to the highest
pageblock that pages were isolated from. This value will not
be updated again until a free page scanner wraps and resets
compact_cached_free_pfn.
This is not optimal and it can still race but the compact_cached_free_pfn
will be pointing to or very near a pageblock with free pages.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Commit cfd19c5a9ecf ("mm: only set page->pfmemalloc when
ALLOC_NO_WATERMARKS was used") tried to narrow down page->pfmemalloc
setting, but it missed some places the pfmemalloc should be set.
So, in __slab_alloc, the unalignment pfmemalloc and ALLOC_NO_WATERMARKS
cause incorrect deactivate_slab() on our core2 server:
64.73% fio [kernel.kallsyms] [k] _raw_spin_lock
|
--- _raw_spin_lock
|
|---0.34%-- deactivate_slab
| __slab_alloc
| kmem_cache_alloc
| |
That causes our fio sync write performance to have a 40% regression.
Move the checking in get_page_from_freelist() which resolves this issue.
Signed-off-by: Alex Shi <alex.shi@intel.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: David Miller <davem@davemloft.net
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Tested-by: Eric Dumazet <eric.dumazet@gmail.com>
Tested-by: Sage Weil <sage@inktank.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Commit aff622495c9a ("vmscan: only defer compaction for failed order and
higher") fixed bad deferring policy but made mistake about checking
compact_order_failed in __compact_pgdat(). So it can't update
compact_order_failed with the new order. This ends up preventing
correct operation of policy deferral. This patch fixes it.
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Rik van Riel <riel@redhat.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Occasionally an isolated BUG_ON(mm->nr_ptes) gets reported, indicating
that not all the page tables allocated could be found and freed when
exit_mmap() tore down the user address space.
There's usually nothing we can say about it, beyond that it's probably a
sign of some bad memory or memory corruption; though it might still
indicate a bug in vma or page table management (and did recently reveal a
race in THP, fixed a few months ago).
But one overdue change we can make is from BUG_ON to WARN_ON.
It's fairly likely that the system will crash shortly afterwards in some
other way (for example, the BUG_ON(page_mapped(page)) in
__delete_from_page_cache(), once an inode mapped into the lost page tables
gets evicted); but might tell us more before that.
Change the BUG_ON(page_mapped) to WARN_ON too? Later perhaps: I'm less
eager, since that one has several times led to fixes.
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Initalizers for deferrable delayed_work are confused.
* __DEFERRED_WORK_INITIALIZER()
* DECLARE_DEFERRED_WORK()
* INIT_DELAYED_WORK_DEFERRABLE()
Rename them to
* __DEFERRABLE_WORK_INITIALIZER()
* DECLARE_DEFERRABLE_WORK()
* INIT_DEFERRABLE_WORK()
This patch doesn't cause any functional changes.
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
This patch fixes:
https://bugzilla.redhat.com/show_bug.cgi?id=843640
If mmap_region()->uprobe_mmap() fails, unmap_and_free_vma path
does unmap_region() but does not remove the soon-to-be-freed vma
from rb tree. Actually there are more problems but this is how
William noticed this bug.
Perhaps we could do do_munmap() + return in this case, but in
fact it is simply wrong to abort if uprobe_mmap() fails. Until
at least we move the !UPROBE_COPY_INSN code from
install_breakpoint() to uprobe_register().
For example, uprobe_mmap()->install_breakpoint() can fail if the
probed insn is not supported (remember, uprobe_register()
succeeds if nobody mmaps inode/offset), mmap() should not fail
in this case.
dup_mmap()->uprobe_mmap() is wrong too by the same reason,
fork() can race with uprobe_register() and fail for no reason if
it wins the race and does install_breakpoint() first.
And, if nothing else, both mmap_region() and dup_mmap() return
success if uprobe_mmap() fails. Change them to ignore the error
code from uprobe_mmap().
Reported-and-tested-by: William Cohen <wcohen@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: <stable@vger.kernel.org> # v3.5
Cc: Anton Arapov <anton@redhat.com>
Cc: William Cohen <wcohen@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20120819171042.GB26957@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
page_get_cache() isn't called from anything, so remove it.
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
|
|
kmem_cache_create() does cache integrity checks when CONFIG_DEBUG_VM is
defined. These checks interspersed with the regular code path has lead
to compile time warnings when compiled without CONFIG_DEBUG_VM defined.
Restructuring the code to move the integrity checks in to a new function
would eliminate the current compile warning problem and also will allow
for future changes to the debug only code to evolve without introducing
new warnings in the regular path.
This restructuring work is based on the discussion in the following
thread:
https://lkml.org/lkml/2012/7/13/424
[akpm@linux-foundation.org: fix build, cleanup]
Signed-off-by: Shuah Khan <shuah.khan@hp.com>
Acked-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
|
|
This reverts commit 455ce9eb1cfa083da0def023094190aeb133855a. Andrew
sent a better version.
Signed-off-by: Pekka Enberg <penberg@kernel.org>
|
|
In current implementation, after unfreezing, we doesn't touch oldpage,
so it remain 'NOT NULL'. When we call this_cpu_cmpxchg()
with this old oldpage, this_cpu_cmpxchg() is mostly be failed.
We can change value of oldpage to NULL after unfreezing,
because unfreeze_partial() ensure that all the cpu partial slabs is removed
from cpu partial list. In this time, we could expect that
this_cpu_cmpxchg is mostly succeed.
Acked-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
|
|
Only applies to scenarios where debugging is on:
Validation of slabs can currently occur while debugging
information is updated from the fast paths of the allocator.
This results in various races where we get false reports about
slab metadata not being in order.
This patch makes the fast paths take the node lock so that
serialization with slab validation will occur. Causes additional
slowdown in debug scenarios.
Reported-by: Waiman Long <Waiman.Long@hp.com>
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
|
|
Eliminate an ifdef and a label by moving all the CONFIG_DEBUG_VM checking
inside the locked region.
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
|
|
page_get_cache() does not need to call compound_head(), as its unique
caller virt_to_slab() already makes sure to return a head page.
Additionally, removing the compound_head() call makes page_get_cache()
consistent with page_get_slab().
Signed-off-by: Michel Lespinasse <walken@google.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Pekka Enberg <penberg@kernel.org>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
|
|
When freeing objects, the slub allocator will most of the time free
empty pages by calling __free_pages(). But high-order kmalloc will be
diposed by means of put_page() instead. It makes no sense to call
put_page() in kernel pages that are provided by the object allocators,
so we shouldn't be doing this ourselves. Aside from the consistency
change, we don't change the flow too much. put_page()'s would call its
dtor function, which is __free_pages. We also already do all of the
Compound page tests ourselves, and the Mlock test we lose don't really
matter.
Signed-off-by: Glauber Costa <glommer@parallels.com>
Acked-by: Christoph Lameter <cl@linux.com>
CC: David Rientjes <rientjes@google.com>
CC: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
|
|
* stable/for-linus-3.6:
mm/frontswap: fix uninit'ed variable warning
|
|
Fixes uninitialized variable warning on 'type' in frontswap_shrink().
type is set before use by __frontswap_unuse_pages() called by
__frontswap_shrink() called by frontswap_shrink() before use by
try_to_unuse().
Signed-off-by: Seth Jennings <sjenning@linux.vnet.ibm.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
|
Move unplugging for direct I/O from around ->direct_IO() down to
do_blockdev_direct_IO(). This implicitly adds plugging for direct
writes.
CC: Li Shaohua <shli@fusionio.com>
Acked-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Buffered write(2) is not directly tied to IO, so it's not suitable to
handle plug in generic_file_aio_write().
Note that plugging for O_SYNC writes is also removed. The user may pass
arbitrary @size arguments, which may be much larger than the preferable
I/O size, or may cross extent/device boundaries. Let the lower layers
handle the plugging. The plugging code here actually turns them into
no-ops.
CC: Li Shaohua <shli@fusionio.com>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Finally we can kill the 'sync_supers' kernel thread along with the
'->write_super()' superblock operation because all the users are gone.
Now every file-system is supposed to self-manage own superblock and
its dirty state.
The nice thing about killing this thread is that it improves power management.
Indeed, 'sync_supers' is a source of monotonic system wake-ups - it woke up
every 5 seconds no matter what - even if there were no dirty superblocks and
even if there were no file-systems using this service (e.g., btrfs and
journalled ext4 do not need it). So it was wasting power most of the time. And
because the thread was in the core of the kernel, all systems had to have it.
So I am quite happy to make it go away.
Interestingly, this thread is a left-over from the pdflush kernel thread which
was a self-forking kernel thread responsible for all the write-back in old
Linux kernels. It was turned into per-block device BDI threads, and
'sync_supers' was a left-over. Thus, R.I.P, pdflush as well.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Borislav Petkov reports that the new warning added in commit
88fdf75d1bb5 ("mm: warn if pg_data_t isn't initialized with zero")
triggers for him, and it is the node_start_pfn field that has already
been initialized once.
The call trace looks like this:
x86_64_start_kernel ->
x86_64_start_reservations ->
start_kernel ->
setup_arch ->
paging_init ->
zone_sizes_init ->
free_area_init_nodes ->
free_area_init_node
and (with the warning replaced by debug output), Borislav sees
On node 0 totalpages: 4193848
DMA zone: 64 pages used for memmap
DMA zone: 6 pages reserved
DMA zone: 3890 pages, LIFO batch:0
DMA32 zone: 16320 pages used for memmap
DMA32 zone: 798464 pages, LIFO batch:31
Normal zone: 52736 pages used for memmap
Normal zone: 3322368 pages, LIFO batch:31
free_area_init_node: pgdat->node_start_pfn: 4423680 <----
On node 1 totalpages: 4194304
Normal zone: 65536 pages used for memmap
Normal zone: 4128768 pages, LIFO batch:31
free_area_init_node: pgdat->node_start_pfn: 8617984 <----
On node 2 totalpages: 4194304
Normal zone: 65536 pages used for memmap
Normal zone: 4128768 pages, LIFO batch:31
free_area_init_node: pgdat->node_start_pfn: 12812288 <----
On node 3 totalpages: 4194304
Normal zone: 65536 pages used for memmap
Normal zone: 4128768 pages, LIFO batch:31
so remove the bogus warning for now to avoid annoying people. Minchan
Kim is looking at it.
Reported-by: Borislav Petkov <bp@amd64.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull second vfs pile from Al Viro:
"The stuff in there: fsfreeze deadlock fixes by Jan (essentially, the
deadlock reproduced by xfstests 068), symlink and hardlink restriction
patches, plus assorted cleanups and fixes.
Note that another fsfreeze deadlock (emergency thaw one) is *not*
dealt with - the series by Fernando conflicts a lot with Jan's, breaks
userland ABI (FIFREEZE semantics gets changed) and trades the deadlock
for massive vfsmount leak; this is going to be handled next cycle.
There probably will be another pull request, but that stuff won't be
in it."
Fix up trivial conflicts due to unrelated changes next to each other in
drivers/{staging/gdm72xx/usb_boot.c, usb/gadget/storage_common.c}
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (54 commits)
delousing target_core_file a bit
Documentation: Correct s_umount state for freeze_fs/unfreeze_fs
fs: Remove old freezing mechanism
ext2: Implement freezing
btrfs: Convert to new freezing mechanism
nilfs2: Convert to new freezing mechanism
ntfs: Convert to new freezing mechanism
fuse: Convert to new freezing mechanism
gfs2: Convert to new freezing mechanism
ocfs2: Convert to new freezing mechanism
xfs: Convert to new freezing code
ext4: Convert to new freezing mechanism
fs: Protect write paths by sb_start_write - sb_end_write
fs: Skip atime update on frozen filesystem
fs: Add freezing handling to mnt_want_write() / mnt_drop_write()
fs: Improve filesystem freezing handling
switch the protection of percpu_counter list to spinlock
nfsd: Push mnt_want_write() outside of i_mutex
btrfs: Push mnt_want_write() outside of i_mutex
fat: Push mnt_want_write() outside of i_mutex
...
|