summaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)Author
2013-04-29memcg: simplify mem_cgroup_iterMichal Hocko
The current implementation of mem_cgroup_iter has to consider both css and memcg to find out whether no group has been found (css==NULL - aka the loop is completed) and that no memcg is associated with the found node (!memcg - aka css_tryget failed because the group is no longer alive). This leads to awkward tweaks like tests for css && !memcg to skip the current node. It will be much easier if we got rid off css variable altogether and only rely on memcg. In order to do that the iteration part has to skip dead nodes. This sounds natural to me and as a nice side effect we will get a simple invariant that memcg is always alive when non-NULL and all nodes have been visited otherwise. We could get rid of the surrounding while loop but keep it in for now to make review easier. It will go away in the following patch. Signed-off-by: Michal Hocko <mhocko@suse.cz> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Li Zefan <lizefan@huawei.com> Cc: Ying Han <yinghan@google.com> Cc: Tejun Heo <htejun@gmail.com> Cc: Glauber Costa <glommer@parallels.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29memcg: relax memcg iter cachingMichal Hocko
Now that the per-node-zone-priority iterator caches memory cgroups rather than their css ids we have to be careful and remove them from the iterator when they are on the way out otherwise they might live for unbounded amount of time even though their group is already gone (until the global/targeted reclaim triggers the zone under priority to find out the group is dead and let it to find the final rest). We can fix this issue by relaxing rules for the last_visited memcg. Instead of taking a reference to the css before it is stored into iter->last_visited we can just store its pointer and track the number of removed groups from each memcg's subhierarchy. This number would be stored into iterator everytime when a memcg is cached. If the iter count doesn't match the curent walker root's one we will start from the root again. The group counter is incremented upwards the hierarchy every time a group is removed. The iter_lock can be dropped because racing iterators cannot leak the reference anymore as the reference count is not elevated for last_visited when it is cached. Locking rules got a bit complicated by this change though. The iterator primarily relies on rcu read lock which makes sure that once we see a valid last_visited pointer then it will be valid for the whole RCU walk. smp_rmb makes sure that dead_count is read before last_visited and last_dead_count while smp_wmb makes sure that last_visited is updated before last_dead_count so the up-to-date last_dead_count cannot point to an outdated last_visited. css_tryget then makes sure that the last_visited is still alive in case the iteration races with the cached group removal (css is invalidated before mem_cgroup_css_offline increments dead_count). In short: mem_cgroup_iter rcu_read_lock() dead_count = atomic_read(parent->dead_count) smp_rmb() if (dead_count != iter->last_dead_count) last_visited POSSIBLY INVALID -> last_visited = NULL if (!css_tryget(iter->last_visited)) last_visited DEAD -> last_visited = NULL next = find_next(last_visited) css_tryget(next) css_put(last_visited) // css would be invalidated and parent->dead_count // incremented if this was the last reference iter->last_visited = next smp_wmb() iter->last_dead_count = dead_count rcu_read_unlock() cgroup_rmdir cgroup_destroy_locked atomic_add(CSS_DEACT_BIAS, &css->refcnt) // subsequent css_tryget fail mem_cgroup_css_offline mem_cgroup_invalidate_reclaim_iterators while(parent = parent_mem_cgroup) atomic_inc(parent->dead_count) css_put(css) // last reference held by cgroup core Spotted by Ying Han. Original idea from Johannes Weiner. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Michal Hocko <mhocko@suse.cz> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Ying Han <yinghan@google.com> Cc: Li Zefan <lizefan@huawei.com> Cc: Tejun Heo <htejun@gmail.com> Cc: Glauber Costa <glommer@parallels.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29memcg: rework mem_cgroup_iter to use cgroup iteratorsMichal Hocko
mem_cgroup_iter curently relies on css->id when walking down a group hierarchy tree. This is really awkward because the tree walk depends on the groups creation ordering. The only guarantee is that a parent node is visited before its children. Example: 1) mkdir -p a a/d a/b/c 2) mkdir -a a/b/c a/d Will create the same trees but the tree walks will be different: 1) a, d, b, c 2) a, b, c, d Commit 574bd9f7c7c1 ("cgroup: implement generic child / descendant walk macros") has introduced generic cgroup tree walkers which provide either pre-order or post-order tree walk. This patch converts css->id based iteration to pre-order tree walk to keep the semantic with the original iterator where parent is always visited before its subtree. cgroup_for_each_descendant_pre suggests using post_create and pre_destroy for proper synchronization with groups addidition resp. removal. This implementation doesn't use those because a new memory cgroup is initialized sufficiently for iteration in mem_cgroup_css_alloc already and css reference counting enforces that the group is alive for both the last seen cgroup and the found one resp. it signals that the group is dead and it should be skipped. If the reclaim cookie is used we need to store the last visited group into the iterator so we have to be careful that it doesn't disappear in the mean time. Elevated reference count on the css keeps it alive even though the group have been removed (parked waiting for the last dput so that it can be freed). Per node-zone-prio iter_lock has been introduced to ensure that css_tryget and iter->last_visited is set atomically. Otherwise two racing walkers could both take a references and only one release it leading to a css leak (which pins cgroup dentry). Signed-off-by: Michal Hocko <mhocko@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Li Zefan <lizefan@huawei.com> Cc: Ying Han <yinghan@google.com> Cc: Tejun Heo <htejun@gmail.com> Cc: Glauber Costa <glommer@parallels.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29memcg: keep prev's css alive for the whole mem_cgroup_iterMichal Hocko
The patchset tries to make mem_cgroup_iter saner in the way how it walks hierarchies. css->id based traversal is far from being ideal as it is not deterministic because it depends on the creation ordering. Additional to that css_id is considered a burden for cgroup maintainers because it is quite some code and memcg is the last user of it. After this series only the swap accounting uses css_id but that one will follow up later. Diffstat (if we exclude removed/added comments) looks quite promising. We got rid of some code: $ git diff mmotm... | grep -v "^[+-][[:space:]]*[/ ]\*" | diffstat b/include/linux/cgroup.h | 3 --- kernel/cgroup.c | 33 --------------------------------- mm/memcontrol.c | 4 +++- 3 files changed, 3 insertions(+), 37 deletions(-) The first patch is just preparatory and it changes when we release css of the previously returned memcg. Nothing controlversial. The second patch is the core of the patchset and it replaces css_get_next based on css_id by the generic cgroup pre-order. This brings some chalanges for the last visited group caching during the reclaim (mem_cgroup_per_zone::reclaim_iter). We have to use memcg pointers directly now which means that we have to keep a reference to those groups' css to keep them alive. I also folded iter_lock introduced by https://lkml.org/lkml/2013/1/3/295 in the previous version into this patch. Johannes felt the race I was describing should be mostly harmless and I haven't been able to trigger it so the lock doesn't deserve its own patch. It is still needed temporarily, though, because the reference counting on iter->last_visited depends on it. It will go away with the next patch. The next patch fixups an unbounded cgroup removal holdoff caused by the elevated css refcount. The issue has been observed by Ying Han. Johannes wasn't impressed by the previous version of the fix (https://lkml.org/lkml/2013/2/8/379) which cleaned up pending references during mem_cgroup_css_offline when a group is removed. He has suggested a different way when the iterator checks whether a cached memcg is still valid or no. More on that in the patch but the basic idea is that every memcg tracks the number removed subgroups and iterator records this number when a group is cached. These numbers are checked before iter->last_visited is about to be used and the iteration is restarted if it is invalid. The fourth and fifth patches are an attempt for simplification of the mem_cgroup_iter. css juggling is removed and the iteration logic is moved to a helper so that the reference counting and iteration are separated. The last patch just removes css_get_next as there is no user for it any longer. My testing looked as follows: A (use_hierarchy=1, limit_in_bytes=150M) /|\ 1 2 3 Children groups were created so that the number is never higher than 3 and their limits were random between 50-100M. Each group hosts a kernel build (starting with tar -xf so the tree is not shared and make -jNUM_CPUs/3) and terminated after random time - up to 5 minutes) and then it is removed. This should exercise both leaf and hierarchical reclaim as well as races with cgroup removals and debugging messages I added on top proved that. 100 groups were created during the test. This patch: css reference counting keeps the cgroup alive even though it has been already removed. mem_cgroup_iter relies on this fact and takes a reference to the returned group. The reference is then released on the next iteration or mem_cgroup_iter_break. mem_cgroup_iter currently releases the reference right after it gets the last css_id. This is correct because neither prev's memcg nor cgroup are accessed after then. This will change in the next patch so we need to hold the group alive a bit longer so let's move the css_put at the end of the function. Signed-off-by: Michal Hocko <mhocko@suse.cz> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Li Zefan <lizefan@huawei.com> Cc: Ying Han <yinghan@google.com> Cc: Tejun Heo <htejun@gmail.com> Cc: Glauber Costa <glommer@parallels.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29mm/x86: use free_highmem_page() to free highmem pages into buddy systemJiang Liu
Use helper function free_highmem_page() to free highmem pages into the buddy system. Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Cong Wang <amwang@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Attilio Rao <attilio.rao@citrix.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Pekka Enberg <penberg@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29mm/um: use free_highmem_page() to free highmem pages into buddy systemJiang Liu
Use helper function free_highmem_page() to free highmem pages into the buddy system. Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Cc: Jeff Dike <jdike@addtoit.com> Cc: Richard Weinberger <richard@nod.at> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29mm/SPARC: use free_highmem_page() to free highmem pages into buddy systemJiang Liu
Use helper function free_highmem_page() to free highmem pages into the buddy system. Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Cc: "David S. Miller" <davem@davemloft.net> Acked-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29mm/PPC: use free_highmem_page() to free highmem pages into buddy systemJiang Liu
Use helper function free_highmem_page() to free highmem pages into the buddy system. Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Jiang Liu <jiang.liu@huawei.com> Cc: Alexander Graf <agraf@suse.de> Cc: "Suzuki K. Poulose" <suzuki@in.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29mm/MIPS: use free_highmem_page() to free highmem pages into buddy systemJiang Liu
Use helper function free_highmem_page() to free highmem pages into the buddy system. Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: David Daney <david.daney@cavium.com> Cc: Cong Wang <amwang@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29mm/microblaze: use free_highmem_page() to free highmem pages into buddy systemJiang Liu
Use helper function free_highmem_page() to free highmem pages into the buddy system. Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Cc: Michal Simek <monstr@monstr.eu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29mm/metag: use free_highmem_page() to free highmem pages into buddy systemJiang Liu
Use helper function free_highmem_page() to free highmem pages into the buddy system. Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Cc: James Hogan <james.hogan@imgtec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29mm/FRV: use free_highmem_page() to free highmem pages into buddy systemJiang Liu
Use helper function free_highmem_page() to free highmem pages into the buddy system. Also fix a bug that totalhigh_pages should be increased when freeing a highmem page into the buddy system. Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Cc: David Howells <dhowells@redhat.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29mm/ARM: use free_highmem_page() to free highmem pages into buddy systemJiang Liu
Use helper function free_highmem_page() to free highmem pages into the buddy system. Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Linus Walleij <linus.walleij@linaro.org> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Stephen Boyd <sboyd@codeaurora.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29mm: introduce free_highmem_page() helper to free highmem pages into buddy systemJiang Liu
The original goal of this patchset is to fix the bug reported by https://bugzilla.kernel.org/show_bug.cgi?id=53501 Now it has also been expanded to reduce common code used by memory initializion. This is the second part, which applies to the previous part at: http://marc.info/?l=linux-mm&m=136289696323825&w=2 It introduces a helper function free_highmem_page() to free highmem pages into the buddy system when initializing mm subsystem. Introduction of free_highmem_page() is one step forward to clean up accesses and modificaitons of totalhigh_pages, totalram_pages and zone->managed_pages etc. I hope we could remove all references to totalhigh_pages from the arch/ subdirectory. We have only tested these patchset on x86 platforms, and have done basic compliation tests using cross-compilers from ftp.kernel.org. That means some code may not pass compilation on some architectures. So any help to test this patchset are welcomed! There are several other parts still under development: Part3: refine code to manage totalram_pages, totalhigh_pages and zone->managed_pages Part4: introduce helper functions to simplify mem_init() and remove the global variable num_physpages. This patch: Introduce helper function free_highmem_page(), which will be used by architectures with HIGHMEM enabled to free highmem pages into the buddy system. Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: "Suzuki K. Poulose" <suzuki@in.ibm.com> Cc: Alexander Graf <agraf@suse.de> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Attilio Rao <attilio.rao@citrix.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Cong Wang <amwang@redhat.com> Cc: David Daney <david.daney@cavium.com> Cc: David Howells <dhowells@redhat.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Hogan <james.hogan@imgtec.com> Cc: Jeff Dike <jdike@addtoit.com> Cc: Jiang Liu <jiang.liu@huawei.com> Cc: Jiang Liu <liuj97@gmail.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Konstantin Khlebnikov <khlebnikov@openvz.org> Cc: Linus Walleij <linus.walleij@linaro.org> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Michal Nazarewicz <mina86@mina86.com> Cc: Michal Simek <monstr@monstr.eu> Cc: Michel Lespinasse <walken@google.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Richard Weinberger <richard@nod.at> Cc: Rik van Riel <riel@redhat.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: Stephen Boyd <sboyd@codeaurora.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yinghai Lu <yinghai@kernel.org> Reviewed-by: Pekka Enberg <penberg@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29mm,kexec: use common help functions to free reserved pagesJiang Liu
Use common help functions to free reserved pages. Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Cc: Eric Biederman <ebiederm@xmission.com> Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29mm/metag: use common help functions to free reserved pagesJiang Liu
Use common help functions to free reserved pages. Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Cc: James Hogan <james.hogan@imgtec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29mm/arc: use common help functions to free reserved pagesJiang Liu
Use common help functions to free reserved pages. Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Acked-by: Vineet Gupta <vgupta@synopsys.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29mm/xtensa: use common help functions to free reserved pagesJiang Liu
Use common help functions to free reserved pages. Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Cc: Chris Zankel <chris@zankel.net> Cc: Max Filippov <jcmvbkbc@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29mm/x86: use common help functions to free reserved pagesJiang Liu
Use common help functions to free reserved pages. Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29mm/unicore32: use common help functions to free reserved pagesJiang Liu
Use common help functions to free reserved pages. Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29mm/um: use common help functions to free reserved pagesJiang Liu
Use common help functions to free reserved pages. Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Cc: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29mm/SH: use common help functions to free reserved pagesJiang Liu
Use common help functions to free reserved pages. Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Acked-by: Paul Mundt <lethal@linux-sh.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29mm/score: use common help functions to free reserved pagesJiang Liu
Use common help functions to free reserved pages. Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Cc: Chen Liqin <liqin.chen@sunplusct.com> Cc: Lennox Wu <lennox.wu@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29mm/s390: use common help functions to free reserved pagesJiang Liu
Use common help functions to free reserved pages. Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29mm/ppc: use common help functions to free reserved pagesJiang Liu
Use common help functions to free reserved pages. Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Anatolij Gustschin <agust@denx.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29mm/parisc: use common help functions to free reserved pagesJiang Liu
Use common help functions to free reserved pages. Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Cc: "James E.J. Bottomley" <jejb@parisc-linux.org> Cc: Helge Deller <deller@gmx.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29mm/openrisc: use common help functions to free reserved pagesJiang Liu
Use common help functions to free reserved pages. Also include <asm/sections.h> to avoid local declarations. Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Cc: Jonas Bonn <jonas@southpole.se> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29mm/mn10300: use common help functions to free reserved pagesJiang Liu
Use common help functions to free reserved pages. Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29mm/MIPS: use common help functions to free reserved pagesJiang Liu
Use common help functions to free reserved pages. Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Cc: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29mm/microblaze: use common help functions to free reserved pagesJiang Liu
Use common help functions to free reserved pages. Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Cc: Michal Simek <monstr@monstr.eu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29mm/m68k: use common help functions to free reserved pagesJiang Liu
Use common help functions to free reserved pages. Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29mm/m32r: use common help functions to free reserved pagesJiang Liu
Use common help functions to free reserved pages. Also include <asm/sections.h> to avoid local declarations. Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Cc: Hirokazu Takata <takata@linux-m32r.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29mm/IA64: use common help functions to free reserved pagesJiang Liu
Use common help functions to free reserved pages. Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29mm/h8300: use common help functions to free reserved pagesJiang Liu
Use common help functions to free reserved pages. Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29mm/FRV: use common help functions to free reserved pagesJiang Liu
Use common help functions to free reserved pages. Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Cc: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29mm/cris: use common help functions to free reserved pagesJiang Liu
Use common help functions to free reserved pages. Also include <asm/sections.h> to avoid local declaration. Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Cc: Mikael Starvik <starvik@axis.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29mm/c6x: use common help functions to free reserved pagesJiang Liu
Use common help functions to free reserved pages. Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Cc: Mark Salter <msalter@redhat.com> Cc: Aurelien Jacquiot <a-jacquiot@ti.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29mm/blackfin: use common help functions to free reserved pagesJiang Liu
Use common help functions to free reserved pages. Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Cc: Mike Frysinger <vapier@gentoo.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29mm/avr32: use common help functions to free reserved pagesJiang Liu
Use common help functions to free reserved pages. Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Acked-by: Hans-Christian Egtvedt <egtvedt@samfundet.no> Cc: Haavard Skinnemoen <hskinnemoen@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29mm/ARM: use common help functions to free reserved pagesJiang Liu
Use common help functions to free reserved pages. Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29mm/alpha: use common help functions to free reserved pagesJiang Liu
Use common help functions to free reserved pages. Also include <asm/sections.h> to avoid local declarations. Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Matt Turner <mattst88@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29mm: introduce common help functions to deal with reserved/managed pagesJiang Liu
The original goal of this patchset is to fix the bug reported by https://bugzilla.kernel.org/show_bug.cgi?id=53501 Now it has also been expanded to reduce common code used by memory initializion. This is the first part, which applies to v3.9-rc1. It introduces following common helper functions to simplify free_initmem() and free_initrd_mem() on different architectures: adjust_managed_page_count(): will be used to adjust totalram_pages, totalhigh_pages, zone->managed_pages when reserving/unresering a page. __free_reserved_page(): free a reserved page into the buddy system without adjusting page statistics info free_reserved_page(): free a reserved page into the buddy system and adjust page statistics info mark_page_reserved(): mark a page as reserved and adjust page statistics info free_reserved_area(): free a continous ranges of pages by calling free_reserved_page() free_initmem_default(): default method to free __init pages. We have only tested these patchset on x86 platforms, and have done basic compliation tests using cross-compilers from ftp.kernel.org. That means some code may not pass compilation on some architectures. So any help to test this patchset are welcomed! There are several other parts still under development: Part2: introduce free_highmem_page() to simplify freeing highmem pages Part3: refine code to manage totalram_pages, totalhigh_pages and zone->managed_pages Part4: introduce helper functions to simplify mem_init() and remove the global variable num_physpages. This patch: Code to deal with reserved/managed pages are duplicated by many architectures, so introduce common help functions to reduce duplicated code. These common help functions will also be used to concentrate code to modify totalram_pages and zone->managed_pages, which makes the code much more clear. Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: "James E.J. Bottomley" <jejb@parisc-linux.org> Cc: Anatolij Gustschin <agust@denx.de> Cc: Aurelien Jacquiot <a-jacquiot@ti.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chen Liqin <liqin.chen@sunplusct.com> Cc: Chris Zankel <chris@zankel.net> Cc: David Howells <dhowells@redhat.com> Cc: David S. Miller <davem@davemloft.net> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> Cc: Haavard Skinnemoen <hskinnemoen@gmail.com> Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: Hirokazu Takata <takata@linux-m32r.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: James Hogan <james.hogan@imgtec.com> Cc: Jeff Dike <jdike@addtoit.com> Cc: Jiang Liu <jiang.liu@huawei.com> Cc: Jiang Liu <liuj97@gmail.com> Cc: Jonas Bonn <jonas@southpole.se> Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com> Cc: Lennox Wu <lennox.wu@gmail.com> Cc: Mark Salter <msalter@redhat.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michal Simek <monstr@monstr.eu> Cc: Mikael Starvik <starvik@axis.com> Cc: Mike Frysinger <vapier@gentoo.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Richard Henderson <rth@twiddle.net> Cc: Russell King <linux@arm.linux.org.uk> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29mm/vmscan.c: minor cleanup for kswapdHillf Danton
Local variable total_scanned is no longer used. Signed-off-by: Hillf Danton <dhillf@gmail.com> Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29direct-io: submit bio after boundary buffer is added to itJan Kara
Currently, dio_send_cur_page() submits bio before current page and cached sdio->cur_page is added to the bio if sdio->boundary is set. This is actually wrong because sdio->boundary means the current buffer is the last one before metadata needs to be read. So we should rather submit the bio after the current page is added to it. Signed-off-by: Jan Kara <jack@suse.cz> Reported-by: Kazuya Mio <k-mio@sx.jp.nec.com> Tested-by: Kazuya Mio <k-mio@sx.jp.nec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29direct-io: fix boundary block handlingJan Kara
When we read/write a file sequentially, we will read/write not only the data blocks but also the indirect blocks that may not be physically adjacent to the data blocks. So filesystems set the BH_Boundary flag to submit the previous I/O before reading/writing an indirect block. However the generic direct IO code mishandles buffer_boundary(), setting sdio->boundary before each submit_page_section() call which results in sending only one page bios as underlying code thinks this page is the last in the contiguous extent. So fix the problem by setting sdio->boundary only if the current page is really the last one in the mapped extent. With this patch and "direct-io: submit bio after boundary buffer is added to it" I've measured about 10% throughput improvement of direct IO reads on ext3 with SATA harddrive (from 90 MB/s to 100 MB/s). With ramdisk, the improvement was about 3-fold (from 350 MB/s to 1.2 GB/s). For other filesystems (such as ext4), the improvements won't be as visible because the frequency of BH_Boundary flag being set is much smaller. Signed-off-by: Jan Kara <jack@suse.cz> Reported-by: Kazuya Mio <k-mio@sx.jp.nec.com> Tested-by: Kazuya Mio <k-mio@sx.jp.nec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29mm: walk_memory_range(): fix typo in commentToshi Kani
Fix a typo "end_pft" in the comment of walk_memory_range(). Signed-off-by: Toshi Kani <toshi.kani@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29memblock: add assertion for zero allocation alignmentVineet Gupta
This came to light when calling memblock allocator from arc port (for copying flattended DT). If a "0" alignment is passed, the allocator round_up() call incorrectly rounds up the size to 0. round_up(num, alignto) => ((num - 1) | (alignto -1)) + 1 While the obvious allocation failure causes kernel to panic, it is better to warn the caller to fix the code. Tejun suggested that instead of BUG_ON(!align) - which might be ineffective due to pending console init and such, it is better to WARN_ON, and continue the boot with a reasonable default align. Caller passing @size need not be handled similarly as the subsequent panic will indicate that anyhow. Signed-off-by: Vineet Gupta <vgupta@synopsys.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29rmap: recompute pgoff for unmapping huge pageHillf Danton
We have to recompute pgoff if the given page is huge, since result based on HPAGE_SIZE is not approapriate for scanning the vma interval tree, as shown by commit 36e4f20af833 ("hugetlb: do not use vma_hugecache_offset() for vma_prio_tree_foreach"). Signed-off-by: Hillf Danton <dhillf@gmail.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Michel Lespinasse <walken@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29vm: adjust ifdef for TINY_RCUPaul E. McKenney
There is an ifdef in page_cache_get_speculative() that checks for !SMP and TREE_RCU, which has been an impossible combination since the advent of TINY_RCU. The ifdef enables a fastpath that is valid when preemption is disabled by rcu_read_lock() in UP systems, which is the case when TINY_RCU is enabled. This commit therefore adjusts the ifdef to generate the fastpath when TINY_RCU is enabled. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reported-by: Andi Kleen <andi@firstfloor.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29mm/shmem.c: remove an ifdefAndrew Morton
Create a CONFIG_MMU=y stub for ramfs_nommu_expand_for_mapping() in the usual fashion. Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Wolfram Sang <wsa@the-dreams.de> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>