summaryrefslogtreecommitdiffstats
path: root/include
AgeCommit message (Collapse)Author
2011-05-25readahead: readahead page allocations are OK to failWu Fengguang
Pass __GFP_NORETRY|__GFP_NOWARN for readahead page allocations. readahead page allocations are completely optional. They are OK to fail and in particular shall not trigger OOM on themselves. Reported-by: Dave Young <hidave.darkstar@gmail.com> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Reviewed-by: Pekka Enberg <penberg@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-25mm: break out page allocation warning codeDave Hansen
This originally started as a simple patch to give vmalloc() some more verbose output on failure on top of the plain page allocator messages. Johannes suggested that it might be nicer to lead with the vmalloc() info _before_ the page allocator messages. But, I do think there's a lot of value in what __alloc_pages_slowpath() does with its filtering and so forth. This patch creates a new function which other allocators can call instead of relying on the internal page allocator warnings. It also gives this function private rate-limiting which separates it from other printk_ratelimit() users. Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: David Rientjes <rientjes@google.com> Cc: Michal Nazarewicz <mina86@mina86.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-25mm: convert mm->cpu_vm_cpumask into cpumask_var_tKOSAKI Motohiro
cpumask_t is very big struct and cpu_vm_mask is placed wrong position. It might lead to reduce cache hit ratio. This patch has two change. 1) Move the place of cpumask into last of mm_struct. Because usually cpumask is accessed only front bits when the system has cpu-hotplug capability 2) Convert cpu_vm_mask into cpumask_var_t. It may help to reduce memory footprint if cpumask_size() will use nr_cpumask_bits properly in future. In addition, this patch change the name of cpu_vm_mask with cpu_vm_mask_var. It may help to detect out of tree cpu_vm_mask users. This patch has no functional change. [akpm@linux-foundation.org: build fix] [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: David Howells <dhowells@redhat.com> Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com> Cc: Hugh Dickins <hughd@google.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-25mm: uninline large generic tlb.h functionsPeter Zijlstra
Some of these functions have grown beyond inline sanity, move them out-of-line. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Requested-by: Andrew Morton <akpm@linux-foundation.org> Requested-by: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-25mm: convert anon_vma->lock to a mutexPeter Zijlstra
Straightforward conversion of anon_vma->lock to a mutex. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Hugh Dickins <hughd@google.com> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: David Miller <davem@davemloft.net> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Jeff Dike <jdike@addtoit.com> Cc: Richard Weinberger <richard@nod.at> Cc: Tony Luck <tony.luck@intel.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Namhyung Kim <namhyung@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-25mm: revert page_lock_anon_vma() lock annotationPeter Zijlstra
Its beyond ugly and gets in the way. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Hugh Dickins <hughd@google.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: David Miller <davem@davemloft.net> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Jeff Dike <jdike@addtoit.com> Cc: Richard Weinberger <richard@nod.at> Cc: Tony Luck <tony.luck@intel.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Namhyung Kim <namhyung@gmail.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Namhyung Kim <namhyung@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-25mm: Convert i_mmap_lock to a mutexPeter Zijlstra
Straightforward conversion of i_mmap_lock to a mutex. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Hugh Dickins <hughd@google.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: David Miller <davem@davemloft.net> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Jeff Dike <jdike@addtoit.com> Cc: Richard Weinberger <richard@nod.at> Cc: Tony Luck <tony.luck@intel.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Namhyung Kim <namhyung@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-25mm: Remove i_mmap_lock lockbreakPeter Zijlstra
Hugh says: "The only significant loser, I think, would be page reclaim (when concurrent with truncation): could spin for a long time waiting for the i_mmap_mutex it expects would soon be dropped? " Counter points: - cpu contention makes the spin stop (need_resched()) - zap pages should be freeing pages at a higher rate than reclaim ever can I think the simplification of the truncate code is definitely worth it. Effectively reverts: 2aa15890f3c ("mm: prevent concurrent unmap_mapping_range() on the same inode") and takes out the code that caused its problem. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Hugh Dickins <hughd@google.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: David Miller <davem@davemloft.net> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Jeff Dike <jdike@addtoit.com> Cc: Richard Weinberger <richard@nod.at> Cc: Tony Luck <tony.luck@intel.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Namhyung Kim <namhyung@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-25lockdep, mutex: provide mutex_lock_nest_lockPeter Zijlstra
In order to convert i_mmap_lock to a mutex we need a mutex equivalent to spin_lock_nest_lock(), thus provide the mutex_lock_nest_lock() annotation. As with spin_lock_nest_lock(), mutex_lock_nest_lock() allows annotation of the locking pattern where an outer lock serializes the acquisition order of nested locks. That is, if every time you lock multiple locks A, say A1 and A2 you first acquire N, the order of acquiring A1 and A2 is irrelevant. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: David Miller <davem@davemloft.net> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Jeff Dike <jdike@addtoit.com> Cc: Richard Weinberger <richard@nod.at> Cc: Tony Luck <tony.luck@intel.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Hugh Dickins <hughd@google.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Namhyung Kim <namhyung@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-25mm: extended batches for generic mmu_gatherPeter Zijlstra
Instead of using a single batch (the small on-stack, or an allocated page), try and extend the batch every time it runs out and only flush once either the extend fails or we're done. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Requested-by: Nick Piggin <npiggin@kernel.dk> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Hugh Dickins <hughd@google.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: David Miller <davem@davemloft.net> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Jeff Dike <jdike@addtoit.com> Cc: Richard Weinberger <richard@nod.at> Cc: Tony Luck <tony.luck@intel.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Namhyung Kim <namhyung@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-25mm, powerpc: move the RCU page-table freeing into generic codePeter Zijlstra
In case other architectures require RCU freed page-tables to implement gup_fast() and software filled hashes and similar things, provide the means to do so by moving the logic into generic code. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Requested-by: David Miller <davem@davemloft.net> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Jeff Dike <jdike@addtoit.com> Cc: Richard Weinberger <richard@nod.at> Cc: Tony Luck <tony.luck@intel.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Hugh Dickins <hughd@google.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Namhyung Kim <namhyung@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-25mm: mmu_gather reworkPeter Zijlstra
Rework the existing mmu_gather infrastructure. The direct purpose of these patches was to allow preemptible mmu_gather, but even without that I think these patches provide an improvement to the status quo. The first 9 patches rework the mmu_gather infrastructure. For review purpose I've split them into generic and per-arch patches with the last of those a generic cleanup. The next patch provides generic RCU page-table freeing, and the followup is a patch converting s390 to use this. I've also got 4 patches from DaveM lined up (not included in this series) that uses this to implement gup_fast() for sparc64. Then there is one patch that extends the generic mmu_gather batching. After that follow the mm preemptibility patches, these make part of the mm a lot more preemptible. It converts i_mmap_lock and anon_vma->lock to mutexes which together with the mmu_gather rework makes mmu_gather preemptible as well. Making i_mmap_lock a mutex also enables a clean-up of the truncate code. This also allows for preemptible mmu_notifiers, something that XPMEM I think wants. Furthermore, it removes the new and universially detested unmap_mutex. This patch: Remove the first obstacle towards a fully preemptible mmu_gather. The current scheme assumes mmu_gather is always done with preemption disabled and uses per-cpu storage for the page batches. Change this to try and allocate a page for batching and in case of failure, use a small on-stack array to make some progress. Preemptible mmu_gather is desired in general and usable once i_mmap_lock becomes a mutex. Doing it before the mutex conversion saves us from having to rework the code by moving the mmu_gather bits inside the pte_lock. Also avoid flushing the tlb batches from under the pte lock, this is useful even without the i_mmap_lock conversion as it significantly reduces pte lock hold times. [akpm@linux-foundation.org: fix comment tpyo] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: David Miller <davem@davemloft.net> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Jeff Dike <jdike@addtoit.com> Cc: Richard Weinberger <richard@nod.at> Cc: Tony Luck <tony.luck@intel.com> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Hugh Dickins <hughd@google.com> Acked-by: Mel Gorman <mel@csn.ul.ie> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Namhyung Kim <namhyung@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-25mm: make expand_downwards() symmetrical with expand_upwards()Michal Hocko
Currently we have expand_upwards exported while expand_downwards is accessible only via expand_stack or expand_stack_downwards. check_stack_guard_page is a nice example of the asymmetry. It uses expand_stack for VM_GROWSDOWN while expand_upwards is called for VM_GROWSUP case. Let's clean this up by exporting both functions and make those names consistent. Let's use expand_{upwards,downwards} because expanding doesn't always involve stack manipulation (an example is ia64_do_page_fault which uses expand_upwards for registers backing store expansion). expand_downwards has to be defined for both CONFIG_STACK_GROWS{UP,DOWN} because get_arg_page calls the downwards version in the early process initialization phase for growsup configuration. Signed-off-by: Michal Hocko <mhocko@suse.cz> Acked-by: Hugh Dickins <hughd@google.com> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: "Luck, Tony" <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-25include/linux/gfp.h: convert BUG_ON() into VM_BUG_ON()Dave Hansen
VM_BUG_ON() if effectively a BUG_ON() undef #ifdef CONFIG_DEBUG_VM. That is exactly what we have here now, and two different folks have suggested doing it this way. Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com> Cc: Christoph Lameter <cl@linux.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-25include/linux/gfp.h: work around apparent sparse confusionDave Hansen
Running sparse on page_alloc.c today, it errors out: include/linux/gfp.h:254:17: error: bad constant expression include/linux/gfp.h:254:17: error: cannot size expression which is a line in gfp_zone(): BUILD_BUG_ON((GFP_ZONE_BAD >> bit) & 1); That's really unfortunate, because it ends up hiding all of the other legitimate sparse messages like this: mm/page_alloc.c:5315:59: warning: incorrect type in argument 1 (different base types) mm/page_alloc.c:5315:59: expected unsigned long [unsigned] [usertype] size mm/page_alloc.c:5315:59: got restricted gfp_t [usertype] <noident> ... Having sparse be able to catch these very oopsable bugs is a lot more important than keeping a BUILD_BUG_ON(). Kill the BUILD_BUG_ON(). Compiles on x86_64 with and without CONFIG_DEBUG_VM=y. defconfig boots fine for me. Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-25oom: replace PF_OOM_ORIGIN with toggling oom_score_adjDavid Rientjes
There's a kernel-wide shortage of per-process flags, so it's always helpful to trim one when possible without incurring a significant penalty. It's even more important when you're planning on adding a per- process flag yourself, which I plan to do shortly for transparent hugepages. PF_OOM_ORIGIN is used by ksm and swapoff to prefer current since it has a tendency to allocate large amounts of memory and should be preferred for killing over other tasks. We'd rather immediately kill the task making the errant syscall rather than penalizing an innocent task. This patch removes PF_OOM_ORIGIN since its behavior is equivalent to setting the process's oom_score_adj to OOM_SCORE_ADJ_MAX. The process's old oom_score_adj is stored and then set to OOM_SCORE_ADJ_MAX during the time it used to have PF_OOM_ORIGIN. The old value is then reinstated when the process should no longer be considered a high priority for oom killing. Signed-off-by: David Rientjes <rientjes@google.com> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Cc: Hugh Dickins <hughd@google.com> Cc: Izik Eidus <ieidus@redhat.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-25mm, mem-hotplug: update pcp->stat_threshold when memory hotplug occurKOSAKI Motohiro
Currently, cpu hotplug updates pcp->stat_threshold, but memory hotplug doesn't. There is no reason for this. [akpm@linux-foundation.org: fix CONFIG_SMP=n build] Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Mel Gorman <mel@csn.ul.ie> Acked-by: Christoph Lameter <cl@linux.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-25mm, mem-hotplug: recalculate lowmem_reserve when memory hotplug occursKOSAKI Motohiro
Currently, memory hotplug calls setup_per_zone_wmarks() and calculate_zone_inactive_ratio(), but doesn't call setup_per_zone_lowmem_reserve(). It means the number of reserved pages aren't updated even if memory hot plug occur. This patch fixes it. Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Mel Gorman <mel@csn.ul.ie> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-25x86,mm: make pagefault killableKOSAKI Motohiro
When an oom killing occurs, almost all processes are getting stuck at the following two points. 1) __alloc_pages_nodemask 2) __lock_page_or_retry 1) is not very problematic because TIF_MEMDIE leads to an allocation failure and getting out from page allocator. 2) is more problematic. In an OOM situation, zones typically don't have page cache at all and memory starvation might lead to greatly reduced IO performance. When a fork bomb occurs, TIF_MEMDIE tasks don't die quickly, meaning that a fork bomb may create new process quickly rather than the oom-killer killing it. Then, the system may become livelocked. This patch makes the pagefault interruptible by SIGKILL. Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: Matthew Wilcox <willy@linux.intel.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-25mm: introduce wait_on_page_locked_killable()KOSAKI Motohiro
commit 2687a356 ("Add lock_page_killable") introduced killable lock_page(). Similarly this patch introdues killable wait_on_page_locked(). Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Cc: Matthew Wilcox <willy@linux.intel.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-25mm: per-node vmstat: show proper vmstatsKOSAKI Motohiro
commit 2ac390370a ("writeback: add /sys/devices/system/node/<node>/vmstat") added vmstat entry. But strangely it only show nr_written and nr_dirtied. # cat /sys/devices/system/node/node20/vmstat nr_written 0 nr_dirtied 0 Of course, It's not adequate. With this patch, the vmstat show all vm stastics as /proc/vmstat. # cat /sys/devices/system/node/node0/vmstat nr_free_pages 899224 nr_inactive_anon 201 nr_active_anon 17380 nr_inactive_file 31572 nr_active_file 28277 nr_unevictable 0 nr_mlock 0 nr_anon_pages 17321 nr_mapped 8640 nr_file_pages 60107 nr_dirty 33 nr_writeback 0 nr_slab_reclaimable 6850 nr_slab_unreclaimable 7604 nr_page_table_pages 3105 nr_kernel_stack 175 nr_unstable 0 nr_bounce 0 nr_vmscan_write 0 nr_writeback_temp 0 nr_isolated_anon 0 nr_isolated_file 0 nr_shmem 260 nr_dirtied 1050 nr_written 938 numa_hit 962872 numa_miss 0 numa_foreign 0 numa_interleave 8617 numa_local 962872 numa_other 0 nr_anon_transparent_hugepages 0 [akpm@linux-foundation.org: no externs in .c files] Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Michael Rubin <mrubin@google.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Acked-by: David Rientjes <rientjes@google.com> Cc: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-25arch, mm: filter disallowed nodes from arch specific show_mem functionsDavid Rientjes
Architectures that implement their own show_mem() function did not pass the filter argument to show_free_areas() to appropriately avoid emitting the state of nodes that are disallowed in the current context. This patch now passes the filter argument to show_free_areas() so those nodes are now avoided. This patch also removes the show_free_areas() wrapper around __show_free_areas() and converts existing callers to pass an empty filter. ia64 emits additional information for each node, so skip_free_areas_zone() must be made global to filter disallowed nodes and it is converted to use a nid argument rather than a zone for this use case. Signed-off-by: David Rientjes <rientjes@google.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Kyle McMartin <kyle@mcmartin.ca> Cc: Helge Deller <deller@gmx.de> Cc: James Bottomley <jejb@parisc-linux.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-259p: Small cleanup in <net/9p/9p.h>Sasha Levin
There are two small cleanups in this patch: - p9_errstr2errno was declared twice - remove one declaration. - A uint8_t type was mixed in, change it to u8 to match with the rest of the type names and remove dependency. Cc: Eric Van Hensbergen <ericvh@gmail.com> Cc: Ron Minnich <rminnich@sandia.gov> Cc: Latchesar Ionkov <lucho@ionkov.net> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Sasha Levin <levinsasha928@gmail.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2011-05-259p: typo fixes and minor cleanupsRob Landley
Typo fixes and minor cleanups for v9fs Signed-off-by: Rob Landley <rob@landley.net> Reviewed-by: Venkateswararao Jujjuri (JV) <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2011-05-25Merge branch 'next' into for-linusVinod Koul
2011-05-25dmaengine/dw_dmac: Update maintainer-shipViresh Kumar
Nobody is currently maintaining dw_dmac. We are using dw_dmac for SPEAr13xx and are currently maintaining it. After discussing with Vinod, sending this patch to update maintainer-ship of dw_dmac. Signed-off-by: Viresh Kumar <viresh.kumar@st.com> Acked-by: Havard Skinnemoen <hskinnemoen@gmail.com> Signed-off-by: Vinod Koul <vinod.koul@intel.com>
2011-05-25[media] Add support for M-5MOLS 8 Mega Pixel camera ISPHeungJun, Kim
Add I2C/V4L2 subdev driver for M-5MOLS integrated image signal processor with 8 Mega Pixel sensor. Signed-off-by: HeungJun, Kim <riverful.kim@samsung.com> Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2011-05-24mmc: quirks: Add/remove quirks conditional support.Andrei Warkentin
Conditional add/remove quirks for MMC and SD. Signed-off-by: Andrei Warkentin <andreiw@motorola.com> Signed-off-by: Chris Ball <cjb@laptop.org>
2011-05-24mmc: core: add support for eMMC Dual Data RatePhilip Rakity
eMMC voltage change not required for 1.8V. 3.3V and 1.8V vcc are capable of doing DDR. vccq of 1.8v is not required. Signed-off-by: Philip Rakity <prakity@marvell.com> Reviewed-by: Arindam Nath <arindam.nath@amd.com> Signed-off-by: Chris Ball <cjb@laptop.org>
2011-05-24mmc: sdhi: allow powering down controller with no card insertedGuennadi Liakhovetski
Supply a link to TMIO private data for platforms to implement their own card detection. Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de> Signed-off-by: Chris Ball <cjb@laptop.org>
2011-05-24mmc: tmio: runtime suspend the controller, where possibleGuennadi Liakhovetski
The TMIO MMC controller cannot be powered off to save power, when no card is plugged in, because then it will not be able to detect a new card-insertion event. On some implementations, however, it is possible to switch to using another source to detect card insertion. This patch adds support for such implementations. Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de> Signed-off-by: Chris Ball <cjb@laptop.org>
2011-05-24mmc: sdio: optimized SDIO IRQ handling for single irqStefan Nilsson XK
If there is only 1 function interrupt registered it is possible to improve performance by directly calling the irq handler and avoiding the overhead of reading the CCCR registers. Signed-off-by: Per Forlin <per.forlin@linaro.org> Acked-by: Ulf Hansson <ulf.hansson@stericsson.com> Reviewed-by: Nicolas Pitre <nicolas.pitre@linaro.org> Signed-off-by: Chris Ball <cjb@laptop.org>
2011-05-24mmc: sdhci: add support for retuning mode 1Arindam Nath
Host Controller v3.00 can support retuning modes 1,2 or 3 depending on the bits 46-47 of the Capabilities register. Also, the timer count for retuning is indicated by bits 40-43 of the same register. We initialize timer_list for retuning the first time we execute tuning procedure. This condition is indicated by SDHCI_NEEDS_RETUNING not being set. Since retuning mode 1 sets a limit of 4MB on the maximum data length, we set max_blk_count appropriately. Once the tuning timer expires, we set SDHCI_NEEDS_RETUNING flag, and if the flag is set, we execute tuning procedure before sending the next command. We need to restore mmc_request structure after executing retuning procedure since host->mrq is used inside the procedure to send CMD19. We also disable and re-enable this flag during suspend and resume respectively, as per the spec v3.00. Tested by Zhangfei Gao with a Toshiba uhs card and general hs card, on mmp2 in SDMA mode. Signed-off-by: Arindam Nath <arindam.nath@amd.com> Reviewed-by: Philip Rakity <prakity@marvell.com> Tested-by: Philip Rakity <prakity@marvell.com> Acked-by: Zhangfei Gao <zhangfei.gao@marvell.com> Signed-off-by: Chris Ball <cjb@laptop.org>
2011-05-24mmc: sdhci: add support for programmable clock modeArindam Nath
Host Controller v3.00 supports programmable clock mode as an optional feature. The support for this mode is indicated by non-zero value in bits 48-55 of the Capabilities register. If supported, the actual value of Clock Multiplier is one more than the value provided in the bit fields. We only set Clock Generator Select (bit 5) and SDCLK Frequency Select (bits 8-15) of the Clock Control register in case Preset Value Enable is not set, otherwise these fields are automatically set by the Host Controller based on the UHS mode selected. Also, since the maximum and minimum clock frequency in this mode can be (Base Clock * Clock Mul) and (Base Clock * Clock Mul)/1024 respectively, f_max and f_min have been recalculated to reflect this change. Tested by Zhangfei Gao with a Toshiba uhs card and general hs card, on mmp2 in SDMA mode. Signed-off-by: Arindam Nath <arindam.nath@amd.com> Reviewed-by: Philip Rakity <prakity@marvell.com> Tested-by: Philip Rakity <prakity@marvell.com> Acked-by: Zhangfei Gao <zhangfei.gao@marvell.com> Signed-off-by: Chris Ball <cjb@laptop.org>
2011-05-24mmc: sdhci: enable preset value after uhs initializationArindam Nath
According to the Host Controller spec v3.00, setting Preset Value Enable in the Host Control2 register lets SDCLK Frequency Select, Clock Generator Select and Driver Strength Select to be set automatically by the Host Controller based on the UHS-I mode set. This patch enables this feature. Since Preset Value Enable makes sense only for UHS-I cards, we enable this feature after successfull UHS-I initialization. We also reset Preset Value Enable next time before initialization. Tested by Zhangfei Gao with a Toshiba uhs card and general hs card, on mmp2 in SDMA mode. Signed-off-by: Arindam Nath <arindam.nath@amd.com> Reviewed-by: Philip Rakity <prakity@marvell.com> Tested-by: Philip Rakity <prakity@marvell.com> Acked-by: Zhangfei Gao <zhangfei.gao@marvell.com> Signed-off-by: Chris Ball <cjb@laptop.org>
2011-05-24mmc: sd: add support for tuning during uhs initializationArindam Nath
Host Controller needs tuning during initialization to operate SDR50 and SDR104 UHS-I cards. Whether SDR50 mode actually needs tuning is indicated by bit 45 of the Host Controller Capabilities register. A new command CMD19 has been defined in the Physical Layer spec v3.01 to request the card to send tuning pattern. We enable Buffer Read Ready interrupt at the very begining of tuning procedure, because that is the only interrupt generated by the Host Controller during tuning. We program the block size to 64 in the Block Size register. We make sure that DMA Enable and Multi Block Select in the Transfer Mode register are set to 0 before actually sending CMD19. The tuning block is sent by the card to the Host Controller using DAT lines, so we set Data Present Select (bit 5) in the Command register. The Host Controller is responsible for doing the verfication of tuning block sent by the card at the hardware level. After sending CMD19, we wait for Buffer Read Ready interrupt. In case we don't receive an interrupt after the specified timeout value, we fall back on fixed sampling clock by setting Execute Tuning (bit 6) and Sampling Clock Select (bit 7) of Host Control2 register to 0. Before exiting the tuning procedure, we disable Buffer Read Ready interrupt and re-enable other interrupts. Tested by Zhangfei Gao with a Toshiba uhs card and general hs card, on mmp2 in SDMA mode. Signed-off-by: Arindam Nath <arindam.nath@amd.com> Reviewed-by: Philip Rakity <prakity@marvell.com> Tested-by: Philip Rakity <prakity@marvell.com> Acked-by: Zhangfei Gao <zhangfei.gao@marvell.com> Signed-off-by: Chris Ball <cjb@laptop.org>
2011-05-24mmc: sd: report correct speed and capacity of uhs cardsArindam Nath
Since only UHS-I cards respond with S18A set in response to ACMD41, we set the card as ultra-high-speed after successfull initialization. We need to decide whether a card is SDXC based on the C_SIZE field of CSDv2.0 register. According to Physical Layer spec v3.01, the minimum value of C_SIZE for SDXC card is 00FFFFh. Tested by Zhangfei Gao with a Toshiba uhs card and general hs card, on mmp2 in SDMA mode. Signed-off-by: Arindam Nath <arindam.nath@amd.com> Reviewed-by: Philip Rakity <prakity@marvell.com> Tested-by: Philip Rakity <prakity@marvell.com> Acked-by: Zhangfei Gao <zhangfei.gao@marvell.com> Signed-off-by: Chris Ball <cjb@laptop.org>
2011-05-24mmc: sd: set current limit for uhs cardsArindam Nath
We decide on the current limit to be set for the card based on the Capability of Host Controller to provide current at 1.8V signalling, and the maximum current limit of the card as indicated by CMD6 mode 0. We then set the current limit for the card using CMD6 mode 1. As per the Physical Layer Spec v3.01, the current limit switch is only applicable for SDR50, SDR104, and DDR50 bus speed modes. For other UHS-I modes, we set the default current limit of 200mA. Tested by Zhangfei Gao with a Toshiba uhs card and general hs card, on mmp2 in SDMA mode. Signed-off-by: Arindam Nath <arindam.nath@amd.com> Reviewed-by: Philip Rakity <prakity@marvell.com> Tested-by: Philip Rakity <prakity@marvell.com> Acked-by: Zhangfei Gao <zhangfei.gao@marvell.com> Signed-off-by: Chris Ball <cjb@laptop.org>
2011-05-24mmc: sd: add support for uhs bus speed mode selectionArindam Nath
This patch adds support for setting UHS-I bus speed mode during UHS-I initialization procedure. Since both the host and card can support more than one bus speed, we select the highest speed based on both of their capabilities. First we set the bus speed mode for the card using CMD6 mode 1, and then we program the host controller to support the required speed mode. We also set High Speed Enable in case one of the UHS-I modes is selected. We take care to reset SD clock before setting UHS mode in the Host Control2 register, and then re-enable it as per the Host Controller spec v3.00. We then set the clock frequency for the UHS-I mode selected. Tested by Zhangfei Gao with a Toshiba uhs card and general hs card, on mmp2 in SDMA mode. Signed-off-by: Arindam Nath <arindam.nath@amd.com> Reviewed-by: Philip Rakity <prakity@marvell.com> Tested-by: Philip Rakity <prakity@marvell.com> Acked-by: Zhangfei Gao <zhangfei.gao@marvell.com> Signed-off-by: Chris Ball <cjb@laptop.org>
2011-05-24mmc: sd: add support for driver type selectionArindam Nath
This patch adds support for setting driver strength during UHS-I initialization procedure. Since UHS-I cards set S18A (bit 24) in response to ACMD41, we use this as a base for UHS-I initialization. We modify the parameter list of mmc_sd_get_cid() so that we can save the ROCR from ACMD41 to check whether bit 24 is set. We decide whether the Host Controller supports A, C, or D driver type depending on the Capabilities register. Driver type B is suported by default. We then set the appropriate driver type for the card using CMD6 mode 1. As per Host Controller spec v3.00, we set driver type for the host only if Preset Value Enable in the Host Control2 register is not set. SDHCI_HOST_CONTROL has been renamed to SDHCI_HOST_CONTROL1 to conform to the spec. Tested by Zhangfei Gao with a Toshiba uhs card and general hs card, on mmp2 in SDMA mode. Signed-off-by: Arindam Nath <arindam.nath@amd.com> Reviewed-by: Philip Rakity <prakity@marvell.com> Tested-by: Philip Rakity <prakity@marvell.com> Acked-by: Zhangfei Gao <zhangfei.gao@marvell.com> Signed-off-by: Chris Ball <cjb@laptop.org>
2011-05-25mtd: kill CONFIG_MTD_PARTITIONSJamie Iles
Now that none of the drivers use CONFIG_MTD_PARTITIONS we can remove it from Kconfig and the last remaining uses. Signed-off-by: Jamie Iles <jamie@jamieiles.com> Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2011-05-25mtd: remove add_mtd_partitions, add_mtd_device and friendsJamie Iles
These symbols are replaced with mtd_device_register() (and removal with mtd_device_unregister()) for public registration. Signed-off-by: Jamie Iles <jamie@jamieiles.com> Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2011-05-25mtd: physmap: convert to mtd_device_register()Jamie Iles
Convert to mtd_device_register() and remove the CONFIG_MTD_PARTITIONS preprocessor conditionals as partitioning is always available. Signed-off-by: Jamie Iles <jamie@jamieiles.com> Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2011-05-25mtd: provide of_mtd_parse_partitions for !CONFIG_MTD_OF_PARTSJamie Iles
If we don't have OpenFirmware enabled then provide a stub of_mtd_parse_partitions that returns no partitions so drivers don't need ifdeffery inside. Signed-off-by: Jamie Iles <jamie@jamieiles.com> Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2011-05-25mtd: introduce mtd_device_(un)register()Jamie Iles
To prepare for the removal of add_mtd_device and add_mtd_partitions(), introduce mtd_device_register(). This will create partitions if they are supplied or register the whole device if there are no partitions. Once all drivers are converted to use mtd_device_register(), add_mtd_device() and add_mtd_partitions() will be made internal only. v2: move kerneldoc to implementation file and fixup some kerneldoc warnings. Artem: tweak comments: remove junk tabs, use dots consistently. Signed-off-by: Jamie Iles <jamie@jamieiles.com> Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2011-05-24mmc: sd: query function modes for uhs cardsArindam Nath
SD cards which conform to Physical Layer Spec v3.01 can support additional Bus Speed Modes, Driver Strength, and Current Limit other than the default values. We use CMD6 mode 0 to read these additional card functions. The values read here will be used during UHS-I initialization steps. Tested by Zhangfei Gao with a Toshiba uhs card and general hs card, on mmp2 in SDMA mode. Signed-off-by: Arindam Nath <arindam.nath@amd.com> Reviewed-by: Philip Rakity <prakity@marvell.com> Tested-by: Philip Rakity <prakity@marvell.com> Acked-by: Zhangfei Gao <zhangfei.gao@marvell.com> Signed-off-by: Chris Ball <cjb@laptop.org>
2011-05-24mmc: sd: add support for signal voltage switch procedureArindam Nath
Host Controller v3.00 adds another Capabilities register. Apart from other things, this new register indicates whether the Host Controller supports SDR50, SDR104, and DDR50 UHS-I modes. The spec doesn't mention about explicit support for SDR12 and SDR25 UHS-I modes, so the Host Controller v3.00 should support them by default. Also if the controller supports SDR104 mode, it will also support SDR50 mode as well. So depending on the host support, we set the corresponding MMC_CAP_* flags. One more new register. Host Control2 is added in v3.00, which is used during Signal Voltage Switch procedure described below. Since as per v3.00 spec, UHS-I supported hosts should set S18R to 1, we set S18R (bit 24) of OCR before sending ACMD41. We also need to set XPC (bit 28) of OCR in case the host can supply >150mA. This support is indicated by the Maximum Current Capabilities register of the Host Controller. If the response of ACMD41 has both CCS and S18A set, we start the signal voltage switch procedure, which if successfull, will switch the card from 3.3V signalling to 1.8V signalling. Signal voltage switch procedure adds support for a new command CMD11 in the Physical Layer Spec v3.01. As part of this procedure, we need to set 1.8V Signalling Enable (bit 3) of Host Control2 register, which if remains set after 5ms, means the switch to 1.8V signalling is successfull. Otherwise, we clear bit 24 of OCR and retry the initialization sequence. When we remove the card, and insert the same or another card, we need to make sure that we start with 3.3V signalling voltage. So we call mmc_set_signal_voltage() with MMC_SIGNAL_VOLTAGE_330 set so that we are back to 3.3V signalling voltage before we actually start initializing the card. Tested by Zhangfei Gao with a Toshiba uhs card and general hs card, on mmp2 in SDMA mode. Signed-off-by: Arindam Nath <arindam.nath@amd.com> Reviewed-by: Philip Rakity <prakity@marvell.com> Tested-by: Philip Rakity <prakity@marvell.com> Acked-by: Zhangfei Gao <zhangfei.gao@marvell.com> Signed-off-by: Chris Ball <cjb@laptop.org>
2011-05-24mmc: core: Add mmc CMD+ACMD passthrough ioctlJohn Calixto
Allows appropriately-privileged applications to send CMD (normal) and ACMD (application-specific; preceded with CMD55) commands to cards/devices on the mmc bus. This is primarily useful for enabling the security functionality built in to every SD card. It can also be used as a generic passthrough (e.g. to enable virtual machines to control mmc bus devices directly). However, this use case has not been tested rigorously. Generic passthrough testing was only conducted for a few non-security opcodes to prove the feasibility of the passthrough. Since any opcode can be sent using this passthrough, it is very possible to render the card/device unusable. Applications that use this ioctl must have CAP_SYS_RAWIO. Security commands tested on TI PCIxx12 (SDHCI), Sigma Designs SMP8652 SoC, TI OMAP3621/OMAP3630 SoC, Samsung S5PC110 SoC, Qualcomm MSM7200A SoC. Signed-off-by: John Calixto <john.calixto@modsystems.com> Reviewed-by: Andrei Warkentin <andreiw@motorola.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Chris Ball <cjb@laptop.org>
2011-05-24mmc: sdhci: Fix read-only detection with JMicron 388 chipTakashi Iwai
On HP laptops with JMicron 388 chip, the write-locked SD card isn't detected correctly as read-only in many cases. This is because the PRESENT_STATE register becomes unsable just after plugging, and it returns the WRITE_PROTECT bit wrongly at the first read. This patch fixes the read-only detection by adding a new sdhci quirk indicating to check the register more intensively with a relatively long delay. The patch is tested with 2.6.39-rc4 kernel. Cc: Aries Lee <arieslee@jmicron.com> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Chris Ball <cjb@laptop.org>
2011-05-24mmc: quirks: Fix erase/trim for certain SanDisk cards.Andrei Warkentin
CMD38 argument is passed through EXT_CSD[113]. Signed-off-by: Andrei Warkentin <andreiw@motorola.com> Signed-off-by: Chris Ball <cjb@laptop.org>