summaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)Author
2012-12-12cpuset: use N_MEMORY instead N_HIGH_MEMORYLai Jiangshan
N_HIGH_MEMORY stands for the nodes that has normal or high memory. N_MEMORY stands for the nodes that has any memory. The code here need to handle with the nodes which have memory, we should use N_MEMORY instead. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Acked-by: Hillf Danton <dhillf@gmail.com> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Cc: Christoph Lameter <cl@linux.com> Cc: Lin Feng <linfeng@cn.fujitsu.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-12mm: node_states: introduce N_MEMORYLai Jiangshan
We have N_NORMAL_MEMORY for standing for the nodes that have normal memory with zone_type <= ZONE_NORMAL. And we have N_HIGH_MEMORY for standing for the nodes that have normal or high memory. But we don't have any word to stand for the nodes that have *any* memory. And we have N_CPU but without N_MEMORY. Current code reuse the N_HIGH_MEMORY for this purpose because any node which has memory must have high memory or normal memory currently. A) But this reusing is bad for *readability*. Because the name N_HIGH_MEMORY just stands for high or normal: A.example 1) mem_cgroup_nr_lru_pages(): for_each_node_state(nid, N_HIGH_MEMORY) The user will be confused(why this function just counts for high or normal memory node? does it counts for ZONE_MOVABLE's lru pages?) until someone else tell them N_HIGH_MEMORY is reused to stand for nodes that have any memory. A.cont) If we introduce N_MEMORY, we can reduce this confusing AND make the code more clearly: A.example 2) mm/page_cgroup.c use N_HIGH_MEMORY twice: One is in page_cgroup_init(void): for_each_node_state(nid, N_HIGH_MEMORY) { It means if the node have memory, we will allocate page_cgroup map for the node. We should use N_MEMORY instead here to gaim more clearly. The second using is in alloc_page_cgroup(): if (node_state(nid, N_HIGH_MEMORY)) addr = vzalloc_node(size, nid); It means if the node has high or normal memory that can be allocated from kernel. We should keep N_HIGH_MEMORY here, and it will be better if the "any memory" semantic of N_HIGH_MEMORY is removed. B) This reusing is out-dated if we introduce MOVABLE-dedicated node. The MOVABLE-dedicated node should not appear in node_stats[N_HIGH_MEMORY] nor node_stats[N_NORMAL_MEMORY], because MOVABLE-dedicated node has no high or normal memory. In x86_64, N_HIGH_MEMORY=N_NORMAL_MEMORY, if a MOVABLE-dedicated node is in node_stats[N_HIGH_MEMORY], it is also means it is in node_stats[N_NORMAL_MEMORY], it causes SLUB wrong. The slub uses for_each_node_state(nid, N_NORMAL_MEMORY) and creates kmem_cache_node for MOVABLE-dedicated node and cause problem. In one word, we need a N_MEMORY. We just intrude it as an alias to N_HIGH_MEMORY and fix all im-proper usages of N_HIGH_MEMORY in late patches. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Acked-by: Christoph Lameter <cl@linux.com> Acked-by: Hillf Danton <dhillf@gmail.com> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Cc: Lin Feng <linfeng@cn.fujitsu.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-12mm: use migrate_prep() instead of migrate_prep_local()Marek Szyprowski
__alloc_contig_migrate_range() should use all possible ways to get all the pages migrated from the given memory range, so pruning per-cpu lru lists for all CPUs is required, regadless the cost of such operation. Otherwise some pages which got stuck at per-cpu lru list might get missed by migration procedure causing the contiguous allocation to fail. Reported-by: SeongHwan Yoon <sunghwan.yun@samsung.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com> Acked-by: Michal Nazarewicz <mina86@mina86.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-12mm: compaction: Fix compiler warningThierry Reding
compact_capture_page() is only used if compaction is enabled so it should be moved into the corresponding #ifdef. Signed-off-by: Thierry Reding <thierry.reding@avionic-design.de> Acked-by: Mel Gorman <mgorman@suse.de> Cc: Rik van Riel <riel@redhat.com> Cc: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-12thp: avoid race on multiple parallel page faults to the same pageKirill A. Shutemov
pmd value is stable only with mm->page_table_lock taken. After taking the lock we need to check that nobody modified the pmd before changing it. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Jiri Slaby <jslaby@suse.cz> Cc: David Rientjes <rientjes@google.com> Reviewed-by: Bob Liu <lliubbo@gmail.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-12thp: introduce sysfs knob to disable huge zero pageKirill A. Shutemov
By default kernel tries to use huge zero page on read page fault. It's possible to disable huge zero page by writing 0 or enable it back by writing 1: echo 0 >/sys/kernel/mm/transparent_hugepage/khugepaged/use_zero_page echo 1 >/sys/kernel/mm/transparent_hugepage/khugepaged/use_zero_page Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: "H. Peter Anvin" <hpa@linux.intel.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-12thp, vmstat: implement HZP_ALLOC and HZP_ALLOC_FAILED eventsKirill A. Shutemov
hzp_alloc is incremented every time a huge zero page is successfully allocated. It includes allocations which where dropped due race with other allocation. Note, it doesn't count every map of the huge zero page, only its allocation. hzp_alloc_failed is incremented if kernel fails to allocate huge zero page and falls back to using small pages. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: "H. Peter Anvin" <hpa@linux.intel.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-12thp: implement refcounting for huge zero pageKirill A. Shutemov
H. Peter Anvin doesn't like huge zero page which sticks in memory forever after the first allocation. Here's implementation of lockless refcounting for huge zero page. We have two basic primitives: {get,put}_huge_zero_page(). They manipulate reference counter. If counter is 0, get_huge_zero_page() allocates a new huge page and takes two references: one for caller and one for shrinker. We free the page only in shrinker callback if counter is 1 (only shrinker has the reference). put_huge_zero_page() only decrements counter. Counter is never zero in put_huge_zero_page() since shrinker holds on reference. Freeing huge zero page in shrinker callback helps to avoid frequent allocate-free. Refcounting has cost. On 4 socket machine I observe ~1% slowdown on parallel (40 processes) read page faulting comparing to lazy huge page allocation. I think it's pretty reasonable for synthetic benchmark. [lliubbo@gmail.com: fix mismerge] Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: "H. Peter Anvin" <hpa@linux.intel.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Bob Liu <lliubbo@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-12thp: lazy huge zero page allocationKirill A. Shutemov
Instead of allocating huge zero page on hugepage_init() we can postpone it until first huge zero page map. It saves memory if THP is not in use. cmpxchg() is used to avoid race on huge_zero_pfn initialization. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: "H. Peter Anvin" <hpa@linux.intel.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-12thp: setup huge zero page on non-write page faultKirill A. Shutemov
All code paths seems covered. Now we can map huge zero page on read page fault. We setup it in do_huge_pmd_anonymous_page() if area around fault address is suitable for THP and we've got read page fault. If we fail to setup huge zero page (ENOMEM) we fallback to handle_pte_fault() as we normally do in THP. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: "H. Peter Anvin" <hpa@linux.intel.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-12thp: implement splitting pmd for huge zero pageKirill A. Shutemov
We can't split huge zero page itself (and it's bug if we try), but we can split the pmd which points to it. On splitting the pmd we create a table with all ptes set to normal zero page. [akpm@linux-foundation.org: fix build error] Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: "H. Peter Anvin" <hpa@linux.intel.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-12thp: change split_huge_page_pmd() interfaceKirill A. Shutemov
Pass vma instead of mm and add address parameter. In most cases we already have vma on the stack. We provides split_huge_page_pmd_mm() for few cases when we have mm, but not vma. This change is preparation to huge zero pmd splitting implementation. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: "H. Peter Anvin" <hpa@linux.intel.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-12thp: change_huge_pmd(): make sure we don't try to make a page writableKirill A. Shutemov
mprotect core never tries to make page writable using change_huge_pmd(). Let's add an assert that the assumption is true. It's important to be sure we will not make huge zero page writable. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: "H. Peter Anvin" <hpa@linux.intel.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-12thp: do_huge_pmd_wp_page(): handle huge zero pageKirill A. Shutemov
On write access to huge zero page we alloc a new huge page and clear it. If ENOMEM, graceful fallback: we create a new pmd table and set pte around fault address to newly allocated normal (4k) page. All other ptes in the pmd set to normal zero page. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: "H. Peter Anvin" <hpa@linux.intel.com> Cc: Mel Gorman <mel@csn.ul.ie> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-12thp: copy_huge_pmd(): copy huge zero pageKirill A. Shutemov
It's easy to copy huge zero page. Just set destination pmd to huge zero page. It's safe to copy huge zero page since we have none yet :-p [rientjes@google.com: fix comment] Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: "H. Peter Anvin" <hpa@linux.intel.com> Cc: Mel Gorman <mel@csn.ul.ie> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-12thp: zap_huge_pmd(): zap huge zero pmdKirill A. Shutemov
We don't have a mapped page to zap in huge zero page case. Let's just clear pmd and remove it from tlb. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: "H. Peter Anvin" <hpa@linux.intel.com> Cc: Mel Gorman <mel@csn.ul.ie> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-12thp: huge zero page: basic preparationKirill A. Shutemov
During testing I noticed big (up to 2.5 times) memory consumption overhead on some workloads (e.g. ft.A from NPB) if THP is enabled. The main reason for that big difference is lacking zero page in THP case. We have to allocate a real page on read page fault. A program to demonstrate the issue: #include <assert.h> #include <stdlib.h> #include <unistd.h> #define MB 1024*1024 int main(int argc, char **argv) { char *p; int i; posix_memalign((void **)&p, 2 * MB, 200 * MB); for (i = 0; i < 200 * MB; i+= 4096) assert(p[i] == 0); pause(); return 0; } With thp-never RSS is about 400k, but with thp-always it's 200M. After the patcheset thp-always RSS is 400k too. Design overview. Huge zero page (hzp) is a non-movable huge page (2M on x86-64) filled with zeros. The way how we allocate it changes in the patchset: - [01/10] simplest way: hzp allocated on boot time in hugepage_init(); - [09/10] lazy allocation on first use; - [10/10] lockless refcounting + shrinker-reclaimable hzp; We setup it in do_huge_pmd_anonymous_page() if area around fault address is suitable for THP and we've got read page fault. If we fail to setup hzp (ENOMEM) we fallback to handle_pte_fault() as we normally do in THP. On wp fault to hzp we allocate real memory for the huge page and clear it. If ENOMEM, graceful fallback: we create a new pmd table and set pte around fault address to newly allocated normal (4k) page. All other ptes in the pmd set to normal zero page. We cannot split hzp (and it's bug if we try), but we can split the pmd which points to it. On splitting the pmd we create a table with all ptes set to normal zero page. === By hpa's request I've tried alternative approach for hzp implementation (see Virtual huge zero page patchset): pmd table with all entries set to zero page. This way should be more cache friendly, but it increases TLB pressure. The problem with virtual huge zero page: it requires per-arch enabling. We need a way to mark that pmd table has all ptes set to zero page. Some numbers to compare two implementations (on 4s Westmere-EX): Mirobenchmark1 ============== test: posix_memalign((void **)&p, 2 * MB, 8 * GB); for (i = 0; i < 100; i++) { assert(memcmp(p, p + 4*GB, 4*GB) == 0); asm volatile ("": : :"memory"); } hzp: Performance counter stats for './test_memcmp' (5 runs): 32356.272845 task-clock # 0.998 CPUs utilized ( +- 0.13% ) 40 context-switches # 0.001 K/sec ( +- 0.94% ) 0 CPU-migrations # 0.000 K/sec 4,218 page-faults # 0.130 K/sec ( +- 0.00% ) 76,712,481,765 cycles # 2.371 GHz ( +- 0.13% ) [83.31%] 36,279,577,636 stalled-cycles-frontend # 47.29% frontend cycles idle ( +- 0.28% ) [83.35%] 1,684,049,110 stalled-cycles-backend # 2.20% backend cycles idle ( +- 2.96% ) [66.67%] 134,355,715,816 instructions # 1.75 insns per cycle # 0.27 stalled cycles per insn ( +- 0.10% ) [83.35%] 13,526,169,702 branches # 418.039 M/sec ( +- 0.10% ) [83.31%] 1,058,230 branch-misses # 0.01% of all branches ( +- 0.91% ) [83.36%] 32.413866442 seconds time elapsed ( +- 0.13% ) vhzp: Performance counter stats for './test_memcmp' (5 runs): 30327.183829 task-clock # 0.998 CPUs utilized ( +- 0.13% ) 38 context-switches # 0.001 K/sec ( +- 1.53% ) 0 CPU-migrations # 0.000 K/sec 4,218 page-faults # 0.139 K/sec ( +- 0.01% ) 71,964,773,660 cycles # 2.373 GHz ( +- 0.13% ) [83.35%] 31,191,284,231 stalled-cycles-frontend # 43.34% frontend cycles idle ( +- 0.40% ) [83.32%] 773,484,474 stalled-cycles-backend # 1.07% backend cycles idle ( +- 6.61% ) [66.67%] 134,982,215,437 instructions # 1.88 insns per cycle # 0.23 stalled cycles per insn ( +- 0.11% ) [83.32%] 13,509,150,683 branches # 445.447 M/sec ( +- 0.11% ) [83.34%] 1,017,667 branch-misses # 0.01% of all branches ( +- 1.07% ) [83.32%] 30.381324695 seconds time elapsed ( +- 0.13% ) Mirobenchmark2 ============== test: posix_memalign((void **)&p, 2 * MB, 8 * GB); for (i = 0; i < 1000; i++) { char *_p = p; while (_p < p+4*GB) { assert(*_p == *(_p+4*GB)); _p += 4096; asm volatile ("": : :"memory"); } } hzp: Performance counter stats for 'taskset -c 0 ./test_memcmp2' (5 runs): 3505.727639 task-clock # 0.998 CPUs utilized ( +- 0.26% ) 9 context-switches # 0.003 K/sec ( +- 4.97% ) 4,384 page-faults # 0.001 M/sec ( +- 0.00% ) 8,318,482,466 cycles # 2.373 GHz ( +- 0.26% ) [33.31%] 5,134,318,786 stalled-cycles-frontend # 61.72% frontend cycles idle ( +- 0.42% ) [33.32%] 2,193,266,208 stalled-cycles-backend # 26.37% backend cycles idle ( +- 5.51% ) [33.33%] 9,494,670,537 instructions # 1.14 insns per cycle # 0.54 stalled cycles per insn ( +- 0.13% ) [41.68%] 2,108,522,738 branches # 601.451 M/sec ( +- 0.09% ) [41.68%] 158,746 branch-misses # 0.01% of all branches ( +- 1.60% ) [41.71%] 3,168,102,115 L1-dcache-loads # 903.693 M/sec ( +- 0.11% ) [41.70%] 1,048,710,998 L1-dcache-misses # 33.10% of all L1-dcache hits ( +- 0.11% ) [41.72%] 1,047,699,685 LLC-load # 298.854 M/sec ( +- 0.03% ) [33.38%] 2,287 LLC-misses # 0.00% of all LL-cache hits ( +- 8.27% ) [33.37%] 3,166,187,367 dTLB-loads # 903.147 M/sec ( +- 0.02% ) [33.35%] 4,266,538 dTLB-misses # 0.13% of all dTLB cache hits ( +- 0.03% ) [33.33%] 3.513339813 seconds time elapsed ( +- 0.26% ) vhzp: Performance counter stats for 'taskset -c 0 ./test_memcmp2' (5 runs): 27313.891128 task-clock # 0.998 CPUs utilized ( +- 0.24% ) 62 context-switches # 0.002 K/sec ( +- 0.61% ) 4,384 page-faults # 0.160 K/sec ( +- 0.01% ) 64,747,374,606 cycles # 2.370 GHz ( +- 0.24% ) [33.33%] 61,341,580,278 stalled-cycles-frontend # 94.74% frontend cycles idle ( +- 0.26% ) [33.33%] 56,702,237,511 stalled-cycles-backend # 87.57% backend cycles idle ( +- 0.07% ) [33.33%] 10,033,724,846 instructions # 0.15 insns per cycle # 6.11 stalled cycles per insn ( +- 0.09% ) [41.65%] 2,190,424,932 branches # 80.195 M/sec ( +- 0.12% ) [41.66%] 1,028,630 branch-misses # 0.05% of all branches ( +- 1.50% ) [41.66%] 3,302,006,540 L1-dcache-loads # 120.891 M/sec ( +- 0.11% ) [41.68%] 271,374,358 L1-dcache-misses # 8.22% of all L1-dcache hits ( +- 0.04% ) [41.66%] 20,385,476 LLC-load # 0.746 M/sec ( +- 1.64% ) [33.34%] 76,754 LLC-misses # 0.38% of all LL-cache hits ( +- 2.35% ) [33.34%] 3,309,927,290 dTLB-loads # 121.181 M/sec ( +- 0.03% ) [33.34%] 2,098,967,427 dTLB-misses # 63.41% of all dTLB cache hits ( +- 0.03% ) [33.34%] 27.364448741 seconds time elapsed ( +- 0.24% ) === I personally prefer implementation present in this patchset. It doesn't touch arch-specific code. This patch: Huge zero page (hzp) is a non-movable huge page (2M on x86-64) filled with zeros. For now let's allocate the page on hugepage_init(). We'll switch to lazy allocation later. We are not going to map the huge zero page until we can handle it properly on all code paths. is_huge_zero_{pfn,pmd}() functions will be used by following patches to check whether the pfn/pmd is huge zero page. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: "H. Peter Anvin" <hpa@linux.intel.com> Cc: Mel Gorman <mel@csn.ul.ie> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-12bootmem: remove alloc_arch_preferred_bootmem()Joonsoo Kim
The name of this function is not suitable, and removing the function and open-coding it into each call sites makes the code more understandable. Additionally, we shouldn't do an allocation from bootmem when slab_is_available(), so directly return kmalloc()'s return value. Signed-off-by: Joonsoo Kim <js1304@gmail.com> Cc: Haavard Skinnemoen <hskinnemoen@gmail.com> Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-12bootmem: remove not implemented function call, bootmem_arch_preferred_node()Joonsoo Kim
There is no implementation of bootmem_arch_preferred_node() and a call to this function will cause a compilation error. So remove it. Signed-off-by: Joonsoo Kim <js1304@gmail.com> Cc: Haavard Skinnemoen <hskinnemoen@gmail.com> Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-12Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal Pull big execve/kernel_thread/fork unification series from Al Viro: "All architectures are converted to new model. Quite a bit of that stuff is actually shared with architecture trees; in such cases it's literally shared branch pulled by both, not a cherry-pick. A lot of ugliness and black magic is gone (-3KLoC total in this one): - kernel_thread()/kernel_execve()/sys_execve() redesign. We don't do syscalls from kernel anymore for either kernel_thread() or kernel_execve(): kernel_thread() is essentially clone(2) with callback run before we return to userland, the callbacks either never return or do successful do_execve() before returning. kernel_execve() is a wrapper for do_execve() - it doesn't need to do transition to user mode anymore. As a result kernel_thread() and kernel_execve() are arch-independent now - they live in kernel/fork.c and fs/exec.c resp. sys_execve() is also in fs/exec.c and it's completely architecture-independent. - daemonize() is gone, along with its parts in fs/*.c - struct pt_regs * is no longer passed to do_fork/copy_process/ copy_thread/do_execve/search_binary_handler/->load_binary/do_coredump. - sys_fork()/sys_vfork()/sys_clone() unified; some architectures still need wrappers (ones with callee-saved registers not saved in pt_regs on syscall entry), but the main part of those suckers is in kernel/fork.c now." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal: (113 commits) do_coredump(): get rid of pt_regs argument print_fatal_signal(): get rid of pt_regs argument ptrace_signal(): get rid of unused arguments get rid of ptrace_signal_deliver() arguments new helper: signal_pt_regs() unify default ptrace_signal_deliver flagday: kill pt_regs argument of do_fork() death to idle_regs() don't pass regs to copy_process() flagday: don't pass regs to copy_thread() bfin: switch to generic vfork, get rid of pointless wrappers xtensa: switch to generic clone() openrisc: switch to use of generic fork and clone unicore32: switch to generic clone(2) score: switch to generic fork/vfork/clone c6x: sanitize copy_thread(), get rid of clone(2) wrapper, switch to generic clone() take sys_fork/sys_vfork/sys_clone prototypes to linux/syscalls.h mn10300: switch to generic fork/vfork/clone h8300: switch to generic fork/vfork/clone tile: switch to generic clone() ... Conflicts: arch/microblaze/include/asm/Kbuild
2012-12-12Merge tag 'boards' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-socLinus Torvalds
Pull ARM SoC board updates from Olof Johansson: "This branch contains a set of various board updates for ARM platforms. A few shmobile platforms that are stale have been removed, some defconfig updates for various boards selecting new features such as pinctrl subsystem support, and various updates enabling peripherals, etc." Fix up conflicts mostly as per Olof. * tag 'boards' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (58 commits) ARM: S3C64XX: Add dummy supplies for Glenfarclas LDOs ARM: S3C64XX: Add registration of WM2200 Bells device on Cragganmore ARM: kirkwood: Add Plat'Home OpenBlocks A6 support ARM: Dove: update defconfig ARM: Kirkwood: update defconfig for new boards arm: orion5x: add DT related options in defconfig arm: orion5x: convert 'LaCie Ethernet Disk mini v2' to Device Tree arm: orion5x: basic Device Tree support arm: orion5x: mechanical defconfig update ARM: kirkwood: Add support for the MPL CEC4 arm: kirkwood: add support for ZyXEL NSA310 ARM: Kirkwood: new board USI Topkick ARM: kirkwood: use gpio-fan DT binding on lsxl ARM: Kirkwood: add Netspace boards to defconfig ARM: kirkwood: DT board setup for Network Space Mini v2 ARM: kirkwood: DT board setup for Network Space Lite v2 ARM: kirkwood: DT board setup for Network Space v2 and parents leds: leds-ns2: add device tree binding ARM: Kirkwood: Enable the second I2C bus ARM: mmp: select pinctrl driver ...
2012-12-12Merge tag 'soc' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-socLinus Torvalds
Pull ARM SoC updates from Olof Johansson: "This contains the bulk of new SoC development for this merge window. Two new platforms have been added, the sunxi platforms (Allwinner A1x SoCs) by Maxime Ripard, and a generic Broadcom platform for a new series of ARMv7 platforms from them, where the hope is that we can keep the platform code generic enough to have them all share one mach directory. The new Broadcom platform is contributed by Christian Daudt. Highbank has grown support for Calxeda's next generation of hardware, ECX-2000. clps711x has seen a lot of cleanup from Alexander Shiyan, and he's also taken on maintainership of the platform. Beyond this there has been a bunch of work from a number of people on converting more platforms to IRQ domains, pinctrl conversion, cleanup and general feature enablement across most of the active platforms." Fix up trivial conflicts as per Olof. * tag 'soc' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (174 commits) mfd: vexpress-sysreg: Remove LEDs code irqchip: irq-sunxi: Add terminating entry for sunxi_irq_dt_ids clocksource: sunxi_timer: Add terminating entry for sunxi_timer_dt_ids irq: versatile: delete dangling variable ARM: sunxi: add missing include for mdelay() ARM: EXYNOS: Avoid early use of of_machine_is_compatible() ARM: dts: add node for PL330 MDMA1 controller for exynos4 ARM: EXYNOS: Add support for secondary CPU bring-up on Exynos4412 ARM: EXYNOS: add UART3 to DEBUG_LL ports ARM: S3C24XX: Add clkdev entry for camif-upll clock ARM: SAMSUNG: Add s3c24xx/s3c64xx CAMIF GPIO setup helpers ARM: sunxi: Add missing sun4i.dtsi file pinctrl: samsung: Do not initialise statics to 0 ARM i.MX6: remove gate_mask from pllv3 ARM i.MX6: Fix ethernet PLL clocks ARM i.MX6: rename PLLs according to datasheet ARM i.MX6: Add pwm support ARM i.MX51: Add pwm support ARM i.MX53: Add pwm support ARM: mx5: Replace clk_register_clkdev with clock DT lookup ...
2012-12-12Merge tag 'cleanup' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-socLinus Torvalds
Pull ARM SoC cleanups on various subarchitectures from Olof Johansson: "Cleanup patches for various ARM platforms and some of their associated drivers. There's also a branch in here that enables Freescale i.MX to be part of the multiplatform support -- the first "big" SoC that is moved over (more multiplatform work comes in a separate branch later during the merge window)." Conflicts fixed as per Olof, including a silent semantic one in arch/arm/mach-omap2/board-generic.c (omap_prcm_restart() was renamed to omap3xxx_restart(), and a new user of the old name was added). * tag 'cleanup' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (189 commits) ARM: omap: fix typo on timer cleanup ARM: EXYNOS: Remove unused regs-mem.h file ARM: EXYNOS: Remove unused non-dt support for dwmci controller ARM: Kirkwood: Use hw_pci.ops instead of hw_pci.scan ARM: OMAP3: cm-t3517: use GPTIMER for system clock ARM: OMAP2+: timer: remove CONFIG_OMAP_32K_TIMER ARM: SAMSUNG: use devm_ functions for ADC driver ARM: EXYNOS: no duplicate mask/unmask in eint0_15 ARM: S3C24XX: SPI clock channel setup is fixed for S3C2443 ARM: EXYNOS: Remove i2c0 resource information and setting of device names ARM: Kirkwood: checkpatch cleanups ARM: Kirkwood: Fix sparse warnings. ARM: Kirkwood: Remove unused includes ARM: kirkwood: cleanup lsxl board includes ARM: integrator: use BUG_ON where possible ARM: integrator: push down SC dependencies ARM: integrator: delete static UART1 mapping ARM: integrator: delete SC mapping on the CP ARM: integrator: remove static CP syscon mapping ARM: integrator: remove static AP syscon mapping ...
2012-12-12Merge tag 'headers' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-socLinus Torvalds
Pull ARM SoC Header cleanups from Olof Johansson: "This is a collection of header file cleanups, mostly for OMAP and AT91, that keeps moving the platforms in the direction of multiplatform by removing the need for mach-dependent header files used in drivers and other places." Fix up mostly trivial conflicts as per Olof. * tag 'headers' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (106 commits) ARM: OMAP2+: Move iommu/iovmm headers to platform_data ARM: OMAP2+: Make some definitions local ARM: OMAP2+: Move iommu2 to drivers/iommu/omap-iommu2.c ARM: OMAP2+: Move plat/iovmm.h to include/linux/omap-iommu.h ARM: OMAP2+: Move iopgtable header to drivers/iommu/ ARM: OMAP: Merge iommu2.h into iommu.h atmel: move ATMEL_MAX_UART to platform_data/atmel.h ARM: OMAP: Remove omap_init_consistent_dma_size() arm: at91: move at91rm9200 rtc header in drivers/rtc arm: at91: move reset controller header to arm/arm/mach-at91 arm: at91: move pit define to the driver arm: at91: move at91_shdwc.h to arch/arm/mach-at91 arm: at91: move board header to arch/arm/mach-at91 arn: at91: move at91_tc.h to arch/arm/mach-at91 arm: at91 move at91_aic.h to arch/arm/mach-at91 arm: at91 move board.h to arch/arm/mach-at91 arm: at91: move platfarm_data to include/linux/platform_data/atmel.h arm: at91: drop machine defconfig ARM: OMAP: Remove NEED_MACH_GPIO_H ARM: OMAP: Remove unnecessary mach and plat includes ...
2012-12-12Merge tag 'fixes-non-critical' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc Pull ARM SoC Non-critical bug fixes from Olof Johansson: "Simple bug fixes that were not considered important enough for inclusion into 3.7, especially those that arrived late during the merge window. There's also a MAINTAINERS update for the Renesas platforms in here, marking Simon Horman as a maintainer and changing the git url to his tree." * tag 'fixes-non-critical' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: Update ARM/SHMOBILE section of MAINTAINERS ARM: Fix Kconfig symbols typo for LEDS ARM: pxa: add dummy SA1100 rtc clock in pxa25x ARM: pxa: fix pxa25x gpio wakeup setting ARM: OMAP4: PM: fix errata handling when CONFIG_PM=n ARM: cns3xxx: drop unnecessary symbol selection ARM: vexpress: fix ll debug code when building multiplatform ARM: OMAP4: retrigger localtimers after re-enabling gic ARM: OMAP4460: Workaround for ROM bug because of CA9 r2pX GIC control register change. ARM: OMAP4: PM: add errata support ARM: davinci: fix return value check by using IS_ERR in tnetv107x_devices_init() ARM: davinci: uncompress.h: bail out if uart not initialized ARM: davinci: serial.h: fix uart number in the comment ARM: davinci: dm644x evm: move pointer dereference below NULL check ARM: vexpress: Make the debug UART detection more specific
2012-12-12Merge branch 'for-linus' of git://git.linaro.org/people/rmk/linux-armLinus Torvalds
Pull ARM updates from Russell King: "Here's the updates for ARM for this merge window, which cover quite a variety of areas. There's a bunch of patch series from Will tackling various bugs like the PROT_NONE handling, ASID allocation, cluster boot protocol and ASID TLB tagging updates. We move to a build-time sorted exception table rather than doing the sorting at run-time, add support for the secure computing filter, and some updates to the perf code. We also have sorted out the placement of some headers, fixed some build warnings, fixed some hotplug problems with the per-cpu TWD code." * 'for-linus' of git://git.linaro.org/people/rmk/linux-arm: (73 commits) ARM: 7594/1: Add .smp entry for REALVIEW_EB ARM: 7599/1: head: Remove boot-time HYP mode check for v5 and below ARM: 7598/1: net: bpf_jit_32: fix sp-relative load/stores offsets. ARM: 7595/1: syscall: rework ordering in syscall_trace_exit ARM: 7596/1: mmci: replace readsl/writesl with ioread32_rep/iowrite32_rep ARM: 7597/1: net: bpf_jit_32: fix kzalloc gfp/size mismatch. ARM: 7593/1: nommu: do not enable DCACHE_WORD_ACCESS when !CONFIG_MMU ARM: 7592/1: nommu: prevent generation of kernel unaligned memory accesses ARM: 7591/1: nommu: Enable the strict alignment (CR_A) bit only if ARCH < v6 ARM: 7590/1: /proc/interrupts: limit the display of IPIs to online CPUs only ARM: 7587/1: implement optimized percpu variable access ARM: 7589/1: integrator: pass the lm resource to amba ARM: 7588/1: amba: create a resource parent registrator ARM: 7582/2: rename kvm_seq to vmalloc_seq so to avoid confusion with KVM ARM: 7585/1: kernel: fix nr_cpu_ids check in DT logical map init ARM: 7584/1: perf: fix link error when CONFIG_HW_PERF_EVENTS is not selected ARM: gic: use a private mapping for CPU target interfaces ARM: kernel: add logical mappings look-up ARM: kernel: add cpu logical map DT init in setup_arch ARM: kernel: add device tree init map function ...
2012-12-12Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull CIFS fixes from Steve French: "This includes a set of misc. cifs fixes (most importantly some byte range lock related write fixes from Pavel, and some ACL and idmap related fixes from Jeff) but also includes the SMB2.02 dialect enablement, and a key fix for SMB3 mounts. Default authentication upgraded to ntlmv2 for cifs (it was already ntlmv2 for smb2)" * 'for-next' of git://git.samba.org/sfrench/cifs-2.6: (43 commits) CIFS: Fix write after setting a read lock for read oplock files cifs: parse the device name into UNC and prepath cifs: fix up handling of prefixpath= option cifs: clean up handling of unc= option cifs: fix SID binary to string conversion fix "disabling echoes and oplocks" on SMB2 mounts Do not send SMB2 signatures for SMB3 frames cifs: deal with id_to_sid embedded sid reply corner case cifs: fix hardcoded default security descriptor length cifs: extra sanity checking for cifs.idmap keys cifs: avoid extra allocation for small cifs.idmap keys cifs: simplify id_to_sid and sid_to_id mapping code CIFS: Fix possible data coherency problem after oplock break to None CIFS: Do not permit write to a range mandatory locked with a read lock cifs: rename cifs_readdir_lookup to cifs_prime_dcache and make it void return cifs: Add CONFIG_CIFS_DEBUG and rename use of CIFS_DEBUG cifs: Make CIFS_DEBUG possible to undefine SMB3 mounts fail with access denied to some servers cifs: Remove unused cEVENT macro cifs: always zero out smb_vol before parsing options ...
2012-12-12Merge tag 'for-linus-v3.8-rc1' of git://oss.sgi.com/xfs/xfsLinus Torvalds
Pull xfs update from Ben Myers: "There is plenty going on, including the cleanup of xfssyncd, metadata verifiers, CRC infrastructure for the log, tracking of inodes with speculative allocation, a cleanup of xfs_fs_subr.c, fixes for XFS_IOC_ZERO_RANGE, and important fix related to log replay (only update the last_sync_lsn when a transaction completes), a fix for deadlock on AGF buffers, documentation and comment updates, and a few more cleanups and fixes. Details: - remove the xfssyncd mess - only update the last_sync_lsn when a transaction completes - zero allocation_args on the kernel stack - fix AGF/alloc workqueue deadlock - silence uninitialised f.file warning - Update inode alloc comments - Update mount options documentation - report projid32bit feature in geometry call - speculative preallocation inode tracking - fix attr tree double split corruption - fix broken error handling in xfs_vm_writepage - drop buffer io reference when a bad bio is built - add more attribute tree trace points - growfs infrastructure changes for 3.8 - fs/xfs/xfs_fs_subr.c die die die - add CRC infrastructure - add CRC checks to the log - Remove description of nodelaylog mount option from xfs.txt - inode allocation should use unmapped buffers - byte range granularity for XFS_IOC_ZERO_RANGE - fix direct IO nested transaction deadlock - fix stray dquot unlock when reclaiming dquots - fix sparse reported log CRC endian issue" Fix up trivial conflict in fs/xfs/xfs_fsops.c due to the same patch having been applied twice (commits eaef854335ce and 1375cb65e87b: "xfs: growfs: don't read garbage for new secondary superblocks") with later updates to the affected code in the XFS tree. * tag 'for-linus-v3.8-rc1' of git://oss.sgi.com/xfs/xfs: (78 commits) xfs: fix sparse reported log CRC endian issue xfs: fix stray dquot unlock when reclaiming dquots xfs: fix direct IO nested transaction deadlock. xfs: byte range granularity for XFS_IOC_ZERO_RANGE xfs: inode allocation should use unmapped buffers. xfs: Remove the description of nodelaylog mount option from xfs.txt xfs: add CRC checks to the log xfs: add CRC infrastructure xfs: convert buffer verifiers to an ops structure. xfs: connect up write verifiers to new buffers xfs: add pre-write metadata buffer verifier callbacks xfs: add buffer pre-write callback xfs: Add verifiers to dir2 data readahead. xfs: add xfs_da_node verification xfs: factor and verify attr leaf reads xfs: factor dir2 leaf read xfs: factor out dir2 data block reading xfs: factor dir2 free block reading xfs: verify dir2 block format buffers xfs: factor dir2 block read operations ...
2012-12-12Merge tag 'dlm-3.8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm Pull dlm updates from David Teigland: "This set fixes some conditions in which value blocks are invalidated, and includes two trivial cleanups." * tag 'dlm-3.8' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm: dlm: fix lvb invalidation conditions fs/dlm: remove CONFIG_EXPERIMENTAL dlm: remove unused variable in *dlm_lowcomms_get_buffer()
2012-12-12Merge branch 'for-3.8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup changes from Tejun Heo: "A lot of activities on cgroup side. The big changes are focused on making cgroup hierarchy handling saner. - cgroup_rmdir() had peculiar semantics - it allowed cgroup destruction to be vetoed by individual controllers and tried to drain refcnt synchronously. The vetoing never worked properly and caused good deal of contortions in cgroup. memcg was the last reamining user. Michal Hocko removed the usage and cgroup_rmdir() path has been simplified significantly. This was done in a separate branch so that the memcg people can base further memcg changes on top. - The above allowed cleaning up cgroup lifecycle management and implementation of generic cgroup iterators which are used to improve hierarchy support. - cgroup_freezer updated to allow migration in and out of a frozen cgroup and handle hierarchy. If a cgroup is frozen, all descendant cgroups are frozen. - netcls_cgroup and netprio_cgroup updated to handle hierarchy properly. - Various fixes and cleanups. - Two merge commits. One to pull in memcg and rmdir cleanups (needed to build iterators). The other pulled in cgroup/for-3.7-fixes for device_cgroup fixes so that further device_cgroup patches can be stacked on top." Fixed up a trivial conflict in mm/memcontrol.c as per Tejun (due to commit bea8c150a7 ("memcg: fix hotplugged memory zone oops") in master touching code close to commit 2ef37d3fe4 ("memcg: Simplify mem_cgroup_force_empty_list error handling") in for-3.8) * 'for-3.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (65 commits) cgroup: update Documentation/cgroups/00-INDEX cgroup_rm_file: don't delete the uncreated files cgroup: remove subsystem files when remounting cgroup cgroup: use cgroup_addrm_files() in cgroup_clear_directory() cgroup: warn about broken hierarchies only after css_online cgroup: list_del_init() on removed events cgroup: fix lockdep warning for event_control cgroup: move list add after list head initilization netprio_cgroup: allow nesting and inherit config on cgroup creation netprio_cgroup: implement netprio[_set]_prio() helpers netprio_cgroup: use cgroup->id instead of cgroup_netprio_state->prioidx netprio_cgroup: reimplement priomap expansion netprio_cgroup: shorten variable names in extend_netdev_table() netprio_cgroup: simplify write_priomap() netcls_cgroup: move config inheritance to ->css_online() and remove .broken_hierarchy marking cgroup: remove obsolete guarantee from cgroup_task_migrate. cgroup: add cgroup->id cgroup, cpuset: remove cgroup_subsys->post_clone() cgroup: s/CGRP_CLONE_CHILDREN/CGRP_CPUSET_CLONE_CHILDREN/ cgroup: rename ->create/post_create/pre_destroy/destroy() to ->css_alloc/online/offline/free() ...
2012-12-12Merge branch 'for-3.8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu Pull percpu changes from Tejun Heo: "Nothing exciting here either. Joonsoo's is almost cosmetic. Cyrill's patch fixes "percpu_alloc" early kernel param handling so that the kernel doesn't crash when the parameter is specified w/o any argument." * 'for-3.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: mm, percpu: Make sure percpu_alloc early parameter has an argument percpu: make pcpu_free_chunk() use pcpu_mem_free() instead of kfree()
2012-12-12Merge branch 'for-3.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wqLinus Torvalds
Pull workqueue changes from Tejun Heo: "Nothing exciting. Just two trivial changes." * 'for-3.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: add WARN_ON_ONCE() on CPU number to wq_worker_waking_up() workqueue: trivial fix for return statement in work_busy()
2012-12-12Merge branch 'release' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux Pull thermal management update from Zhang Rui: "Highlights: - Introduction of thermal policy support, together with three new thermal governors, including step_wise, user_space, fire_share. - Introduction of ST-Ericsson db8500_thermal driver and ST-Ericsson db8500_cpufreq_cooling driver. - Thermal Kconfig file and Makefile refactor. - Fixes for generic thermal layer, generic cpucooling, rcar thermal driver and Exynos thermal driver." * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux: (36 commits) Thermal: Fix DEFAULT_THERMAL_GOVERNOR Thermal: fix a NULL pointer dereference when generic thermal layer is built as a module thermal: rcar: add rcar_zone_to_priv() macro thermal: rcar: fixup the unit of temperature thermal: cpu cooling: allow module builds thermal: cpu cooling: use const parameter while registering Thermal: Add ST-Ericsson DB8500 thermal properties and platform data. Thermal: Add ST-Ericsson DB8500 thermal driver. drivers/thermal/Makefile refactor Exynos: Add missing dependency Refactor drivers/thermal/Kconfig thermal: cpu_cooling: Make 'notify_device' static Thermal: Remove the cooling_cpufreq_list. Thermal: fix bug of counting cpu frequencies. Thermal: add indent for code alignment. thermal: rcar_thermal: remove explicitly used devm_kfree/iounap() thermal: user_space: Add missing static storage class specifiers thermal: fair_share: Add missing static storage class specifiers thermal: step_wise: Add missing static storage class specifiers Thermal: Fix oops and unlocking in thermal_sys.c ...
2012-12-12Merge tag 'regmap-3.8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap Pull regmap updates from Mark Brown: "Quite a few enhancements this time around, helpers and diagnostics for the most part which is good to see: - Addition of table based lookups for the register access checks from Davide Ciminaghi, making life easier for drivers with big blocks of similar registers. - Allow drivers to get the irqdomain for regmap irq_chips, allowing the domain to be used with other APIs. - Debug improvements for paged register maps. - Performance improvments for some of the diagnostic infrastructure, very helpful for devices with large register maps." * tag 'regmap-3.8' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap: regmap: debugfs: Cache offsets of valid regions for dump regmap: debugfs: Factor out initial seek regmap: debugfs: Avoid overflows for very small reads regmap: Cache register and value sizes for debugfs regmap: introduce tables for readable/writeable/volatile/precious checks regmap: core: Report registers in hex when we can't cache regmap: Fix printing of size_t variable regmap: make lock/unlock functions customizable regmap: silence GCC warning regmap: Split raw writes that cross window boundaries regmap: Make return code checks consistent regmap: Factor range lookup out of page selection regmap: Provide debugfs read of register ranges regmap: Factor out debugfs register read regmap: Allow ranges to be named regmap: When we sanity check during range adds say what errors we find regmap: Rename n_ranges to num_ranges regmap: irq: Allow users to retrieve the irq_domain
2012-12-12Merge tag 'please-pull-einj-fix-for-acpi5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras Pull ACPI5 error injection fix from Tony Luck: "Trivial fix for error injection code using ACPI5 version of EINJ" * tag 'please-pull-einj-fix-for-acpi5' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras: ACPI, APEI, EINJ: Add missed ACPI5 support for error trigger table
2012-12-12Merge tag 'please-pull-pstore_mevent' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux Pull pstore fixes from Tony Luck: "Patch series to allow EFI variable backend to pstore to hold multiple records." * tag 'please-pull-pstore_mevent' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux: efi_pstore: Add a format check for an existing variable name at erasing time efi_pstore: Add a format check for an existing variable name at reading time efi_pstore: Add a sequence counter to a variable name efi_pstore: Add ctime to argument of erase callback efi_pstore: Remove a logic erasing entries from a write callback to hold multiple logs efi_pstore: Add a logic erasing entries to an erase callback efi_pstore: Check remaining space with QueryVariableInfo() before writing data
2012-12-12Merge tag 'please-pull-misc-3.8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux Pull ia64 fix from Tony Luck: "Miscellaneous ia64 fix for 3.8. Just need to avoid a pending namespace collision from other work being merged." * tag 'please-pull-misc-3.8' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux: [IA64] Resolve name space collision for cache_show()
2012-12-12Merge tag 'arm64-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/cmarinas/linux-aarch64 Pull ARM64 updates from Catalin Marinas: - Generic execve, kernel_thread, fork/vfork/clone. - Preparatory patches for KVM support (initialising EL2 mode for later installing KVM support, hypervisor stub). - Signal handling corner case fix (alternative signal stack set up for a SEGV handler, which is raised in response to RLIMIT_STACK being reached). - Sub-nanosecond timer error fix. * tag 'arm64-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/cmarinas/linux-aarch64: (30 commits) arm64: Update the MAINTAINERS entry arm64: compat for clock_adjtime(2) is miswired arm64: move FP-SIMD save/restore code to a macro arm64: hyp: initialize vttbr_el2 to zero arm64: add hypervisor stub arm64: record boot mode when entering the kernel arm64: move vector entry macro to assembler.h arm64: add AArch32 execution modes to ptrace.h arm64: expand register mapping between AArch32 and AArch64 arm64: generic timer: use virtual counter instead of physical at EL0 arm64: vdso: defer shifting of nanosecond component of timespec arm64: vdso: rework __do_get_tspec register allocation and return shift arm64: vdso: check sequence counter even for coarse realtime operations arm64: vdso: fix clocksource mask when extracting bottom 56 bits ARM64: Remove incorrect Kconfig symbol HAVE_SPARSE_IRQ Documentation: Fixes a word in Documentation/arm64/memory.txt arm64: Make !dirty ptes read-only arm64: Convert empty flush_cache_{mm,page} functions to static inline arm64: signal: let the compiler inline compat_get_sigframe arm64: signal: return struct rt_sigframe from get_sigframe ... Conflicts: arch/arm64/include/asm/unistd32.h
2012-12-12Merge branch 'omap-serial' of git://git.linaro.org/people/rmk/linux-armLinus Torvalds
Pull ARM OMAP serial updates from Russell King: "This series is a major reworking of the OMAP serial driver code fixing various bugs in the hardware-assisted flow control, extending up into serial_core for a couple of issues. These fixes have been done as a set of progressive changes and transformations in the hope that no new bugs will be introduced by this series. The problems are many-fold, from the driver not being informed about updated settings, to the driver not knowing what the intentions of the upper layers are. The first four patches tackle the serial_core layer, allowing it to provide the necessary information to drivers, and the remaining patches allow the OMAP serial driver to take advantage of this. This brings hardware assisted RTS/CTS and XON/OFF flow control into a useful state. These patches have been in linux-next for most of the last cycle; indeed they predate the previous merge window. They've also been posted to the OMAP people." * 'omap-serial' of git://git.linaro.org/people/rmk/linux-arm: (21 commits) SERIAL: omap: fix hardware assisted flow control SERIAL: omap: simplify (2) SERIAL: omap: move xon/xoff setting earlier SERIAL: omap: always set TCR SERIAL: omap: simplify SERIAL: omap: don't read back LCR/MCR/EFR SERIAL: omap: serial_omap_configure_xonxoff() contents into set_termios SERIAL: omap: configure xon/xoff before setting modem control lines SERIAL: omap: remove OMAP_UART_SYSC_RESET and OMAP_UART_FIFO_CLR SERIAL: omap: move driver private definitions and structures to driver SERIAL: omap: remove 'irq_pending' bitfield SERIAL: omap: fix MCR TCRTLR bit handling SERIAL: omap: fix set_mctrl() breakage SERIAL: omap: no need to re-read EFR SERIAL: omap: remove setting of EFR SCD bit SERIAL: omap: allow hardware assisted IXANY mode to be disabled SERIAL: omap: allow hardware assisted rts/cts modes to be disabled SERIAL: core: add throttle/unthrottle callbacks for hardware assisted flow control SERIAL: core: add hardware assisted h/w flow control support SERIAL: core: add hardware assisted s/w flow control support ... Conflicts: drivers/tty/serial/omap-serial.c
2012-12-12Thermal: Fix DEFAULT_THERMAL_GOVERNORZhang Rui
Fix DEFAULT_THERMAL_GOVERNOR to be consistant with the default governor selected in kernel config file. Signed-off-by: Zhang Rui <rui.zhang@intel.com>
2012-12-12Thermal: fix a NULL pointer dereference when generic thermal layer is built ↵Zhang Rui
as a module [ 12.761956] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018 [ 12.762016] IP: [<ffffffffa0005277>] handle_thermal_trip+0x47/0x130 [thermal_sys] [ 12.762060] PGD 1fec74067 PUD 1fee5b067 PMD 0 [ 12.762127] Oops: 0000 [#1] SMP [ 12.762177] Modules linked in: hid_generic crc32c_intel usbhid hid firewire_ohci(+) e1000e(+) firewire_core crc_itu_t xhci_hcd(+) thermal(+) fan thermal_sys hwmon [ 12.762423] CPU 1 [ 12.762443] Pid: 187, comm: modprobe Tainted: G A 3.7.0-thermal-module+ #25 /DH77DF [ 12.762496] RIP: 0010:[<ffffffffa0005277>] [<ffffffffa0005277>] handle_thermal_trip+0x47/0x130 [thermal_sys] [ 12.762682] RSP: 0018:ffff8801fe7ddc18 EFLAGS: 00010282 [ 12.762704] RAX: 0000000000000000 RBX: ffff8801ff3e9c00 RCX: ffff8801fdc39800 [ 12.762728] RDX: ffff8801fe7ddc24 RSI: 0000000000000001 RDI: ffff8801ff3e9c00 [ 12.762764] RBP: ffff8801fe7ddc48 R08: 0000000004000000 R09: ffffffffa001f568 [ 12.762797] R10: ffffffff81363083 R11: 0000000000000001 R12: 0000000000000001 [ 12.762832] R13: 0000000000000000 R14: 0000000000000001 R15: ffff8801fde73e68 [ 12.762866] FS: 00007f5548516700(0000) GS:ffff88021f240000(0000) knlGS:0000000000000000 [ 12.762912] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 12.762946] CR2: 0000000000000018 CR3: 00000001fefe2000 CR4: 00000000001407e0 [ 12.762979] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 12.763014] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 12.763048] Process modprobe (pid: 187, threadinfo ffff8801fe7dc000, task ffff8801fe5bdb40) [ 12.763095] Stack: [ 12.763122] 0000000000019640 00000000fdc39800 ffff8801fe7ddc48 ffff8801ff3e9c00 [ 12.763225] 0000000000000002 0000000000000000 ffff8801fe7ddc78 ffffffffa00053e7 [ 12.763338] ffff8801ff3e9c00 0000000000006c98 ffffffffa0007480 ffff8801ff3e9c00 [ 12.763440] Call Trace: [ 12.763470] [<ffffffffa00053e7>] thermal_zone_device_update+0x77/0xa0 [thermal_sys] [ 12.763515] [<ffffffffa0006d38>] thermal_zone_device_register+0x788/0xa88 [thermal_sys] [ 12.763562] [<ffffffffa001f394>] acpi_thermal_add+0x360/0x4c8 [thermal] [ 12.763598] [<ffffffff8133902a>] acpi_device_probe+0x50/0x190 [ 12.763632] [<ffffffff811bd793>] ? sysfs_create_link+0x13/0x20 [ 12.763666] [<ffffffff813cc41b>] driver_probe_device+0x7b/0x240 [ 12.763699] [<ffffffff813cc68b>] __driver_attach+0xab/0xb0 [ 12.763732] [<ffffffff813cc5e0>] ? driver_probe_device+0x240/0x240 [ 12.763766] [<ffffffff813ca836>] bus_for_each_dev+0x56/0x90 [ 12.763799] [<ffffffff813cbf4e>] driver_attach+0x1e/0x20 [ 12.763831] [<ffffffff813cbac0>] bus_add_driver+0x190/0x290 [ 12.763864] [<ffffffffa0022000>] ? 0xffffffffa0021fff [ 12.763896] [<ffffffff813ccbea>] driver_register+0x7a/0x160 [ 12.763928] [<ffffffffa0022000>] ? 0xffffffffa0021fff [ 12.763960] [<ffffffff813399fb>] acpi_bus_register_driver+0x43/0x45 [ 12.763995] [<ffffffffa002203a>] acpi_thermal_init+0x3a/0x42 [thermal] [ 12.764029] [<ffffffff8100207f>] do_one_initcall+0x3f/0x170 [ 12.764063] [<ffffffff810b1a5f>] sys_init_module+0x8f/0x200 [ 12.764097] [<ffffffff815ff259>] system_call_fastpath+0x16/0x1b [ 12.764129] Code: 48 8b 87 c8 02 00 00 41 89 f4 48 8d 55 dc ff 50 28 44 8b 6d dc 41 8d 45 fe 83 f8 01 76 5e 48 8b 83 d8 02 00 00 44 89 e6 48 89 df <ff> 50 18 4c 8d a3 10 03 00 00 4c 89 e7 e8 87 f1 5e e1 8b 83 bc [ 12.765164] RIP [<ffffffffa0005277>] handle_thermal_trip+0x47/0x130 [thermal_sys] [ 12.765223] RSP <ffff8801fe7ddc18> [ 12.765252] CR2: 0000000000000018 [ 12.765284] ---[ end trace 7723294cdfb00d2a ]--- This is because thermal_zone_device_update() is invoked before any thermal governors being registered. Signed-off-by: Zhang Rui <rui.zhang@intel.com>
2012-12-11Merge branch 'x86-timers-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 timer update from Ingo Molnar: "This tree includes HPET fixes and also implements a calibration-free, TSC match driven APIC timer interrupt mode: 'TSC deadline mode' supported in SandyBridge and later CPUs." * 'x86-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: hpet: Fix inverted return value check in arch_setup_hpet_msi() x86: hpet: Fix masking of MSI interrupts x86: apic: Use tsc deadline for oneshot when available
2012-12-11Merge branch 'x86-nuke386-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull "Nuke 386-DX/SX support" from Ingo Molnar: "This tree removes ancient-386-CPUs support and thus zaps quite a bit of complexity: 24 files changed, 56 insertions(+), 425 deletions(-) ... which complexity has plagued us with extra work whenever we wanted to change SMP primitives, for years. Unfortunately there's a nostalgic cost: your old original 386 DX33 system from early 1991 won't be able to boot modern Linux kernels anymore. Sniff." I'm not sentimental. Good riddance. * 'x86-nuke386-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, 386 removal: Document Nx586 as a 386 and thus unsupported x86, cleanups: Simplify sync_core() in the case of no CPUID x86, 386 removal: Remove CONFIG_X86_POPAD_OK x86, 386 removal: Remove CONFIG_X86_WP_WORKS_OK x86, 386 removal: Remove CONFIG_INVLPG x86, 386 removal: Remove CONFIG_BSWAP x86, 386 removal: Remove CONFIG_XADD x86, 386 removal: Remove CONFIG_CMPXCHG x86, 386 removal: Remove CONFIG_M386 from Kconfig
2012-12-11Merge branch 'x86-cpu-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 topology discovery improvements from Ingo Molnar: "These changes improve topology discovery on AMD CPUs. Right now this feeds information displayed in /sys/devices/system/cpu/cpuX/cache/indexY/* - but in the future we could use this to set up a better scheduling topology." * 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, cacheinfo: Base cache sharing info on CPUID 0x8000001d on AMD x86, cacheinfo: Make use of CPUID 0x8000001d for cache information on AMD x86, cacheinfo: Determine number of cache leafs using CPUID 0x8000001d on AMD x86: Add cpu_has_topoext
2012-12-11Merge branch 'x86-cleanups-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cleanups from Ingo Molnar: "Small cleanups." * 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: Fix the error of using "const" in gen-insn-attr-x86.awk x86, apic: Cleanup cfg->domain setup for legacy interrupts x86: Remove dead hlt_use_halt code
2012-12-11Merge branch 'x86-bsp-hotplug-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 BSP hotplug changes from Ingo Molnar: "This tree enables CPU#0 (the boot processor) to be onlined/offlined on x86, just like any other CPU. Enabled on Intel CPUs for now. Allowing this required the identification and fixing of latent CPU#0 assumptions (such as CPU#0 initializations, etc.) in the x86 architecture code, plus the identification of barriers to BSP-offlining, such as active PIC interrupts which can only be serviced on the BSP. It's behind a default-off option, and there's a debug option that allows the automatic testing of this feature. The motivation of this feature is to allow and prepare for true CPU-hotplug hardware support: recent changes to MCE support enable us to detect a deteriorating but not yet hard-failing L1/L2 cache on a CPU that could be soft-unplugged - or a failing L3 cache on a multi-socket system. Note that true hardware hot-plug is not yet fully enabled by this, because that requires a special platform wakeup sequence to be sent to the freshly powered up CPU#0. Future patches for this are planned, once such a platform exists. Chicken and egg" * 'x86-bsp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, topology: Debug CPU0 hotplug x86/i387.c: Initialize thread xstate only on CPU0 only once x86, hotplug: Handle retrigger irq by the first available CPU x86, hotplug: The first online processor saves the MTRR state x86, hotplug: During CPU0 online, enable x2apic, set_numa_node. x86, hotplug: Wake up CPU0 via NMI instead of INIT, SIPI, SIPI x86-32, hotplug: Add start_cpu0() entry point to head_32.S x86-64, hotplug: Add start_cpu0() entry point to head_64.S kernel/cpu.c: Add comment for priority in cpu_hotplug_pm_callback x86, hotplug, suspend: Online CPU0 for suspend or hibernate x86, hotplug: Support functions for CPU0 online/offline x86, topology: Don't offline CPU0 if any PIC irq can not be migrated out of it x86, Kconfig: Add config switch for CPU0 hotplug doc: Add x86 CPU0 online/offline feature
2012-12-11Merge branch 'x86-boot-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 boot changes from Ingo Molnar: "Two small changes: a cleanup and allow CONFIG_X86_MPPARSE to be turned off on SFI as well." * 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: arch/x86/Kconfig: Allow turning off CONFIG_X86_MPPARSE when either ACPI or SFI is present x86/boot/doc: Fix grammar and typo in boot.txt
2012-12-11Merge branch 'x86-asm-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 asm changes from Ingo Molnar: "Two fixlets and a cleanup." * 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86_32: Return actual stack when requesting sp from regs x86: Don't clobber top of pt_regs in nested NMI x86/asm: Clean up copy_page_*() comments and code
2012-12-11Merge branch 'timers-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull core timer changes from Ingo Molnar: "It contains continued generic-NOHZ work by Frederic and smaller cleanups." * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: time: Kill xtime_lock, replacing it with jiffies_lock clocksource: arm_generic: use this_cpu_ptr per-cpu helper clocksource: arm_generic: use integer math helpers time/jiffies: Make clocksource_jiffies static clocksource: clean up parse_pmtmr() tick: Correct the comments for tick_sched_timer() tick: Conditionally build nohz specific code in tick handler tick: Consolidate tick handling for high and low res handlers tick: Consolidate timekeeping handling code
2012-12-11Merge branch 'sched-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: "The biggest change affects group scheduling: we now track the runnable average on a per-task entity basis, allowing a smoother, exponential decay average based load/weight estimation instead of the previous binary on-the-runqueue/off-the-runqueue load weight method. This will inevitably disturb workloads that were in some sort of borderline balancing state or unstable equilibrium, so an eye has to be kept on regressions. For that reason the new load average is only limited to group scheduling (shares distribution) at the moment (which was also hurting the most from the prior, crude weight calculation and whose scheduling quality wins most from this change) - but we plan to extend this to regular SMP balancing as well in the future, which will simplify and speed up things a bit. Other changes involve ongoing preparatory work to extend NOHZ to the scheduler as well, eventually allowing completely irq-free user-space execution." * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (33 commits) Revert "sched/autogroup: Fix crash on reboot when autogroup is disabled" cputime: Comment cputime's adjusting code cputime: Consolidate cputime adjustment code cputime: Rename thread_group_times to thread_group_cputime_adjusted cputime: Move thread_group_cputime() to sched code vtime: Warn if irqs aren't disabled on system time accounting APIs vtime: No need to disable irqs on vtime_account() vtime: Consolidate a bit the ctx switch code vtime: Explicitly account pending user time on process tick vtime: Remove the underscore prefix invasion sched/autogroup: Fix crash on reboot when autogroup is disabled cputime: Separate irqtime accounting from generic vtime cputime: Specialize irq vtime hooks kvm: Directly account vtime to system on guest switch vtime: Make vtime_account_system() irqsafe vtime: Gather vtime declarations to their own header file sched: Describe CFS load-balancer sched: Introduce temporary FAIR_GROUP_SCHED dependency for load-tracking sched: Make __update_entity_runnable_avg() fast sched: Update_cfs_shares at period edge ...