summaryrefslogtreecommitdiffstats
path: root/arch/s390/mm/vmem.c
AgeCommit message (Collapse)Author
2012-05-24s390/headers: replace __s390x__ with CONFIG_64BIT where possibleHeiko Carstens
Replace __s390x__ with CONFIG_64BIT in all places that are not exported to userspace or guarded with #ifdef __KERNEL__. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-10-30[S390] kdump backend codeMichael Holzheu
This patch provides the architecture specific part of the s390 kdump support. Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-07-24[S390] kvm guest address space mappingMartin Schwidefsky
Add code that allows KVM to control the virtual memory layout that is seen by a guest. The guest address space uses a second page table that shares the last level pte-tables with the process page table. If a page is unmapped from the process page table it is automatically unmapped from the guest page table as well. The guest address space mapping starts out empty, KVM can map any individual 1MB segments from the process virtual memory to any 1MB aligned location in the guest virtual memory. If a target segment in the process virtual memory does not exist or is unmapped while a guest mapping exists the desired target address is stored as an invalid segment table entry in the guest page table. The population of the guest page table is fault driven. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-05-23[S390] refactor page table functions for better pgste supportMartin Schwidefsky
Rework the architecture page table functions to access the bits in the page table extension array (pgste). There are a number of changes: 1) Fix missing pgste update if the attach_count for the mm is <= 1. 2) For every operation that affects the invalid bit in the pte or the rcp byte in the pgste the pcl lock needs to be acquired. The function pgste_get_lock gets the pcl lock and returns the current pgste value for a pte pointer. The function pgste_set_unlock stores the pgste and releases the lock. Between these two calls the bits in the pgste can be shuffled. 3) Define two software bits in the pte _PAGE_SWR and _PAGE_SWC to avoid calling SetPageDirty and SetPageReferenced from pgtable.h. If the host reference backup bit or the host change backup bit has been set the dirty/referenced state is transfered to the pte. The common code will pick up the state from the pte. 4) Add ptep_modify_prot_start and ptep_modify_prot_commit for mprotect. 5) Remove pgd_populate_kernel, pud_populate_kernel, pmd_populate_kernel pgd_clear_kernel, pud_clear_kernel, pmd_clear_kernel and ptep_invalidate. 6) Rename kvm_s390_test_and_clear_page_dirty to ptep_test_and_clear_user_dirty and add ptep_test_and_clear_user_young. 7) Define mm_exclusive() and mm_has_pgste() helper to improve readability. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2010-08-09mm: provide init_mm mm_context initializerHeiko Carstens
Provide an INIT_MM_CONTEXT intializer macro which can be used to statically initialize mm_struct:mm_context of init_mm. This way we can get rid of code which will do the initialization at run time (on s390). In addition the current code can be found at a place where it is not expected. So let's have a common initializer which architectures can use if needed. This is based on a patch from Suzuki Poulose. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Suzuki Poulose <suzuki@in.ibm.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-04-09[S390] s390: disable change bit overrideChristian Borntraeger
commit 6a985c6194017de2c062916ad1cd00dee0302c40 ([S390] s390: use change recording override for kernel mapping) deactivated the change bit recording for the kernel mapping to improve the performance. This works most of the time, but there are cases (e.g. kernel runs in home space, futex atomic compare xcmg) where we modify user memory with the kernel mapping instead of the user mapping. Instead of fixing these cases, this patch just deactivates change bit override to avoid future problems with other kernel code that might use the kernel mapping for user memory. CC: stable@kernel.org Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2010-03-30include cleanup: Update gfp.h and slab.h includes to prepare for breaking ↵Tejun Heo
implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_*.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). * x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2009-12-07[S390] s390: use change recording override for kernel mappingChristian Borntraeger
We dont need the dirty bit if a write access is done via the kernel mapping. In that case SetPageDirty and friends are used anyway, no need to do that a second time. We can use the change-recording overide function for the kernel mapping, if available. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2009-09-11[S390] fix recursive locking on page_table_lockMartin Schwidefsky
Suzuki Poulose reported the following recursive locking bug on s390: Here is the stack trace : (see Appendix I for more info) [<0000000000406ed6>] _spin_lock+0x52/0x94 [<0000000000103bde>] crst_table_free+0x14e/0x1a4 [<00000000001ba684>] __pmd_alloc+0x114/0x1ec [<00000000001be8d0>] handle_mm_fault+0x2cc/0xb80 [<0000000000407d62>] do_dat_exception+0x2b6/0x3a0 [<0000000000114f8c>] sysc_return+0x0/0x8 [<00000200001642b2>] 0x200001642b2 The page_table_lock is already acquired in __pmd_alloc (mm/memory.c) and it tries to populate the pud/pgd with a new pmd allocated. If another thread populates it before we get a chance, we free the pmd using pmd_free(). On s390x, pmd_free(even pud_free ) is #defined to crst_table_free(), which acquires the page_table_lock to protect the crst_table index updates. Hence this ends up in a recursive locking of the page_table_lock. The solution suggested by Dave Hansen is to use a new spin lock in the mmu context to protect the access to the crst_list and the pgtable_list. Reported-by: Suzuki Poulose <suzuki@in.ibm.com> Cc: Dave Hansen <dave@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2008-06-10[S390] vmemmap: fix off-by-one bug.Heiko Carstens
If a memory range is supposed to be added to the 1:1 mapping and it ends just below the maximum supported physical address it won't succeed. This is because a test doesn't consider that the end address is 1 smaller than start + size. Fix the comparison. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2008-05-30[S390] Fix section mismatch warnings.Heiko Carstens
This fixes the last remaining section mismatch warnings in s390 architecture code. It reveals also a real bug introduced by... me with git commit 2069e978d5a6e7b45d58027e3de7f879b8c5e488 ("[S390] sparsemem vmemmap: initialize memmap.") Calling the generic vmemmap_alloc_block() function to get initialized memory is a nice idea, however that function is __meminit annotated and therefore the function might be gone if we try to call it later. This can happen if a DCSS segment gets added. So basically revert the patch and clear the memmap explicitly to fix the original bug. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2008-05-15[S390] sparsemem vmemmap: initialize memmap.Heiko Carstens
Let's just use the generic vmmemmap_alloc_block() function which always returns initialized memory. Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2008-04-30[S390] Convert to SPARSEMEM & SPARSEMEM_VMEMMAPHeiko Carstens
Convert s390 to SPARSEMEM and SPARSEMEM_VMEMMAP. We do a select of SPARSEMEM_VMEMMAP since it is configurable. This is because SPARSEMEM without SPARSEMEM_VMEMMAP gives us a hell of broken include dependencies that I don't want to fix. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2008-04-30[S390] System z large page support.Gerald Schaefer
This adds hugetlbfs support on System z, using both hardware large page support if available and software large page emulation on older hardware. Shared (large) page tables are implemented in software emulation mode, by using page->index of the first tail page from a compound large page to store page table information. Signed-off-by: Gerald Schaefer <geraldsc@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2008-04-30[S390] vmemmap: use clear_table to initialise page tables.Heiko Carstens
Always use clear_table to initialise page tables. The overlapping memcpy is just a leftover of a previous version that wasn't fully converted to clear_table. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2008-02-09[S390] Add four level page tables for CONFIG_64BIT=y.Martin Schwidefsky
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2008-02-09[S390] 1K/2K page table pages.Martin Schwidefsky
This patch implements 1K/2K page table pages for s390. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2008-02-05[S390] Remove BUILD_BUG_ON() in vmem code.Heiko Carstens
Remove BUILD_BUG_ON() in vmem code since it causes build failures if the size of struct page increases. Instead calculate at compile time the address of the highest physical address that can be added to the 1:1 mapping. This supposed to fix a build failure with the page owner tracking leak detector patches as reported by akpm. page-owner-tracking-leak-detector-broken-on-s390.patch can be removed from -mm again when this is merged. Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2008-02-05[S390] Fix couple of section mismatches.Heiko Carstens
Fix couple of section mismatches. And since we touch the code anyway change the IPL code to use C99 initializers. Cc: Michael Holzheu <holzheu@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2008-01-26[S390] vmemmap: allocate struct pages before 1:1 mappingChristian Borntraeger
We have seen an oops in an OOM situation, where show_mem tried to access the struct page of a dcss segment. The vmemmap code has already created the 1:1 mapping but failed allocating the struct pages. In the OOM case, show_mem now walks the memory. It uses pfn_valid to detect if it may access the struct page. In the case described above, the mapping was established and pfn_valid returned true. As the struct pages were not allocated, the kernel oopsed. We have to ensure that we have created the struct pages, before we add a mapping pointing to the pages. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2008-01-26[S390] Get rid of HOLES_IN_ZONE requirement.Heiko Carstens
Align everything to MAX_ORDER so we can get rid of the extra checks. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2008-01-26[S390] Change vmalloc defintionsChristian Borntraeger
Currently the vmalloc area starts at a dynamic address depending on the memory size. There was also an 8MB security hole after the physical memory to catch out-of-bounds accesses. We can simplify the code by putting the vmalloc area explicitely at the top of the kernel mapping and setting the vmalloc size to a fixed value of 128MB/128GB for 31bit/64bit systems. Part of the vmalloc area will be used for the vmem_map. This leaves an area of 96MB/1GB for normal vmalloc allocations. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2007-10-22[S390] 4level-fixup cleanupMartin Schwidefsky
Get independent from asm-generic/4level-fixup.h Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2007-10-22[S390] Cleanup page table definitions.Martin Schwidefsky
- De-confuse the defines for the address-space-control-elements and the segment/region table entries. - Create out of line functions for page table allocation / freeing. - Simplify get_shadow_xxx functions. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2007-07-27[S390] Get rid of new section mismatch warnings.Heiko Carstens
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2007-02-05[S390] noexec protectionGerald Schaefer
This provides a noexec protection on s390 hardware. Our hardware does not have any bits left in the pte for a hw noexec bit, so this is a different approach using shadow page tables and a special addressing mode that allows separate address spaces for code and data. As a special feature of our "secondary-space" addressing mode, separate page tables can be specified for the translation of data addresses (storage operands) and instruction addresses. The shadow page table is used for the instruction addresses and the standard page table for the data addresses. The shadow page table is linked to the standard page table by a pointer in page->lru.next of the struct page corresponding to the page that contains the standard page table (since page->private is not really private with the pte_lock and the page table pages are not in the LRU list). Depending on the software bits of a pte, it is either inserted into both page tables or just into the standard (data) page table. Pages of a vma that does not have the VM_EXEC bit set get mapped only in the data address space. Any try to execute code on such a page will cause a page translation exception. The standard reaction to this is a SIGSEGV with two exceptions: the two system call opcodes 0x0a77 (sys_sigreturn) and 0x0aad (sys_rt_sigreturn) are allowed. They are stored by the kernel to the signal stack frame. Unfortunately, the signal return mechanism cannot be modified to use an SA_RESTORER because the exception unwinding code depends on the system call opcode stored behind the signal stack frame. This feature requires that user space is executed in secondary-space mode and the kernel in home-space mode, which means that the addressing modes need to be switched and that the noexec protection only works for user space. After switching the addressing modes, we cannot use the mvcp/mvcs instructions anymore to copy between kernel and user space. A new mvcos instruction has been added to the z9 EC/BC hardware which allows to copy between arbitrary address spaces, but on older hardware the page tables need to be walked manually. Signed-off-by: Gerald Schaefer <geraldsc@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2007-01-11[PATCH] Fix sparsemem on CellDave Hansen
Fix an oops experienced on the Cell architecture when init-time functions, early_*(), are called at runtime. It alters the call paths to make sure that the callers explicitly say whether the call is being made on behalf of a hotplug even, or happening at boot-time. It has been compile tested on ppc64, ia64, s390, i386 and x86_64. Acked-by: Arnd Bergmann <arndb@de.ibm.com> Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Cc: Yasunori Goto <y-goto@jp.fujitsu.com> Acked-by: Andy Whitcroft <apw@shadowen.org> Cc: Christoph Lameter <clameter@engr.sgi.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08[S390] Virtual memmap for s390.Heiko Carstens
Virtual memmap support for s390. Inspired by the ia64 implementation. Unlike ia64 we need a mechanism which allows us to dynamically attach shared memory regions. These memory regions are accessed via the dcss device driver. dcss implements the 'direct_access' operation, which requires struct pages for every single shared page. Therefore this implementation provides an interface to attach/detach shared memory: int add_shared_memory(unsigned long start, unsigned long size); int remove_shared_memory(unsigned long start, unsigned long size); The purpose of the add_shared_memory function is to add the given memory range to the 1:1 mapping and to make sure that the corresponding range in the vmemmap is backed with physical pages. It also initialises the new struct pages. remove_shared_memory in turn only invalidates the page table entries in the 1:1 mapping. The page tables and the memory used for struct pages in the vmemmap are currently not freed. They will be reused when the next segment will be attached. Given that the maximum size of a shared memory region is 2GB and in addition all regions must reside below 2GB this is not too much of a restriction, but there is room for improvement. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>