summaryrefslogtreecommitdiffstats
path: root/mm/nommu.c
AgeCommit message (Collapse)Author
2008-06-06nommu: fix kobjsize() for SLOB and SLUBPaul Mundt
kobjsize() has been abusing page->index as a method for sorting out compound order, which blows up both for page cache pages, and SLOB's reuse of the index in struct slob_page. Presently we are not able to accurately size arbitrary pointers that don't come from kmalloc(), so the best we can do is sort out the compound order from the head page if it's a compound page, or default to 0-order if it's impossible to ksize() the object. Obviously this leaves quite a bit to be desired in terms of object sizing accuracy, but the behaviour is unchanged over the existing implementation, while fixing the page->index oopses originally reported here: http://marc.info/?l=linux-mm&m=121127773325245&w=2 Accuracy could also be improved by having SLUB and SLOB both set PG_slab on ksizeable pages, rather than just handling the __GFP_COMP cases irregardless of the PG_slab setting, as made possibly with Pekka's patches: http://marc.info/?l=linux-kernel&m=121139439900534&w=2 http://marc.info/?l=linux-kernel&m=121139440000537&w=2 http://marc.info/?l=linux-kernel&m=121139440000540&w=2 This is primarily a bugfix for nommu systems for 2.6.26, with the aim being to gradually kill off kobjsize() and its particular brand of object abuse entirely. Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Paul Mundt <lethal@linux-sh.org> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-24mm: fix atomic_t overflow in vmAlan Cox
The atomic_t type is 32bit but a 64bit system can have more than 2^32 pages of virtual address space available. Without this we overflow on ludicrously large mappings Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29procfs task exe symlinkMatt Helsley
The kernel implements readlink of /proc/pid/exe by getting the file from the first executable VMA. Then the path to the file is reconstructed and reported as the result. Because of the VMA walk the code is slightly different on nommu systems. This patch avoids separate /proc/pid/exe code on nommu systems. Instead of walking the VMAs to find the first executable file-backed VMA we store a reference to the exec'd file in the mm_struct. That reference would prevent the filesystem holding the executable file from being unmounted even after unmapping the VMAs. So we track the number of VM_EXECUTABLE VMAs and drop the new reference when the last one is unmapped. This avoids pinning the mounted filesystem. [akpm@linux-foundation.org: improve comments] [yamamoto@valinux.co.jp: fix dup_mmap] Signed-off-by: Matt Helsley <matthltc@us.ibm.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: David Howells <dhowells@redhat.com> Cc:"Eric W. Biederman" <ebiederm@xmission.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Hugh Dickins <hugh@veritas.com> Signed-off-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28mm/nommu.c: return 0 from kobjsize with invalid objectsMichael Hennerich
Don't perform kobjsize operations on objects the kernel doesn't manage. On Blackfin, drivers can get dma coherent memory by calling a function dma_alloc_coherent(). We do this in nommu by configuring a chunk of uncached memory at the top of memory. Since we don't want the kernel to use the uncached memory, we lie to the kernel, and tell it that it's max memory is between 0, and the start of the uncached dma coherent section. this all works well, until this memory gets exposed into userspace (with a frame buffer), when you look at the process's maps, it shows the framebuf: root:/proc> cat maps [snip] 03f0ef00-03f34700 rw-p 00000000 1f:00 192 /dev/fb0 root:/proc> This is outside the "normal" range for the kernel. When the kernel tries to find the size of this object (when you run ps), it dies in nommu.c in kobjsize. BUG_ON(page->index >= MAX_ORDER); since the page we are referring to is outside what the kernel thinks is it's max valid memory. root:~> while [ 1 ]; ps > /dev/null; done kernel BUG at mm/nommu.c:119! Kernel panic - not syncing: BUG! We fixed this by adding a check to reject out of range object pointers as it already does that for NULL pointers. Signed-off-by: Michael Hennerich <Michael.Hennerich@analog.com> Signed-off-by: Robin Getz <rgetz@blackfin.uclinux.org> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05nommu: add new vmalloc_user() and remap_vmalloc_range() interfaces.Paul Mundt
This builds on top of the earlier vmalloc_32_user() work introduced by b50731732f926d6c49fd0724616a7344c31cd5cf, as we now have places in the nommu allmodconfig that hit up against these missing APIs. As vmalloc_32_user() is already implemented, this is moved over to vmalloc_user() and simply made a wrapper. As all current nommu platforms are 32-bit addressable, there's no special casing we have to do for ZONE_DMA and things of that nature as per GFP_VMALLOC32. remap_vmalloc_range() needs to check VM_USERMAP in order to figure out whether we permit the remap or not, which means that we also have to rework the vmalloc_user() code to grovel for the VMA and set the flag. Signed-off-by: Paul Mundt <lethal@linux-sh.org> Acked-by: David McCullough <david_mccullough@securecomputing.com> Acked-by: David Howells <dhowells@redhat.com> Acked-by: Greg Ungerer <gerg@snapgear.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05vmalloc: add const to void* parametersChristoph Lameter
Make vmalloc functions work the same way as kfree() and friends that take a const void * argument. [akpm@linux-foundation.org: fix consts, coding-style] Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-12-06Security: round mmap hint address above mmap_min_addrEric Paris
If mmap_min_addr is set and a process attempts to mmap (not fixed) with a non-null hint address less than mmap_min_addr the mapping will fail the security checks. Since this is just a hint address this patch will round such a hint address above mmap_min_addr. gcj was found to try to be very frugal with vm usage and give hint addresses in the 8k-32k range. Without this patch all such programs failed and with the patch they happily get a higher address. This patch is wrappad in CONFIG_SECURITY since mmap_min_addr doesn't exist without it and there would be no security check possible no matter what. So we should not bother compiling in this rounding if it is just a waste of time. Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>
2007-10-29NOMMU: mm/nommu.c needs linux/module.hDavid Howells
mm/nommu.c needs to #include linux/module.h for it to understand EXPORT_*() macros. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-19Explain clearly why kmalloc() can't use __GFP_HIGHMEM.Robert P. J. Day
Fix the wishy-washy comment to clearly explain why kmalloc() can't use the __GFP_HIGHMEM zone modifier. Signed-off-by: Robert P. J. Day <rpjday@mindspring.com> Signed-off-by: Adrian Bunk <bunk@kernel.org>
2007-10-17security/ cleanupsAdrian Bunk
This patch contains the following cleanups that are now possible: - remove the unused security_operations->inode_xattr_getsuffix - remove the no longer used security_operations->unregister_security - remove some no longer required exit code - remove a bunch of no longer used exports Signed-off-by: Adrian Bunk <bunk@kernel.org> Acked-by: James Morris <jmorris@namei.org> Cc: Chris Wright <chrisw@sous-sol.org> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-08-22fix NULL pointer dereference in __vm_enough_memory()Alan Cox
The new exec code inserts an accounted vma into an mm struct which is not current->mm. The existing memory check code has a hard coded assumption that this does not happen as does the security code. As the correct mm is known we pass the mm to the security method and the helper function. A new security test is added for the case where we need to pass the mm and the existing one is modified to pass current->mm to avoid the need to change large amounts of code. (Thanks to Tobias for fixing rejects and testing) Signed-off-by: Alan Cox <alan@redhat.com> Cc: WU Fengguang <wfg@mail.ustc.edu.cn> Cc: James Morris <jmorris@redhat.com> Cc: Tobias Diedrich <ranma+kernel@tdiedrich.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21nommu: vmalloc_32_user()/vm_insert_page() and symbol exports.Paul Mundt
Trying to survive an allmodconfig on a nommu platform results in many screen lengths of module unhappiness. Many of the mmap related things that binfmt_flat hooks in to are never exported despite being global, and there are also missing definitions for vmalloc_32_user() and vm_insert_page(). I've implemented vmalloc_32_user() trying to stick as close to the mm/vmalloc.c implementation as possible, though we don't have any need for VM_USERMAP, so groveling for the VMA can be skipped. vm_insert_page() has been stubbed for now in order to keep the build happy. Signed-off-by: Paul Mundt <lethal@linux-sh.org> Cc: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19mm: fault feedback #1Nick Piggin
Change ->fault prototype. We now return an int, which contains VM_FAULT_xxx code in the low byte, and FAULT_RET_xxx code in the next byte. FAULT_RET_ code tells the VM whether a page was found, whether it has been locked, and potentially other things. This is not quite the way he wanted it yet, but that's changed in the next patch (which requires changes to arch code). This means we no longer set VM_CAN_INVALIDATE in the vma in order to say that a page is locked which requires filemap_nopage to go away (because we can no longer remain backward compatible without that flag), but we were going to do that anyway. struct fault_data is renamed to struct vm_fault as Linus asked. address is now a void __user * that we should firmly encourage drivers not to use without really good reason. The page is now returned via a page pointer in the vm_fault struct. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19mm: merge populate and nopage into fault (fixes nonlinear)Nick Piggin
Nonlinear mappings are (AFAIKS) simply a virtual memory concept that encodes the virtual address -> file offset differently from linear mappings. ->populate is a layering violation because the filesystem/pagecache code should need to know anything about the virtual memory mapping. The hitch here is that the ->nopage handler didn't pass down enough information (ie. pgoff). But it is more logical to pass pgoff rather than have the ->nopage function calculate it itself anyway (because that's a similar layering violation). Having the populate handler install the pte itself is likewise a nasty thing to be doing. This patch introduces a new fault handler that replaces ->nopage and ->populate and (later) ->nopfn. Most of the old mechanism is still in place so there is a lot of duplication and nice cleanups that can be removed if everyone switches over. The rationale for doing this in the first place is that nonlinear mappings are subject to the pagefault vs invalidate/truncate race too, and it seemed stupid to duplicate the synchronisation logic rather than just consolidate the two. After this patch, MAP_NONBLOCK no longer sets up ptes for pages present in pagecache. Seems like a fringe functionality anyway. NOPAGE_REFAULT is removed. This should be implemented with ->fault, and no users have hit mainline yet. [akpm@linux-foundation.org: cleanup] [randy.dunlap@oracle.com: doc. fixes for readahead] [akpm@linux-foundation.org: build fix] Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: Mark Fasheh <mark.fasheh@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-16nommu: stub expand_stack() for nommu caseGreg Ungerer
Be consistent with VM mmap, implement expand_stack(). We can't actually do anything other than return an error in the no MMU case though. Signed-off-by: Greg Ungerer <gerg@uclinux.org> Cc: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-11security: Protection for exploiting null dereference using mmapEric Paris
Add a new security check on mmap operations to see if the user is attempting to mmap to low area of the address space. The amount of space protected is indicated by the new proc tunable /proc/sys/vm/mmap_min_addr and defaults to 0, preserving existing behavior. This patch uses a new SELinux security class "memprotect." Policy already contains a number of allow rules like a_t self:process * (unconfined_t being one of them) which mean that putting this check in the process class (its best current fit) would make it useless as all user processes, which we also want to protect against, would be allowed. By taking the memprotect name of the new class it will also make it possible for us to move some of the other memory protect permissions out of 'process' and into the new class next time we bump the policy version number (which I also think is a good future idea) Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Acked-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>
2007-05-08move die notifier handling to common codeChristoph Hellwig
This patch moves the die notifier handling to common code. Previous various architectures had exactly the same code for it. Note that the new code is compiled unconditionally, this should be understood as an appel to the other architecture maintainer to implement support for it aswell (aka sprinkling a notify_die or two in the proper place) arm had a notifiy_die that did something totally different, I renamed it to arm_notify_die as part of the patch and made it static to the file it's declared and used at. avr32 used to pass slightly less information through this interface and I brought it into line with the other architectures. [akpm@linux-foundation.org: build fix] [akpm@linux-foundation.org: fix vmalloc_sync_all bustage] [bryan.wu@analog.com: fix vmalloc_sync_all in nommu] Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: <linux-arch@vger.kernel.org> Cc: Russell King <rmk@arm.linux.org.uk> Signed-off-by: Bryan Wu <bryan.wu@analog.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-04-12[PATCH] nommu: fix bug ip_conntrack does not work on nommuWu, Bryan
num_physpages is not exported out in mm/nommu.c, so the ip_conntrack module link will fail. Signed-off-by: Bryan Wu <bryan.wu@analog.com> Acked-By: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-03-22[PATCH] NOMMU: make SYSV SHM nattch work correctlyDavid Howells
Make the SYSV SHM nattch counter work correctly by forcing multiple VMAs to be produced to represent MAP_SHARED segments, even if they overlap exactly. Using this test program: http://people.redhat.com/~dhowells/doshm.c Run as: doshm sysv I can see nattch going from one before the patch: # /doshm sysv Command: sysv shmid: 65536 memory: 0xc3700000 c0b00000-c0b04000 rw-p 00000000 00:00 0 c0bb0000-c0bba788 r-xs 00000000 00:0b 14582157 /lib/ld-uClibc-0.9.28.so c3180000-c31dede4 r-xs 00000000 00:0b 14582179 /lib/libuClibc-0.9.28.so c3520000-c352278c rw-p 00000000 00:0b 13763417 /doshm c3584000-c35865e8 r-xs 00000000 00:0b 13763417 /doshm c3588000-c358aa00 rw-p 00008000 00:0b 14582157 /lib/ld-uClibc-0.9.28.so c3590000-c359b6c0 rw-p 00000000 00:00 0 c3620000-c3640000 rwxp 00000000 00:00 0 c3700000-c37fa000 rw-S 00000000 00:06 1411 /SYSV00000000 (deleted) c3700000-c37fa000 rw-S 00000000 00:06 1411 /SYSV00000000 (deleted) nattch 1 To two after the patch: # /doshm sysv Command: sysv shmid: 0 memory: 0xc3700000 c0bb0000-c0bba788 r-xs 00000000 00:0b 14582157 /lib/ld-uClibc-0.9.28.so c3180000-c31dede4 r-xs 00000000 00:0b 14582179 /lib/libuClibc-0.9.28.so c3320000-c3340000 rwxp 00000000 00:00 0 c3530000-c35325e8 r-xs 00000000 00:0b 13763417 /doshm c3534000-c353678c rw-p 00000000 00:0b 13763417 /doshm c3538000-c353aa00 rw-p 00008000 00:0b 14582157 /lib/ld-uClibc-0.9.28.so c3590000-c359b6c0 rw-p 00000000 00:00 0 c35a4000-c35a8000 rw-p 00000000 00:00 0 c3700000-c37fa000 rw-S 00000000 00:06 1369 /SYSV00000000 (deleted) c3700000-c37fa000 rw-S 00000000 00:06 1369 /SYSV00000000 (deleted) nattch 2 That's +1 to nattch for each shmat() made. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-03-22[PATCH] NOMMU: supply get_unmapped_area() to fix NOMMU SYSV SHMDavid Howells
Supply a get_unmapped_area() to fix NOMMU SYSV SHM support. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Adam Litke <agl@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2006-12-08[PATCH] struct path: convert mmJosef Sipek
Signed-off-by: Josef Sipek <jsipek@fsl.cs.sunysb.edu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07[PATCH] kernel core: replace kmalloc+memset with kzallocBurman Yan
Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-06[PATCH] uclinux: fix mmap() of directory for nommu caseMike Frysinger
I was playing with blackfin when i hit a neat bug ... doing an open() on a directory and then passing that fd to mmap() would cause the kernel to hang after poking into the code a bit more, i found that mm/nommu.c:validate_mmap_request() checks the length and if it is 0, just returns the address ... this is in stark contrast to mmu's mm/mmap.c:do_mmap_pgoff() where it returns -EINVAL for 0 length requests ... i then noticed that some other parts of the logic is out of date between the two funcs, so perhaps that's the easy fix ? Signed-off-by: Greg Ungerer <gerg@uclinux.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-03Spelling fix: "control" instead of "cotrol"Michael Opdenacker
This patch against fixes a spelling mistake ("control" instead of "cotrol"). Signed-off-by: Michael Opdenacker <michael@free-electrons.com> Acked-by: Alan Cox <alan@redhat.com> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-10-01[PATCH] NOMMU: don't try and give NULL to fput()Gavin Lambert
Don't try and give NULL to fput() in the error handling in do_mmap_pgoff() as it'll cause an oops. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-27[PATCH] NOMMU: Make futexes work under NOMMU conditionsDavid Howells
Make futexes work under NOMMU conditions. This can be tested by running this in one shell: #define SYSERROR(X, Y) \ do { if ((long)(X) == -1L) { perror(Y); exit(1); }} while(0) int main() { int shmid, tmp, *f, n; shmid = shmget(23, 4, IPC_CREAT|0666); SYSERROR(shmid, "shmget"); f = shmat(shmid, NULL, 0); SYSERROR(f, "shmat"); n = *f; printf("WAIT: %p{%x}\n", f, n); tmp = futex(f, FUTEX_WAIT, n, NULL, NULL, 0); SYSERROR(tmp, "futex"); printf("WAITED: %d\n", tmp); tmp = shmdt(f); SYSERROR(tmp, "shmdt"); exit(0); } And then this in the other shell: #define SYSERROR(X, Y) \ do { if ((long)(X) == -1L) { perror(Y); exit(1); }} while(0) int main() { int shmid, tmp, *f; shmid = shmget(23, 4, IPC_CREAT|0666); SYSERROR(shmid, "shmget"); f = shmat(shmid, NULL, 0); SYSERROR(f, "shmat"); (*f)++; printf("WAKE: %p{%x}\n", f, *f); tmp = futex(f, FUTEX_WAKE, 1, NULL, NULL, 0); SYSERROR(tmp, "futex"); printf("WOKE: %d\n", tmp); tmp = shmdt(f); SYSERROR(tmp, "shmdt"); exit(0); } The first program will set up a SYSV IPC SHM segment and wait on a futex in it for the number at the start to change. The program will increment that number and wake the first program up. This leads to output of the form: SHELL 1 SHELL 2 ======================= ======================= # /dowait WAIT: 0xc32ac000{0} # /dowake WAKE: 0xc32ac000{1} WAITED: 0 WOKE: 1 Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-27[PATCH] NOMMU: Make mremap() partially work for NOMMU kernelsDavid Howells
Make mremap() partially work for NOMMU kernels. It may resize a VMA provided that it doesn't exceed the size of the slab object in which the storage is allocated that the VMA refers to. Shareable VMAs may not be resized. Moving VMAs (as permitted by MREMAP_MAYMOVE) is not currently supported. This patch also makes use of the fact that the VMA list is now ordered to cut it short when possible. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-27[PATCH] NOMMU: Order the per-mm_struct VMA listDavid Howells
Order the per-mm_struct VMA list by address so that searching it can be cut short when the appropriate address has been exceeded. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-27[PATCH] NOMMU: Permit ptrace to ignore non-PROT_WRITE VMAs in NOMMU modeDavid Howells
Permit ptrace to modify a section that's non-shared but is marked unwritable, such as is obtained by mapping the text segment of an ELF-FDPIC executable binary with into a binary that's being ptraced[*]. [*] Under NOMMU conditions ptrace causes read-only MAP_PRIVATE mmaps to become totally private copies because if a private mapping was actually shared then the debugging setting breakpoints in it would potentially crash other processes. This is done by using the VM_MAYWRITE flag rather than the VM_WRITE flag when deciding whether to permit a write. Without this patch a debugger can't set breakpoints in the mapped text sections of executables that are mapped read-only private, even if the mmap() syscall has taken a private copy because PT_PTRACED is set. In addition, VM_MAYREAD is used instead of VM_READ for similar reasons. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-27[PATCH] NOMMU: Check VMA protectionsDavid Howells
Check the VMA protections in get_user_pages() against what's being asked. This checks to see that we don't accidentally write on a non-writable VMA or permit an I/O mapping VMA to be accessed (which may lack page structs). Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-27[PATCH] Check if start address is in vma region in NOMMU function ↵Sonic Zhang
get_user_pages() In NOMMU arch, if run "cat /proc/self/mem", data from physical address 0 are read. This behavior is different from MMU arch. In IA32, message "cat: /proc/self/mem: Input/output error" is reported. This issue is rootcaused by not validate the start address in NOMMU function get_user_pages(). Following patch solves this issue. Signed-off-by: Sonic Zhang <sonic.adi@gmail.com> Cc: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-27[PATCH] NOMMU: Use find_vma() rather than reimplementing a VMA searchDavid Howells
Use find_vma() in the NOMMU version of access_process_vm() rather than reimplementing it. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-27[PATCH] NOMMU: Check that access_process_vm() has a valid targetDavid Howells
Check that access_process_vm() is accessing a valid mapping in the target process. This limits ptrace() accesses and accesses through /proc/<pid>/maps to only those regions actually mapped by a program. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-26[PATCH] ZVC: Support NR_SLAB_RECLAIMABLE / NR_SLAB_UNRECLAIMABLEChristoph Lameter
Remove the atomic counter for slab_reclaim_pages and replace the counter and NR_SLAB with two ZVC counter that account for unreclaimable and reclaimable slab pages: NR_SLAB_RECLAIMABLE and NR_SLAB_UNRECLAIMABLE. Change the check in vmscan.c to refer to to NR_SLAB_RECLAIMABLE. The intend seems to be to check for slab pages that could be freed. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-14[PATCH] nommu: export two symbols for drivers to useLuke Yang
nommu.c needs to export two more symbols for drivers to use: remap_pfn_range and unmap_mapping_range. Signed-off-by: Luke Yang <luke.adi@gmail.com> Cc: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-30[PATCH] zoned vm counters: conversion of nr_pagecache to per zone counterChristoph Lameter
Currently a single atomic variable is used to establish the size of the page cache in the whole machine. The zoned VM counters have the same method of implementation as the nr_pagecache code but also allow the determination of the pagecache size per zone. Remove the special implementation for nr_pagecache and make it a zoned counter named NR_FILE_PAGES. Updates of the page cache counters are always performed with interrupts off. We can therefore use the __ variant here. Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-11[PATCH] overcommit: use totalreserve_pages for nommuHideo AOKI
This patch is an enhancement of OVERCOMMIT_GUESS algorithm in __vm_enough_memory() in mm/nommu.c. When the OVERCOMMIT_GUESS algorithm calculates the number of free pages, the algorithm subtracts the number of reserved pages from the result nr_free_pages(). Signed-off-by: Hideo Aoki <haoki@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-22[PATCH] mm: nommu use compound pagesNick Piggin
Now that compound page handling is properly fixed in the VM, move nommu over to using compound pages rather than rolling their own refcounting. nommu vm page refcounting is broken anyway, but there is no need to have divergent code in the core VM now, nor when it gets fixed. Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: David Howells <dhowells@redhat.com> (Needs testing, please). Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-28[PATCH] nommu: implement vmalloc_node()Andrew Morton
Fix oprofile linkage. Pointed out by "Luke Yang" <luke.adi@gmail.com>. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-20[PATCH] Fix undefined symbols for nommu architectureLuke Yang
Signed-off-by: Luke Yang <luke.adi@gmail.com> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06[PATCH] NOMMU: Make SYSV IPC SHM use ramfs facilities on NOMMUDavid Howells
The attached patch makes the SYSV IPC shared memory facilities use the new ramfs facilities on a no-MMU kernel. The following changes are made: (1) There are now shmem_mmap() and shmem_get_unmapped_area() functions to allow the IPC SHM facilities to commune with the tiny-shmem and shmem code. (2) ramfs files now need resizing using do_truncate() rather than by modifying the inode size directly (see shmem_file_setup()). This causes ramfs to attempt to bind a block of pages of sufficient size to the inode. (3) CONFIG_SYSVIPC is no longer contingent on CONFIG_MMU. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-28mm: re-architect the VM_UNPAGED logicLinus Torvalds
This replaces the (in my opinion horrible) VM_UNMAPPED logic with very explicit support for a "remapped page range" aka VM_PFNMAP. It allows a VM area to contain an arbitrary range of page table entries that the VM never touches, and never considers to be normal pages. Any user of "remap_pfn_range()" automatically gets this new functionality, and doesn't even have to mark the pages reserved or indeed mark them any other way. It just works. As a side effect, doing mmap() on /dev/mem works for arbitrary ranges. Sparc update from David in the next commit. Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-07[PATCH] mm/{mmap,nommu}.c: several unexportsAdrian Bunk
I didn't find any possible modular usage in the kernel. This patch was already ACK'ed by Christoph Hellwig. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-29[PATCH] mm: follow_page with inner ptlockHugh Dickins
Final step in pushing down common core's page_table_lock. follow_page no longer wants caller to hold page_table_lock, uses pte_offset_map_lock itself; and so no page_table_lock is taken in get_user_pages itself. But get_user_pages (and get_futex_key) do then need follow_page to pin the page for them: take Daniel's suggestion of bitflags to follow_page. Need one for WRITE, another for TOUCH (it was the accessed flag before: vanished along with check_user_page_readable, but surely get_numa_maps is wrong to mark every page it finds as accessed), another for GET. And another, ANON to dispose of untouched_anonymous_page: it seems silly for that to descend a second time, let follow_page observe if there was no page table and return ZERO_PAGE if so. Fix minor bug in that: check VM_LOCKED - make_pages_present ought to make readonly anonymous present. Give get_numa_maps a cond_resched while we're there. Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-29[PATCH] mm: update_hiwaters just in timeHugh Dickins
update_mem_hiwater has attracted various criticisms, in particular from those concerned with mm scalability. Originally it was called whenever rss or total_vm got raised. Then many of those callsites were replaced by a timer tick call from account_system_time. Now Frank van Maarseveen reports that to be found inadequate. How about this? Works for Frank. Replace update_mem_hiwater, a poor combination of two unrelated ops, by macros update_hiwater_rss and update_hiwater_vm. Don't attempt to keep mm->hiwater_rss up to date at timer tick, nor every time we raise rss (usually by 1): those are hot paths. Do the opposite, update only when about to lower rss (usually by many), or just before final accounting in do_exit. Handle mm->hiwater_vm in the same way, though it's much less of an issue. Demand that whoever collects these hiwater statistics do the work of taking the maximum with rss or total_vm. And there has been no collector of these hiwater statistics in the tree. The new convention needs an example, so match Frank's usage by adding a VmPeak line above VmSize to /proc/<pid>/status, and also a VmHWM line above VmRSS (High-Water-Mark or High-Water-Memory). There was a particular anomaly during mremap move, that hiwater_vm might be captured too high. A fleeting such anomaly remains, but it's quickly corrected now, whereas before it would stick. What locking? None: if the app is racy then these statistics will be racy, it's not worth any overhead to make them exact. But whenever it suits, hiwater_vm is updated under exclusive mmap_sem, and hiwater_rss under page_table_lock (for now) or with preemption disabled (later on): without going to any trouble, minimize the time between reading current values and updating, to minimize those occasions when a racing thread bumps a count up and back down in between. Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-29[PATCH] mm: rss = file_rss + anon_rssHugh Dickins
I was lazy when we added anon_rss, and chose to change as few places as possible. So currently each anonymous page has to be counted twice, in rss and in anon_rss. Which won't be so good if those are atomic counts in some configurations. Change that around: keep file_rss and anon_rss separately, and add them together (with get_mm_rss macro) when the total is needed - reading two atomics is much cheaper than updating two atomics. And update anon_rss upfront, typically in memory.c, not tucked away in page_add_anon_rmap. Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-08[PATCH] gfp flags annotations - part 1Al Viro
- added typedef unsigned int __nocast gfp_t; - replaced __nocast uses for gfp flags with gfp_t - it gives exactly the same warnings as far as sparse is concerned, doesn't change generated code (from gcc point of view we replaced unsigned int with typedef) and documents what's going on far better. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-11[PATCH] uclinux: add NULL check, 0 end valid check and some more exports to ↵Greg Ungerer
nommu.c Move call to get_mm_counter() in update_mem_hiwater() to be inside the check for tsk->mm being null. Otherwise you can be following a null pointer here. This patch submitted by Javier Herrero <jherrero@hvsistemas.es>. Modify the end check for munmap regions to allow for the legacy behavior of 0 being valid. Pretty much all current uClinux system libc malloc's pass in 0 as the end point. A hard check will fail on these, so change the check so that if it is non-zero it must be valid otherwise it fails. A passed in value will always succeed (as it used too). Also export a few more mm system functions - to be consistent with the VM code exports. Signed-off-by: Greg Ungerer <gerg@uclinux.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-08-04[PATCH] __vm_enough_memory() signedness fixSimon Derr
We have found what seems to be a small bug in __vm_enough_memory() when sysctl_overcommit_memory is set to OVERCOMMIT_NEVER. When this bug occurs the systems fails to boot, with /sbin/init whining about fork() returning ENOMEM. We hunted down the problem to this: The deferred update mecanism used in vm_acct_memory(), on a SMP system, allows the vm_committed_space counter to have a negative value. This should not be a problem since this counter is known to be inaccurate. But in __vm_enough_memory() this counter is compared to the `allowed' variable, which is an unsigned long. This comparison is broken since it will consider the negative values of vm_committed_space to be huge positive values, resulting in a memory allocation failure. Signed-off-by: <Jean-Marc.Saffroy@ext.bull.net> Signed-off-by: <Simon.Derr@bull.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21[PATCH] Avoiding mmap fragmentationWolfgang Wander
Ingo recently introduced a great speedup for allocating new mmaps using the free_area_cache pointer which boosts the specweb SSL benchmark by 4-5% and causes huge performance increases in thread creation. The downside of this patch is that it does lead to fragmentation in the mmap-ed areas (visible via /proc/self/maps), such that some applications that work fine under 2.4 kernels quickly run out of memory on any 2.6 kernel. The problem is twofold: 1) the free_area_cache is used to continue a search for memory where the last search ended. Before the change new areas were always searched from the base address on. So now new small areas are cluttering holes of all sizes throughout the whole mmap-able region whereas before small holes tended to close holes near the base leaving holes far from the base large and available for larger requests. 2) the free_area_cache also is set to the location of the last munmap-ed area so in scenarios where we allocate e.g. five regions of 1K each, then free regions 4 2 3 in this order the next request for 1K will be placed in the position of the old region 3, whereas before we appended it to the still active region 1, placing it at the location of the old region 2. Before we had 1 free region of 2K, now we only get two free regions of 1K -> fragmentation. The patch addresses thes issues by introducing yet another cache descriptor cached_hole_size that contains the largest known hole size below the current free_area_cache. If a new request comes in the size is compared against the cached_hole_size and if the request can be filled with a hole below free_area_cache the search is started from the base instead. The results look promising: Whereas 2.6.12-rc4 fragments quickly and my (earlier posted) leakme.c test program terminates after 50000+ iterations with 96 distinct and fragmented maps in /proc/self/maps it performs nicely (as expected) with thread creation, Ingo's test_str02 with 20000 threads requires 0.7s system time. Taking out Ingo's patch (un-patch available per request) by basically deleting all mentions of free_area_cache from the kernel and starting the search for new memory always at the respective bases we observe: leakme terminates successfully with 11 distinctive hardly fragmented areas in /proc/self/maps but thread creating is gringdingly slow: 30+s(!) system time for Ingo's test_str02 with 20000 threads. Now - drumroll ;-) the appended patch works fine with leakme: it ends with only 7 distinct areas in /proc/self/maps and also thread creation seems sufficiently fast with 0.71s for 20000 threads. Signed-off-by: Wolfgang Wander <wwc@rentec.com> Credit-to: "Richard Purdie" <rpurdie@rpsys.net> Signed-off-by: Ken Chen <kenneth.w.chen@intel.com> Acked-by: Ingo Molnar <mingo@elte.hu> (partly) Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>