From 66aebce747eaf9bc456bf1f1b217d8db843031d0 Mon Sep 17 00:00:00 2001 From: Chris Metcalf Date: Thu, 12 Apr 2012 12:49:15 -0700 Subject: hugetlb: fix race condition in hugetlb_fault() The race is as follows: Suppose a multi-threaded task forks a new process (on cpu A), thus bumping up the ref count on all the pages. While the fork is occurring (and thus we have marked all the PTEs as read-only), another thread in the original process (on cpu B) tries to write to a huge page, taking an access violation from the write-protect and calling hugetlb_cow(). Now, suppose the fork() fails. It will undo the COW and decrement the ref count on the pages, so the ref count on the huge page drops back to 1. Meanwhile hugetlb_cow() also decrements the ref count by one on the original page, since the original address space doesn't need it any more, having copied a new page to replace the original page. This leaves the ref count at zero, and when we call unlock_page(), we panic. fork on CPU A fault on CPU B ============= ============== ... down_write(&parent->mmap_sem); down_write_nested(&child->mmap_sem); ... while duplicating vmas if error break; ... up_write(&child->mmap_sem); up_write(&parent->mmap_sem); ... down_read(&parent->mmap_sem); ... lock_page(page); handle COW page_mapcount(old_page) == 2 alloc and prepare new_page ... handle error page_remove_rmap(page); put_page(page); ... fold new_page into pte page_remove_rmap(page); put_page(page); ... oops ==> unlock_page(page); up_read(&parent->mmap_sem); The solution is to take an extra reference to the page while we are holding the lock on it. Signed-off-by: Chris Metcalf Cc: Hillf Danton Cc: Michal Hocko Cc: KAMEZAWA Hiroyuki Cc: Hugh Dickins Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/hugetlb.c | 2 ++ 1 file changed, 2 insertions(+) (limited to 'mm/hugetlb.c') diff --git a/mm/hugetlb.c b/mm/hugetlb.c index b8ce6f45095..cd65cb19c94 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -2791,6 +2791,7 @@ int hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, * so no worry about deadlock. */ page = pte_page(entry); + get_page(page); if (page != pagecache_page) lock_page(page); @@ -2822,6 +2823,7 @@ out_page_table_lock: } if (page != pagecache_page) unlock_page(page); + put_page(page); out_mutex: mutex_unlock(&hugetlb_instantiation_mutex); -- cgit v1.2.3-70-g09d2 From b1c12cbcd0a02527c180a862e8971e249d3b347d Mon Sep 17 00:00:00 2001 From: Konstantin Khlebnikov Date: Wed, 25 Apr 2012 16:01:46 -0700 Subject: mm/hugetlb: fix warning in alloc_huge_page/dequeue_huge_page_vma Fix a gcc warning (and bug?) introduced in cc9a6c877 ("cpuset: mm: reduce large amounts of memory barrier related damage v3") Local variable "page" can be uninitialized if the nodemask from vma policy does not intersects with nodemask from cpuset. Even if it doesn't happens it is better to initialize this variable explicitly than to introduce a kernel oops in a weird corner case. mm/hugetlb.c: In function `alloc_huge_page': mm/hugetlb.c:1135:5: warning: `page' may be used uninitialized in this function Signed-off-by: Konstantin Khlebnikov Acked-by: Mel Gorman Acked-by: David Rientjes Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/hugetlb.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'mm/hugetlb.c') diff --git a/mm/hugetlb.c b/mm/hugetlb.c index cd65cb19c94..5a16423a512 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -532,7 +532,7 @@ static struct page *dequeue_huge_page_vma(struct hstate *h, struct vm_area_struct *vma, unsigned long address, int avoid_reserve) { - struct page *page; + struct page *page = NULL; struct mempolicy *mpol; nodemask_t *nodemask; struct zonelist *zonelist; -- cgit v1.2.3-70-g09d2