summaryrefslogtreecommitdiffstats
path: root/include
AgeCommit message (Collapse)Author
2012-10-09rbtree: add RB_DECLARE_CALLBACKS() macroMichel Lespinasse
As proposed by Peter Zijlstra, this makes it easier to define the augmented rbtree callbacks. Signed-off-by: Michel Lespinasse <walken@google.com> Cc: Rik van Riel <riel@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09rbtree: remove prior augmented rbtree implementationMichel Lespinasse
convert arch/x86/mm/pat_rbtree.c to the proposed augmented rbtree api and remove the old augmented rbtree implementation. Signed-off-by: Michel Lespinasse <walken@google.com> Acked-by: Rik van Riel <riel@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09rbtree: faster augmented rbtree manipulationMichel Lespinasse
Introduce new augmented rbtree APIs that allow minimal recalculation of augmented node information. A new callback is added to the rbtree insertion and erase rebalancing functions, to be called on each tree rotations. Such rotations preserve the subtree's root augmented value, but require recalculation of the one child that was previously located at the subtree root. In the insertion case, the handcoded search phase must be updated to maintain the augmented information on insertion, and then the rbtree coloring/rebalancing algorithms keep it up to date. In the erase case, things are more complicated since it is library code that manipulates the rbtree in order to remove internal nodes. This requires a couple additional callbacks to copy a subtree's augmented value when a new root is stitched in, and to recompute augmented values down the ancestry path when a node is removed from the tree. In order to preserve maximum speed for the non-augmented case, we provide two versions of each tree manipulation function. rb_insert_augmented() is the augmented equivalent of rb_insert_color(), and rb_erase_augmented() is the augmented equivalent of rb_erase(). Signed-off-by: Michel Lespinasse <walken@google.com> Acked-by: Rik van Riel <riel@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09rbtree: move some implementation details from rbtree.h to rbtree.cMichel Lespinasse
rbtree users must use the documented APIs to manipulate the tree structure. Low-level helpers to manipulate node colors and parenthood are not part of that API, so move them to lib/rbtree.c [dwmw2@infradead.org: fix jffs2 build issue due to renamed __rb_parent_color field] Signed-off-by: Michel Lespinasse <walken@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Acked-by: David Woodhouse <David.Woodhouse@intel.com> Cc: Rik van Riel <riel@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Daniel Santos <daniel.santos@pobox.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09rbtree: empty nodes have no colorMichel Lespinasse
Empty nodes have no color. We can make use of this property to simplify the code emitted by the RB_EMPTY_NODE and RB_CLEAR_NODE macros. Also, we can get rid of the rb_init_node function which had been introduced by commit 88d19cf37952 ("timers: Add rb_init_node() to allow for stack allocated rb nodes") to avoid some issue with the empty node's color not being initialized. I'm not sure what the RB_EMPTY_NODE checks in rb_prev() / rb_next() are doing there, though. axboe introduced them in commit 10fd48f2376d ("rbtree: fixed reversed RB_EMPTY_NODE and rb_next/prev"). The way I see it, the 'empty node' abstraction is only used by rbtree users to flag nodes that they haven't inserted in any rbtree, so asking the predecessor or successor of such nodes doesn't make any sense. One final rb_init_node() caller was recently added in sysctl code to implement faster sysctl name lookups. This code doesn't make use of RB_EMPTY_NODE at all, and from what I could see it only called rb_init_node() under the mistaken assumption that such initialization was required before node insertion. [sfr@canb.auug.org.au: fix net/ceph/osd_client.c build] Signed-off-by: Michel Lespinasse <walken@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Acked-by: David Woodhouse <David.Woodhouse@intel.com> Cc: Rik van Riel <riel@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Daniel Santos <daniel.santos@pobox.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: John Stultz <john.stultz@linaro.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09rbtree: reference Documentation/rbtree.txt for usage instructionsMichel Lespinasse
I recently started looking at the rbtree code (with an eye towards improving the augmented rbtree support, but I haven't gotten there yet). I noticed a lot of possible speed improvements, which I am now proposing in this patch set. Patches 1-4 are preparatory: remove internal functions from rbtree.h so that users won't be tempted to use them instead of the documented APIs, clean up some incorrect usages I've noticed (in particular, with the recently added fs/proc/proc_sysctl.c rbtree usage), reference the documentation so that people have one less excuse to miss it, etc. Patch 5 is a small module I wrote to check the rbtree performance. It creates 100 nodes with random keys and repeatedly inserts and erases them from an rbtree. Additionally, it has code to check for rbtree invariants after each insert or erase operation. Patches 6-12 is where the rbtree optimizations are done, and they touch only that one file, lib/rbtree.c . I am getting good results out of these - in my small benchmark doing rbtree insertion (including search) and erase, I'm seeing a 30% runtime reduction on Sandybridge E5, which is more than I initially thought would be possible. (the results aren't as impressive on my two other test hosts though, AMD barcelona and Intel Westmere, where I am seeing 14% runtime reduction only). The code size - both source (ommiting comments) and compiled - is also shorter after these changes. However, I do admit that the updated code is more arduous to read - one big reason for that is the removal of the tree rotation helpers, which added some overhead but also made it easier to reason about things locally. Overall, I believe this is an acceptable compromise, given that this code doesn't get modified very often, and that I have good tests for it. Upon Peter's suggestion, I added comments showing the rtree configuration before every rotation. I think they help; however it's still best to have a copy of the cormen/leiserson/rivest book when digging into this code. This patch: reference Documentation/rbtree.txt for usage instructions include/linux/rbtree.h included some basic usage instructions, while Documentation/rbtree.txt had some more complete and easier to follow instructions. Replacing the former with a reference to the latter. Signed-off-by: Michel Lespinasse <walken@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Acked-by: David Woodhouse <David.Woodhouse@intel.com> Cc: Rik van Riel <riel@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Daniel Santos <daniel.santos@pobox.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09thp: introduce pmdp_invalidate()Gerald Schaefer
On s390, a valid page table entry must not be changed while it is attached to any CPU. So instead of pmd_mknotpresent() and set_pmd_at(), an IDTE operation would be necessary there. This patch introduces the pmdp_invalidate() function, to allow architecture-specific implementations. Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Hillf Danton <dhillf@gmail.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09thp: remove assumptions on pgtable_t typeGerald Schaefer
The thp page table pre-allocation code currently assumes that pgtable_t is of type "struct page *". This may not be true for all architectures, so this patch removes that assumption by replacing the functions prepare_pmd_huge_pte() and get_pmd_huge_pte() with two new functions that can be defined architecture-specific. It also removes two VM_BUG_ON checks for page_count() and page_mapcount() operating on a pgtable_t. Apart from the VM_BUG_ON removal, there will be no functional change introduced by this patch. Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Hillf Danton <dhillf@gmail.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09oom: remove deprecated oom_adjDavidlohr Bueso
The deprecated /proc/<pid>/oom_adj is scheduled for removal this month. Signed-off-by: Davidlohr Bueso <dave@gnu.org> Acked-by: David Rientjes <rientjes@google.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09mm: mmu_notifier: have mmu_notifiers use a global SRCU so they may safely ↵Sagi Grimberg
schedule With an RCU based mmu_notifier implementation, any callout to mmu_notifier_invalidate_range_{start,end}() or mmu_notifier_invalidate_page() would not be allowed to call schedule() as that could potentially allow a modification to the mmu_notifier structure while it is currently being used. Since srcu allocs 4 machine words per instance per cpu, we may end up with memory exhaustion if we use srcu per mm. So all mms share a global srcu. Note that during large mmu_notifier activity exit & unregister paths might hang for longer periods, but it is tolerable for current mmu_notifier clients. Signed-off-by: Sagi Grimberg <sagig@mellanox.co.il> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Haggai Eran <haggaie@mellanox.com> Cc: "Paul E. McKenney" <paulmck@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09mm: mmu_notifier: fix inconsistent memory between secondary MMU and hostXiao Guangrong
There is a bug in set_pte_at_notify() which always sets the pte to the new page before releasing the old page in the secondary MMU. At this time, the process will access on the new page, but the secondary MMU still access on the old page, the memory is inconsistent between them The below scenario shows the bug more clearly: at the beginning: *p = 0, and p is write-protected by KSM or shared with parent process CPU 0 CPU 1 write 1 to p to trigger COW, set_pte_at_notify will be called: *pte = new_page + W; /* The W bit of pte is set */ *p = 1; /* pte is valid, so no #PF */ return back to secondary MMU, then the secondary MMU read p, but get: *p == 0; /* * !!!!!! * the host has already set p to 1, but the secondary * MMU still get the old value 0 */ call mmu_notifier_change_pte to release old page in secondary MMU We can fix it by release old page first, then set the pte to the new page. Note, the new page will be firstly used in secondary MMU before it is mapped into the page table of the process, but this is safe because it is protected by the page table lock, there is no race to change the pte [akpm@linux-foundation.org: add comment from Andrea] Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Cc: Avi Kivity <avi@redhat.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09mempolicy: fix a race in shared_policy_replace()Mel Gorman
shared_policy_replace() use of sp_alloc() is unsafe. 1) sp_node cannot be dereferenced if sp->lock is not held and 2) another thread can modify sp_node between spin_unlock for allocating a new sp node and next spin_lock. The bug was introduced before 2.6.12-rc2. Kosaki's original patch for this problem was to allocate an sp node and policy within shared_policy_replace and initialise it when the lock is reacquired. I was not keen on this approach because it partially duplicates sp_alloc(). As the paths were sp->lock is taken are not that performance critical this patch converts sp->lock to sp->mutex so it can sleep when calling sp_alloc(). [kosaki.motohiro@jp.fujitsu.com: Original patch] Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Reviewed-by: Christoph Lameter <cl@linux.com> Cc: Josh Boyer <jwboyer@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09mm: compaction: capture a suitable high-order page immediately when it is ↵Mel Gorman
made available While compaction is migrating pages to free up large contiguous blocks for allocation it races with other allocation requests that may steal these blocks or break them up. This patch alters direct compaction to capture a suitable free page as soon as it becomes available to reduce this race. It uses similar logic to split_free_page() to ensure that watermarks are still obeyed. Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com> Reviewed-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09mm: kill vma flag VM_RESERVED and mm->reserved_vm counterKonstantin Khlebnikov
A long time ago, in v2.4, VM_RESERVED kept swapout process off VMA, currently it lost original meaning but still has some effects: | effect | alternative flags -+------------------------+--------------------------------------------- 1| account as reserved_vm | VM_IO 2| skip in core dump | VM_IO, VM_DONTDUMP 3| do not merge or expand | VM_IO, VM_DONTEXPAND, VM_HUGETLB, VM_PFNMAP 4| do not mlock | VM_IO, VM_DONTEXPAND, VM_HUGETLB, VM_PFNMAP This patch removes reserved_vm counter from mm_struct. Seems like nobody cares about it, it does not exported into userspace directly, it only reduces total_vm showed in proc. Thus VM_RESERVED can be replaced with VM_IO or pair VM_DONTEXPAND | VM_DONTDUMP. remap_pfn_range() and io_remap_pfn_range() set VM_IO|VM_DONTEXPAND|VM_DONTDUMP. remap_vmalloc_range() set VM_DONTEXPAND | VM_DONTDUMP. [akpm@linux-foundation.org: drivers/vfio/pci/vfio_pci.c fixup] Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Carsten Otte <cotte@de.ibm.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Eric Paris <eparis@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Morris <james.l.morris@oracle.com> Cc: Jason Baron <jbaron@redhat.com> Cc: Kentaro Takeda <takedakn@nttdata.co.jp> Cc: Matt Helsley <matthltc@us.ibm.com> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Robert Richter <robert.richter@amd.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Venkatesh Pallipadi <venki@google.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09mm: prepare VM_DONTDUMP for using in driversKonstantin Khlebnikov
Rename VM_NODUMP into VM_DONTDUMP: this name matches other negative flags: VM_DONTEXPAND, VM_DONTCOPY. Currently this flag used only for sys_madvise. The next patch will use it for replacing the outdated flag VM_RESERVED. Also forbid madvise(MADV_DODUMP) for special kernel mappings VM_SPECIAL (VM_IO | VM_DONTEXPAND | VM_RESERVED | VM_PFNMAP) Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Carsten Otte <cotte@de.ibm.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Eric Paris <eparis@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Morris <james.l.morris@oracle.com> Cc: Jason Baron <jbaron@redhat.com> Cc: Kentaro Takeda <takedakn@nttdata.co.jp> Cc: Matt Helsley <matthltc@us.ibm.com> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Robert Richter <robert.richter@amd.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Venkatesh Pallipadi <venki@google.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09mm: kill vma flag VM_EXECUTABLE and mm->num_exe_file_vmasKonstantin Khlebnikov
Currently the kernel sets mm->exe_file during sys_execve() and then tracks number of vmas with VM_EXECUTABLE flag in mm->num_exe_file_vmas, as soon as this counter drops to zero kernel resets mm->exe_file to NULL. Plus it resets mm->exe_file at last mmput() when mm->mm_users drops to zero. VMA with VM_EXECUTABLE flag appears after mapping file with flag MAP_EXECUTABLE, such vmas can appears only at sys_execve() or after vma splitting, because sys_mmap ignores this flag. Usually binfmt module sets mm->exe_file and mmaps executable vmas with this file, they hold mm->exe_file while task is running. comment from v2.6.25-6245-g925d1c4 ("procfs task exe symlink"), where all this stuff was introduced: > The kernel implements readlink of /proc/pid/exe by getting the file from > the first executable VMA. Then the path to the file is reconstructed and > reported as the result. > > Because of the VMA walk the code is slightly different on nommu systems. > This patch avoids separate /proc/pid/exe code on nommu systems. Instead of > walking the VMAs to find the first executable file-backed VMA we store a > reference to the exec'd file in the mm_struct. > > That reference would prevent the filesystem holding the executable file > from being unmounted even after unmapping the VMAs. So we track the number > of VM_EXECUTABLE VMAs and drop the new reference when the last one is > unmapped. This avoids pinning the mounted filesystem. exe_file's vma accounting is hooked into every file mmap/unmmap and vma split/merge just to fix some hypothetical pinning fs from umounting by mm, which already unmapped all its executable files, but still alive. Seems like currently nobody depends on this behaviour. We can try to remove this logic and keep mm->exe_file until final mmput(). mm->exe_file is still protected with mm->mmap_sem, because we want to change it via new sys_prctl(PR_SET_MM_EXE_FILE). Also via this syscall task can change its mm->exe_file and unpin mountpoint explicitly. Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Carsten Otte <cotte@de.ibm.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Eric Paris <eparis@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Morris <james.l.morris@oracle.com> Cc: Jason Baron <jbaron@redhat.com> Cc: Kentaro Takeda <takedakn@nttdata.co.jp> Cc: Matt Helsley <matthltc@us.ibm.com> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Robert Richter <robert.richter@amd.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Venkatesh Pallipadi <venki@google.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09mm: kill vma flag VM_CAN_NONLINEARKonstantin Khlebnikov
Move actual pte filling for non-linear file mappings into the new special vma operation: ->remap_pages(). Filesystems must implement this method to get non-linear mapping support, if it uses filemap_fault() then generic_file_remap_pages() can be used. Now device drivers can implement this method and obtain nonlinear vma support. Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Carsten Otte <cotte@de.ibm.com> Cc: Chris Metcalf <cmetcalf@tilera.com> #arch/tile Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Eric Paris <eparis@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Morris <james.l.morris@oracle.com> Cc: Jason Baron <jbaron@redhat.com> Cc: Kentaro Takeda <takedakn@nttdata.co.jp> Cc: Matt Helsley <matthltc@us.ibm.com> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Robert Richter <robert.richter@amd.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Venkatesh Pallipadi <venki@google.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09mm: kill vma flag VM_INSERTPAGEKonstantin Khlebnikov
Merge VM_INSERTPAGE into VM_MIXEDMAP. VM_MIXEDMAP VMA can mix pure-pfn ptes, special ptes and normal ptes. Now copy_page_range() always copies VM_MIXEDMAP VMA on fork like VM_PFNMAP. If driver populates whole VMA at mmap() it probably not expects page-faults. This patch removes special check from vma_wants_writenotify() which disables pages write tracking for VMA populated via vm_instert_page(). BDI below mapped file should not use dirty-accounting, moreover do_wp_page() can handle this. vm_insert_page() still marks vma after first usage. Usually it is called from f_op->mmap() handler under mm->mmap_sem write-lock, so it able to change vma->vm_flags. Caller must set VM_MIXEDMAP at mmap time if it wants to call this function from other places, for example from page-fault handler. Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Carsten Otte <cotte@de.ibm.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Eric Paris <eparis@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Morris <james.l.morris@oracle.com> Cc: Jason Baron <jbaron@redhat.com> Cc: Kentaro Takeda <takedakn@nttdata.co.jp> Cc: Matt Helsley <matthltc@us.ibm.com> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Robert Richter <robert.richter@amd.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Venkatesh Pallipadi <venki@google.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09mm: introduce arch-specific vma flag VM_ARCH_1Konstantin Khlebnikov
Combine several arch-specific vma flags into one. before patch: 0x00000200 0x01000000 0x20000000 0x40000000 x86 VM_NOHUGEPAGE VM_HUGEPAGE - VM_PAT powerpc - - VM_SAO - parisc VM_GROWSUP - - - ia64 VM_GROWSUP - - - nommu - VM_MAPPED_COPY - - others - - - - after patch: 0x00000200 0x01000000 0x20000000 0x40000000 x86 - VM_PAT VM_HUGEPAGE VM_NOHUGEPAGE powerpc - VM_SAO - - parisc - VM_GROWSUP - - ia64 - VM_GROWSUP - - nommu - VM_MAPPED_COPY - - others - VM_ARCH_1 - - And voila! One completely free bit. Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Carsten Otte <cotte@de.ibm.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Eric Paris <eparis@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Morris <james.l.morris@oracle.com> Cc: Jason Baron <jbaron@redhat.com> Cc: Kentaro Takeda <takedakn@nttdata.co.jp> Cc: Matt Helsley <matthltc@us.ibm.com> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Robert Richter <robert.richter@amd.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Venkatesh Pallipadi <venki@google.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09mm, x86, pat: rework linear pfn-mmap trackingKonstantin Khlebnikov
Replace the generic vma-flag VM_PFN_AT_MMAP with x86-only VM_PAT. We can toss mapping address from remap_pfn_range() into track_pfn_vma_new(), and collect all PAT-related logic together in arch/x86/. This patch also restores orignal frustration-free is_cow_mapping() check in remap_pfn_range(), as it was before commit v2.6.28-rc8-88-g3c8bb73 ("x86: PAT: store vm_pgoff for all linear_over_vma_region mappings - v3") is_linear_pfn_mapping() checks can be removed from mm/huge_memory.c, because it already handled by VM_PFNMAP in VM_NO_THP bit-mask. [suresh.b.siddha@intel.com: Reset the VM_PAT flag as part of untrack_pfn_vma()] Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Venkatesh Pallipadi <venki@google.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Ingo Molnar <mingo@redhat.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Carsten Otte <cotte@de.ibm.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Eric Paris <eparis@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: James Morris <james.l.morris@oracle.com> Cc: Jason Baron <jbaron@redhat.com> Cc: Kentaro Takeda <takedakn@nttdata.co.jp> Cc: Matt Helsley <matthltc@us.ibm.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Robert Richter <robert.richter@amd.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Venkatesh Pallipadi <venki@google.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09x86, pat: separate the pfn attribute tracking for remap_pfn_range and ↵Suresh Siddha
vm_insert_pfn With PAT enabled, vm_insert_pfn() looks up the existing pfn memory attribute and uses it. Expectation is that the driver reserves the memory attributes for the pfn before calling vm_insert_pfn(). remap_pfn_range() (when called for the whole vma) will setup a new attribute (based on the prot argument) for the specified pfn range. This addresses the legacy usage which typically calls remap_pfn_range() with a desired memory attribute. For ranges smaller than the vma size (which is typically not the case), remap_pfn_range() will use the existing memory attribute for the pfn range. Expose two different API's for these different behaviors. track_pfn_insert() for tracking the pfn attribute set by vm_insert_pfn() and track_pfn_remap() for the remap_pfn_range(). This cleanup also prepares the ground for the track/untrack pfn vma routines to take over the ownership of setting PAT specific vm_flag in the 'vma'. [khlebnikov@openvz.org: Clear checks in track_pfn_remap()] [akpm@linux-foundation.org: tweak a few comments] Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Cc: Venkatesh Pallipadi <venki@google.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Ingo Molnar <mingo@redhat.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Carsten Otte <cotte@de.ibm.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Eric Paris <eparis@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: James Morris <james.l.morris@oracle.com> Cc: Jason Baron <jbaron@redhat.com> Cc: Kentaro Takeda <takedakn@nttdata.co.jp> Cc: Konstantin Khlebnikov <khlebnikov@openvz.org> Cc: Matt Helsley <matthltc@us.ibm.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Robert Richter <robert.richter@amd.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09mm: remove __GFP_NO_KSWAPDRik van Riel
When transparent huge pages were introduced, memory compaction and swap storms were an issue, and the kernel had to be careful to not make THP allocations cause pageout or compaction. Now that we have working compaction deferral, kswapd is smart enough to invoke compaction and the quadratic behaviour around isolate_free_pages has been fixed, it should be safe to remove __GFP_NO_KSWAPD. [minchan@kernel.org: Comment fix] [mgorman@suse.de: Avoid direct reclaim for deferred compaction] Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09Merge tag 'asm-generic' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic Pull asm-generic updates from Arnd Bergmann: "This has three changes for asm-generic that did not really fit into any other branch as normal asm-generic changes do. One is a fix for a build warning, the other two are more interesting: * A patch from Mark Brown to allow using the common clock infrastructure on all architectures, so we can use the clock API in architecture independent device drivers. * The UAPI split patches from David Howells for the asm-generic files. There are other architecture specific series that are going through the arch maintainer tree and that depend on this one. There may be a few small merge conflicts between Mark's patch and the following arch header file split patches. In each case the solution will be to keep the new "generic-y += clkdev.h" line, even if it ends up being the only line in the Kbuild file." * tag 'asm-generic' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: UAPI: (Scripted) Disintegrate include/asm-generic asm-generic: Add default clkdev.h asm-generic: xor: mark static functions as __maybe_unused
2012-10-09Merge branch 'linux-next' of git://git.open-osd.org/linux-open-osdLinus Torvalds
Pull exofs update from Boaz Harrosh: "Just three one liners" * 'linux-next' of git://git.open-osd.org/linux-open-osd: pnfs_osd_xdr: Remove unused #include from pnfs_osd_xdr.h ore: signedness bug in _sp2d_min_pg() exofs: check for allocation failure in uri_store()
2012-10-09Merge branches 'fixes-for-37', 'ec' and 'thermal' into releaseLen Brown
2012-10-09Merge branch 'release' of ↵Len Brown
git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux into thermal Conflicts: drivers/staging/omap-thermal/omap-thermal-common. OMAP supplied dummy TC1 and TC2, at the same time that the thermal tree removed them from thermal_zone_device_register() drivers/thermal/cpu_cooling.c b/drivers/thermal/cpu_cooling.c propogate the upstream MAX_IDR_LEVEL re-name to prevent a build failure Previously-fixed-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Len Brown <len.brown@intel.com>
2012-10-09Merge tag 'sound-3.7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound updates from Takashi Iwai: "This contains pretty many small commits covering fairly large range of files in sound/ directory. Partly because of additional API support and partly because of constantly developed ASoC and ARM stuff. Some highlights: - Introduced the helper function and documentation for exposing the channel map via control API, as discussed in Plumbers; most of PCI drivers are covered, will follow more drivers later - Most of drivers have been replaced with the new PM callbacks (if the bus is supported) - HD-audio controller got the support of runtime PM and the support of D3 clock-stop. Also changing the power_save option in sysfs kicks off immediately to enable / disable the power-save mode. - Another significant code change in HD-audio is the rewrite of firmware loading code. Other than that, most of changes in HD-audio are continued cleanups and standardization for the generic auto parser and bug fixes (HBR, device-specific fixups), in addition to the support of channel-map API. - Addition of ASoC bindings for the compressed API, used by the mid-x86 drivers. - Lots of cleanups and API refreshes for ASoC codec drivers and DaVinci. - Conversion of OMAP to dmaengine. - New machine driver for Wolfson Microelectronics Bells. - New CODEC driver for Wolfson Microelectronics WM0010. - Enhancements to the ux500 and wm2000 drivers - A new driver for DA9055 and the support for regulator bypass mode." Fix up various arm soc header file reorg conflicts. * tag 'sound-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (339 commits) ALSA: hda - Add new codec ALC283 ALC290 support ALSA: hda - avoid unneccesary indices on "Headphone Jack" controls ALSA: hda - fix indices on boost volume on Conexant ALSA: aloop - add locking to timer access ALSA: hda - Fix hang caused by race during suspend. sound: Remove unnecessary semicolon ALSA: hda/realtek - Fix detection of ALC271X codec ALSA: hda - Add inverted internal mic quirk for Lenovo IdeaPad U310 ALSA: hda - make Realtek/Sigmatel/Conexant use the generic unsol event ALSA: hda - make a generic unsol event handler ASoC: codecs: Add DA9055 codec driver ASoC: eukrea-tlv320: Convert it to platform driver ALSA: ASoC: add DT bindings for CS4271 ASoC: wm_hubs: Ensure volume updates are handled during class W startup ASoC: wm5110: Adding missing volume update bits ASoC: wm5110: Add OUT3R support ASoC: wm5110: Add AEC loopback support ASoC: wm5110: Rename EPOUT to HPOUT3 ASoC: arizona: Add more clock rates ASoC: arizona: Add more DSP options for mixer input muxes ...
2012-10-08ipv4: Add FLOWI_FLAG_KNOWN_NHJulian Anastasov
Add flag to request that output route should be returned with known rt_gateway, in case we want to use it as nexthop for neighbour resolving. The returned route can be cached as follows: - in NH exception: because the cached routes are not shared with other destinations - in FIB NH: when using gateway because all destinations for NH share same gateway As last option, to return rt_gateway!=0 we have to set DST_NOCACHE. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-08ipv4: introduce rt_uses_gatewayJulian Anastasov
Add new flag to remember when route is via gateway. We will use it to allow rt_gateway to contain address of directly connected host for the cases when DST_NOCACHE is used or when the NH exception caches per-destination route without DST_NOCACHE flag, i.e. when routes are not used for other destinations. By this way we force the neighbour resolving to work with the routed destination but we can use different address in the packet, feature needed for IPVS-DR where original packet for virtual IP is routed via route to real IP. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-08ipv6: gro: fix PV6_GRO_CB(skb)->proto problemEric Dumazet
It seems IPV6_GRO_CB(skb)->proto can be destroyed in skb_gro_receive() if a new skb is allocated (to serve as an anchor for frag_list) We copy NAPI_GRO_CB() only (not the IPV6 specific part) in : *NAPI_GRO_CB(nskb) = *NAPI_GRO_CB(p); So we leave IPV6_GRO_CB(nskb)->proto to 0 (fresh skb allocation) instead of IPPROTO_TCP (6) ipv6_gro_complete() isnt able to call ops->gro_complete() [ tcp6_gro_complete() ] Fix this by moving proto in NAPI_GRO_CB() and getting rid of IPV6_GRO_CB Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-08vlan: don't deliver frames for unknown vlans to protocolsFlorian Zumbiehl
6a32e4f9dd9219261f8856f817e6655114cfec2f made the vlan code skip marking vlan-tagged frames for not locally configured vlans as PACKET_OTHERHOST if there was an rx_handler, as the rx_handler could cause the frame to be received on a different (virtual) vlan-capable interface where that vlan might be configured. As rx_handlers do not necessarily return RX_HANDLER_ANOTHER, this could cause frames for unknown vlans to be delivered to the protocol stack as if they had been received untagged. For example, if an ipv6 router advertisement that's tagged for a locally not configured vlan is received on an interface with macvlan interfaces attached, macvlan's rx_handler returns RX_HANDLER_PASS after delivering the frame to the macvlan interfaces, which caused it to be passed to the protocol stack, leading to ipv6 addresses for the announced prefix being configured even though those are completely unusable on the underlying interface. The fix moves marking as PACKET_OTHERHOST after the rx_handler so the rx_handler, if there is one, sees the frame unchanged, but afterwards, before the frame is delivered to the protocol stack, it gets marked whether there is an rx_handler or not. Signed-off-by: Florian Zumbiehl <florz@florz.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-08net: gro: selective flush of packetsEric Dumazet
Current GRO can hold packets in gro_list for almost unlimited time, in case napi->poll() handler consumes its budget over and over. In this case, napi_complete()/napi_gro_flush() are not called. Another problem is that gro_list is flushed in non friendly way : We scan the list and complete packets in the reverse order. (youngest packets first, oldest packets last) This defeats priorities that sender could have cooked. Since GRO currently only store TCP packets, we dont really notice the bug because of retransmits, but this behavior can add unexpected latencies, particularly on mice flows clamped by elephant flows. This patch makes sure no packet can stay more than 1 ms in queue, and only in stress situations. It also complete packets in the right order to minimize latencies. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Jesse Gross <jesse@nicira.com> Cc: Tom Herbert <therbert@google.com> Cc: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-08Input: extend the number of event (and other) devicesDmitry Torokhov
Extend the amount of character devices, such as eventX, mouseX and jsX, from a hard limit of 32 per input handler to about 1024 shared across all handlers. To be compatible with legacy installations input handlers will start creating char devices with minors in their legacy range, however once legacy range is exhausted they will start allocating minors from the dynamic range 256-1024. Reviewed-by: David Herrmann <dh.herrmann@googlemail.com> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2012-10-08Merge tag 'v3.6-rc7' into i2c-embedded/for-nextWolfram Sang
Linux 3.6-rc7 Needed to get updates from i2c-embedded/for-current into i2c-embedded/for-next
2012-10-08i2c: add Renesas R-Car I2C driverKuninori Morimoto
R-Car I2C is similar with SH7760 I2C. But the SH7760 I2C driver had many workaround operations, since H/W had bugs. Thus, it was pointless to keep compatible between SH7760 and R-Car I2C drivers. This patch creates new Renesas R-Car I2C driver. Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com> Signed-off-by: Wolfram Sang <w.sang@pengutronix.de>
2012-10-08Merge branch 'drm-next' of git://people.freedesktop.org/~airlied/linuxLinus Torvalds
Pill drm updates part 2 from Dave Airlie: "This is the follow-up pull, 3 pieces a) exynos next stuff, was delayed but looks okay to me, one patch in v4l bits but it was acked by v4l person. b) UAPI disintegration bits c) intel fixes - DP fixes, hang fixes, other misc fixes." * 'drm-next' of git://people.freedesktop.org/~airlied/linux: (52 commits) drm: exynos: hdmi: remove drm common hdmi platform data struct drm: exynos: hdmi: add support for exynos5 hdmi drm: exynos: hdmi: replace is_v13 with version check in hdmi drm: exynos: hdmi: add support for exynos5 mixer drm: exynos: hdmi: add support to disable video processor in mixer drm: exynos: hdmi: add support for platform variants for mixer drm: exynos: hdmi: add support for exynos5 hdmiphy drm: exynos: hdmi: add support for exynos5 ddc drm: exynos: remove drm hdmi platform data struct drm: exynos: hdmi: turn off HPD interrupt in HDMI chip drm: exynos: hdmi: use s5p-hdmi platform data drm: exynos: hdmi: fix interrupt handling drm: exynos: hdmi: support for platform variants media: s5p-hdmi: add HPD GPIO to platform data UAPI: (Scripted) Disintegrate include/drm drm/i915: Fix GT_MODE default value drm/i915: don't frob the vblank ts in finish_page_flip drm/i915: call drm_handle_vblank before finish_page_flip drm/i915: print warning if vmi915_gem_fault error is not handled drm/i915: EBUSY status handling added to i915_gem_fault(). ...
2012-10-08MPILIB: Provide a function to read raw data into an MPIDavid Howells
Provide a function to read raw data of a predetermined size into an MPI rather than expecting the size to be encoded within the data. The data is assumed to represent an unsigned integer, and the resulting MPI will be positive. The function looks like this: MPI mpi_read_raw_data(const void *, size_t); This is useful for reading ASN.1 integer primitives where the length is encoded in the ASN.1 metadata. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-10-08X.509: Add an ASN.1 decoderDavid Howells
Add an ASN.1 BER/DER/CER decoder. This uses the bytecode from the ASN.1 compiler in the previous patch to inform it as to what to expect to find in the encoded byte stream. The output from the compiler also tells it what functions to call on what tags, thus allowing the caller to retrieve information. The decoder is called as follows: int asn1_decoder(const struct asn1_decoder *decoder, void *context, const unsigned char *data, size_t datalen); The decoder argument points to the bytecode from the ASN.1 compiler. context is the caller's context and is passed to the action functions. data and datalen define the byte stream to be decoded. Note that the decoder is currently limited to datalen being less than 64K. This reduces the amount of stack space used by the decoder because ASN.1 is a nested construct. Similarly, the decoder is limited to a maximum of 10 levels of constructed data outside of a leaf node also in an effort to keep stack usage down. These restrictions can be raised if necessary. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-10-08X.509: Add simple ASN.1 grammar compilerDavid Howells
Add a simple ASN.1 grammar compiler. This produces a bytecode output that can be fed to a decoder to inform the decoder how to interpret the ASN.1 stream it is trying to parse. Action functions can be specified in the grammar by interpolating: ({ foo }) after a type, for example: SubjectPublicKeyInfo ::= SEQUENCE { algorithm AlgorithmIdentifier, subjectPublicKey BIT STRING ({ do_key_data }) } The decoder is expected to call these after matching this type and parsing the contents if it is a constructed type. The grammar compiler does not currently support the SET type (though it does support SET OF) as I can't see a good way of tracking which members have been encountered yet without using up extra stack space. Currently, the grammar compiler will fail if more than 256 bytes of bytecode would be produced or more than 256 actions have been specified as it uses 8-bit jump values and action indices to keep space usage down. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-10-08X.509: Add utility functions to render OIDs as stringsDavid Howells
Add a pair of utility functions to render OIDs as strings. The first takes an encoded OID and turns it into a "a.b.c.d" form string: int sprint_oid(const void *data, size_t datasize, char *buffer, size_t bufsize); The second takes an OID enum index and calls the first on the data held therein: int sprint_OID(enum OID oid, char *buffer, size_t bufsize); Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-10-08X.509: Implement simple static OID registryDavid Howells
Implement a simple static OID registry that allows the mapping of an encoded OID to an enum value for ease of use. The OID registry index enum appears in the: linux/oid_registry.h header file. A script generates the registry from lines in the header file that look like: <sp*>OID_foo,<sp*>/*<sp*>1.2.3.4<sp*>*/ The actual OID is taken to be represented by the numbers with interpolated dots in the comment. All other lines in the header are ignored. The registry is queries by calling: OID look_up_oid(const void *data, size_t datasize); This returns a number from the registry enum representing the OID if found or OID__NR if not. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-10-08KEYS: Provide signature verification with an asymmetric keyDavid Howells
Provide signature verification using an asymmetric-type key to indicate the public key to be used. The API is a single function that can be found in crypto/public_key.h: int verify_signature(const struct key *key, const struct public_key_signature *sig) The first argument is the appropriate key to be used and the second argument is the parsed signature data: struct public_key_signature { u8 *digest; u16 digest_size; enum pkey_hash_algo pkey_hash_algo : 8; union { MPI mpi[2]; struct { MPI s; /* m^d mod n */ } rsa; struct { MPI r; MPI s; } dsa; }; }; This should be filled in prior to calling the function. The hash algorithm should already have been called and the hash finalised and the output should be in a buffer pointed to by the 'digest' member. Any extra data to be added to the hash by the hash format (eg. PGP) should have been added by the caller prior to finalising the hash. It is assumed that the signature is made up of a number of MPI values. If an algorithm becomes available for which this is not the case, the above structure will have to change. It is also assumed that it will have been checked that the signature algorithm matches the key algorithm. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-10-08KEYS: Asymmetric public-key algorithm crypto key subtypeDavid Howells
Add a subtype for supporting asymmetric public-key encryption algorithms such as DSA (FIPS-186) and RSA (PKCS#1 / RFC1337). Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-10-08KEYS: Asymmetric key pluggable data parsersDavid Howells
The instantiation data passed to the asymmetric key type are expected to be formatted in some way, and there are several possible standard ways to format the data. The two obvious standards are OpenPGP keys and X.509 certificates. The latter is especially useful when dealing with UEFI, and the former might be useful when dealing with, say, eCryptfs. Further, it might be desirable to provide formatted blobs that indicate hardware is to be accessed to retrieve the keys or that the keys live unretrievably in a hardware store, but that the keys can be used by means of the hardware. From userspace, the keys can be loaded using the keyctl command, for example, an X.509 binary certificate: keyctl padd asymmetric foo @s <dhowells.pem or a PGP key: keyctl padd asymmetric bar @s <dhowells.pub or a pointer into the contents of the TPM: keyctl add asymmetric zebra "TPM:04982390582905f8" @s Inside the kernel, pluggable parsers register themselves and then get to examine the payload data to see if they can handle it. If they can, they get to: (1) Propose a name for the key, to be used it the name is "" or NULL. (2) Specify the key subtype. (3) Provide the data for the subtype. The key type asks the parser to do its stuff before a key is allocated and thus before the name is set. If successful, the parser stores the suggested data into the key_preparsed_payload struct, which will be either used (if the key is successfully created and instantiated or updated) or discarded. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-10-08KEYS: Implement asymmetric key typeDavid Howells
Create a key type that can be used to represent an asymmetric key type for use in appropriate cryptographic operations, such as encryption, decryption, signature generation and signature verification. The key type is "asymmetric" and can provide access to a variety of cryptographic algorithms. Possibly, this would be better as "public_key" - but that has the disadvantage that "public key" is an overloaded term. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-10-08MPILIB: Provide count_leading/trailing_zeros() based on arch functionsDavid Howells
Provide count_leading/trailing_zeros() macros based on extant arch bit scanning functions rather than reimplementing from scratch in MPILIB. Whilst we're at it, turn count_foo_zeros(n, x) into n = count_foo_zeros(x). Also move the definition to asm-generic as other people may be interested in using it. Signed-off-by: David Howells <dhowells@redhat.com> Cc: David S. Miller <davem@davemloft.net> Cc: Dmitry Kasatkin <dmitry.kasatkin@intel.com> Cc: Arnd Bergmann <arnd@arndb.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-10-08KEYS: Add payload preparsing opportunity prior to key instantiate or updateDavid Howells
Give the key type the opportunity to preparse the payload prior to the instantiation and update routines being called. This is done with the provision of two new key type operations: int (*preparse)(struct key_preparsed_payload *prep); void (*free_preparse)(struct key_preparsed_payload *prep); If the first operation is present, then it is called before key creation (in the add/update case) or before the key semaphore is taken (in the update and instantiate cases). The second operation is called to clean up if the first was called. preparse() is given the opportunity to fill in the following structure: struct key_preparsed_payload { char *description; void *type_data[2]; void *payload; const void *data; size_t datalen; size_t quotalen; }; Before the preparser is called, the first three fields will have been cleared, the payload pointer and size will be stored in data and datalen and the default quota size from the key_type struct will be stored into quotalen. The preparser may parse the payload in any way it likes and may store data in the type_data[] and payload fields for use by the instantiate() and update() ops. The preparser may also propose a description for the key by attaching it as a string to the description field. This can be used by passing a NULL or "" description to the add_key() system call or the key_create_or_update() function. This cannot work with request_key() as that required the description to tell the upcall about the key to be created. This, for example permits keys that store PGP public keys to generate their own name from the user ID and public key fingerprint in the key. The instantiate() and update() operations are then modified to look like this: int (*instantiate)(struct key *key, struct key_preparsed_payload *prep); int (*update)(struct key *key, struct key_preparsed_payload *prep); and the new payload data is passed in *prep, whether or not it was preparsed. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-10-08Merge branch 'release' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux Pul ACPI & Power Management updates from Len Brown: - acpidump utility added - intel_idle driver now supports IVB Xeon - turbostat utility can now count SMIs - ACPI can now bind to USB3 hubs - misc fixes * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux: (49 commits) ACPI: Add new sysfs interface to export device description ACPI: Harden acpi_table_parse_entries() against BIOS bug tools/power/turbostat: add option to count SMIs, re-name some options tools/power turbostat: add [-d MSR#][-D MSR#] options to print counter deltas intel_idle: enable IVB Xeon support tools/power turbostat: add [-m MSR#] option tools/power turbostat: make -M output pretty tools/power turbostat: print more turbo-limit information tools/power turbostat: delete unused line tools/power turbostat: run on IVB Xeon tools/power/acpi/acpidump: create acpidump(8), local make install targets tools/power/acpi/acpidump: version 20101221 - find dynamic tables in sysfs ACPI: run _OSC after ACPI_FULL_INITIALIZATION tools/power/acpi/acpidump: create acpidump(8), local make install targets tools/power/acpi/acpidump: version 20101221 - find dynamic tables in sysfs tools/power/acpi/acpidump: version 20071116 tools/power/acpi/acpidump: version 20070714 tools/power/acpi/acpidump: version 20060606 tools/power/acpi/acpidump: version 20051111 xo15-ebook: convert to module_acpi_driver() ...
2012-10-07mmc: core: Fixup broken suspend and eMMC4.5 power off notifyUlf Hansson
This patch fixes up the broken suspend sequence for eMMC with sleep support. Additionally it reworks the eMMC4.5 Power Off Notification feature so it fits together with the existing sleep feature. The CMD0 based re-initialization of the eMMC at resume is re-introduced to maintain compatiblity for devices using sleep. A host shall use MMC_CAP2_POWEROFF_NOTIFY to enable the Power Off Notification feature. We might be able to remove this cap later on, if we think that Power Off Notification always is preferred over sleep, even if the host is not able to cut the eMMC VCCQ power. Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Saugata Das <saugata.das@linaro.org> Acked-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Chris Ball <cjb@laptop.org>
2012-10-08Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull ceph updates from Sage Weil: "The bulk of this pull is a series from Alex that refactors and cleans up the RBD code to lay the groundwork for supporting the new image format and evolving feature set. There are also some cleanups in libceph, and for ceph there's fixed validation of file striping layouts and a bugfix in the code handling a shrinking MDS cluster." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (71 commits) ceph: avoid 32-bit page index overflow ceph: return EIO on invalid layout on GET_DATALOC ioctl rbd: BUG on invalid layout ceph: propagate layout error on osd request creation libceph: check for invalid mapping ceph: convert to use le32_add_cpu() ceph: Fix oops when handling mdsmap that decreases max_mds rbd: update remaining header fields for v2 rbd: get snapshot name for a v2 image rbd: get the snapshot context for a v2 image rbd: get image features for a v2 image rbd: get the object prefix for a v2 rbd image rbd: add code to get the size of a v2 rbd image rbd: lay out header probe infrastructure rbd: encapsulate code that gets snapshot info rbd: add an rbd features field rbd: don't use index in __rbd_add_snap_dev() rbd: kill create_snap sysfs entry rbd: define rbd_dev_image_id() rbd: define some new format constants ...