asmadeus/linux.git - The linux kernel

Age	Commit message (Collapse)	Author
2013-10-17	powerpc: book3e: _PAGE_LENDIAN must be _PAGE_ENDIAN	Bharat Bhushan
	For booke3e _PAGE_ENDIAN is not defined. Infact what is defined is "_PAGE_LENDIAN" which is wrong and that should be _PAGE_ENDIAN. There are no compilation errors as arch/powerpc/include/asm/pte-common.h defines _PAGE_ENDIAN to 0 as it is not defined anywhere. Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>
2013-10-17	KVM: PPC: Book3S HV: Better handling of exceptions that happen in real mode	Paul Mackerras
	When an interrupt or exception happens in the guest that comes to the host, the CPU goes to hypervisor real mode (MMU off) to handle the exception but doesn't change the MMU context. After saving a few registers, we then clear the "in guest" flag. If, for any reason, we get an exception in the real-mode code, that then gets handled by the normal kernel exception handlers, which turn the MMU on. This is disastrous if the MMU is still set to the guest context, since we end up executing instructions from random places in the guest kernel with hypervisor privilege. In order to catch this situation, we define a new value for the "in guest" flag, KVM_GUEST_MODE_HOST_HV, to indicate that we are in hypervisor real mode with guest MMU context. If the "in guest" flag is set to this value, we branch off to an emergency handler. For the moment, this just does a branch to self to stop the CPU from doing anything further. While we're here, we define another new flag value to indicate that we are in a HV guest, as distinct from a PR guest. This will be useful when we have a kernel that can support both PR and HV guests concurrently. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2013-10-17	kvm: powerpc: book3s hv: Fix vcore leak	Paul Mackerras
	add kvmppc_free_vcores() to free the kvmppc_vcore structures that we allocate for a guest, which are currently being leaked. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Alexander Graf <agraf@suse.de>
2013-10-17	KVM: PPC: Book3S PR: Reduce number of shadow PTEs invalidated by MMU notifiers	Paul Mackerras
	Currently, whenever any of the MMU notifier callbacks get called, we invalidate all the shadow PTEs. This is inefficient because it means that we typically then get a lot of DSIs and ISIs in the guest to fault the shadow PTEs back in. We do this even if the address range being notified doesn't correspond to guest memory. This commit adds code to scan the memslot array to find out what range(s) of guest physical addresses corresponds to the host virtual address range being affected. For each such range we flush only the shadow PTEs for the range, on all cpus. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2013-10-17	KVM: PPC: Book3S PR: Mark pages accessed, and dirty if being written	Paul Mackerras
	The mark_page_dirty() function, despite what its name might suggest, doesn't actually mark the page as dirty as far as the MM subsystem is concerned. It merely sets a bit in KVM's map of dirty pages, if userspace has requested dirty tracking for the relevant memslot. To tell the MM subsystem that the page is dirty, we have to call kvm_set_pfn_dirty() (or an equivalent such as SetPageDirty()). This adds a call to kvm_set_pfn_dirty(), and while we are here, also adds a call to kvm_set_pfn_accessed() to tell the MM subsystem that the page has been accessed. Since we are now using the pfn in several places, this adds a 'pfn' variable to store it and changes the places that used hpaddr >> PAGE_SHIFT to use pfn instead, which is the same thing. This also changes a use of HPTE_R_PP to PP_RXRX. Both are 3, but PP_RXRX is more informative as being the read-only page permission bit setting. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2013-10-17	KVM: PPC: Book3S PR: Use mmu_notifier_retry() in kvmppc_mmu_map_page()	Paul Mackerras
	When the MM code is invalidating a range of pages, it calls the KVM kvm_mmu_notifier_invalidate_range_start() notifier function, which calls kvm_unmap_hva_range(), which arranges to flush all the existing host HPTEs for guest pages. However, the Linux PTEs for the range being flushed are still valid at that point. We are not supposed to establish any new references to pages in the range until the ...range_end() notifier gets called. The PPC-specific KVM code doesn't get any explicit notification of that; instead, we are supposed to use mmu_notifier_retry() to test whether we are or have been inside a range flush notifier pair while we have been getting a page and instantiating a host HPTE for the page. This therefore adds a call to mmu_notifier_retry inside kvmppc_mmu_map_page(). This call is inside a region locked with kvm->mmu_lock, which is the same lock that is called by the KVM MMU notifier functions, thus ensuring that no new notification can proceed while we are in the locked region. Inside this region we also create the host HPTE and link the corresponding hpte_cache structure into the lists used to find it later. We cannot allocate the hpte_cache structure inside this locked region because that can lead to deadlock, so we allocate it outside the region and free it if we end up not using it. This also moves the updates of vcpu3s->hpte_cache_count inside the regions locked with vcpu3s->mmu_lock, and does the increment in kvmppc_mmu_hpte_cache_map() when the pte is added to the cache rather than when it is allocated, in order that the hpte_cache_count is accurate. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2013-10-17	KVM: PPC: Book3S PR: Better handling of host-side read-only pages	Paul Mackerras
	Currently we request write access to all pages that get mapped into the guest, even if the guest is only loading from the page. This reduces the effectiveness of KSM because it means that we unshare every page we access. Also, we always set the changed (C) bit in the guest HPTE if it allows writing, even for a guest load. This fixes both these problems. We pass an 'iswrite' flag to the mmu.xlate() functions and to kvmppc_mmu_map_page() to indicate whether the access is a load or a store. The mmu.xlate() functions now only set C for stores. kvmppc_gfn_to_pfn() now calls gfn_to_pfn_prot() instead of gfn_to_pfn() so that it can indicate whether we need write access to the page, and get back a 'writable' flag to indicate whether the page is writable or not. If that 'writable' flag is clear, we then make the host HPTE read-only even if the guest HPTE allowed writing. This means that we can get a protection fault when the guest writes to a page that it has mapped read-write but which is read-only on the host side (perhaps due to KSM having merged the page). Thus we now call kvmppc_handle_pagefault() for protection faults as well as HPTE not found faults. In kvmppc_handle_pagefault(), if the access was allowed by the guest HPTE and we thus need to install a new host HPTE, we then need to remove the old host HPTE if there is one. This is done with a new function, kvmppc_mmu_unmap_page(), which uses kvmppc_mmu_pte_vflush() to find and remove the old host HPTE. Since the memslot-related functions require the KVM SRCU read lock to be held, this adds srcu_read_lock/unlock pairs around the calls to kvmppc_handle_pagefault(). Finally, this changes kvmppc_mmu_book3s_32_xlate_pte() to not ignore guest HPTEs that don't permit access, and to return -EPERM for accesses that are not permitted by the page protections. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2013-10-17	KVM: PPC: Book3S: Move skip-interrupt handlers to common code	Paul Mackerras
	Both PR and HV KVM have separate, identical copies of the kvmppc_skip_interrupt and kvmppc_skip_Hinterrupt handlers that are used for the situation where an interrupt happens when loading the instruction that caused an exit from the guest. To eliminate this duplication and make it easier to compile in both PR and HV KVM, this moves this code to arch/powerpc/kernel/exceptions-64s.S along with other kernel interrupt handler code. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2013-10-17	KVM: PPC: Book3S PR: Allocate kvm_vcpu structs from kvm_vcpu_cache	Paul Mackerras
	This makes PR KVM allocate its kvm_vcpu structs from the kvm_vcpu_cache rather than having them embedded in the kvmppc_vcpu_book3s struct, which is allocated with vzalloc. The reason is to reduce the differences between PR and HV KVM in order to make is easier to have them coexist in one kernel binary. With this, the kvm_vcpu struct has a pointer to the kvmppc_vcpu_book3s struct. The pointer to the kvmppc_book3s_shadow_vcpu struct has moved from the kvmppc_vcpu_book3s struct to the kvm_vcpu struct, and is only present for 32-bit, since it is only used for 32-bit. Signed-off-by: Paul Mackerras <paulus@samba.org> [agraf: squash in compile fix from Aneesh] Signed-off-by: Alexander Graf <agraf@suse.de>
2013-10-17	KVM: PPC: Book3S PR: Make HPT accesses and updates SMP-safe	Paul Mackerras
	This adds a per-VM mutex to provide mutual exclusion between vcpus for accesses to and updates of the guest hashed page table (HPT). This also makes the code use single-byte writes to the HPT entry when updating of the reference (R) and change (C) bits. The reason for doing this, rather than writing back the whole HPTE, is that on non-PAPR virtual machines, the guest OS might be writing to the HPTE concurrently, and writing back the whole HPTE might conflict with that. Also, real hardware does single-byte writes to update R and C. The new mutex is taken in kvmppc_mmu_book3s_64_xlate() when reading the HPT and updating R and/or C, and in the PAPR HPT update hcalls (H_ENTER, H_REMOVE, etc.). Having the mutex means that we don't need to use a hypervisor lock bit in the HPT update hcalls, and we don't need to be careful about the order in which the bytes of the HPTE are updated by those hcalls. The other change here is to make emulated TLB invalidations (tlbie) effective across all vcpus. To do this we call kvmppc_mmu_pte_vflush for all vcpus in kvmppc_ppc_book3s_64_tlbie(). For 32-bit, this makes the setting of the accessed and dirty bits use single-byte writes, and makes tlbie invalidate shadow HPTEs for all vcpus. With this, PR KVM can successfully run SMP guests. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2013-10-17	KVM: PPC: Book3S PR: Correct errors in H_ENTER implementation	Paul Mackerras
	The implementation of H_ENTER in PR KVM has some errors: * With H_EXACT not set, if the HPTEG is full, we return H_PTEG_FULL as the return value of kvmppc_h_pr_enter, but the caller is expecting one of the EMULATE_* values. The H_PTEG_FULL needs to go in the guest's R3 instead. * With H_EXACT set, if the selected HPTE is already valid, the H_ENTER call should return a H_PTEG_FULL error. This fixes these errors and also makes it write only the selected HPTE, not the whole group, since only the selected HPTE has been modified. This also micro-optimizes the calculations involving pte_index and i. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2013-10-17	KVM: PPC: Book3S PR: Handle PP0 page-protection bit in guest HPTEs	Paul Mackerras
	64-bit POWER processors have a three-bit field for page protection in the hashed page table entry (HPTE). Currently we only interpret the two bits that were present in older versions of the architecture. The only defined combination that has the new bit set is 110, meaning read-only for supervisor and no access for user mode. This adds code to kvmppc_mmu_book3s_64_xlate() to interpret the extra bit appropriately. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2013-10-17	KVM: PPC: Book3S PR: Use 64k host pages where possible	Paul Mackerras
	Currently, PR KVM uses 4k pages for the host-side mappings of guest memory, regardless of the host page size. When the host page size is 64kB, we might as well use 64k host page mappings for guest mappings of 64kB and larger pages and for guest real-mode mappings. However, the magic page has to remain a 4k page. To implement this, we first add another flag bit to the guest VSID values we use, to indicate that this segment is one where host pages should be mapped using 64k pages. For segments with this bit set we set the bits in the shadow SLB entry to indicate a 64k base page size. When faulting in host HPTEs for this segment, we make them 64k HPTEs instead of 4k. We record the pagesize in struct hpte_cache for use when invalidating the HPTE. For now we restrict the segment containing the magic page (if any) to 4k pages. It should be possible to lift this restriction in future by ensuring that the magic 4k page is appropriately positioned within a host 64k page. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2013-10-17	KVM: PPC: Book3S PR: Allow guest to use 64k pages	Paul Mackerras
	This adds the code to interpret 64k HPTEs in the guest hashed page table (HPT), 64k SLB entries, and to tell the guest about 64k pages in kvm_vm_ioctl_get_smmu_info(). Guest 64k pages are still shadowed by 4k pages. This also adds another hash table to the four we have already in book3s_mmu_hpte.c to allow us to find all the PTEs that we have instantiated that match a given 64k guest page. The tlbie instruction changed starting with POWER6 to use a bit in the RB operand to indicate large page invalidations, and to use other RB bits to indicate the base and actual page sizes and the segment size. 64k pages came in slightly earlier, with POWER5++. We use one bit in vcpu->arch.hflags to indicate that the emulated cpu supports 64k pages, and another to indicate that it has the new tlbie definition. The KVM_PPC_GET_SMMU_INFO ioctl presents a bit of a problem, because the MMU capabilities depend on which CPU model we're emulating, but it is a VM ioctl not a VCPU ioctl and therefore doesn't get passed a VCPU fd. In addition, commonly-used userspace (QEMU) calls it before setting the PVR for any VCPU. Therefore, as a best effort we look at the first vcpu in the VM and return 64k pages or not depending on its capabilities. We also make the PVR default to the host PVR on recent CPUs that support 1TB segments (and therefore multiple page sizes as well) so that KVM_PPC_GET_SMMU_INFO will include 64k page and 1TB segment support on those CPUs. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2013-10-17	KVM: PPC: Book3S PR: Keep volatile reg values in vcpu rather than shadow_vcpu	Paul Mackerras
	Currently PR-style KVM keeps the volatile guest register values (R0 - R13, CR, LR, CTR, XER, PC) in a shadow_vcpu struct rather than the main kvm_vcpu struct. For 64-bit, the shadow_vcpu exists in two places, a kmalloc'd struct and in the PACA, and it gets copied back and forth in kvmppc_core_vcpu_load/put(), because the real-mode code can't rely on being able to access the kmalloc'd struct. This changes the code to copy the volatile values into the shadow_vcpu as one of the last things done before entering the guest. Similarly the values are copied back out of the shadow_vcpu to the kvm_vcpu immediately after exiting the guest. We arrange for interrupts to be still disabled at this point so that we can't get preempted on 64-bit and end up copying values from the wrong PACA. This means that the accessor functions in kvm_book3s.h for these registers are greatly simplified, and are same between PR and HV KVM. In places where accesses to shadow_vcpu fields are now replaced by accesses to the kvm_vcpu, we can also remove the svcpu_get/put pairs. Finally, on 64-bit, we don't need the kmalloc'd struct at all any more. With this, the time to read the PVR one million times in a loop went from 567.7ms to 575.5ms (averages of 6 values), an increase of about 1.4% for this worse-case test for guest entries and exits. The standard deviation of the measurements is about 11ms, so the difference is only marginally significant statistically. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2013-10-17	KVM: PPC: Book3S PR: Fix compilation without CONFIG_ALTIVEC	Paul Mackerras
	Commit 9d1ffdd8f3 ("KVM: PPC: Book3S PR: Don't corrupt guest state when kernel uses VMX") added a call to kvmppc_load_up_altivec() that isn't guarded by CONFIG_ALTIVEC, causing a link failure when building a kernel without CONFIG_ALTIVEC set. This adds an #ifdef to fix this. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2013-10-17	KVM: PPC: Book3S HV: Don't crash host on unknown guest interrupt	Paul Mackerras
	If we come out of a guest with an interrupt that we don't know about, instead of crashing the host with a BUG(), we now return to userspace with the exit reason set to KVM_EXIT_UNKNOWN and the trap vector in the hw.hardware_exit_reason field of the kvm_run structure, as is done on x86. Note that run->exit_reason is already set to KVM_EXIT_UNKNOWN at the beginning of kvmppc_handle_exit(). Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2013-10-17	KVM: PPC: Book3S HV: Support POWER6 compatibility mode on POWER7	Paul Mackerras
	This enables us to use the Processor Compatibility Register (PCR) on POWER7 to put the processor into architecture 2.05 compatibility mode when running a guest. In this mode the new instructions and registers that were introduced on POWER7 are disabled in user mode. This includes all the VSX facilities plus several other instructions such as ldbrx, stdbrx, popcntw, popcntd, etc. To select this mode, we have a new register accessible through the set/get_one_reg interface, called KVM_REG_PPC_ARCH_COMPAT. Setting this to zero gives the full set of capabilities of the processor. Setting it to one of the "logical" PVR values defined in PAPR puts the vcpu into the compatibility mode for the corresponding architecture level. The supported values are: 0x0f000002 Architecture 2.05 (POWER6) 0x0f000003 Architecture 2.06 (POWER7) 0x0f100003 Architecture 2.06+ (POWER7+) Since the PCR is per-core, the architecture compatibility level and the corresponding PCR value are stored in the struct kvmppc_vcore, and are therefore shared between all vcpus in a virtual core. Signed-off-by: Paul Mackerras <paulus@samba.org> [agraf: squash in fix to add missing break statements and documentation] Signed-off-by: Alexander Graf <agraf@suse.de>
2013-10-17	KVM: PPC: Book3S HV: Add support for guest Program Priority Register	Paul Mackerras
	POWER7 and later IBM server processors have a register called the Program Priority Register (PPR), which controls the priority of each hardware CPU SMT thread, and affects how fast it runs compared to other SMT threads. This priority can be controlled by writing to the PPR or by use of a set of instructions of the form or rN,rN,rN which are otherwise no-ops but have been defined to set the priority to particular levels. This adds code to context switch the PPR when entering and exiting guests and to make the PPR value accessible through the SET/GET_ONE_REG interface. When entering the guest, we set the PPR as late as possible, because if we are setting a low thread priority it will make the code run slowly from that point on. Similarly, the first-level interrupt handlers save the PPR value in the PACA very early on, and set the thread priority to the medium level, so that the interrupt handling code runs at a reasonable speed. Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2013-10-17	KVM: PPC: Book3S HV: Store LPCR value for each virtual core	Paul Mackerras
	This adds the ability to have a separate LPCR (Logical Partitioning Control Register) value relating to a guest for each virtual core, rather than only having a single value for the whole VM. This corresponds to what real POWER hardware does, where there is a LPCR per CPU thread but most of the fields are required to have the same value on all active threads in a core. The per-virtual-core LPCR can be read and written using the GET/SET_ONE_REG interface. Userspace can can only modify the following fields of the LPCR value: DPFD Default prefetch depth ILE Interrupt little-endian TC Translation control (secondary HPT hash group search disable) We still maintain a per-VM default LPCR value in kvm->arch.lpcr, which contains bits relating to memory management, i.e. the Virtualized Partition Memory (VPM) bits and the bits relating to guest real mode. When this default value is updated, the update needs to be propagated to the per-vcore values, so we add a kvmppc_update_lpcr() helper to do that. Signed-off-by: Paul Mackerras <paulus@samba.org> [agraf: fix whitespace] Signed-off-by: Alexander Graf <agraf@suse.de>
2013-10-17	KVM: PPC: BookE: Add GET/SET_ONE_REG interface for VRSAVE	Paul Mackerras
	This makes the VRSAVE register value for a vcpu accessible through the GET/SET_ONE_REG interface on Book E systems (in addition to the existing GET/SET_SREGS interface), for consistency with Book 3S. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2013-10-17	KVM: PPC: Book3S HV: Avoid unbalanced increments of VPA yield count	Paul Mackerras
	The yield count in the VPA is supposed to be incremented every time we enter the guest, and every time we exit the guest, so that its value is even when the vcpu is running in the guest and odd when it isn't. However, it's currently possible that we increment the yield count on the way into the guest but then find that other CPU threads are already exiting the guest, so we go back to nap mode via the secondary_too_late label. In this situation we don't increment the yield count again, breaking the relationship between the LSB of the count and whether the vcpu is in the guest. To fix this, we move the increment of the yield count to a point after we have checked whether other CPU threads are exiting. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2013-10-17	KVM: PPC: Book3S HV: Pull out interrupt-reading code into a subroutine	Paul Mackerras
	This moves the code in book3s_hv_rmhandlers.S that reads any pending interrupt from the XICS interrupt controller, and works out whether it is an IPI for the guest, an IPI for the host, or a device interrupt, into a new function called kvmppc_read_intr. Later patches will need this. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2013-10-17	KVM: PPC: Book3S HV: Restructure kvmppc_hv_entry to be a subroutine	Paul Mackerras
	We have two paths into and out of the low-level guest entry and exit code: from a vcpu task via kvmppc_hv_entry_trampoline, and from the system reset vector for an offline secondary thread on POWER7 via kvm_start_guest. Currently both just branch to kvmppc_hv_entry to enter the guest, and on guest exit, we test the vcpu physical thread ID to detect which way we came in and thus whether we should return to the vcpu task or go back to nap mode. In order to make the code flow clearer, and to keep the code relating to each flow together, this turns kvmppc_hv_entry into a subroutine that follows the normal conventions for call and return. This means that kvmppc_hv_entry_trampoline() and kvmppc_hv_entry() now establish normal stack frames, and we use the normal stack slots for saving return addresses rather than local_paca->kvm_hstate.vmhandler. Apart from that this is mostly moving code around unchanged. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2013-10-17	KVM: PPC: Book3S HV: Implement H_CONFER	Paul Mackerras
	The H_CONFER hypercall is used when a guest vcpu is spinning on a lock held by another vcpu which has been preempted, and the spinning vcpu wishes to give its timeslice to the lock holder. We implement this in the straightforward way using kvm_vcpu_yield_to(). Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2013-10-17	KVM: PPC: Book3S: Add GET/SET_ONE_REG interface for VRSAVE	Paul Mackerras
	The VRSAVE register value for a vcpu is accessible through the GET/SET_SREGS interface for Book E processors, but not for Book 3S processors. In order to make this accessible for Book 3S processors, this adds a new register identifier for GET/SET_ONE_REG, and adds the code to implement it. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2013-10-17	KVM: PPC: Book3S HV: Implement timebase offset for guests	Paul Mackerras
	This allows guests to have a different timebase origin from the host. This is needed for migration, where a guest can migrate from one host to another and the two hosts might have a different timebase origin. However, the timebase seen by the guest must not go backwards, and should go forwards only by a small amount corresponding to the time taken for the migration. Therefore this provides a new per-vcpu value accessed via the one_reg interface using the new KVM_REG_PPC_TB_OFFSET identifier. This value defaults to 0 and is not modified by KVM. On entering the guest, this value is added onto the timebase, and on exiting the guest, it is subtracted from the timebase. This is only supported for recent POWER hardware which has the TBU40 (timebase upper 40 bits) register. Writing to the TBU40 register only alters the upper 40 bits of the timebase, leaving the lower 24 bits unchanged. This provides a way to modify the timebase for guest migration without disturbing the synchronization of the timebase registers across CPU cores. The kernel rounds up the value given to a multiple of 2^24. Timebase values stored in KVM structures (struct kvm_vcpu, struct kvmppc_vcore, etc.) are stored as host timebase values. The timebase values in the dispatch trace log need to be guest timebase values, however, since that is read directly by the guest. This moves the setting of vcpu->arch.dec_expires on guest exit to a point after we have restored the host timebase so that vcpu->arch.dec_expires is a host timebase value. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2013-10-17	KVM: PPC: Book3S HV: Save/restore SIAR and SDAR along with other PMU registers	Paul Mackerras
	Currently we are not saving and restoring the SIAR and SDAR registers in the PMU (performance monitor unit) on guest entry and exit. The result is that performance monitoring tools in the guest could get false information about where a program was executing and what data it was accessing at the time of a performance monitor interrupt. This fixes it by saving and restoring these registers along with the other PMU registers on guest entry/exit. This also provides a way for userspace to access these values for a vcpu via the one_reg interface. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2013-10-17	KVM: PPC: Book3S HV: Reserve POWER8 space in get/set_one_reg	Michael Neuling
	This reserves space in get/set_one_reg ioctl for the extra guest state needed for POWER8. It doesn't implement these at all, it just reserves them so that the ABI is defined now. A few things to note here: - This add a lot state for transactional memory. TM suspend mode, this is unavoidable, you can't simply roll back all transactions and store only the checkpointed state. I've added this all to get/set_one_reg (including GPRs) rather than creating a new ioctl which returns a struct kvm_regs like KVM_GET_REGS does. This means we if we need to extract the TM state, we are going to need a bucket load of IOCTLs. Hopefully most of the time this will not be needed as we can look at the MSR to see if TM is active and only grab them when needed. If this becomes a bottle neck in future we can add another ioctl to grab all this state in one go. - The TM state is offset by 0x80000000. - For TM, I've done away with VMX and FP and created a single 64x128 bit VSX register space. - I've left a space of 1 (at 0x9c) since Paulus needs to add a value which applies to POWER7 as well. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2013-10-16	powerpc: Emulate sync instruction variants	James Yang
	Reserved fields of the sync instruction have been used for other instructions (e.g. lwsync). On processors that do not support variants of the sync instruction, emulate it by executing a sync to subsume the effect of the intended instruction. Signed-off-by: James Yang <James.Yang@freescale.com> [scottwood@freescale.com: whitespace and subject line fix] Signed-off-by: Scott Wood <scottwood@freescale.com>
2013-10-16	powerpc/fsl-booke: Use common defines for SPE/FP interrupts numbers	Mihai Caraman
	On Book3E some SPE/FP/AltiVec interrupts share the same number. Use common defines to indentify these numbers. Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com> [scottwood@freescale.com: fixed space-before-tab] Signed-off-by: Scott Wood <scottwood@freescale.com>
2013-10-16	powerpc/booke64: Use common defines for AltiVec interrupts numbers	Mihai Caraman
	On Book3E some SPE/FP/AltiVec interrupts share the same number. Use common defines to indentify these numbers. Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com> Signed-off-by: Scott Wood <scottwood@freescale.com>
2013-10-16	powerpc: remove dependency on MV64360	Paul Bolle
	The Kconfig entry that allows to "Distribute interrupts on all CPUs by default" has a (negative) dependency on MV64360. But that Kconfig symbol was removed in v2.6.27, which means that this dependency has evaluated to true ever since. It can be removed too. Signed-off-by: Paul Bolle <pebolle@tiscali.nl> Signed-off-by: Scott Wood <scottwood@freescale.com>
2013-10-17	x86 / msr: add 64bit _on_cpu access functions	Jacob Pan
	Having 64-bit MSR access methods on given CPU can avoid shifting and simplify MSR content manipulation. We already have other combinations of rdmsrl_xxx and wrmsrl_xxx but missing the _on_cpu version. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Reviewed-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-10-16	ARM: AT91: DT: pm: Select ram controller standby based on DT	Jean-Christophe PLAGNIOL-VILLARD
	Move non-dt selection to ioremap_registers init which is only called not non-dt board. So we can support sam9n12/sam9x5/sama5d3 too. Signed-off-by: Jean-Christophe PLAGNIOL-VILLARD <plagnioj@jcrosoft.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: Nicolas Ferre <nicolas.ferre@atmel.com> Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>
2013-10-16	ARM: AT91: pm: Factorize standby function	Jean-Christophe PLAGNIOL-VILLARD
	Detect presence of second bank. So we do not need to have on function per SoC Signed-off-by: Jean-Christophe PLAGNIOL-VILLARD <plagnioj@jcrosoft.com> Cc: Nicolas Ferre <nicolas.ferre@atmel.com> Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2013-10-16	ARM: at91: cpuidle: Move driver to drivers/cpuidle	Daniel Lezcano
	As the cpuidle driver code has no more the dependency with the pm code, the 'standby' callback being passed as a parameter to the device's platform data, we can move the cpuidle driver in the drivers/cpuidle directory. Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Acked-by: Jean-Christophe PLAGNIOL-VILLARD <plagnioj@jcrosoft.com> Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com> Conflicts: drivers/cpuidle/Kconfig.arm drivers/cpuidle/Makefile
2013-10-16	ARM: at91: cpuidle: Convert to platform driver	Daniel Lezcano
	Using the platform driver model is a good way to separate the cpuidle specific code from the low level pm code. It allows to remove the dependency between these two components. The platform_device is located in the pm code and a 'set' function has been added to set the standby function from the AT91_SOC_START initialization function. Each SoC with a cpuidle driver will set the standby function in the platform_data field at init time. Then pm code will register the cpuidle platform device. The cpuidle driver will register the platform_driver and use the device's platform_data as a standby callback in the idle path. The at91_pm_enter function contains a { if then else } based on cpu_is_xx similar to what was in cpuidle. This is considered dangerous when adding a new SoC. Like the cpuidle driver, a standby ops is defined and assigned when the SoC init function specifies what is its standby function and reused in the at91_pm_enter's 'case' block. Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Acked-by: Jean-Christophe PLAGNIOL-VILLARD <plagnioj@jcrosoft.com> Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>
2013-10-16	ARM: Samsung: Remove the MIPI PHY setup code	Sylwester Nawrocki
	Generic PHY drivers are used to handle the MIPI CSIS and MIPI DSIM DPHYs so we can remove now unused code at arch/arm/plat-samsung. In case there is any board file for S5PV210 platforms using MIPI CSIS/DSIM (not any upstream currently) it should use the generic PHY API to bind the PHYs to respective PHY consumer drivers and a platform device for the PHY provider should be defined. Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com> Acked-by: Felipe Balbi <balbi@ti.com> Acked-by: Kukjin Kim <kgene.kim@samsung.com> Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-16	xtensa: Cocci spatch "noderef"	Thomas Meyer
	sizeof when applied to a pointer typed expression gives the size of the pointer. Found by coccinelle spatch "misc/noderef.cocci" Signed-off-by: Thomas Meyer <thomas@m3y3r.de> Signed-off-by: Chris Zankel <chris@zankel.net>
2013-10-16	ARM: at91: remove init_machine() as default is suitable	Nicolas Ferre
	Since 883a106b0866ca8d75b5520bdb3ca1cf8e3730ba (ARM: default machine descriptor for multiplatform) we can remove the SoC-specific callback init_machine() to use the default code. This cleans up the code and reduces the number of lines. Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com> Acked-by: Jean-Christophe PLAGNIOL-VILLARD <plagnioj@jcrosoft.com>
2013-10-16	ARM: at91/dt: split sama5d3 peripheral definitions	Boris BREZILLON
	This patch splits the sama5d3 SoCs definition: - a common base for all sama5d3 SoCs (sama5d3.dtsi) - several optional peripheral definitions which will be included by sama5d3 specific SoCs (sama5d3_'periph name'.dtsi) - sama5d3 specific SoC definitions (sama5d3x.dtsi) This provides a better representation of the real hardware (drop unneed dt nodes) and avoids peripheral id conflict (which is not the case for current sama5d3 SoCs, but could be if other SoCs of this family are released). Signed-off-by: Boris BREZILLON <b.brezillon@overkiz.com> [nicolas.ferre@atmel.com: add more "sama5d3?" compatibility strings] Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
2013-10-16	ARM: at91/dt: split sam9x5 peripheral definitions	Boris BREZILLON
	This patch splits the sam9x5 peripheral definitions into: - a common base for all sam9x5 SoCs (at91sam9x5.dtsi) - several optional peripheral definitions which will be included by specific sam9x5 SoCs (at91sam9x5_'periph name'.dtsi) This provides a better representation of the real hardware (drop unneeded dt nodes) and avoids future peripheral id conflict (lcdc and isi both use peripheral id 25). Signed-off-by: Boris BREZILLON <b.brezillon@overkiz.com> Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
2013-10-16	perf/x86: Optimize intel_pmu_pebs_fixup_ip()	Peter Zijlstra
	There's been reports of high NMI handler overhead, highlighted by such kernel messages: [ 3697.380195] perf samples too long (10009 > 10000), lowering kernel.perf_event_max_sample_rate to 13000 [ 3697.389509] INFO: NMI handler (perf_event_nmi_handler) took too long to run: 9.331 msecs Don Zickus analyzed the source of the overhead and reported: > While there are a few places that are causing latencies, for now I focused on > the longest one first. It seems to be 'copy_user_from_nmi' > > intel_pmu_handle_irq -> > intel_pmu_drain_pebs_nhm -> > __intel_pmu_drain_pebs_nhm -> > __intel_pmu_pebs_event -> > intel_pmu_pebs_fixup_ip -> > copy_from_user_nmi > > In intel_pmu_pebs_fixup_ip(), if the while-loop goes over 50, the sum of > all the copy_from_user_nmi latencies seems to go over 1,000,000 cycles > (there are some cases where only 10 iterations are needed to go that high > too, but in generall over 50 or so). At this point copy_user_from_nmi > seems to account for over 90% of the nmi latency. The solution to that is to avoid having to call copy_from_user_nmi() for every instruction. Since we already limit the max basic block size, we can easily pre-allocate a piece of memory to copy the entire thing into in one go. Don reported this test result: > Your patch made a huge difference in improvement. The > copy_from_user_nmi() no longer hits the million of cycles. I still > have a batch of 100,000-300,000 cycles. My longest NMI paths used > to be dominated by copy_from_user_nmi, now it is not (I have to dig > up the new hot path). Reported-and-tested-by: Don Zickus <dzickus@redhat.com> Cc: jmario@redhat.com Cc: acme@infradead.org Cc: dave.hansen@linux.intel.com Cc: eranian@google.com Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20131016105755.GX10651@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-16	Merge tag 'kvm-arm-for-3.13-1' of ↵	Gleb Natapov
	git://git.linaro.org/people/cdall/linux-kvm-arm into next Updates for KVM/ARM including cpu=host and Cortex-A7 support
2013-10-16	ARM: integrator: core module registers from compatible strings	Linus Walleij
	This augments the core machine code for the Integrator platforms to get their references to the core module device nodes by using compatible strings instead of predefined node names and rename the CP syscon node to be simply "syscon". Reported-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2013-10-16	ARM: integrator: use devm_ioremap() to remap CM	Linus Walleij
	In the PCIv3 driver, use devm_ioremap() instead of just ioremap() when remapping the system controller in the PCIv3 driver, so the mapping will be automatically released on probe failure. Reported-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2013-10-16	ARM: integrator: move CM base into device tree	Linus Walleij
	This moves the core module (CM) control base into the device tree. It is a simple memory range of 0x200 bytes. Move the cm header down into the machine directory and unexport the cm_control() symbol as no modules are using it. Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2013-10-16	ARM: integrator: decommission the <mach/irqs.h> header	Linus Walleij
	This header is no longer needed when we boot from the device tree. Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2013-10-16	ARM: integrator: delete non-devicetree boot path	Linus Walleij
	The Device Tree boot path now supports everything the ATAG boot can provide, and the two are equivalent. This deletes the ATAG boot path from the Integrator/AP and Integrator/CP platforms to move them on to the future. Signed-off-by: Linus Walleij <linus.walleij@linaro.org>