summaryrefslogtreecommitdiffstats
path: root/arch
AgeCommit message (Collapse)Author
2014-05-30KVM: PPC: Book3S: Move KVM_REG_PPC_WORT to an unused register numberPaul Mackerras
Commit b005255e12a3 ("KVM: PPC: Book3S HV: Context-switch new POWER8 SPRs") added a definition of KVM_REG_PPC_WORT with the same register number as the existing KVM_REG_PPC_VRSAVE (though in fact the definitions are not identical because of the different register sizes.) For clarity, this moves KVM_REG_PPC_WORT to the next unused number, and also adds it to api.txt. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-05-30KVM: PPC: Add CAP to indicate hcall fixesAlexander Graf
We worked around some nasty KVM magic page hcall breakages: 1) NX bit not honored, so ignore NX when we detect it 2) LE guests swizzle hypercall instruction Without these fixes in place, there's no way it would make sense to expose kvm hypercalls to a guest. Chances are immensely high it would trip over and break. So add a new CAP that gives user space a hint that we have workarounds for the bugs above in place. It can use those as hint to disable PV hypercalls when the guest CPU is anything POWER7 or higher and the host does not have fixes in place. Signed-off-by: Alexander Graf <agraf@suse.de>
2014-05-30KVM: PPC: MPIC: Reset IRQ source private membersAlexander Graf
When we reset the in-kernel MPIC controller, we forget to reset some hidden state such as destmask and output. This state is usually set when the guest writes to the IDR register for a specific IRQ line. To make sure we stay in sync and don't forget hidden state, treat reset of the IDR register as a simple write of the IDR register. That automatically updates all the hidden state as well. Reported-by: Paul Janzen <pcj@pauljanzen.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-05-30KVM: PPC: Graciously fail broken LE hypercallsAlexander Graf
There are LE Linux guests out there that don't handle hypercalls correctly. Instead of interpreting the instruction stream from device tree as big endian they assume it's a little endian instruction stream and fail. When we see an illegal instruction from such a byte reversed instruction stream, bail out graciously and just declare every hcall as error. Signed-off-by: Alexander Graf <agraf@suse.de>
2014-05-30PPC: ePAPR: Fix hypercall on LE guestAlexander Graf
We get an array of instructions from the hypervisor via device tree that we write into a buffer that gets executed whenever we want to make an ePAPR compliant hypercall. However, the hypervisor passes us these instructions in BE order which we have to manually convert to LE when we want to run them in LE mode. With this fixup in place, I can successfully run LE kernels with KVM PV enabled on PR KVM. Signed-off-by: Alexander Graf <agraf@suse.de>
2014-05-30KVM: PPC: BOOK3S: Remove open coded make_dsisr in alignment handlerAneesh Kumar K.V
Use make_dsisr instead of open coding it. This also have the added benefit of handling alignment interrupt on additional instructions. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-05-30KVM: PPC: BOOK3S: Always use the saved DAR valueAneesh Kumar K.V
Although it's optional, IBM POWER cpus always had DAR value set on alignment interrupt. So don't try to compute these values. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-05-30PPC: KVM: Make NX bit available with magic pageAlexander Graf
Because old kernels enable the magic page and then choke on NXed trampoline code we have to disable NX by default in KVM when we use the magic page. However, since commit b18db0b8 we have successfully fixed that and can now leave NX enabled, so tell the hypervisor about this. Signed-off-by: Alexander Graf <agraf@suse.de>
2014-05-30KVM: PPC: Disable NX for old magic page using guestsAlexander Graf
Old guests try to use the magic page, but map their trampoline code inside of an NX region. Since we can't fix those old kernels, try to detect whether the guest is sane or not. If not, just disable NX functionality in KVM so that old guests at least work at all. For newer guests, add a bit that we can set to keep NX functionality available. Signed-off-by: Alexander Graf <agraf@suse.de>
2014-05-30KVM: PPC: BOOK3S: HV: Add mixed page-size support for guestAneesh Kumar K.V
On recent IBM Power CPUs, while the hashed page table is looked up using the page size from the segmentation hardware (i.e. the SLB), it is possible to have the HPT entry indicate a larger page size. Thus for example it is possible to put a 16MB page in a 64kB segment, but since the hash lookup is done using a 64kB page size, it may be necessary to put multiple entries in the HPT for a single 16MB page. This capability is called mixed page-size segment (MPSS). With MPSS, there are two relevant page sizes: the base page size, which is the size used in searching the HPT, and the actual page size, which is the size indicated in the HPT entry. [ Note that the actual page size is always >= base page size ]. We use "ibm,segment-page-sizes" device tree node to advertise the MPSS support to PAPR guest. The penc encoding indicates whether we support a specific combination of base page size and actual page size in the same segment. We also use the penc value in the LP encoding of HPTE entry. This patch exposes MPSS support to KVM guest by advertising the feature via "ibm,segment-page-sizes". It also adds the necessary changes to decode the base page size and the actual page size correctly from the HPTE entry. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-05-30KVM: PPC: BOOK3S: HV: Prefer CMA region for hash page table allocationAneesh Kumar K.V
Today when KVM tries to reserve memory for the hash page table it allocates from the normal page allocator first. If that fails it falls back to CMA's reserved region. One of the side effects of this is that we could end up exhausting the page allocator and get linux into OOM conditions while we still have plenty of space available in CMA. This patch addresses this issue by first trying hash page table allocation from CMA's reserved region before falling back to the normal page allocator. So if we run out of memory, we really are out of memory. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-05-30KVM: PPC: Book3S PR: Expose TM registersAlexander Graf
POWER8 introduces transactional memory which brings along a number of new registers and MSR bits. Implementing all of those is a pretty big headache, so for now let's at least emulate enough to make Linux's context switching code happy. Signed-off-by: Alexander Graf <agraf@suse.de>
2014-05-30KVM: PPC: Book3S PR: Expose EBB registersAlexander Graf
POWER8 introduces a new facility called the "Event Based Branch" facility. It contains of a few registers that indicate where a guest should branch to when a defined event occurs and it's in PR mode. We don't want to really enable EBB as it will create a big mess with !PR guest mode while hardware is in PR and we don't really emulate the PMU anyway. So instead, let's just leave it at emulation of all its registers. Signed-off-by: Alexander Graf <agraf@suse.de>
2014-05-30KVM: PPC: Book3S PR: Expose TAR facility to guestAlexander Graf
POWER8 implements a new register called TAR. This register has to be enabled in FSCR and then from KVM's point of view is mere storage. This patch enables the guest to use TAR. Signed-off-by: Alexander Graf <agraf@suse.de>
2014-05-30KVM: PPC: Book3S PR: Handle Facility interrupt and FSCRAlexander Graf
POWER8 introduced a new interrupt type called "Facility unavailable interrupt" which contains its status message in a new register called FSCR. Handle these exits and try to emulate instructions for unhandled facilities. Follow-on patches enable KVM to expose specific facilities into the guest. Signed-off-by: Alexander Graf <agraf@suse.de>
2014-05-30KVM: PPC: Book3S PR: Emulate TIR registerAlexander Graf
In parallel to the Processor ID Register (PIR) threaded POWER8 also adds a Thread ID Register (TIR). Since PR KVM doesn't emulate more than one thread per core, we can just always expose 0 here. Signed-off-by: Alexander Graf <agraf@suse.de>
2014-05-30KVM: PPC: Book3S PR: Ignore PMU SPRsAlexander Graf
When we expose a POWER8 CPU into the guest, it will start accessing PMU SPRs that we don't emulate. Just ignore accesses to them. Signed-off-by: Alexander Graf <agraf@suse.de>
2014-05-30KVM: PPC: Book3S: Move little endian conflict to HV KVMAlexander Graf
With the previous patches applied, we can now successfully use PR KVM on little endian hosts which means we can now allow users to select it. However, HV KVM still needs some work, so let's keep the kconfig conflict on that one. Signed-off-by: Alexander Graf <agraf@suse.de>
2014-05-30KVM: PPC: Book3S PR: Do dcbz32 patching with big endian instructionsAlexander Graf
When the host CPU we're running on doesn't support dcbz32 itself, but the guest wants to have dcbz only clear 32 bytes of data, we loop through every executable mapped page to search for dcbz instructions and patch them with a special privileged instruction that we emulate as dcbz32. The only guests that want to see dcbz act as 32byte are book3s_32 guests, so we don't have to worry about little endian instruction ordering. So let's just always search for big endian dcbz instructions, also when we're on a little endian host. Signed-off-by: Alexander Graf <agraf@suse.de>
2014-05-30KVM: PPC: Make shared struct aka magic page guest endianAlexander Graf
The shared (magic) page is a data structure that contains often used supervisor privileged SPRs accessible via memory to the user to reduce the number of exits we have to take to read/write them. When we actually share this structure with the guest we have to maintain it in guest endianness, because some of the patch tricks only work with native endian load/store operations. Since we only share the structure with either host or guest in little endian on book3s_64 pr mode, we don't have to worry about booke or book3s hv. For booke, the shared struct stays big endian. For book3s_64 hv we maintain the struct in host native endian, since it never gets shared with the guest. For book3s_64 pr we introduce a variable that tells us which endianness the shared struct is in and route every access to it through helper inline functions that evaluate this variable. Signed-off-by: Alexander Graf <agraf@suse.de>
2014-05-30KVM: PPC: PR: Fill pvinfo hcall instructions in big endianAlexander Graf
We expose a blob of hypercall instructions to user space that it gives to the guest via device tree again. That blob should contain a stream of instructions necessary to do a hypercall in big endian, as it just gets passed into the guest and old guests use them straight away. Signed-off-by: Alexander Graf <agraf@suse.de>
2014-05-30KVM: PPC: Book3S PR: PAPR: Access RTAS in big endianAlexander Graf
When the guest does an RTAS hypercall it keeps all RTAS variables inside a big endian data structure. To make sure we don't have to bother about endianness inside the actual RTAS handlers, let's just convert the whole structure to host endian before we call our RTAS handlers and back to big endian when we return to the guest. Signed-off-by: Alexander Graf <agraf@suse.de>
2014-05-30KVM: PPC: Book3S PR: PAPR: Access HTAB in big endianAlexander Graf
The HTAB on PPC is always in big endian. When we access it via hypercalls on behalf of the guest and we're running on a little endian host, we need to make sure we swap the bits accordingly. Signed-off-by: Alexander Graf <agraf@suse.de>
2014-05-30KVM: PPC: Book3S PR: Default to big endian guestAlexander Graf
The default MSR when user space does not define anything should be identical on little and big endian hosts, so remove MSR_LE from it. Signed-off-by: Alexander Graf <agraf@suse.de>
2014-05-30KVM: PPC: Book3S_64 PR: Access shadow slb in big endianAlexander Graf
The "shadow SLB" in the PACA is shared with the hypervisor, so it has to be big endian. We access the shadow SLB during world switch, so let's make sure we access it in big endian even when we're on a little endian host. Signed-off-by: Alexander Graf <agraf@suse.de>
2014-05-30KVM: PPC: Book3S_64 PR: Access HTAB in big endianAlexander Graf
The HTAB is always big endian. We access the guest's HTAB using copy_from/to_user, but don't yet take care of the fact that we might be running on an LE host. Wrap all accesses to the guest HTAB with big endian accessors. Signed-off-by: Alexander Graf <agraf@suse.de>
2014-05-30KVM: PPC: Book3S_32: PR: Access HTAB in big endianAlexander Graf
The HTAB is always big endian. We access the guest's HTAB using copy_from/to_user, but don't yet take care of the fact that we might be running on an LE host. Wrap all accesses to the guest HTAB with big endian accessors. Signed-off-by: Alexander Graf <agraf@suse.de>
2014-05-30KVM: PPC: Book3S: PR: Fix C/R bit settingAlexander Graf
Commit 9308ab8e2d made C/R HTAB updates go byte-wise into the target HTAB. However, it didn't update the guest's copy of the HTAB, but instead the host local copy of it. Write to the guest's HTAB instead. Signed-off-by: Alexander Graf <agraf@suse.de> CC: Paul Mackerras <paulus@samba.org> Acked-by: Paul Mackerras <paulus@samba.org>
2014-05-30KVM: PPC: BOOK3S: PR: Fix WARN_ON with debug options onAneesh Kumar K.V
With debug option "sleep inside atomic section checking" enabled we get the below WARN_ON during a PR KVM boot. This is because upstream now have PREEMPT_COUNT enabled even if we have preempt disabled. Fix the warning by adding preempt_disable/enable around floating point and altivec enable. WARNING: at arch/powerpc/kernel/process.c:156 Modules linked in: kvm_pr kvm CPU: 1 PID: 3990 Comm: qemu-system-ppc Tainted: G W 3.15.0-rc1+ #4 task: c0000000eb85b3a0 ti: c0000000ec59c000 task.ti: c0000000ec59c000 NIP: c000000000015c84 LR: d000000003334644 CTR: c000000000015c00 REGS: c0000000ec59f140 TRAP: 0700 Tainted: G W (3.15.0-rc1+) MSR: 8000000000029032 <SF,EE,ME,IR,DR,RI> CR: 42000024 XER: 20000000 CFAR: c000000000015c24 SOFTE: 1 GPR00: d000000003334644 c0000000ec59f3c0 c000000000e2fa40 c0000000e2f80000 GPR04: 0000000000000800 0000000000002000 0000000000000001 8000000000000000 GPR08: 0000000000000001 0000000000000001 0000000000002000 c000000000015c00 GPR12: d00000000333da18 c00000000fb80900 0000000000000000 0000000000000000 GPR16: 0000000000000000 0000000000000000 0000000000000000 00003fffce4e0fa1 GPR20: 0000000000000010 0000000000000001 0000000000000002 00000000100b9a38 GPR24: 0000000000000002 0000000000000000 0000000000000000 0000000000000013 GPR28: 0000000000000000 c0000000eb85b3a0 0000000000002000 c0000000e2f80000 NIP [c000000000015c84] .enable_kernel_fp+0x84/0x90 LR [d000000003334644] .kvmppc_handle_ext+0x134/0x190 [kvm_pr] Call Trace: [c0000000ec59f3c0] [0000000000000010] 0x10 (unreliable) [c0000000ec59f430] [d000000003334644] .kvmppc_handle_ext+0x134/0x190 [kvm_pr] [c0000000ec59f4c0] [d00000000324b380] .kvmppc_set_msr+0x30/0x50 [kvm] [c0000000ec59f530] [d000000003337cac] .kvmppc_core_emulate_op_pr+0x16c/0x5e0 [kvm_pr] [c0000000ec59f5f0] [d00000000324a944] .kvmppc_emulate_instruction+0x284/0xa80 [kvm] [c0000000ec59f6c0] [d000000003336888] .kvmppc_handle_exit_pr+0x488/0xb70 [kvm_pr] [c0000000ec59f790] [d000000003338d34] kvm_start_lightweight+0xcc/0xdc [kvm_pr] [c0000000ec59f960] [d000000003336288] .kvmppc_vcpu_run_pr+0xc8/0x190 [kvm_pr] [c0000000ec59f9f0] [d00000000324c880] .kvmppc_vcpu_run+0x30/0x50 [kvm] [c0000000ec59fa60] [d000000003249e74] .kvm_arch_vcpu_ioctl_run+0x54/0x1b0 [kvm] [c0000000ec59faf0] [d000000003244948] .kvm_vcpu_ioctl+0x478/0x760 [kvm] [c0000000ec59fcb0] [c000000000224e34] .do_vfs_ioctl+0x4d4/0x790 [c0000000ec59fd90] [c000000000225148] .SyS_ioctl+0x58/0xb0 [c0000000ec59fe30] [c00000000000a1e4] syscall_exit+0x0/0x98 Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-05-30KVM: PPC: BOOK3S: PR: Enable Little Endian PR guestAneesh Kumar K.V
This patch make sure we inherit the LE bit correctly in different case so that we can run Little Endian distro in PR mode Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-05-30KVM: PPC: E500: Add dcbtls emulationAlexander Graf
The dcbtls instruction is able to lock data inside the L1 cache. We don't want to give the guest actual access to hardware cache locks, as that could influence other VMs on the same system. But we can tell the guest that its locking attempt failed. By implementing the instruction we at least don't give the guest a program exception which it definitely does not expect. Signed-off-by: Alexander Graf <agraf@suse.de>
2014-05-30KVM: PPC: E500: Ignore L1CSR1_ICFI,ICLFRAlexander Graf
The L1 instruction cache control register contains bits that indicate that we're still handling a request. Mask those out when we set the SPR so that a read doesn't assume we're still doing something. Signed-off-by: Alexander Graf <agraf@suse.de>
2014-05-30MIPS: KVM: Remove redundant semicolonJames Hogan
Remove extra semicolon in kvm_arch_vcpu_dump_regs(). Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Gleb Natapov <gleb@kernel.org> Cc: kvm@vger.kernel.org Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-mips@linux-mips.org Cc: Sanjay Lal <sanjayl@kymasys.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-05-30MIPS: KVM: Remove redundant NULL checks before kfree()James Hogan
The kfree() function already NULL checks the parameter so remove the redundant NULL checks before kfree() calls in arch/mips/kvm/. Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Gleb Natapov <gleb@kernel.org> Cc: kvm@vger.kernel.org Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-mips@linux-mips.org Cc: Sanjay Lal <sanjayl@kymasys.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-05-30MIPS: KVM: Quieten kvm_info() loggingJames Hogan
The logging from MIPS KVM is fairly noisy with kvm_info() in places where it shouldn't be, such as on VM creation and migration to a different CPU. Replace these kvm_info() calls with kvm_debug(). Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Gleb Natapov <gleb@kernel.org> Cc: kvm@vger.kernel.org Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-mips@linux-mips.org Cc: Sanjay Lal <sanjayl@kymasys.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-05-30MIPS: KVM: Remove ifdef DEBUG around kvm_debugJames Hogan
kvm_debug() uses pr_debug() which is already compiled out in the absence of a DEBUG define, so remove the unnecessary ifdef DEBUG lines around kvm_debug() calls which are littered around arch/mips/kvm/. As well as generally cleaning up, this prevents future bit-rot due to DEBUG not being commonly used. Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Gleb Natapov <gleb@kernel.org> Cc: kvm@vger.kernel.org Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-mips@linux-mips.org Cc: Sanjay Lal <sanjayl@kymasys.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-05-30MIPS: KVM: Fix kvm_debug bit-rottageJames Hogan
Fix build errors when DEBUG is defined in arch/mips/kvm/. - The DEBUG code in kvm_mips_handle_tlbmod() was missing some variables. - The DEBUG code in kvm_mips_host_tlb_write() was conditional on an undefined "debug" variable. - The DEBUG code in kvm_mips_host_tlb_inv() accessed asid_map directly rather than using kvm_mips_get_user_asid(). Also fixed brace placement. Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Gleb Natapov <gleb@kernel.org> Cc: kvm@vger.kernel.org Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-mips@linux-mips.org Cc: Sanjay Lal <sanjayl@kymasys.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-05-30MIPS: KVM: Whitespace fixes in kvm_mips_callbacksJames Hogan
Fix whitespace in struct kvm_mips_callbacks function pointers. Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Gleb Natapov <gleb@kernel.org> Cc: kvm@vger.kernel.org Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-mips@linux-mips.org Cc: Sanjay Lal <sanjayl@kymasys.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-05-30MIPS: KVM: Make kvm_mips_comparecount_{func,wakeup} staticJames Hogan
The kvm_mips_comparecount_func() and kvm_mips_comparecount_wakeup() functions are only used within arch/mips/kvm/kvm_mips.c, so make them static. Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Gleb Natapov <gleb@kernel.org> Cc: kvm@vger.kernel.org Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-mips@linux-mips.org Cc: Sanjay Lal <sanjayl@kymasys.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-05-30MIPS: KVM: Add count frequency KVM registerJames Hogan
Expose the KVM guest CP0_Count frequency to userland via a new KVM_REG_MIPS_COUNT_HZ register accessible with the KVM_{GET,SET}_ONE_REG ioctls. When the frequency is altered the bias is adjusted such that the guest CP0_Count doesn't jump discontinuously or lose any timer interrupts. Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Gleb Natapov <gleb@kernel.org> Cc: kvm@vger.kernel.org Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-mips@linux-mips.org Cc: David Daney <david.daney@cavium.com> Cc: Sanjay Lal <sanjayl@kymasys.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-05-30MIPS: KVM: Add master disable count interfaceJames Hogan
Expose two new virtual registers to userland via the KVM_{GET,SET}_ONE_REG ioctls. KVM_REG_MIPS_COUNT_CTL is for timer configuration fields and just contains a master disable count bit. This can be used by userland to freeze the timer in order to read a consistent state from the timer count value and timer interrupt pending bit. This cannot be done with the CP0_Cause.DC bit because the timer interrupt pending bit (TI) is also in CP0_Cause so it would be impossible to stop the timer without also risking a race with an hrtimer interrupt and having to explicitly check whether an interrupt should have occurred. When the timer is re-enabled it resumes without losing time, i.e. the CP0_Count value jumps to what it would have been had the timer not been disabled, which would also be impossible to do from userland with CP0_Cause.DC. The timer interrupt also cannot be lost, i.e. if a timer interrupt would have occurred had the timer not been disabled it is queued when the timer is re-enabled. This works by storing the nanosecond monotonic time when the master disable is set, and using it for various operations instead of the current monotonic time (e.g. when recalculating the bias when the CP0_Count is set), until the master disable is cleared again, i.e. the timer state is read/written as it would have been at that time. This state is exposed to userland via the read-only KVM_REG_MIPS_COUNT_RESUME virtual register so that userland can determine the exact time the master disable took effect. This should allow userland to atomically save the state of the timer, and later restore it. Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Gleb Natapov <gleb@kernel.org> Cc: kvm@vger.kernel.org Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-mips@linux-mips.org Cc: David Daney <david.daney@cavium.com> Cc: Sanjay Lal <sanjayl@kymasys.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-05-30MIPS: KVM: Override guest kernel timer frequency directlyJames Hogan
The KVM_HOST_FREQ Kconfig symbol was used by KVM guest kernels to override the timer frequency calculation to a value based on the host frequency. Now that the KVM timer emulation is implemented independent of the host timer frequency and defaults to 100MHz, adjust the working of CONFIG_KVM_HOST_FREQ to match. The Kconfig symbol now specifies the guest timer frequency directly, and has been renamed accordingly to KVM_GUEST_TIMER_FREQ. It now defaults to 100MHz too and the help text is updated to make it clear that a zero value will allow the normal timer frequency calculation to take place (based on the emulated RTC). Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Gleb Natapov <gleb@kernel.org> Cc: kvm@vger.kernel.org Cc: linux-mips@linux-mips.org Cc: Sanjay Lal <sanjayl@kymasys.com> Acked-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-05-30MIPS: KVM: Rewrite count/compare timer emulationJames Hogan
Previously the emulation of the CPU timer was just enough to get a Linux guest running but some shortcuts were taken: - The guest timer interrupt was hard coded to always happen every 10 ms rather than being timed to when CP0_Count would match CP0_Compare. - The guest's CP0_Count register was based on the host's CP0_Count register. This isn't very portable and fails on cores without a CP_Count register implemented such as Ingenic XBurst. It also meant that the guest's CP0_Cause.DC bit to disable the CP0_Count register took no effect. - The guest's CP0_Count register was emulated by just dividing the host's CP0_Count register by 4. This resulted in continuity problems when used as a clock source, since when the host CP0_Count overflows from 0x7fffffff to 0x80000000, the guest CP0_Count transitions discontinuously from 0x1fffffff to 0xe0000000. Therefore rewrite & fix emulation of the guest timer based on the monotonic kernel time (i.e. ktime_get()). Internally a 32-bit count_bias value is added to the frequency scaled nanosecond monotonic time to get the guest's CP0_Count. The frequency of the timer is initialised to 100MHz and cannot yet be changed, but a later patch will allow the frequency to be configured via the KVM_{GET,SET}_ONE_REG ioctl interface. The timer can now be stopped via the CP0_Cause.DC bit (by the guest or via the KVM_SET_ONE_REG ioctl interface), at which point the current CP0_Count is stored and can be read directly. When it is restarted the bias is recalculated such that the CP0_Count value is continuous. Due to the nature of hrtimer interrupts any read of the guest's CP0_Count register while it is running triggers a check for whether the hrtimer has expired, so that the guest/userland cannot observe the CP0_Count passing CP0_Compare without queuing a timer interrupt. This is also taken advantage of when stopping the timer to ensure that a pending timer interrupt is queued. This replaces the implementation of: - Guest read of CP0_Count - Guest write of CP0_Count - Guest write of CP0_Compare - Guest write of CP0_Cause - Guest read of HWR 2 (CC) with RDHWR - Host read of CP0_Count via KVM_GET_ONE_REG ioctl interface - Host write of CP0_Count via KVM_SET_ONE_REG ioctl interface - Host write of CP0_Compare via KVM_SET_ONE_REG ioctl interface - Host write of CP0_Cause via KVM_SET_ONE_REG ioctl interface Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Gleb Natapov <gleb@kernel.org> Cc: kvm@vger.kernel.org Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-mips@linux-mips.org Cc: Sanjay Lal <sanjayl@kymasys.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-05-30MIPS: KVM: Migrate hrtimer to follow VCPUJames Hogan
When a VCPU is scheduled in on a different CPU, refresh the hrtimer used for emulating count/compare so that it gets migrated to the same CPU. This should prevent a timer interrupt occurring on a different CPU to where the guest it relates to is running, which would cause the guest timer interrupt not to be delivered until after the next guest exit. Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Gleb Natapov <gleb@kernel.org> Cc: kvm@vger.kernel.org Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-mips@linux-mips.org Cc: Sanjay Lal <sanjayl@kymasys.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-05-30MIPS: KVM: Fix timer race modifying guest CP0_CauseJames Hogan
The hrtimer callback for guest timer timeouts sets the guest's CP0_Cause.TI bit to indicate to the guest that a timer interrupt is pending, however there is no mutual exclusion implemented to prevent this occurring while the guest's CP0_Cause register is being read-modify-written elsewhere. When this occurs the setting of the CP0_Cause.TI bit is undone and the guest misses the timer interrupt and doesn't reprogram the CP0_Compare register for the next timeout. Currently another timer interrupt will be triggered again in another 10ms anyway due to the way timers are emulated, but after the MIPS timer emulation is fixed this would result in Linux guest time standing still and the guest scheduler not being invoked until the guest CP0_Count has looped around again, which at 100MHz takes just under 43 seconds. Currently this is the only asynchronous modification of guest registers, therefore it is fixed by adjusting the implementations of the kvm_set_c0_guest_cause(), kvm_clear_c0_guest_cause(), and kvm_change_c0_guest_cause() macros which are used for modifying the guest CP0_Cause register to use ll/sc to ensure atomic modification. This should work in both UP and SMP cases without requiring interrupts to be disabled. Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Gleb Natapov <gleb@kernel.org> Cc: kvm@vger.kernel.org Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-mips@linux-mips.org Cc: Sanjay Lal <sanjayl@kymasys.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-05-30MIPS: KVM: Deliver guest interrupts after local_irq_disable()James Hogan
When about to run the guest, deliver guest interrupts after disabling host interrupts. This should prevent an hrtimer interrupt from being handled after delivering guest interrupts, and therefore not delivering the guest timer interrupt until after the next guest exit. Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Gleb Natapov <gleb@kernel.org> Cc: kvm@vger.kernel.org Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-mips@linux-mips.org Cc: Sanjay Lal <sanjayl@kymasys.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-05-30MIPS: KVM: Add CP0_HWREna KVM register accessJames Hogan
Implement KVM_{GET,SET}_ONE_REG ioctl based access to the guest CP0 HWREna register. This is so that userland can save and restore its value so that RDHWR instructions don't have to be emulated by the guest. Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Gleb Natapov <gleb@kernel.org> Cc: kvm@vger.kernel.org Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-mips@linux-mips.org Cc: David Daney <david.daney@cavium.com> Cc: Sanjay Lal <sanjayl@kymasys.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-05-30MIPS: KVM: Add CP0_UserLocal KVM register accessJames Hogan
Implement KVM_{GET,SET}_ONE_REG ioctl based access to the guest CP0 UserLocal register. This is so that userland can save and restore its value. Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Gleb Natapov <gleb@kernel.org> Cc: kvm@vger.kernel.org Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-mips@linux-mips.org Cc: David Daney <david.daney@cavium.com> Cc: Sanjay Lal <sanjayl@kymasys.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-05-30MIPS: KVM: Add CP0_Count/Compare KVM register accessJames Hogan
Implement KVM_{GET,SET}_ONE_REG ioctl based access to the guest CP0 Count and Compare registers. These registers are special in that writing to them has side effects (adjusting the time until the next timer interrupt) and reading of Count depends on the time. Therefore add a couple of callbacks so that different implementations (trap & emulate or VZ) can implement them differently depending on what the hardware provides. The trap & emulate versions mostly duplicate what happens when a T&E guest reads or writes these registers, so it inherits the same limitations which can be fixed in later patches. Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Gleb Natapov <gleb@kernel.org> Cc: kvm@vger.kernel.org Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-mips@linux-mips.org Cc: David Daney <david.daney@cavium.com> Cc: Sanjay Lal <sanjayl@kymasys.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-05-30MIPS: KVM: Move KVM_{GET,SET}_ONE_REG definitions into kvm_host.hJames Hogan
Move the KVM_{GET,SET}_ONE_REG MIPS register id definitions out of kvm_mips.c to kvm_host.h so that they can be shared between multiple source files. This allows register access to be indirected depending on the underlying implementation (trap & emulate or VZ). Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Gleb Natapov <gleb@kernel.org> Cc: kvm@vger.kernel.org Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-mips@linux-mips.org Cc: David Daney <david.daney@cavium.com> Cc: Sanjay Lal <sanjayl@kymasys.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>