Age | Commit message (Collapse) | Author |
|
Instead of counting and then processing relocations, do it in a single
pass. This splits the processing logic into separate functions for
realmode and 32-bit (and paves the way for 64-bit). Also extracts helper
functions when emitting relocations.
Based on work by Neill Clift and Michael Davidson.
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: http://lkml.kernel.org/r/1365797627-20874-3-git-send-email-keescook@chromium.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
|
|
In preparation for making the reloc tool operate on 64-bit relocations,
generalize the structure names for easy recompilation via #defines.
Based on work by Neill Clift and Michael Davidson.
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: http://lkml.kernel.org/r/1365797627-20874-2-git-send-email-keescook@chromium.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
|
|
If guest vcpu is in VM86 mode the vcpu state should be checked as if in
real mode.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
|
|
KVM does not use the activity state VMCS field, and does not support
it in nested VMX either (the corresponding bits in the misc VMX feature
MSR are zero). Fail entry if the activity state is set to anything but
"active".
Since the value will always be the same for L1 and L2, we do not need
to read and write the corresponding VMCS field on L1/L2 transitions,
either.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Reviewed-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
|
|
The "xen_cpu_die" and "xen_hvm_cpu_die" are very similar.
Lets coalesce them.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
|
There is no need to use the PV version of the IRQ_WORKER mechanism
as under PVHVM we are using the native version. The native
version is using the SMP API.
They just sit around unused:
69: 0 0 xen-percpu-ipi irqwork0
83: 0 0 xen-percpu-ipi irqwork1
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
|
See git commit f10cd522c5fbfec9ae3cc01967868c9c2401ed23
(xen: disable PV spinlocks on HVM) for details.
But we did not disable it everywhere - which means that when
we boot as PVHVM we end up allocating per-CPU irq line for
spinlock. This fixes that.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
|
The default (uninitialized) value of the IRQ line is -1.
Check if we already have allocated an spinlock interrupt line
and if somebody is trying to do it again. Also set it to -1
when we offline the CPU.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
|
If the timer interrupt has been de-init or is just now being
initialized, the default value of -1 should be preset as
interrupt line. Check for that and if something is odd
WARN us.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
|
When we online the CPU, we get this splat:
smpboot: Booting Node 0 Processor 1 APIC 0x2
installing Xen timer for CPU 1
BUG: sleeping function called from invalid context at /home/konrad/ssd/konrad/linux/mm/slab.c:3179
in_atomic(): 1, irqs_disabled(): 0, pid: 0, name: swapper/1
Pid: 0, comm: swapper/1 Not tainted 3.9.0-rc6upstream-00001-g3884fad #1
Call Trace:
[<ffffffff810c1fea>] __might_sleep+0xda/0x100
[<ffffffff81194617>] __kmalloc_track_caller+0x1e7/0x2c0
[<ffffffff81303758>] ? kasprintf+0x38/0x40
[<ffffffff813036eb>] kvasprintf+0x5b/0x90
[<ffffffff81303758>] kasprintf+0x38/0x40
[<ffffffff81044510>] xen_setup_timer+0x30/0xb0
[<ffffffff810445af>] xen_hvm_setup_cpu_clockevents+0x1f/0x30
[<ffffffff81666d0a>] start_secondary+0x19c/0x1a8
The solution to that is use kasprintf in the CPU hotplug path
that 'online's the CPU. That is, do it in in xen_hvm_cpu_notify,
and remove the call to in xen_hvm_setup_cpu_clockevents.
Unfortunatly the later is not a good idea as the bootup path
does not use xen_hvm_cpu_notify so we would end up never allocating
timer%d interrupt lines when booting. As such add the check for
atomic() to continue.
CC: stable@vger.kernel.org
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
|
If posted interrupt is avaliable, then uses it to inject virtual
interrupt to guest.
Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
|
|
Only deliver the posted interrupt when target vcpu is running
and there is no previous interrupt pending in pir.
Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
|
|
We already know the trigger mode of a given interrupt when programming
the ioapice entry. So it's not necessary to set it in each interrupt
delivery.
Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
|
|
Both TMR and EOI exit bitmap need to be updated when ioapic changed
or vcpu's id/ldr/dfr changed. So use common function instead eoi exit
bitmap specific function.
Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
|
|
Detect the posted interrupt feature. If it exists, then set it in vmcs_config.
Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
|
|
Posted Interrupt feature requires a special IPI to deliver posted interrupt
to guest. And it should has a high priority so the interrupt will not be
blocked by others.
Normally, the posted interrupt will be consumed by vcpu if target vcpu is
running and transparent to OS. But in some cases, the interrupt will arrive
when target vcpu is scheduled out. And host will see it. So we need to
register a dump handler to handle it.
Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
|
|
The "acknowledge interrupt on exit" feature controls processor behavior
for external interrupt acknowledgement. When this control is set, the
processor acknowledges the interrupt controller to acquire the
interrupt vector on VM exit.
After enabling this feature, an interrupt which arrived when target cpu is
running in vmx non-root mode will be handled by vmx handler instead of handler
in idt. Currently, vmx handler only fakes an interrupt stack and jump to idt
table to let real handler to handle it. Further, we will recognize the interrupt
and only delivery the interrupt which not belong to current vcpu through idt table.
The interrupt which belonged to current vcpu will be handled inside vmx handler.
This will reduce the interrupt handle cost of KVM.
Also, interrupt enable logic is changed if this feature is turnning on:
Before this patch, hypervior call local_irq_enable() to enable it directly.
Now IF bit is set on interrupt stack frame, and will be enabled on a return from
interrupt handler if exterrupt interrupt exists. If no external interrupt, still
call local_irq_enable() to enable it.
Refer to Intel SDM volum 3, chapter 33.2.
Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
|
|
online/offline
While we don't use the spinlock interrupt line (see for details
commit f10cd522c5fbfec9ae3cc01967868c9c2401ed23 -
xen: disable PV spinlocks on HVM) - we should still do the proper
init / deinit sequence. We did not do that correctly and for the
CPU init for PVHVM guest we would allocate an interrupt line - but
failed to deallocate the old interrupt line.
This resulted in leakage of an irq_desc but more importantly this splat
as we online an offlined CPU:
genirq: Flags mismatch irq 71. 0002cc20 (spinlock1) vs. 0002cc20 (spinlock1)
Pid: 2542, comm: init.late Not tainted 3.9.0-rc6upstream #1
Call Trace:
[<ffffffff811156de>] __setup_irq+0x23e/0x4a0
[<ffffffff81194191>] ? kmem_cache_alloc_trace+0x221/0x250
[<ffffffff811161bb>] request_threaded_irq+0xfb/0x160
[<ffffffff8104c6f0>] ? xen_spin_trylock+0x20/0x20
[<ffffffff813a8423>] bind_ipi_to_irqhandler+0xa3/0x160
[<ffffffff81303758>] ? kasprintf+0x38/0x40
[<ffffffff8104c6f0>] ? xen_spin_trylock+0x20/0x20
[<ffffffff810cad35>] ? update_max_interval+0x15/0x40
[<ffffffff816605db>] xen_init_lock_cpu+0x3c/0x78
[<ffffffff81660029>] xen_hvm_cpu_notify+0x29/0x33
[<ffffffff81676bdd>] notifier_call_chain+0x4d/0x70
[<ffffffff810bb2a9>] __raw_notifier_call_chain+0x9/0x10
[<ffffffff8109402b>] __cpu_notify+0x1b/0x30
[<ffffffff8166834a>] _cpu_up+0xa0/0x14b
[<ffffffff816684ce>] cpu_up+0xd9/0xec
[<ffffffff8165f754>] store_online+0x94/0xd0
[<ffffffff8141d15b>] dev_attr_store+0x1b/0x20
[<ffffffff81218f44>] sysfs_write_file+0xf4/0x170
[<ffffffff811a2864>] vfs_write+0xb4/0x130
[<ffffffff811a302a>] sys_write+0x5a/0xa0
[<ffffffff8167ada9>] system_call_fastpath+0x16/0x1b
cpu 1 spinlock event irq -16
smpboot: Booting Node 0 Processor 1 APIC 0x2
And if one looks at the /proc/interrupts right after
offlining (CPU1):
70: 0 0 xen-percpu-ipi spinlock0
71: 0 0 xen-percpu-ipi spinlock1
77: 0 0 xen-percpu-ipi spinlock2
There is the oddity of the 'spinlock1' still being present.
CC: stable@vger.kernel.org
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
|
In the PVHVM path when we do CPU online/offline path we would
leak the timer%d IRQ line everytime we do a offline event. The
online path (xen_hvm_setup_cpu_clockevents via
x86_cpuinit.setup_percpu_clockev) would allocate a new interrupt
line for the timer%d.
But we would still use the old interrupt line leading to:
kernel BUG at /home/konrad/ssd/konrad/linux/kernel/hrtimer.c:1261!
invalid opcode: 0000 [#1] SMP
RIP: 0010:[<ffffffff810b9e21>] [<ffffffff810b9e21>] hrtimer_interrupt+0x261/0x270
.. snip..
<IRQ>
[<ffffffff810445ef>] xen_timer_interrupt+0x2f/0x1b0
[<ffffffff81104825>] ? stop_machine_cpu_stop+0xb5/0xf0
[<ffffffff8111434c>] handle_irq_event_percpu+0x7c/0x240
[<ffffffff811175b9>] handle_percpu_irq+0x49/0x70
[<ffffffff813a74a3>] __xen_evtchn_do_upcall+0x1c3/0x2f0
[<ffffffff813a760a>] xen_evtchn_do_upcall+0x2a/0x40
[<ffffffff8167c26d>] xen_hvm_callback_vector+0x6d/0x80
<EOI>
[<ffffffff81666d01>] ? start_secondary+0x193/0x1a8
[<ffffffff81666cfd>] ? start_secondary+0x18f/0x1a8
There is also the oddity (timer1) in the /proc/interrupts after
offlining CPU1:
64: 1121 0 xen-percpu-virq timer0
78: 0 0 xen-percpu-virq timer1
84: 0 2483 xen-percpu-virq timer2
This patch fixes it.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
CC: stable@vger.kernel.org
|
|
For quite a few Xen versions, this wasn't the IRQ vector anymore
anyway, and it is not being used by the kernel for anything. Hence
drop the field from struct irq_info, and respective function
parameters.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
|
During early setup of a dom0 kernel, populate boot_params with the
Enhanced Disk Drive (EDD) and MBR signature data. This makes
information on the BIOS boot device available in /sys/firmware/edd/.
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
|
Fixes build with CONFIG_EFI_VARS=m which was broken after the commit
"x86, efivars: firmware bug workarounds should be in platform code".
Signed-off-by: Sergey Vlasov <vsu@altlinux.ru>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
|
|
The commit "efi: Distinguish between "remaining space" and actually used
space" added usage of ucs2_*() functions to arch/x86/platform/efi/efi.c,
but the only thing which selected UCS2_STRING was EFI_VARS, which is
technically optional and can be built as a module.
Signed-off-by: Sergey Vlasov <vsu@altlinux.ru>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
|
|
The valid mask for both offcore_response_0 and
offcore_response_1 was wrong for SNB/SNB-EP,
IVB/IVB-EP. It was possible to write to
reserved bit and cause a GP fault crashing
the kernel.
This patch fixes the problem by correctly marking the
reserved bits in the valid mask for all the processors
mentioned above.
A distinction between desktop and server parts is introduced
because bits 24-30 are only available on the server parts.
This version of the patch is just a rebase to perf/urgent tree
and should apply to older kernels as well.
Signed-off-by: Stephane Eranian <eranian@google.com>
Cc: peterz@infradead.org
Cc: jolsa@redhat.com
Cc: gregkh@linuxfoundation.org
Cc: security@kernel.org
Cc: ak@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
All we want to do is return from this function so stop jumping around
like a flea for no good reason.
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1365436666-9837-5-git-send-email-bp@alien8.de
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
|
|
The idea with those routines is to slowly phase them out and not call
them on anything else besides K8. They even have a check for that which,
when called too early, fails. Let me explain:
It gets the cpuinfo_x86 pointer from the per_cpu array and when this
happens for cpu0, before its boot_cpu_data has been copied back to the
per_cpu array in smp_store_boot_cpu_info(), we get an empty struct and
thus the check fails.
Use boot_cpu_data directly instead.
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1365436666-9837-4-git-send-email-bp@alien8.de
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
|
|
Fold it into its single call site. No functionality change.
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1365436666-9837-3-git-send-email-bp@alien8.de
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
|
|
GENERIC_GPIO has been made equivalent to GPIOLIB in architecture code
and all driver code has been switch to depend on GPIOLIB. It is thus
safe to have GENERIC_GPIO removed.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/oleg/misc into perf/core
Pull uprobes updates from Oleg Nesterov:
- "uretprobes" - an optimization to uprobes, like kretprobes are an optimization
to kprobes. "perf probe -x file sym%return" now works like kretprobes.
- PowerPC fixes plus a couple of cleanups/optimizations in uprobes and trace_uprobes.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
The memblock_find_in_range() return value addr is guaranteed
to be within "addr + aper_size" and not beyond GART_MAX_ADDR.
Signed-off-by: Wang YanQing <udknight@gmail.com>
Cc: yinghai@kernel.org
Link: http://lkml.kernel.org/r/20130416013734.GA14641@udknight
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Userspace may deliver RTC interrupt without query the status. So we
want to track RTC EOI for this case.
Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
|
|
restore rtc_status from migration or save/restore
Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
|
|
Add a new parameter to know vcpus who received the interrupt.
Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
|
|
Add vcpu info to ioapic_update_eoi, so we can know which vcpu
issued this EOI.
Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
|
|
EFI implementations distinguish between space that is actively used by a
variable and space that merely hasn't been garbage collected yet. Space
that hasn't yet been garbage collected isn't available for use and so isn't
counted in the remaining_space field returned by QueryVariableInfo().
Combined with commit 68d9298 this can cause problems. Some implementations
don't garbage collect until the remaining space is smaller than the maximum
variable size, and as a result check_var_size() will always fail once more
than 50% of the variable store has been used even if most of that space is
marked as available for garbage collection. The user is unable to create
new variables, and deleting variables doesn't increase the remaining space.
The problem that 68d9298 was attempting to avoid was one where certain
platforms fail if the actively used space is greater than 50% of the
available storage space. We should be able to calculate that by simply
summing the size of each available variable and subtracting that from
the total storage space. With luck this will fix the problem described in
https://bugzilla.kernel.org/show_bug.cgi?id=55471 without permitting
damage to occur to the machines 68d9298 was attempting to fix.
Signed-off-by: Matthew Garrett <matthew.garrett@nebula.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
|
|
EFI variables can be flagged as being accessible only within boot services.
This makes it awkward for us to figure out how much space they use at
runtime. In theory we could figure this out by simply comparing the results
from QueryVariableInfo() to the space used by all of our variables, but
that fails if the platform doesn't garbage collect on every boot. Thankfully,
calling QueryVariableInfo() while still inside boot services gives a more
reliable answer. This patch passes that information from the EFI boot stub
up to the efi platform code.
Signed-off-by: Matthew Garrett <matthew.garrett@nebula.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
|
|
CONFIG_MEMORY_HOTREMOVE
kernel_physical_mapping_remove() is only called by
arch_remove_memory() in init_64.c, which is enclosed in
CONFIG_MEMORY_HOTREMOVE. So when we don't configure
CONFIG_MEMORY_HOTREMOVE, the compiler will give a warning:
warning: ‘kernel_physical_mapping_remove’ defined but not used
So put kernel_physical_mapping_remove() in
CONFIG_MEMORY_HOTREMOVE.
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Cc: linux-mm@kvack.org
Cc: gregkh@linuxfoundation.org
Cc: yinghai@kernel.org
Cc: wency@cn.fujitsu.com
Cc: mgorman@suse.de
Cc: tj@kernel.org
Cc: liwanp@linux.vnet.ibm.com
Link: http://lkml.kernel.org/r/1366019207-27818-3-git-send-email-tangchen@cn.fujitsu.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
As suggested by Peter Anvin.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: H . Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Apparently 'byts' should be 'bytes'.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: H . Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
"Misc fixes"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm: Flush lazy MMU when DEBUG_PAGEALLOC is set
x86/mm/cpa/selftest: Fix false positive in CPA self test
x86/mm/cpa: Convert noop to functional fix
x86, mm: Patch out arch_flush_lazy_mmu_mode() when running on bare metal
x86, mm, paravirt: Fix vmalloc_fault oops during lazy MMU updates
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
"Misc fixlets"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf: Fix error return code
ftrace: Fix strncpy() use, use strlcpy() instead of strncpy()
perf: Fix strncpy() use, use strlcpy() instead of strncpy()
perf: Fix strncpy() use, always make sure it's NUL terminated
perf: Fix ring_buffer perf_output_space() boundary calculation
perf/x86: Fix uninitialized pt_regs in intel_pmu_drain_bts_buffer()
|
|
We only need to update vm_exit_intr_error_code if there is a valid exit
interruption information and it comes with a valid error code.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
|
|
If we are entering guest mode, we do not want L0 to interrupt this
vmentry with all its side effects on the vmcs. Therefore, injection
shall be disallowed during L1->L2 transitions, as in the previous
version. However, this check is conceptually independent of
nested_exit_on_intr, so decouple it.
If L1 traps external interrupts, we can kick the guest from L2 to L1,
also just like the previous code worked. But we no longer need to
consider L1's idt_vectoring_info_field. It will always be empty at this
point. Instead, if L2 has pending events, those are now found in the
architectural queues and will, thus, prevent vmx_interrupt_allowed from
being called at all.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
|
|
The basic idea is to always transfer the pending event injection on
vmexit into the architectural state of the VCPU and then drop it from
there if it turns out that we left L2 to enter L1, i.e. if we enter
prepare_vmcs12.
vmcs12_save_pending_events takes care to transfer pending L0 events into
the queue of L1. That is mandatory as L1 may decide to switch the guest
state completely, invalidating or preserving the pending events for
later injection (including on a different node, once we support
migration).
This concept is based on the rule that a pending vmlaunch/vmresume is
not canceled. Otherwise, we would risk to lose injected events or leak
them into the wrong queues. Encode this rule via a WARN_ON_ONCE at the
entry of nested_vmx_vmexit.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
|
|
Check if the interrupt or NMI window exit is for L1 by testing if it has
the corresponding controls enabled. This is required when we allow
direct injection from L0 to L2
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
|
|
Signed-off-by: Gleb Natapov <gleb@redhat.com>
|
|
Emulation of undefined opcode should inject #UD instead of causing
emulation failure. Do that by moving Undefined flag check to emulation
stage and injection #UD there.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
|
|
guest state
During invalid guest state emulation vcpu cannot enter guest mode to try
to reexecute instruction that emulator failed to emulate, so emulation
will happen again and again. Prevent that by telling the emulator that
instruction reexecution should not be attempted.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
|
|
Unimplemented instruction detection is broken for group instructions
since it relies on "flags" field of opcode to be zero, but all
instructions in a group inherit flags from a group encoding. Fix that by
having a separate flag for unimplemented instructions.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
|
|
Hijack the return address and replace it with a trampoline address.
Signed-off-by: Anton Arapov <anton@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
|