Age | Commit message (Collapse) | Author |
|
There are some AMD CPU models which have thresholding banks but which
cannot generate a thresholding interrupt. This is denoted by the bit
MCi_MISC[IntP]. Make sure to check that bit before assigning the
thresholding interrupt handler.
Signed-off-by: Chen Yucong <slaoub@gmail.com>
[ Boris: save an indentation level and rewrite commit message. ]
Link: http://lkml.kernel.org/r/1412662128.28440.18.camel@debian
Signed-off-by: Borislav Petkov <bp@suse.de>
|
|
Rusty noticed a Really Bad Bug (tm) in my NT fix. The entry code
reads out of bounds, causing the NT fix to be unreliable. But, and
this is much, much worse, if your stack is somehow just below the
top of the direct map (or a hole), you read out of bounds and crash.
Excerpt from the crash:
[ 1.129513] RSP: 0018:ffff88001da4bf88 EFLAGS: 00010296
2b:* f7 84 24 90 00 00 00 testl $0x4000,0x90(%rsp)
That read is deterministically above the top of the stack. I
thought I even single-stepped through this code when I wrote it to
check the offset, but I clearly screwed it up.
Fixes: 8c7aa698baca ("x86_64, entry: Filter RFLAGS.NT on entry from userspace")
Reported-by: Rusty Russell <rusty@ozlabs.org>
Cc: stable@vger.kernel.org
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
"Fixes from all around the place:
- hyper-V 32-bit PAE guest kernel fix
- two IRQ allocation fixes on certain x86 boards
- intel-mid boot crash fix
- intel-quark quirk
- /proc/interrupts duplicate irq chip name fix
- cma boot crash fix
- syscall audit fix
- boot crash fix with certain TSC configurations (seen on Qemu)
- smpboot.c build warning fix"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, pageattr: Prevent overflow in slow_virt_to_phys() for X86_PAE
ACPI, irq, x86: Return IRQ instead of GSI in mp_register_gsi()
x86, intel-mid: Create IRQs for APB timers and RTC timers
x86: Don't enable F00F workaround on Intel Quark processors
x86/irq: Fix XT-PIC-XT-PIC in /proc/interrupts
x86, cma: Reserve DMA contiguous area after initmem_init()
i386/audit: stop scribbling on the stack frame
x86, apic: Handle a bad TSC more gracefully
x86: ACPI: Do not translate GSI number if IOAPIC is disabled
x86/smpboot: Move data structure to its primary usage scope
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
"Various scheduler fixes all over the place: three SCHED_DL fixes,
three sched/numa fixes, two generic race fixes and a comment fix"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/dl: Fix preemption checks
sched: Update comments for CLONE_NEWNS
sched: stop the unbound recursion in preempt_schedule_context()
sched/fair: Fix division by zero sysctl_numa_balancing_scan_size
sched/fair: Care divide error in update_task_scan_period()
sched/numa: Fix unsafe get_task_struct() in task_numa_assign()
sched/deadline: Fix races between rt_mutex_setprio() and dl_task_timer()
sched/deadline: Don't replenish from a !SCHED_DEADLINE entity
sched: Fix race between task_group and sched_task_group
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
"Mostly tooling fixes, plus on the kernel side:
- a revert for a newly introduced PMU driver which isn't complete yet
and where we ran out of time with fixes (to be tried again in
v3.19) - this makes up for a large chunk of the diffstat.
- compilation warning fixes
- a printk message fix
- event_idx usage fixes/cleanups"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf probe: Trivial typo fix for --demangle
perf tools: Fix report -F dso_from for data without branch info
perf tools: Fix report -F dso_to for data without branch info
perf tools: Fix report -F symbol_from for data without branch info
perf tools: Fix report -F symbol_to for data without branch info
perf tools: Fix report -F mispredict for data without branch info
perf tools: Fix report -F in_tx for data without branch info
perf tools: Fix report -F abort for data without branch info
perf tools: Make CPUINFO_PROC an array to support different kernel versions
perf callchain: Use global caching provided by libunwind
perf/x86/intel: Revert incomplete and undocumented Broadwell client support
perf/x86: Fix compile warnings for intel_uncore
perf: Fix typos in sample code in the perf_event.h header
perf: Fix and clean up initialization of pmu::event_idx
perf: Fix bogus kernel printk
perf diff: Add missing hists__init() call at tool start
|
|
The file /sys/kernel/debug/tracing/eneabled_functions is used to debug
ftrace function hooks. Add to the output what function is being called
by the trampoline if the arch supports it.
Add support for this feature in x86_64.
Cc: H. Peter Anvin <hpa@linux.intel.com>
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Tested-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
The current method of handling multiple function callbacks is to register
a list function callback that calls all the other callbacks based on
their hash tables and compare it to the function that the callback was
called on. But this is very inefficient.
For example, if you are tracing all functions in the kernel and then
add a kprobe to a function such that the kprobe uses ftrace, the
mcount trampoline will switch from calling the function trace callback
to calling the list callback that will iterate over all registered
ftrace_ops (in this case, the function tracer and the kprobes callback).
That means for every function being traced it checks the hash of the
ftrace_ops for function tracing and kprobes, even though the kprobes
is only set at a single function. The kprobes ftrace_ops is checked
for every function being traced!
Instead of calling the list function for functions that are only being
traced by a single callback, we can call a dynamically allocated
trampoline that calls the callback directly. The function graph tracer
already uses a direct call trampoline when it is being traced by itself
but it is not dynamically allocated. It's trampoline is static in the
kernel core. The infrastructure that called the function graph trampoline
can also be used to call a dynamically allocated one.
For now, only ftrace_ops that are not dynamically allocated can have
a trampoline. That is, users such as function tracer or stack tracer.
kprobes and perf allocate their ftrace_ops, and until there's a safe
way to free the trampoline, it can not be used. The dynamically allocated
ftrace_ops may, although, use the trampoline if the kernel is not
compiled with CONFIG_PREEMPT. But that will come later.
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Tested-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
In order to access the shadow VMCS, we need to load it. At this point,
vmx->loaded_vmcs->vmcs and the actually loaded one start to differ. If
we now get preempted by Linux, vmx_vcpu_put and, on return, the
vmx_vcpu_load will work against the wrong vmcs. That can cause
copy_shadow_to_vmcs12 to corrupt the vmcs12 state.
Fix the issue by disabling preemption during the copy operation.
copy_vmcs12_to_shadow is safe from this issue as it is executed by
vmx_vcpu_run when preemption is already disabled before vmentry.
This bug is exposed by running Jailhouse within KVM on CPUs with
shadow VMCS support. Jailhouse never expects an interrupt pending
vmexit, but the bug can cause it if, after copy_shadow_to_vmcs12
is preempted, the active VMCS happens to have the virtual interrupt
pending flag set in the CPU-based execution controls.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Commit d1442d85cc30 ("KVM: x86: Handle errors when RIP is set during far
jumps") introduced a bug that caused the fix to be incomplete. Due to
incorrect evaluation, far jump to segment with L bit cleared (i.e., 32-bit
segment) and RIP with any of the high bits set (i.e, RIP[63:32] != 0) set may
not trigger #GP. As we know, this imposes a security problem.
In addition, the condition for two warnings was incorrect.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
[Add #ifdef CONFIG_X86_64 to avoid complaints of undefined behavior. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Emulation of code that is 14 bytes to the segment limit or closer
(e.g. RIP = 0xFFFFFFF2 after reset) is broken because we try to read as
many as 15 bytes from the beginning of the instruction, and __linearize
fails when the passed (address, size) pair reaches out of the segment.
To fix this, let __linearize return the maximum accessible size (clamped
to 2^32-1) for usage in __do_insn_fetch_bytes, and avoid the limit check
by passing zero for the desired size.
For expand-down segments, __linearize is performing a redundant check.
(u32)(addr.ea + size - 1) <= lim can only happen if addr.ea is close
to 4GB; in this case, addr.ea + size - 1 will also fail the check against
the upper bound of the segment (which is provided by the D/B bit).
After eliminating the redundant check, it is simple to compute
the *max_size for expand-down segments too.
Now that the limit check is done in __do_insn_fetch_bytes, we want
to inject a general protection fault there if size < op_size (like
__linearize would have done), instead of just aborting.
This fixes booting Tiano Core from emulated flash with EPT disabled.
Cc: stable@vger.kernel.org
Fixes: 719d5a9b2487e0562f178f61e323c3dc18a8b200
Reported-by: Borislav Petkov <bp@suse.de>
Tested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
The error code for #GP and #SS is zero when the segment is used to
access an operand or an instruction. It is only non-zero when
a segment register is being loaded; for limit checks this means
cases such as:
* for #GP, when RIP is beyond the limit on a far call (before the first
instruction is executed). We do not implement this check, but it
would be in em_jmp_far/em_call_far.
* for #SS, if the new stack overflows during an inter-privilege-level
call to a non-conforming code segment. We do not implement stack
switching at all.
So use an error code of zero.
Reviewed-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
These patches:
86a349a28b24 ("perf/x86/intel: Add Broadwell core support")
c46e665f0377 ("perf/x86: Add INST_RETIRED.ALL workarounds")
fdda3c4aacec ("perf/x86/intel: Use Broadwell cache event list for Haswell")
introduced magic constants and unexplained changes:
https://lkml.org/lkml/2014/10/28/1128
https://lkml.org/lkml/2014/10/27/325
https://lkml.org/lkml/2014/8/27/546
https://lkml.org/lkml/2014/10/28/546
Peter Zijlstra has attempted to help out, to clean up the mess:
https://lkml.org/lkml/2014/10/28/543
But has not received helpful and constructive replies which makes
me doubt wether it can all be finished in time until v3.18 is
released.
Despite various review feedback the author (Andi Kleen) has answered
only few of the review questions and has generally been uncooperative,
only giving replies when prompted repeatedly, and only giving minimal
answers instead of constructively explaining and helping along the effort.
That kind of behavior is not acceptable.
There's also a boot crash on Intel E5-1630 v3 CPUs reported for another
commit from Andi Kleen:
e735b9db12d7 ("perf/x86/intel/uncore: Add Haswell-EP uncore support")
https://lkml.org/lkml/2014/10/22/730
Which is not yet resolved. The uncore driver is independent in theory,
but the crash makes me worry about how well all these patches were
tested and makes me uneasy about the level of interminging that the
Broadwell and Haswell code has received by the commits above.
As a first step to resolve the mess revert the Broadwell client commits
back to the v3.17 version, before we run out of time and problematic
code hits a stable upstream kernel.
( If the Haswell-EP crash is not resolved via a simple fix then we'll have
to revert the Haswell-EP uncore driver as well. )
The Broadwell client series has to be submitted in a clean fashion, with
single, well documented changes per patch. If they are submitted in time
and are accepted during review then they can possibly go into v3.19 but
will need additional scrutiny due to the rocky history of this patch set.
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: eranian@google.com
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1409683455-29168-3-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
pte_pfn() returns a PFN of long (32 bits in 32-PAE), so "long <<
PAGE_SHIFT" will overflow for PFNs above 4GB.
Due to this issue, some Linux 32-PAE distros, running as guests on Hyper-V,
with 5GB memory assigned, can't load the netvsc driver successfully and
hence the synthetic network device can't work (we can use the kernel parameter
mem=3000M to work around the issue).
Cast pte_pfn() to phys_addr_t before shifting.
Fixes: "commit d76565344512: x86, mm: Create slow_virt_to_phys()"
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: gregkh@linuxfoundation.org
Cc: linux-mm@kvack.org
Cc: olaf@aepfle.de
Cc: apw@canonical.com
Cc: jasowang@redhat.com
Cc: dave.hansen@intel.com
Cc: riel@redhat.com
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1414580017-27444-1-git-send-email-decui@microsoft.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Function mp_register_gsi() returns blindly the GSI number for the ACPI
SCI interrupt. That causes a regression when the GSI for ACPI SCI is
shared with other devices.
The regression was caused by commit 84245af7297ced9e8fe "x86, irq, ACPI:
Change __acpi_register_gsi to return IRQ number instead of GSI" and
exposed on a SuperMicro system, which shares one GSI between ACPI SCI
and PCI device, with following failure:
http://sourceforge.net/p/linux1394/mailman/linux1394-user/?viewmonth=201410
[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 20 low
level)
[ 2.699224] firewire_ohci 0000:06:00.0: failed to allocate interrupt
20
Return mp_map_gsi_to_irq(gsi, 0) instead of the GSI number.
Reported-and-Tested-by: Daniel Robbins <drobbins@funtoo.org>
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Len Brown <len.brown@intel.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: <stable@vger.kernel.org> # 3.17
Link: http://lkml.kernel.org/r/1414387308-27148-4-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Intel MID platforms has no legacy interrupts, so no IRQ descriptors
preallocated. We need to call mp_map_gsi_to_irq() to create IRQ
descriptors for APB timers and RTC timers, otherwise it may cause
invalid memory access as:
[ 0.116839] BUG: unable to handle kernel NULL pointer dereference at
0000003a
[ 0.123803] IP: [<c1071c0e>] setup_irq+0xf/0x4d
Tested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: David Cohen <david.a.cohen@linux.intel.com>
Cc: <stable@vger.kernel.org> # 3.17
Link: http://lkml.kernel.org/r/1414387308-27148-3-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
The Intel Quark processor is a part of family 5, but does not have the
F00F bug present in Pentiums of the same family.
Pentiums were models 0 through 8, Quark is model 9.
Signed-off-by: Dave Jones <davej@redhat.com>
Cc: Bryan O'Donoghue <pure.logic@nexus-software.ie>
Link: http://lkml.kernel.org/r/20141028175753.GA12743@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
It is a problem when configuring high memory off where the
vmalloc reserve area could end up overlapping the early_ioremap
fixmap area on i386.
The ordering of the VMALLOC_RESERVE space is:
FIXADDR_TOP
fixed_addresses
FIXADDR_START
early_ioremap fixed addresses
FIXADDR_BOOT_START
Persistent kmap area
PKMAP_BASE
VMALLOC_END
Vmalloc area
VMALLOC_START
high_memory
The available address we can use is lower than
FIXADDR_BOOT_START. So we will set the kmap boundary below the
FIXADDR_BOOT_START, if we configure high memory.
If we configure high memory, the vmalloc reserve area should
end up to PKMAP_BASE, otherwise should end up to
FIXADDR_BOOT_START.
Signed-off-by: Minfei Huang <mnfhuang@gmail.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/6B680A9E-6CE9-4C96-934B-CB01DCB58278@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Fix duplicate XT-PIC seen in /proc/interrupts on x86 systems
that make use of 8259A Programmable Interrupt Controllers.
Specifically convert output like this:
CPU0
0: 76573 XT-PIC-XT-PIC timer
1: 11 XT-PIC-XT-PIC i8042
2: 0 XT-PIC-XT-PIC cascade
4: 8 XT-PIC-XT-PIC serial
6: 3 XT-PIC-XT-PIC floppy
7: 0 XT-PIC-XT-PIC parport0
8: 1 XT-PIC-XT-PIC rtc0
10: 448 XT-PIC-XT-PIC fddi0
12: 23 XT-PIC-XT-PIC eth0
14: 2464 XT-PIC-XT-PIC ide0
NMI: 0 Non-maskable interrupts
ERR: 0
to one like this:
CPU0
0: 122033 XT-PIC timer
1: 11 XT-PIC i8042
2: 0 XT-PIC cascade
4: 8 XT-PIC serial
6: 3 XT-PIC floppy
7: 0 XT-PIC parport0
8: 1 XT-PIC rtc0
10: 145 XT-PIC fddi0
12: 31 XT-PIC eth0
14: 2245 XT-PIC ide0
NMI: 0 Non-maskable interrupts
ERR: 0
that is one like we used to have from ~2.2 till it was changed
sometime.
The rationale is there is no value in this duplicate
information, it merely clutters output and looks ugly. We only
have one handler for 8259A interrupts so there is no need to
give it a name separate from the name already given to
irq_chip.
We could define meaningful names for handlers based on bits in
the ELCR register on systems that have it or the value of the
LTIM bit we use in ICW1 otherwise (hardcoded to 0 though with
MCA support gone), to tell edge-triggered and level-triggered
inputs apart. While that information does not affect 8259A
interrupt handlers it could help people determine which lines
are shareable and which are not. That is material for a
separate change though.
Any tools that parse /proc/interrupts are supposed not to be
affected since it was many years we used the format this change
converts back to.
Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/alpine.LFD.2.11.1410260147190.21390@eddie.linux-mips.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Drop double entry for pt_regs_bx.
This seems to be a typo - resulting in a double entry in the
generated include/generated/asm-offsets.h, which is not necessary.
Build-tested and booted on x86 64 box to make sure it was not
doing any strange magic.... after all it was in the kernel in
this form for almost 10 years.
Signed-off-by: Nicholas Mc Guire <der.herr@hofr.at>
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20141027172805.GA19760@opentech.at
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Now vdso/vma.c has a single initcall and no references to
"vsyscall".
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/945c463e2804fedd8b08d63a040cbe85d55195aa.1411494540.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
I think that the jiffies vvar was once used for the vgetcpu
cache. That code is long gone, so let's just make jiffies be a
normal variable.
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/fcfee6f8749af14d96373a9e2656354ad0b95499.1411494540.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
IMO users ought not to be able to use 16-bit segments without
using modify_ldt. Fortunately, it's impossible to break
espfix64 by loading the PER_CPU segment into SS because it's
PER_CPU is marked read-only and SS cannot contain an RO segment,
but marking PER_CPU as 32-bit is less fragile.
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/179f490d659307873eefd09206bebd417e2ab5ad.1411494540.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
The first userspace attempt to read or write the PER_CPU segment
will write the accessed bit to the GDT. This is visible to
userspace using the LAR instruction, and it also pointlessly
dirties a cache line.
Set the segment's accessed bit at boot to prevent userspace
access to segments from having side effects.
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/ac63814ca4c637a08ec2fd0360d67ca67560a9ee.1411494540.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
This makes it easier to see what's going on. It produces
exactly the same segment descriptor as the old code.
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/d492f7b55136cbc60f016adae79160707b2e03b7.1411494540.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
This is pure cut-and-paste. At this point, vsyscall_64.c
contains only code needed for vsyscall emulation, but some of
the comments and function names are still confused.
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/a244daf7d3cbe71afc08ad09fdfe1866ca1f1978.1411494540.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
This code exists for the sole purpose of making the vsyscall
page look sort of like real userspace memory. Move it so that
it lives with the rest of the vsyscall code.
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/a7ee266773671a05f00b7175ca65a0dd812d2e4b.1411494540.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
The code is correct, but only for a rather subtle reason. This
confused me for quite a while when I read switch_mm, so clarify
the code to avoid confusing other people, too.
TBH, I wouldn't be surprised if this code was only correct by
accident.
[ I wouldn't normally send a comment-only patch, but it took me a long
time to first figure out wtf was going on here, and then to figure
out why this wasn't exploitable by malicious code, and then to
figure out why this oddity had no user-visible effect at all. Let's
spare future readers the same confusion. ]
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/36275c99801a87d8dcf0502a41cf4e2ad81aae46.1412623954.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Now that the kernel always runs with clean flags (in particular,
NT is clear), there is no need to save and restore flags on
every context switch.
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Sebastian Lackner <sebastian@fds-team.de>
Cc: Anish Bhatt <anish@chelsio.com>
Cc: Chuck Ebbert <cebbert.lkml@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jan Beulich <jbeulich@suse.com>
Link: http://lkml.kernel.org/r/bf6fb790787eb95b922157838f52712c25dda157.1412187233.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
The uncore drivers require PCI and generate compile time warnings when
!CONFIG_PCI.
Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Andy spotted the fail in what was intended as a conditional printk level.
Reported-by: Andy Lutomirski <luto@amacapital.net>
Fixes: cc6cd47e7395 ("perf/x86: Tone down kernel messages when the PMU check fails in a virtual environment")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20141007124757.GH19379@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
task_preempt_count() is pointless if preemption counter is per-cpu,
currently this is x86 only. It is only valid if the task is not
running, and even in this case the only info it can provide is the
state of PREEMPT_ACTIVE bit.
Change its single caller to check p->on_rq instead, this should be
the same if p->state != TASK_RUNNING, and kill this helper.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Kirill Tkhai <tkhai@yandex.ru>
Cc: Alexander Graf <agraf@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christoph Lameter <cl@linux.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-arch@vger.kernel.org
Link: http://lkml.kernel.org/r/20141008183348.GC17495@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
preempt_schedule_context() does preempt_enable_notrace() at the end
and this can call the same function again; exception_exit() is heavy
and it is quite possible that need-resched is true again.
1. Change this code to dec preempt_count() and check need_resched()
by hand.
2. As Linus suggested, we can use the PREEMPT_ACTIVE bit and avoid
the enable/disable dance around __schedule(). But in this case
we need to move into sched/core.c.
3. Cosmetic, but x86 forgets to declare this function. This doesn't
really matter because it is only called by asm helpers, still it
make sense to add the declaration into asm/preempt.h to match
preempt_schedule().
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Graf <agraf@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Anvin <hpa@zytor.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Chuck Ebbert <cebbert.lkml@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/20141005202322.GB27962@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Fengguang Wu reported a boot crash on the x86 platform
via the 0-day Linux Kernel Performance Test:
cma: dma_contiguous_reserve: reserving 31 MiB for global area
BUG: Int 6: CR2 (null)
[<41850786>] dump_stack+0x16/0x18
[<41d2b1db>] early_idt_handler+0x6b/0x6b
[<41072227>] ? __phys_addr+0x2e/0xca
[<41d4ee4d>] cma_declare_contiguous+0x3c/0x2d7
[<41d6d359>] dma_contiguous_reserve_area+0x27/0x47
[<41d6d4d1>] dma_contiguous_reserve+0x158/0x163
[<41d33e0f>] setup_arch+0x79b/0xc68
[<41d2b7cf>] start_kernel+0x9c/0x456
[<41d2b2ca>] i386_start_kernel+0x79/0x7d
(See details at: https://lkml.org/lkml/2014/10/8/708)
It is because dma_contiguous_reserve() is called before
initmem_init() in x86, the variable high_memory is not
initialized but accessed by __pa(high_memory) in
dma_contiguous_reserve().
This patch moves dma_contiguous_reserve() after initmem_init()
so that high_memory is initialized before accessed.
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Weijie Yang <weijie.yang@samsung.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Marek Szyprowski <m.szyprowski@samsung.com>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Cc: iamjoonsoo.kim@lge.com
Cc: 'Linux-MM' <linux-mm@kvack.org>
Cc: 'Weijie Yang' <weijie.yang.kh@gmail.com>
Link: http://lkml.kernel.org/r/000101cfef69%2431e528a0%2495af79e0%24%25yang@samsung.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
git://anongit.freedesktop.org/drm-intel into drm-next
Ok, new attempt, this time around with full ppgtt disabled again.
drm-intel-next-2014-10-03:
- first batch of skl stage 1 enabling
- fixes from Rodrigo to the PSR, fbc and sink crc code
- kerneldoc for the frontbuffer tracking code, runtime pm code and the basic
interrupt enable/disable functions
- smaller stuff all over
drm-intel-next-2014-09-19:
- bunch more i830M fixes from Ville
- full ppgtt now again enabled by default
- more ppgtt fixes from Michel Thierry and Chris Wilson
- plane config work from Gustavo Padovan
- spinlock clarifications
- piles of smaller improvements all over, as usual
* tag 'drm-intel-next-2014-10-03-no-ppgtt' of git://anongit.freedesktop.org/drm-intel: (114 commits)
Revert "drm/i915: Enable full PPGTT on gen7"
drm/i915: Update DRIVER_DATE to 20141003
drm/i915: Remove the duplicated logic between the two shrink phases
drm/i915: kerneldoc for interrupt enable/disable functions
drm/i915: Use dev_priv instead of dev in irq setup functions
drm/i915: s/pm._irqs_disabled/pm.irqs_enabled/
drm/i915: Clear TX FIFO reset master override bits on chv
drm/i915: Make sure hardware uses the correct swing margin/deemph bits on chv
drm/i915: make sink_crc return -EIO on aux read/write failure
drm/i915: Constify send buffer for intel_dp_aux_ch
drm/i915: De-magic the PSR AUX message
drm/i915: Reinstate error level message for non-simulated gpu hangs
drm/i915: Kerneldoc for intel_runtime_pm.c
drm/i915: Call runtime_pm_disable directly
drm/i915: Move intel_display_set_init_power to intel_runtime_pm.c
drm/i915: Bikeshed rpm functions name a bit.
drm/i915: Extract intel_runtime_pm.c
drm/i915: Remove intel_modeset_suspend_hw
drm/i915: spelling fixes for frontbuffer tracking kerneldoc
drm/i915: Tighting frontbuffer tracking around flips
...
|
|
To query resources in ACPI systems we need to define a data structure. This
would be set as platform_info for the devices.
Signed-off-by: Subhransu S. Prusty <subhransu.s.prusty@intel.com>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
This will be used to abstract the differances in ipc offsets for different
chips.
Signed-off-by: Subhransu S. Prusty <subhransu.s.prusty@intel.com>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
git commit b4f0d3755c5e9cc86292d5fd78261903b4f23d4a was very very dumb.
It was writing over %esp/pt_regs semi-randomly on i686 with the expected
"system can't boot" results. As noted in:
https://bugs.freedesktop.org/show_bug.cgi?id=85277
This patch stops fscking with pt_regs. Instead it sets up the registers
for the call to __audit_syscall_entry in the most obvious conceivable
way. It then does just a tiny tiny touch of magic. We need to get what
started in PT_EDX into 0(%esp) and PT_ESI into 4(%esp). This is as easy
as a pair of pushes.
After the call to __audit_syscall_entry all we need to do is get that
now useless junk off the stack (pair of pops) and reload %eax with the
original syscall so other stuff can keep going about it's business.
Reported-by: Paulo Zanoni <przanoni@gmail.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
Link: http://lkml.kernel.org/r/1414037043-30647-1-git-send-email-eparis@redhat.com
Cc: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
|
|
Reason:
Need to apply audit patch on top of v3.18-rc1.
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
|
|
Pull kvm fixes from Paolo Bonzini:
"This is a pretty large update. I think it is roughly as big as what I
usually had for the _whole_ rc period.
There are a few bad bugs where the guest can OOPS or crash the host.
We have also started looking at attack models for nested
virtualization; bugs that usually result in the guest ring 0 crashing
itself become more worrisome if you have nested virtualization,
because the nested guest might bring down the non-nested guest as
well. For current uses of nested virtualization these do not really
have a security impact, but you never know and bugs are bugs
nevertheless.
A lot of these bugs are in 3.17 too, resulting in a large number of
stable@ Ccs. I checked that all the patches apply there with no
conflicts"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
kvm: vfio: fix unregister kvm_device_ops of vfio
KVM: x86: Wrong assertion on paging_tmpl.h
kvm: fix excessive pages un-pinning in kvm_iommu_map error path.
KVM: x86: PREFETCH and HINT_NOP should have SrcMem flag
KVM: x86: Emulator does not decode clflush well
KVM: emulate: avoid accessing NULL ctxt->memopp
KVM: x86: Decoding guest instructions which cross page boundary may fail
kvm: x86: don't kill guest on unknown exit reason
kvm: vmx: handle invvpid vm exit gracefully
KVM: x86: Handle errors when RIP is set during far jumps
KVM: x86: Emulator fixes for eip canonical checks on near branches
KVM: x86: Fix wrong masking on relative jump/call
KVM: x86: Improve thread safety in pit
KVM: x86: Prevent host from panicking on shared MSR writes.
KVM: x86: Check non-canonical addresses upon WRMSR
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen bug fixes from David Vrabel:
- Fix regression in xen_clocksource_read() which caused all Xen guests
to crash early in boot.
- Several fixes for super rare race conditions in the p2m.
- Assorted other minor fixes.
* tag 'stable/for-linus-3.18-b-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen/pci: Allocate memory for physdev_pci_device_add's optarr
x86/xen: panic on bad Xen-provided memory map
x86/xen: Fix incorrect per_cpu accessor in xen_clocksource_read()
x86/xen: avoid race in p2m handling
x86/xen: delay construction of mfn_list_list
x86/xen: avoid writing to freed memory after race in p2m handling
xen/balloon: Don't continue ballooning when BP_ECANCELED is encountered
|
|
Even after the recent fix, the assertion on paging_tmpl.h is triggered.
Apparently, the assertion wants to check that the PAE is always set on
long-mode, but does it in incorrect way. Note that the assertion is not
enabled unless the code is debugged by defining MMU_DEBUG.
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
The decode phase of the x86 emulator assumes that every instruction with the
ModRM flag, and which can be used with RIP-relative addressing, has either
SrcMem or DstMem. This is not the case for several instructions - prefetch,
hint-nop and clflush.
Adding SrcMem|NoAccess for prefetch and hint-nop and SrcMem for clflush.
This fixes CVE-2014-8480.
Fixes: 41061cdb98a0bec464278b4db8e894a3121671f5
Cc: stable@vger.kernel.org
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Currently, all group15 instructions are decoded as clflush (e.g., mfence,
xsave). In addition, the clflush instruction requires no prefix (66/f2/f3)
would exist. If prefix exists it may encode a different instruction (e.g.,
clflushopt).
Creating a group for clflush, and different group for each prefix.
This has been the case forever, but the next patch needs the cflush group
in order to fix a bug introduced in 3.17.
Fixes: 41061cdb98a0bec464278b4db8e894a3121671f5
Cc: stable@vger.kernel.org
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
A failure to decode the instruction can cause a NULL pointer access.
This is fixed simply by moving the "done" label as close as possible
to the return.
This fixes CVE-2014-8481.
Reported-by: Andy Lutomirski <luto@amacapital.net>
Cc: stable@vger.kernel.org
Fixes: 41061cdb98a0bec464278b4db8e894a3121671f5
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Once an instruction crosses a page boundary, the size read from the second page
disregards the common case that part of the operand resides on the first page.
As a result, fetch of long insturctions may fail, and thereby cause the
decoding to fail as well.
Cc: stable@vger.kernel.org
Fixes: 5cfc7e0f5e5e1adf998df94f8e36edaf5d30d38e
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
KVM_EXIT_UNKNOWN is a kvm bug, we don't really know whether it was
triggered by a priveledged application. Let's not kill the guest: WARN
and inject #UD instead.
Cc: stable@vger.kernel.org
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
On systems with invvpid instruction support (corresponding bit in
IA32_VMX_EPT_VPID_CAP MSR is set) guest invocation of invvpid
causes vm exit, which is currently not handled and results in
propagation of unknown exit to userspace.
Fix this by installing an invvpid vm exit handler.
This is CVE-2014-3646.
Cc: stable@vger.kernel.org
Signed-off-by: Petr Matousek <pmatouse@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Far jmp/call/ret may fault while loading a new RIP. Currently KVM does not
handle this case, and may result in failed vm-entry once the assignment is
done. The tricky part of doing so is that loading the new CS affects the
VMCS/VMCB state, so if we fail during loading the new RIP, we are left in
unconsistent state. Therefore, this patch saves on 64-bit the old CS
descriptor and restores it if loading RIP failed.
This fixes CVE-2014-3647.
Cc: stable@vger.kernel.org
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Before changing rip (during jmp, call, ret, etc.) the target should be asserted
to be canonical one, as real CPUs do. During sysret, both target rsp and rip
should be canonical. If any of these values is noncanonical, a #GP exception
should occur. The exception to this rule are syscall and sysenter instructions
in which the assigned rip is checked during the assignment to the relevant
MSRs.
This patch fixes the emulator to behave as real CPUs do for near branches.
Far branches are handled by the next patch.
This fixes CVE-2014-3647.
Cc: stable@vger.kernel.org
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Relative jumps and calls do the masking according to the operand size, and not
according to the address size as the KVM emulator does today.
This patch fixes KVM behavior.
Cc: stable@vger.kernel.org
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|