Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu
Pull percpu consistent-ops changes from Tejun Heo:
"Way back, before the current percpu allocator was implemented, static
and dynamic percpu memory areas were allocated and handled separately
and had their own accessors. The distinction has been gone for many
years now; however, the now duplicate two sets of accessors remained
with the pointer based ones - this_cpu_*() - evolving various other
operations over time. During the process, we also accumulated other
inconsistent operations.
This pull request contains Christoph's patches to clean up the
duplicate accessor situation. __get_cpu_var() uses are replaced with
with this_cpu_ptr() and __this_cpu_ptr() with raw_cpu_ptr().
Unfortunately, the former sometimes is tricky thanks to C being a bit
messy with the distinction between lvalues and pointers, which led to
a rather ugly solution for cpumask_var_t involving the introduction of
this_cpu_cpumask_var_ptr().
This converts most of the uses but not all. Christoph will follow up
with the remaining conversions in this merge window and hopefully
remove the obsolete accessors"
* 'for-3.18-consistent-ops' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (38 commits)
irqchip: Properly fetch the per cpu offset
percpu: Resolve ambiguities in __get_cpu_var/cpumask_var_t -fix
ia64: sn_nodepda cannot be assigned to after this_cpu conversion. Use __this_cpu_write.
percpu: Resolve ambiguities in __get_cpu_var/cpumask_var_t
Revert "powerpc: Replace __get_cpu_var uses"
percpu: Remove __this_cpu_ptr
clocksource: Replace __this_cpu_ptr with raw_cpu_ptr
sparc: Replace __get_cpu_var uses
avr32: Replace __get_cpu_var with __this_cpu_write
blackfin: Replace __get_cpu_var uses
tile: Use this_cpu_ptr() for hardware counters
tile: Replace __get_cpu_var uses
powerpc: Replace __get_cpu_var uses
alpha: Replace __get_cpu_var
ia64: Replace __get_cpu_var uses
s390: cio driver &__get_cpu_var replacements
s390: Replace __get_cpu_var uses
mips: Replace __get_cpu_var uses
MIPS: Replace __get_cpu_var uses in FPU emulator.
arm: Replace __this_cpu_ptr with raw_cpu_ptr
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
"Misc smaller fixes that missed the v3.17 cycle"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/build: Add arch/x86/purgatory/ make generated files to gitignore
x86: Fix section conflict for numachip
x86: Reject x32 executables if x32 ABI not supported
x86_64, entry: Filter RFLAGS.NT on entry from userspace
x86, boot, kaslr: Fix nuisance warning on 32-bit builds
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 seccomp changes from Ingo Molnar:
"This tree includes x86 seccomp filter speedups and related preparatory
work, which touches core seccomp facilities as well.
The main idea is to split seccomp into two phases, to be able to enter
a simple fast path for syscalls with ptrace side effects.
There's no substantial user-visible (and ABI) effects expected from
this, except a change in how we emit a better audit record for
SECCOMP_RET_TRACE events"
* 'x86-seccomp-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86_64, entry: Use split-phase syscall_trace_enter for 64-bit syscalls
x86_64, entry: Treat regs->ax the same in fastpath and slowpath syscalls
x86: Split syscall_trace_enter into two phases
x86, entry: Only call user_exit if TIF_NOHZ
x86, x32, audit: Fix x32's AUDIT_ARCH wrt audit
seccomp: Document two-phase seccomp and arch-provided seccomp_data
seccomp: Allow arch code to provide seccomp_data
seccomp: Refactor the filter callback and the API
seccomp,x86,arm,mips,s390: Remove nr parameter from secure_computing
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 mm updates from Ingo Molnar:
"This tree includes the following changes:
- fix memory hotplug
- fix hibernation bootup memory layout assumptions
- fix hyperv numa guest kernel messages
- remove dead code
- update documentation"
* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm: Update memory map description to list hypervisor-reserved area
x86/mm, hibernate: Do not assume the first e820 area to be RAM
x86/mm/numa: Drop dead code and rename setup_node_data() to setup_alloc_data()
x86/mm/hotplug: Modify PGD entry when removing memory
x86/mm/hotplug: Pass sync_global_pgds() a correct argument in remove_pagetable()
x86: Remove set_pmd_pfn
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 microcode loading updates from Ingo Molnar:
"Misc smaller cleanups"
* 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, microcode, intel: Fix total_size computation
x86, microcode, intel: Rename apply_microcode and declare it static
x86, microcode, intel: Fix typos
x86, microcode, intel: Add missing static declarations
x86, microcode, amd: Fix missing static declaration
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 FPU updates from Ingo Molnar:
"x86 FPU handling fixes, cleanups and enhancements from Oleg.
The signal handling race fix and the __restore_xstate_sig() preemption
fix for eager-mode is marked for -stable as well"
* 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86: copy_thread: Don't nullify ->ptrace_bps twice
x86, fpu: Shift "fpu_counter = 0" from copy_thread() to arch_dup_task_struct()
x86, fpu: copy_process: Sanitize fpu->last_cpu initialization
x86, fpu: copy_process: Avoid fpu_alloc/copy if !used_math()
x86, fpu: Change __thread_fpu_begin() to use use_eager_fpu()
x86, fpu: __restore_xstate_sig()->math_state_restore() needs preempt_disable()
x86, fpu: shift drop_init_fpu() from save_xstate_sig() to handle_signal()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cpufeature updates from Ingo Molnar:
"This tree includes the following changes:
- Introduce DISABLED_MASK to list disabled CPU features, to simplify
CPU feature handling and avoid excessive #ifdefs
- Remove the lightly used cpu_has_pae() primitive"
* 'x86-cpufeature-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86: Add more disabled features
x86: Introduce disabled-features
x86: Axe the lightly-used cpu_has_pae
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cleanups from Ingo Molnar:
"Three small cleanups"
* 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/tty/serial/8250: Clean up the asm/serial.h include file a bit
x86/tty/serial/8250: Resolve missing-field-initializers warnings
x86: Remove obsolete comment in uapi/e820.h
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar:
"Kernel side updates:
- Fix and enhance poll support (Jiri Olsa)
- Re-enable inheritance optimization (Jiri Olsa)
- Enhance Intel memory events support (Stephane Eranian)
- Refactor the Intel uncore driver to be more maintainable (Zheng
Yan)
- Enhance and fix Intel CPU and uncore PMU drivers (Peter Zijlstra,
Andi Kleen)
- [ plus various smaller fixes/cleanups ]
User visible tooling updates:
- Add +field argument support for --field option, so that one can add
fields to the default list of fields to show, ie now one can just
do:
perf report --fields +pid
And the pid will appear in addition to the default fields (Jiri
Olsa)
- Add +field argument support for --sort option (Jiri Olsa)
- Honour -w in the report tools (report, top), allowing to specify
the widths for the histogram entries columns (Namhyung Kim)
- Properly show submicrosecond times in 'perf kvm stat' (Christian
Borntraeger)
- Add beautifier for mremap flags param in 'trace' (Alex Snast)
- perf script: Allow callchains if any event samples them
- Don't truncate Intel style addresses in 'annotate' (Alex Converse)
- Allow profiling when kptr_restrict == 1 for non root users, kernel
samples will just remain unresolved (Andi Kleen)
- Allow configuring default options for callchains in config file
(Namhyung Kim)
- Support operations for shared futexes. (Davidlohr Bueso)
- "perf kvm stat report" improvements by Alexander Yarygin:
- Save pid string in opts.target.pid
- Enable the target.system_wide flag
- Unify the title bar output
- [ plus lots of other fixes and small improvements. ]
Tooling infrastructure changes:
- Refactor unit and scale function parameters for PMU parsing
routines (Matt Fleming)
- Improve DSO long names lookup with rbtree, resulting in great
speedup for workloads with lots of DSOs (Waiman Long)
- We were not handling POLLHUP notifications for event file
descriptors
Fix it by filtering entries in the events file descriptor array
after poll() returns, refcounting mmaps so that when the last fd
pointing to a perf mmap goes away we do the unmap (Arnaldo Carvalho
de Melo)
- Intel PT prep work, from Adrian Hunter, including:
- Let a user specify a PMU event without any config terms
- Add perf-with-kcore script
- Let default config be defined for a PMU
- Add perf_pmu__scan_file()
- Add a 'perf test' for tracking with sched_switch
- Add 'flush' callback to scripting API
- Use ring buffer consume method to look like other tools (Arnaldo
Carvalho de Melo)
- hists browser (used in top and report) refactorings, getting rid of
unused variables and reducing source code size by handling similar
cases in a fewer functions (Namhyung Kim).
- Replace thread unsafe strerror() with strerror_r() accross the
whole tools/perf/ tree (Masami Hiramatsu)
- Rename ordered_samples to ordered_events and allow setting a queue
size for ordering events (Jiri Olsa)
- [ plus lots of fixes, cleanups and other improvements ]"
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (198 commits)
perf/x86: Tone down kernel messages when the PMU check fails in a virtual environment
perf/x86/intel/uncore: Fix minor race in box set up
perf record: Fix error message for --filter option not coming after tracepoint
perf tools: Fix build breakage on arm64 targets
perf symbols: Improve DSO long names lookup speed with rbtree
perf symbols: Encapsulate dsos list head into struct dsos
perf bench futex: Sanitize -q option in requeue
perf bench futex: Support operations for shared futexes
perf trace: Fix mmap return address truncation to 32-bit
perf tools: Refactor unit and scale function parameters
perf tools: Fix line number in the config file error message
perf tools: Convert {record,top}.call-graph option to call-graph.record-mode
perf tools: Introduce perf_callchain_config()
perf callchain: Move some parser functions to callchain.c
perf tools: Move callchain config from record_opts to callchain_param
perf hists browser: Fix callchain print bug on TUI
perf tools: Use ACCESS_ONCE() instead of volatile cast
perf tools: Modify error code for when perf_session__new() fails
perf tools: Fix perf record as non root with kptr_restrict == 1
perf stat: Fix --per-core on multi socket systems
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull core locking updates from Ingo Molnar:
"The main updates in this cycle were:
- mutex MCS refactoring finishing touches: improve comments, refactor
and clean up code, reduce debug data structure footprint, etc.
- qrwlock finishing touches: remove old code, self-test updates.
- small rwsem optimization
- various smaller fixes/cleanups"
* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/lockdep: Revert qrwlock recusive stuff
locking/rwsem: Avoid double checking before try acquiring write lock
locking/rwsem: Move EXPORT_SYMBOL() lines to follow function definition
locking/rwlock, x86: Delete unused asm/rwlock.h and rwlock.S
locking/rwlock, x86: Clean up asm/spinlock*.h to remove old rwlock code
locking/semaphore: Resolve some shadow warnings
locking/selftest: Support queued rwlock
locking/lockdep: Restrict the use of recursive read_lock() with qrwlock
locking/spinlocks: Always evaluate the second argument of spin_lock_nested()
locking/Documentation: Update locking/mutex-design.txt disadvantages
locking/Documentation: Move locking related docs into Documentation/locking/
locking/mutexes: Use MUTEX_SPIN_ON_OWNER when appropriate
locking/mutexes: Refactor optimistic spinning code
locking/mcs: Remove obsolete comment
locking/mutexes: Document quick lock release when unlocking
locking/mutexes: Standardize arguments in lock/unlock slowpaths
locking: Remove deprecated smp_mb__() barriers
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull arch atomic cleanups from Ingo Molnar:
"This is a series kept separate from the main locking tree, which
cleans up and improves various details in the atomics type handling:
- Remove the unused atomic_or_long() method
- Consolidate and compress atomic ops implementations between
architectures, to reduce linecount and to make it easier to add new
ops.
- Rewrite generic atomic support to only require cmpxchg() from an
architecture - generate all other methods from that"
* 'locking-arch-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
locking,arch: Use ACCESS_ONCE() instead of cast to volatile in atomic_read()
locking, mips: Fix atomics
locking, sparc64: Fix atomics
locking,arch: Rewrite generic atomic support
locking,arch,xtensa: Fold atomic_ops
locking,arch,sparc: Fold atomic_ops
locking,arch,sh: Fold atomic_ops
locking,arch,powerpc: Fold atomic_ops
locking,arch,parisc: Fold atomic_ops
locking,arch,mn10300: Fold atomic_ops
locking,arch,mips: Fold atomic_ops
locking,arch,metag: Fold atomic_ops
locking,arch,m68k: Fold atomic_ops
locking,arch,m32r: Fold atomic_ops
locking,arch,ia64: Fold atomic_ops
locking,arch,hexagon: Fold atomic_ops
locking,arch,cris: Fold atomic_ops
locking,arch,avr32: Fold atomic_ops
locking,arch,arm64: Fold atomic_ops
locking,arch,arm: Fold atomic_ops
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull Xen updates from David Vrabel:
"Features and fixes:
- Add pvscsi frontend and backend drivers.
- Remove _PAGE_IOMAP PTE flag, freeing it for alternate uses.
- Try and keep memory contiguous during PV memory setup (reduces
SWIOTLB usage).
- Allow front/back drivers to use threaded irqs.
- Support large initrds in PV guests.
- Fix PVH guests in preparation for Xen 4.5"
* tag 'stable/for-linus-3.18-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: (22 commits)
xen: remove DEFINE_XENBUS_DRIVER() macro
xen/xenbus: Remove BUG_ON() when error string trucated
xen/xenbus: Correct the comments for xenbus_grant_ring()
x86/xen: Set EFER.NX and EFER.SCE in PVH guests
xen: eliminate scalability issues from initrd handling
xen: sync some headers with xen tree
xen: make pvscsi frontend dependant on xenbus frontend
arm{,64}/xen: Remove "EXPERIMENTAL" in the description of the Xen options
xen-scsifront: don't deadlock if the ring becomes full
x86: remove the Xen-specific _PAGE_IOMAP PTE flag
x86/xen: do not use _PAGE_IOMAP PTE flag for I/O mappings
x86: skip check for spurious faults for non-present faults
xen/efi: Directly include needed headers
xen-scsiback: clean up a type issue in scsiback_make_tpg()
xen-scsifront: use GFP_ATOMIC under spin_lock
MAINTAINERS: Add xen pvscsi maintainer
xen-scsiback: Add Xen PV SCSI backend driver
xen-scsifront: Add Xen PV SCSI frontend driver
xen: Add Xen pvSCSI protocol description
xen/events: support threaded irqs for interdomain event channels
...
|
|
ARCH_USES_NUMA_PROT_NONE was defined for architectures that implemented
_PAGE_NUMA using _PROT_NONE. This saved using an additional PTE bit and
relied on the fact that PROT_NONE vmas were skipped by the NUMA hinting
fault scanner. This was found to be conceptually confusing with a lot of
implicit assumptions and it was asked that an alternative be found.
Commit c46a7c81 "x86: define _PAGE_NUMA by reusing software bits on the
PMD and PTE levels" redefined _PAGE_NUMA on x86 to be one of the swap PTE
bits and shrunk the maximum possible swap size but it did not go far
enough. There are no architectures that reuse _PROT_NONE as _PROT_NUMA
but the relics still exist.
This patch removes ARCH_USES_NUMA_PROT_NONE and removes some unnecessary
duplication in powerpc vs the generic implementation by defining the types
the core NUMA helpers expected to exist from x86 with their ppc64
equivalent. This necessitated that a PTE bit mask be created that
identified the bits that distinguish present from NUMA pte entries but it
is expected this will only differ between arches based on _PAGE_PROTNONE.
The naming for the generic helpers was taken from x86 originally but ppc64
has types that are equivalent for the purposes of the helper so they are
mapped instead of duplicating code.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fixes from Ingo Molnar:
"Main changes:
- Fix the deadlock reported by Dave Jones et al
- Clean up and fix nohz_full interaction with arch abilities
- nohz init code consolidation/cleanup"
* 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
nohz: nohz full depends on irq work self IPI support
nohz: Consolidate nohz full init code
arm64: Tell irq work about self IPI support
arm: Tell irq work about self IPI support
x86: Tell irq work about self IPI support
irq_work: Force raised irq work to run on irq work interrupt
irq_work: Introduce arch_irq_work_has_interrupt()
nohz: Move nohz full init call to tick init
|
|
Pull KVM updates from Paolo Bonzini:
"Fixes and features for 3.18.
Apart from the usual cleanups, here is the summary of new features:
- s390 moves closer towards host large page support
- PowerPC has improved support for debugging (both inside the guest
and via gdbstub) and support for e6500 processors
- ARM/ARM64 support read-only memory (which is necessary to put
firmware in emulated NOR flash)
- x86 has the usual emulator fixes and nested virtualization
improvements (including improved Windows support on Intel and
Jailhouse hypervisor support on AMD), adaptive PLE which helps
overcommitting of huge guests. Also included are some patches that
make KVM more friendly to memory hot-unplug, and fixes for rare
caching bugs.
Two patches have trivial mm/ parts that were acked by Rik and Andrew.
Note: I will soon switch to a subkey for signing purposes"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (157 commits)
kvm: do not handle APIC access page if in-kernel irqchip is not in use
KVM: s390: count vcpu wakeups in stat.halt_wakeup
KVM: s390/facilities: allow TOD-CLOCK steering facility bit
KVM: PPC: BOOK3S: HV: CMA: Reserve cma region only in hypervisor mode
arm/arm64: KVM: Report correct FSC for unsupported fault types
arm/arm64: KVM: Fix VTTBR_BADDR_MASK and pgd alloc
kvm: Fix kvm_get_page_retry_io __gup retval check
arm/arm64: KVM: Fix set_clear_sgi_pend_reg offset
kvm: x86: Unpin and remove kvm_arch->apic_access_page
kvm: vmx: Implement set_apic_access_page_addr
kvm: x86: Add request bit to reload APIC access page address
kvm: Add arch specific mmu notifier for page invalidation
kvm: Rename make_all_cpus_request() to kvm_make_all_cpus_request() and make it non-static
kvm: Fix page ageing bugs
kvm/x86/mmu: Pass gfn and level to rmapp callback.
x86: kvm: use alternatives for VMCALL vs. VMMCALL if kernel text is read-only
kvm: x86: use macros to compute bank MSRs
KVM: x86: Remove debug assertion of non-PAE reserved bits
kvm: don't take vcpu mutex for obviously invalid vcpu ioctls
kvm: Faults which trigger IO release the mmap_sem
...
|
|
It is currently possible to execve() an x32 executable on an x86_64
kernel that has only ia32 compat enabled. However all its syscalls
will fail, even _exit(). This usually causes it to segfault.
Change the ELF compat architecture check so that x32 executables are
rejected if we don't support the x32 ABI.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Link: http://lkml.kernel.org/r/1410120305.6822.9.camel@decadent.org.uk
Cc: stable@vger.kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/josh/linux
Pull "tinification" patches from Josh Triplett.
Work on making smaller kernels.
* tag 'tiny/for-3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/josh/linux:
bloat-o-meter: Ignore syscall aliases SyS_ and compat_SyS_
mm: Support compiling out madvise and fadvise
x86: Support compiling out human-friendly processor feature names
x86: Drop support for /proc files when !CONFIG_PROC_FS
x86, boot: Don't compile early_serial_console.c when !CONFIG_EARLY_PRINTK
x86, boot: Don't compile aslr.c when !CONFIG_RANDOMIZE_BASE
x86, boot: Use the usual -y -n mechanism for objects in vmlinux
x86: Add "make tinyconfig" to configure the tiniest possible kernel
x86, platform, kconfig: move kvmconfig functionality to a helper
|
|
Use the much more reader friendly ACCESS_ONCE() instead of the cast to volatile.
This is purely a stylistic change.
Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
Acked-by: Jesper Nilsson <jesper.nilsson@axis.com>
Acked-by: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Acked-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-arch@vger.kernel.org
Link: http://lkml.kernel.org/r/1411482607-20948-1-git-send-email-bobby.prani@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
"This has:
- EFI revert to fix a boot regression
- early_ioremap() fix for boot failure
- KASLR fix for possible boot failures
- EFI fix for corrupted string printing
- remove a misleading EFI bootup 'failed!' error message
Unfortunately it's all rather close to the merge window"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/efi: Truncate 64-bit values when calling 32-bit OutputString()
x86/efi: Delete misleading efi_printk() error message
Revert "efi/x86: efistub: Move shared dependencies to <asm/efi.h>"
x86/kaslr: Avoid the setup_data area when picking location
x86 early_ioremap: Increase FIX_BTMAPS_SLOTS to 8
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi into x86/urgent
Pull EFI fixes from Matt Fleming:
* Revert the static library changes from the merge window since they're
causing issues for Macbooks and Fedora + Grub2 (Matt Fleming)
* Delete the misleading "setup_efi_pci() failed!" message which some
people are seeing when booting EFI (Matt Fleming)
* Fix printing strings from the 32-bit EFI boot stub by only passing
32-bit addresses to the firmware (Matt Fleming)
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
In order to make the APIC access page migratable, stop pinning it in
memory.
And because the APIC access page is not pinned in memory, we can
remove kvm_arch->apic_access_page. When we need to write its
physical address into vmcs, we use gfn_to_page() to get its page
struct, which is needed to call page_to_phys(); the page is then
immediately unpinned.
Suggested-by: Gleb Natapov <gleb@kernel.org>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Currently, the APIC access page is pinned by KVM for the entire life
of the guest. We want to make it migratable in order to make memory
hot-unplug available for machines that run KVM.
This patch prepares to handle this in generic code, through a new
request bit (that will be set by the MMU notifier) and a new hook
that is called whenever the request bit is processed.
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
This will be used to let the guest run while the APIC access page is
not pinned. Because subsequent patches will fill in the function
for x86, place the (still empty) x86 implementation in the x86.c file
instead of adding an inline function in kvm_host.h.
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
1. We were calling clear_flush_young_notify in unmap_one, but we are
within an mmu notifier invalidate range scope. The spte exists no more
(due to range_start) and the accessed bit info has already been
propagated (due to kvm_pfn_set_accessed). Simply call
clear_flush_young.
2. We clear_flush_young on a primary MMU PMD, but this may be mapped
as a collection of PTEs by the secondary MMU (e.g. during log-dirty).
This required expanding the interface of the clear_flush_young mmu
notifier, so a lot of code has been trivially touched.
3. In the absence of shadow_accessed_mask (e.g. EPT A bit), we emulate
the access bit by blowing the spte. This requires proper synchronizing
with MMU notifier consumers, like every other removal of spte's does.
Signed-off-by: Andres Lagar-Cavilla <andreslc@google.com>
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
On x86_64, kernel text mappings are mapped read-only with CONFIG_DEBUG_RODATA.
In that case, KVM will fail to patch VMCALL instructions to VMMCALL
as required on AMD processors.
The failure mode is currently a divide-by-zero exception, which obviously
is a KVM bug that has to be fixed. However, picking the right instruction
between VMCALL and VMMCALL will be faster and will help if you cannot upgrade
the hypervisor.
Reported-by: Chris Webb <chris@arachsys.com>
Tested-by: Chris Webb <chris@arachsys.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
A one-line wrapper around kvm_make_request is not particularly
useful. Replace kvm_mmu_flush_tlb() with kvm_make_request().
Signed-off-by: Liang Chen <liangchen.linux@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
This reverts commit f23cf8bd5c1f ("efi/x86: efistub: Move shared
dependencies to <asm/efi.h>") as well as the x86 parts of commit
f4f75ad5741f ("efi: efistub: Convert into static library").
The road leading to these two reverts is long and winding.
The above two commits were merged during the v3.17 merge window and
turned the common EFI boot stub code into a static library. This
necessitated making some symbols global in the x86 boot stub which
introduced new entries into the early boot GOT.
The problem was that we weren't fixing up the newly created GOT entries
before invoking the EFI boot stub, which sometimes resulted in hangs or
resets. This failure was reported by Maarten on his Macbook pro.
The proposed fix was commit 9cb0e394234d ("x86/efi: Fixup GOT in all
boot code paths"). However, that caused issues for Linus when booting
his Sony Vaio Pro 11. It was subsequently reverted in commit
f3670394c29f.
So that leaves us back with Maarten's Macbook pro not booting.
At this stage in the release cycle the least risky option is to revert
the x86 EFI boot stub to the pre-merge window code structure where we
explicitly #include efi-stub-helper.c instead of linking with the static
library. The arm64 code remains unaffected.
We can take another swing at the x86 parts for v3.18.
Conflicts:
arch/x86/include/asm/efi.h
Tested-by: Josh Boyer <jwboyer@fedoraproject.org>
Tested-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Tested-by: Leif Lindholm <leif.lindholm@linaro.org> [arm64]
Tested-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>,
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
|
|
The _PAGE_IO_MAP PTE flag was only used by Xen PV guests to mark PTEs
that were used to map I/O regions that are 1:1 in the p2m. This
allowed Xen to obtain the correct PFN when converting the MFNs read
from a PTE back to their PFN.
Xen guests no longer use _PAGE_IOMAP for this. Instead mfn_to_pfn()
returns the correct PFN by using a combination of the m2p and p2m to
determine if an MFN corresponds to a 1:1 mapping in the the p2m.
Remove _PAGE_IOMAP, replacing it with _PAGE_UNUSED2 to allow for
future uses of the PTE flag.
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: "H. Peter Anvin" <hpa@zytor.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
"Misc fixes:
EFI fixes, a build fix, a page table dumping (debug) fix and a clang
build fix"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
efi/arm64: Fix fdt-related memory reservation
x86/mm: Apply the section attribute to the variable, not its type
x86/efi: Fixup GOT in all boot code paths
x86/efi: Only load initrd above 4g on second try
x86-64, ptdump: Mark espfix area only if existent
x86, irq: Fix build error caused by 9eabc99a635a77cbf09
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi into x86/urgent
Pull EFI fix from Matt Fleming:
* Increase the number of early_ioremap() slots to fix a regression with
earlyprintk=efi after recent changes to the ACPI code (Dave Young)
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
kvm_arch->ept_identity_pagetable holds the ept identity pagetable page. But
it is never used to refer to the page at all.
In vcpu initialization, it indicates two things:
1. indicates if ept page is allocated
2. indicates if a memory slot for identity page is initialized
Actually, kvm_arch->ept_identity_pagetable_done is enough to tell if the ept
identity pagetable is initialized. So we can remove ept_identity_pagetable.
NOTE: In the original code, ept identity pagetable page is pinned in memroy.
As a result, it cannot be migrated/hot-removed. After this patch, since
kvm_arch->ept_identity_pagetable is removed, ept identity pagetable page
is no longer pinned in memory. And it can be migrated/hot-removed.
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Gleb Natapov <gleb@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
The setup_node_data() function allocates a pg_data_t object,
inserts it into the node_data[] array and initializes the
following fields: node_id, node_start_pfn and
node_spanned_pages.
However, a few function calls later during the kernel boot,
free_area_init_node() re-initializes those fields, possibly with
setup_node_data() is not used.
This causes a small glitch when running Linux as a hyperv numa
guest:
SRAT: PXM 0 -> APIC 0x00 -> Node 0
SRAT: PXM 0 -> APIC 0x01 -> Node 0
SRAT: PXM 1 -> APIC 0x02 -> Node 1
SRAT: PXM 1 -> APIC 0x03 -> Node 1
SRAT: Node 0 PXM 0 [mem 0x00000000-0x7fffffff]
SRAT: Node 1 PXM 1 [mem 0x80200000-0xf7ffffff]
SRAT: Node 1 PXM 1 [mem 0x100000000-0x1081fffff]
NUMA: Node 1 [mem 0x80200000-0xf7ffffff] + [mem 0x100000000-0x1081fffff] -> [mem 0x80200000-0x1081fffff]
Initmem setup node 0 [mem 0x00000000-0x7fffffff]
NODE_DATA [mem 0x7ffdc000-0x7ffeffff]
Initmem setup node 1 [mem 0x80800000-0x1081fffff]
NODE_DATA [mem 0x1081ea000-0x1081fdfff]
crashkernel: memory value expected
[ffffea0000000000-ffffea0001ffffff] PMD -> [ffff88007de00000-ffff88007fdfffff] on node 0
[ffffea0002000000-ffffea00043fffff] PMD -> [ffff880105600000-ffff8801077fffff] on node 1
Zone ranges:
DMA [mem 0x00001000-0x00ffffff]
DMA32 [mem 0x01000000-0xffffffff]
Normal [mem 0x100000000-0x1081fffff]
Movable zone start for each node
Early memory node ranges
node 0: [mem 0x00001000-0x0009efff]
node 0: [mem 0x00100000-0x7ffeffff]
node 1: [mem 0x80200000-0xf7ffffff]
node 1: [mem 0x100000000-0x1081fffff]
On node 0 totalpages: 524174
DMA zone: 64 pages used for memmap
DMA zone: 21 pages reserved
DMA zone: 3998 pages, LIFO batch:0
DMA32 zone: 8128 pages used for memmap
DMA32 zone: 520176 pages, LIFO batch:31
On node 1 totalpages: 524288
DMA32 zone: 7672 pages used for memmap
DMA32 zone: 491008 pages, LIFO batch:31
Normal zone: 520 pages used for memmap
Normal zone: 33280 pages, LIFO batch:7
In this dmesg, the SRAT table reports that the memory range for
node 1 starts at 0x80200000. However, the line starting with
"Initmem" reports that node 1 memory range starts at 0x80800000.
The "Initmem" line is reported by setup_node_data() and is
wrong, because the kernel ends up using the range as reported in
the SRAT table.
This commit drops all that dead code from setup_node_data(),
renames it to alloc_node_data() and adds a printk() to
free_area_init_node() so that we report a node's memory range
accurately.
Here's the same dmesg section with this patch applied:
SRAT: PXM 0 -> APIC 0x00 -> Node 0
SRAT: PXM 0 -> APIC 0x01 -> Node 0
SRAT: PXM 1 -> APIC 0x02 -> Node 1
SRAT: PXM 1 -> APIC 0x03 -> Node 1
SRAT: Node 0 PXM 0 [mem 0x00000000-0x7fffffff]
SRAT: Node 1 PXM 1 [mem 0x80200000-0xf7ffffff]
SRAT: Node 1 PXM 1 [mem 0x100000000-0x1081fffff]
NUMA: Node 1 [mem 0x80200000-0xf7ffffff] + [mem 0x100000000-0x1081fffff] -> [mem 0x80200000-0x1081fffff]
NODE_DATA(0) allocated [mem 0x7ffdc000-0x7ffeffff]
NODE_DATA(1) allocated [mem 0x1081ea000-0x1081fdfff]
crashkernel: memory value expected
[ffffea0000000000-ffffea0001ffffff] PMD -> [ffff88007de00000-ffff88007fdfffff] on node 0
[ffffea0002000000-ffffea00043fffff] PMD -> [ffff880105600000-ffff8801077fffff] on node 1
Zone ranges:
DMA [mem 0x00001000-0x00ffffff]
DMA32 [mem 0x01000000-0xffffffff]
Normal [mem 0x100000000-0x1081fffff]
Movable zone start for each node
Early memory node ranges
node 0: [mem 0x00001000-0x0009efff]
node 0: [mem 0x00100000-0x7ffeffff]
node 1: [mem 0x80200000-0xf7ffffff]
node 1: [mem 0x100000000-0x1081fffff]
Initmem setup node 0 [mem 0x00001000-0x7ffeffff]
On node 0 totalpages: 524174
DMA zone: 64 pages used for memmap
DMA zone: 21 pages reserved
DMA zone: 3998 pages, LIFO batch:0
DMA32 zone: 8128 pages used for memmap
DMA32 zone: 520176 pages, LIFO batch:31
Initmem setup node 1 [mem 0x80200000-0x1081fffff]
On node 1 totalpages: 524288
DMA32 zone: 7672 pages used for memmap
DMA32 zone: 491008 pages, LIFO batch:31
Normal zone: 520 pages used for memmap
Normal zone: 33280 pages, LIFO batch:7
This commit was tested on a two node bare-metal NUMA machine and
Linux as a numa guest on hyperv and qemu/kvm.
PS: The wrong memory range reported by setup_node_data() seems to be
harmless in the current kernel because it's just not used. However,
that bad range is used in kernel 2.6.32 to initialize the old boot
memory allocator, which causes a crash during boot.
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
When hot-adding/removing memory, sync_global_pgds() is called
for synchronizing PGD to PGD entries of all processes MM. But
when hot-removing memory, sync_global_pgds() does not work
correctly.
At first, sync_global_pgds() checks whether target PGD is none
or not. And if PGD is none, the PGD is skipped. But when
hot-removing memory, PGD may be none since PGD may be cleared by
free_pud_table(). So when sync_global_pgds() is called after
hot-removing memory, sync_global_pgds() should not skip PGD even
if the PGD is none. And sync_global_pgds() must clear PGD
entries of all processes MM.
Currently sync_global_pgds() does not clear PGD entries of all
processes MM when hot-removing memory. So when hot adding
memory which is same memory range as removed memory after
hot-removing memory, following call traces are shown:
kernel BUG at arch/x86/mm/init_64.c:206!
...
[<ffffffff815e0c80>] kernel_physical_mapping_init+0x1b2/0x1d2
[<ffffffff815ced94>] init_memory_mapping+0x1d4/0x380
[<ffffffff8104aebd>] arch_add_memory+0x3d/0xd0
[<ffffffff815d03d9>] add_memory+0xb9/0x1b0
[<ffffffff81352415>] acpi_memory_device_add+0x1af/0x28e
[<ffffffff81325dc4>] acpi_bus_device_attach+0x8c/0xf0
[<ffffffff813413b9>] acpi_ns_walk_namespace+0xc8/0x17f
[<ffffffff81325d38>] ? acpi_bus_type_and_status+0xb7/0xb7
[<ffffffff81325d38>] ? acpi_bus_type_and_status+0xb7/0xb7
[<ffffffff813418ed>] acpi_walk_namespace+0x95/0xc5
[<ffffffff81326b4c>] acpi_bus_scan+0x9a/0xc2
[<ffffffff81326bff>] acpi_scan_bus_device_check+0x8b/0x12e
[<ffffffff81326cb5>] acpi_scan_device_check+0x13/0x15
[<ffffffff81320122>] acpi_os_execute_deferred+0x25/0x32
[<ffffffff8107e02b>] process_one_work+0x17b/0x460
[<ffffffff8107edfb>] worker_thread+0x11b/0x400
[<ffffffff8107ece0>] ? rescuer_thread+0x400/0x400
[<ffffffff81085aef>] kthread+0xcf/0xe0
[<ffffffff81085a20>] ? kthread_create_on_node+0x140/0x140
[<ffffffff815fc76c>] ret_from_fork+0x7c/0xb0
[<ffffffff81085a20>] ? kthread_create_on_node+0x140/0x140
This patch clears PGD entries of all processes MM when
sync_global_pgds() is called after hot-removing memory
Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Acked-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Tang Chen <tangchen@cn.fujitsu.com>
Cc: Gu Zheng <guz.fnst@cn.fujitsu.com>
Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
3.16 kernel boot fail with earlyprintk=efi, it keeps scrolling at the
bottom line of screen.
Bisected, the first bad commit is below:
commit 86dfc6f339886559d80ee0d4bd20fe5ee90450f0
Author: Lv Zheng <lv.zheng@intel.com>
Date: Fri Apr 4 12:38:57 2014 +0800
ACPICA: Tables: Fix table checksums verification before installation.
I did some debugging by enabling both serial and efi earlyprintk, below is
some debug dmesg, seems early_ioremap fails in scroll up function due to
no free slot, see below dmesg output:
WARNING: CPU: 0 PID: 0 at mm/early_ioremap.c:116 __early_ioremap+0x90/0x1c4()
__early_ioremap(ed00c800, 00000c80) not found slot
Modules linked in:
CPU: 0 PID: 0 Comm: swapper Not tainted 3.17.0-rc1+ #204
Hardware name: Hewlett-Packard HP Z420 Workstation/1589, BIOS J61 v03.15 05/09/2013
Call Trace:
dump_stack+0x4e/0x7a
warn_slowpath_common+0x75/0x8e
? __early_ioremap+0x90/0x1c4
warn_slowpath_fmt+0x47/0x49
__early_ioremap+0x90/0x1c4
? sprintf+0x46/0x48
early_ioremap+0x13/0x15
early_efi_map+0x24/0x26
early_efi_scroll_up+0x6d/0xc0
early_efi_write+0x1b0/0x214
call_console_drivers.constprop.21+0x73/0x7e
console_unlock+0x151/0x3b2
? vprintk_emit+0x49f/0x532
vprintk_emit+0x521/0x532
? console_unlock+0x383/0x3b2
printk+0x4f/0x51
acpi_os_vprintf+0x2b/0x2d
acpi_os_printf+0x43/0x45
acpi_info+0x5c/0x63
? __acpi_map_table+0x13/0x18
? acpi_os_map_iomem+0x21/0x147
acpi_tb_print_table_header+0x177/0x186
acpi_tb_install_table_with_override+0x4b/0x62
acpi_tb_install_standard_table+0xd9/0x215
? early_ioremap+0x13/0x15
? __acpi_map_table+0x13/0x18
acpi_tb_parse_root_table+0x16e/0x1b4
acpi_initialize_tables+0x57/0x59
acpi_table_init+0x50/0xce
acpi_boot_table_init+0x1e/0x85
setup_arch+0x9b7/0xcc4
start_kernel+0x94/0x42d
? early_idt_handlers+0x120/0x120
x86_64_start_reservations+0x2a/0x2c
x86_64_start_kernel+0xf3/0x100
Quote reply from Lv.zheng about the early ioremap slot usage in this case:
"""
In early_efi_scroll_up(), 2 mapping entries will be used for the src/dst screen buffer.
In drivers/acpi/acpica/tbutils.c, we've improved the early table loading code in acpi_tb_parse_root_table().
We now need 2 mapping entries:
1. One mapping entry is used for RSDT table mapping. Each RSDT entry contains an address for another ACPI table.
2. For each entry in RSDP, we need another mapping entry to map the table to perform necessary check/override before installing it.
When acpi_tb_parse_root_table() prints something through EFI earlyprintk console, we'll have 4 mapping entries used.
The current 4 slots setting of early_ioremap() seems to be too small for such a use case.
"""
Thus increase the slot to 8 in this patch to fix this issue.
boot-time mappings become 512 page with this patch.
Signed-off-by: Dave Young <dyoung@redhat.com>
Cc: <stable@vger.kernel.org> # v3.16
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
|
|
It used to be an ad-hoc hack defined by the x86 version of
<asm/bitops.h> that enabled a couple of library routines to know whether
an integer multiply is faster than repeated shifts and additions.
This just makes it use the real Kconfig system instead, and makes x86
(which was the only architecture that did this) select the option.
NOTE! Even for x86, this really is kind of wrong. If we cared, we would
probably not enable this for builds optimized for netburst (P4), where
shifts-and-adds are generally faster than multiplies. This patch does
*not* change that kind of logic, though, it is purely a syntactic change
with no code changes.
This was triggered by the fact that we have other places that really
want to know "do I want to expand multiples by constants by hand or
not", particularly the hash generation code.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
x86 supports irq work self-IPIs when local apic is available. This is
partly known on runtime so lets implement arch_irq_work_has_interrupt()
accordingly.
This should be safely called after setup_arch().
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
|
|
The nohz full code needs irq work to trigger its own interrupt so that
the subsystem can work even when the tick is stopped.
Lets introduce arch_irq_work_has_interrupt() that archs can override to
tell about their support for this ability.
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull Xen bug fixes from David Vrabel:
- fix for PVHVM suspend/resume and migration
- don't pointlessly retry certain ballooning ops
- fix gntalloc when grefs have run out.
- fix PV boot if KSALR is enable or very large modules are used.
* tag 'stable/for-linus-3.17-b-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
x86/xen: don't copy bogus duplicate entries into kernel page tables
xen/gntalloc: safely delete grefs in add_grefs() undo path
xen/gntalloc: fix oops after runnning out of grant refs
xen/balloon: cancel ballooning if adding new memory failed
xen/manage: Always freeze/thaw processes when suspend/resuming
|
|
The original motivation for these patches was for an Intel CPU
feature called MPX. The patch to add a disabled feature for it
will go in with the other parts of the support.
But, in the meantime, there are a few other features than MPX
that we can make assumptions about at compile-time based on
compile options. Add them to disabled-features.h and check them
with cpu_feature_enabled().
Note that this gets rid of the last things that needed an #ifdef
CONFIG_X86_64 in cpufeature.h. Yay!
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: http://lkml.kernel.org/r/20140911211524.C0EC332A@viggo.jf.intel.com
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
|
|
I believe the REQUIRED_MASK aproach was taken so that it was
easier to consult in assembly (arch/x86/kernel/verify_cpu.S).
DISABLED_MASK does not have the same restriction, but I
implemented it the same way for consistency.
We have a REQUIRED_MASK... which does two things:
1. Keeps a list of cpuid bits to check in very early boot and
refuse to boot if those are not present.
2. Consulted during cpu_has() checks, which allows us to
optimize out things at compile-time. In other words, if we
*KNOW* we will not boot with the feature off, then we can
safely assume that it will be present forever.
But, we don't have a similar mechanism for CPU features which
may be present but that we know we will not use. We simply
use our existing mechanisms to repeatedly check the status of
the bit at runtime (well, the alternatives patching helps here
but it does not provide compile-time optimization).
Adding a feature to disabled-features.h allows the bit to be
checked via a new macro: cpu_feature_enabled(). Note that
for features in DISABLED_MASK, checks with this macro have
all of the benefits of an #ifdef. Before, we would have done
this in a header:
#ifdef CONFIG_X86_INTEL_MPX
#define cpu_has_mpx cpu_has(X86_FEATURE_MPX)
#else
#define cpu_has_mpx 0
#endif
and this in the code:
if (cpu_has_mpx)
do_some_mpx_thing();
Now, just add your feature to DISABLED_MASK and you can do this
everywhere, and get the same benefits you would have from
#ifdefs:
if (cpu_feature_enabled(X86_FEATURE_MPX))
do_some_mpx_thing();
We need a new function and *not* a modification to cpu_has()
because there are cases where we actually need to check the CPU
itself, despite what features the kernel supports. The best
example of this is a hypervisor which has no control over what
features its guests are using and where the guest does not depend
on the host for support.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: http://lkml.kernel.org/r/20140911211513.9E35E931@viggo.jf.intel.com
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
|
|
cpu_has_pae is only referenced in one place: the X86_32 kexec
code (in a file not even built on 64-bit). It hardly warrants
its own macro, or the trouble we go to ensuring that it can't
be called in X86_64 code.
Axe the macro and replace it with a direct cpu feature check.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: http://lkml.kernel.org/r/20140911211511.AD76E774@viggo.jf.intel.com
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
|
|
When RANDOMIZE_BASE (KASLR) is enabled; or the sum of all loaded
modules exceeds 512 MiB, then loading modules fails with a warning
(and hence a vmalloc allocation failure) because the PTEs for the
newly-allocated vmalloc address space are not zero.
WARNING: CPU: 0 PID: 494 at linux/mm/vmalloc.c:128
vmap_page_range_noflush+0x2a1/0x360()
This is caused by xen_setup_kernel_pagetables() copying
level2_kernel_pgt into level2_fixmap_pgt, overwriting many non-present
entries.
Without KASLR, the normal kernel image size only covers the first half
of level2_kernel_pgt and module space starts after that.
L4[511]->level3_kernel_pgt[510]->level2_kernel_pgt[ 0..255]->kernel
[256..511]->module
[511]->level2_fixmap_pgt[ 0..505]->module
This allows 512 MiB of of module vmalloc space to be used before
having to use the corrupted level2_fixmap_pgt entries.
With KASLR enabled, the kernel image uses the full PUD range of 1G and
module space starts in the level2_fixmap_pgt. So basically:
L4[511]->level3_kernel_pgt[510]->level2_kernel_pgt[0..511]->kernel
[511]->level2_fixmap_pgt[0..505]->module
And now no module vmalloc space can be used without using the corrupt
level2_fixmap_pgt entries.
Fix this by properly converting the level2_fixmap_pgt entries to MFNs,
and setting level1_fixmap_pgt as read-only.
A number of comments were also using the the wrong L3 offset for
level2_kernel_pgt. These have been corrected.
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: stable@vger.kernel.org
|
|
This patch removes the unused asm/rwlock.h and rwlock.S files.
Signed-off-by: Waiman Long <Waiman.Long@hp.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1408037251-45918-3-git-send-email-Waiman.Long@hp.com
Cc: Scott J Norton <scott.norton@hp.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Daniel Borkmann <dborkman@redhat.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Francesco Fusco <ffusco@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Graf <tgraf@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
As the x86 architecture now uses qrwlock for its read/write lock
implementation, it is no longer necessary to keep the old rwlock code
around. This patch removes the old rwlock code in the asm/spinlock.h
and asm/spinlock_types.h files. Now the ARCH_USE_QUEUE_RWLOCK
config parameter cannot be removed from x86/Kconfig or there will be
a compilation error.
Signed-off-by: Waiman Long <Waiman.Long@hp.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Scott J Norton <scott.norton@hp.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Waiman Long <Waiman.Long@hp.com>
Link: http://lkml.kernel.org/r/1408037251-45918-2-git-send-email-Waiman.Long@hp.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
For slowpath syscalls, we initialize regs->ax to -ENOSYS and stick
the syscall number into regs->orig_ax prior to any possible tracing
and syscall execution. This is user-visible ABI used by ptrace
syscall emulation and seccomp.
For fastpath syscalls, there's no good reason not to do the same
thing. It's even slightly simpler than what we're currently doing.
It probably has no measureable performance impact. It should have
no user-visible effect.
The purpose of this patch is to prepare for two-phase syscall
tracing, in which the first phase might modify the saved RAX without
leaving the fast path. This change is just subtle enough that I'm
keeping it separate.
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/01218b493f12ae2f98034b78c9ae085e38e94350.1409954077.git.luto@amacapital.net
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
|
|
This splits syscall_trace_enter into syscall_trace_enter_phase1 and
syscall_trace_enter_phase2. Only phase 2 has full pt_regs, and only
phase 2 is permitted to modify any of pt_regs except for orig_ax.
The intent is that phase 1 can be called from the syscall fast path.
In this implementation, phase1 can handle any combination of
TIF_NOHZ (RCU context tracking), TIF_SECCOMP, and TIF_SYSCALL_AUDIT,
unless seccomp requests a ptrace event, in which case phase2 is
forced.
In principle, this could yield a big speedup for TIF_NOHZ as well as
for TIF_SECCOMP if syscall exit work were similarly split up.
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/2df320a600020fda055fccf2b668145729dd0c04.1409954077.git.luto@amacapital.net
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
|
|
- correct spelling
- align fields vertically to make things more readable
- make the layout of magic defines more obvious
Cc: Mark Rustad <mark.d.rustad@intel.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Link: http://lkml.kernel.org/r/1409972149-26272-1-git-send-email-jeffrey.t.kirsher@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Resolve some missing-field-initializers warnings by using
designated initialization in the expansion of the
SERIAL_PORT_DFNS macro.
Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Link: http://lkml.kernel.org/r/1409972149-26272-1-git-send-email-jeffrey.t.kirsher@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Currently, if a permission error happens during the translation of
the final GPA to HPA, walk_addr_generic returns 0 but does not fill
in walker->fault. To avoid this, add an x86_exception* argument
to the translate_gpa function, and let it fill in walker->fault.
The nested_page_fault field will be true, since the walk_mmu is the
nested_mmu and translate_gpu instead operates on the "outer" (NPT)
instance.
Reported-by: Valentine Sinitsyn <valentine.sinitsyn@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|