summaryrefslogtreecommitdiffstats
path: root/arch/x86
AgeCommit message (Collapse)Author
2012-09-05KVM: x86: constify read_write_emulator_opsMathias Krause
We never change those, make them r/o. Signed-off-by: Mathias Krause <minipli@googlemail.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-09-05KVM: x86 emulator: constify emulate_opsMathias Krause
We never change emulate_ops[] at runtime so it should be r/o. Signed-off-by: Mathias Krause <minipli@googlemail.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-09-05KVM: x86 emulator: mark opcode tables constMathias Krause
The opcode tables never change at runtime, therefor mark them const. Signed-off-by: Mathias Krause <minipli@googlemail.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-09-05KVM: x86 emulator: use aligned variants of SSE register opsMathias Krause
As the the compiler ensures that the memory operand is always aligned to a 16 byte memory location, use the aligned variant of MOVDQ for read_sse_reg() and write_sse_reg(). Signed-off-by: Mathias Krause <minipli@googlemail.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-09-05KVM: x86: minor size optimizationMathias Krause
Some fields can be constified and/or made static to reduce code and data size. Numbers for a 32 bit build: text data bss dec hex filename before: 3351 80 0 3431 d67 cpuid.o after: 3391 0 0 3391 d3f cpuid.o Signed-off-by: Mathias Krause <minipli@googlemail.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-09-05x86/kconfig: Remove outdated reference to Intel CPUs in CONFIG_SWIOTLBJoe Millenbach
Deleted the no longer valid example of which x86 CPUs lack a hardware IOMMU, and moved the "If unsure..." statement to a new line to follow the style of surrounding options. Signed-off-by: Joe Millenbach <jmillenbach@gmail.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Cc: team-fjord@googlegroups.com Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Link: http://lkml.kernel.org/r/1346632700-29113-1-git-send-email-jmillenbach@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-09-05x86/iommu: Use NULL instead of plain 0 for __IOMMU_INITMathias Krause
IOMMU_INIT_POST and IOMMU_INIT_POST_FINISH pass the plain value 0 instead of NULL to __IOMMU_INIT. Fix this and make sparse happy by doing so. Signed-off-by: Mathias Krause <minipli@googlemail.com> Cc: Joerg Roedel <joerg.roedel@amd.com> Link: http://lkml.kernel.org/r/1346621506-30857-8-git-send-email-minipli@googlemail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-09-05x86/iommu: Drop duplicate const in __IOMMU_INITMathias Krause
It's redundant and makes sparse complain about it. Signed-off-by: Mathias Krause <minipli@googlemail.com> Cc: Joerg Roedel <joerg.roedel@amd.com> Link: http://lkml.kernel.org/r/1346621506-30857-7-git-send-email-minipli@googlemail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-09-05x86/fpu/xsave: Keep __user annotation in castsMathias Krause
Don't remove the __user annotation of the fpstate pointer, but drop the superfluous void * cast instead. This fixes the following sparse warnings: xsave.c:135:15: warning: cast removes address space of expression xsave.c:135:15: warning: incorrect type in argument 1 (different address spaces) xsave.c:135:15: expected void const volatile [noderef] <asn:1>*<noident> [...] Signed-off-by: Mathias Krause <minipli@googlemail.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Link: http://lkml.kernel.org/r/1346621506-30857-6-git-send-email-minipli@googlemail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-09-05x86/pci/probe_roms: Add missing __iomem annotation to pci_map_biosrom()Mathias Krause
Stay in sync with the declaration and fix the corresponding sparse warnings. Signed-off-by: Mathias Krause <minipli@googlemail.com> Cc: Dan Williams <dan.j.williams@intel.com> Link: http://lkml.kernel.org/r/1346621506-30857-5-git-send-email-minipli@googlemail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-09-05x86/signals: ia32_signal.c: add __user casts to fix sparse warningsMathias Krause
Fix the following sparse warnings by adding appropriate __user casts and annotations: ia32_signal.c:165:38: warning: incorrect type in argument 1 (different address spaces) ia32_signal.c:165:38: expected struct sigaltstack const [noderef] [usertype] <asn:1>*<noident> ia32_signal.c:165:38: got struct sigaltstack * [...] Signed-off-by: Mathias Krause <minipli@googlemail.com> Cc: Oleg Nesterov <oleg@redhat.com> Link: http://lkml.kernel.org/r/1346621506-30857-4-git-send-email-minipli@googlemail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-09-05x86/vdso: Add __user annotation to VDSO32_SYMBOLMathias Krause
The address calculated by VDSO32_SYMBOL() is a pointer into userland. Add the __user annotation to fix related sparse warnings in its users. Signed-off-by: Mathias Krause <minipli@googlemail.com> Cc: Andy Lutomirski <luto@MIT.EDU> Link: http://lkml.kernel.org/r/1346621506-30857-3-git-send-email-minipli@googlemail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-09-05x86: Fix __user annotations in asm/sys_ia32.hMathias Krause
Fix the following sparse warnings: sys_ia32.c:293:38: warning: incorrect type in argument 2 (different address spaces) sys_ia32.c:293:38: expected unsigned int [noderef] [usertype] <asn:1>*stat_addr sys_ia32.c:293:38: got unsigned int *stat_addr Ironically, sys_ia32.h was introduced to fix sparse warnings but missed that one. Signed-off-by: Mathias Krause <minipli@googlemail.com> Link: http://lkml.kernel.org/r/1346621506-30857-2-git-send-email-minipli@googlemail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-09-05x86/Kconfig: Turn off DEBUG_NX_TEST module in defconfigsJosh Triplett
The x86 defconfigs include exactly one module: test_nx.ko, a special-purpose module which just exists to do evil things like executing code off the stack to see if the kernel has enabled NX support. Anyone who actually uses that module can easily enable it themselves, but the vast majority of kernel builds don't need it; disable it by default. Signed-off-by: Josh Triplett <josh@joshtriplett.org> Cc: Randy Dunlap <rdunlap@xenotime.net> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Arjan van de Ven <arjan@linux.intel.com> Link: http://lkml.kernel.org/r/e72faf875e1172fb1cbec5e6d3cd4122df508a97.1346649518.git.josh@joshtriplett.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-09-05x86/Kconfig: Turn off CONFIG_BLK_DEV_RAMJosh Triplett
The vast majority of systems either use initramfs or mount a root filesystem directly from the kernel. Distros have defaulted to initramfs for years. Only highly specialized systems would use an actual filesystem-image initrd at this point, and such systems don't rely on defconfig anyway. Drop initrd support (and specifically RAM block device support) from the defconfigs. Signed-off-by: Josh Triplett <josh@joshtriplett.org> Cc: Randy Dunlap <rdunlap@xenotime.net> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/2521e983a63595cd7a331236d929577660f89c72.1346649518.git.josh@joshtriplett.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-09-05x86/Kconfig: Disable CONFIG_CRC_T10DIF in defconfigsJosh Triplett
CONFIG_CRC_T10DIF explicitly states that it exists only for use by out-of-tree modules; anything in-kernel that needs it selects it. Thus, compile it out by default. Signed-off-by: Josh Triplett <josh@joshtriplett.org> Cc: Randy Dunlap <rdunlap@xenotime.net> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/3aaff7a0af1320427952d411a21b8ded29747a1f.1346649518.git.josh@joshtriplett.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-09-05x86/Kconfig: Switch to ext4 in defconfigsJosh Triplett
The current x86 and x86-64 defconfigs do not enable ext4, which most current distributions default to. Switch the defconfigs to ext4, so they will boot on current systems without additional configuration. Signed-off-by: Josh Triplett <josh@joshtriplett.org> Cc: Randy Dunlap <rdunlap@xenotime.net> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/bd8a359506b7e1287c680823de16d67608ec52fe.1346649518.git.josh@joshtriplett.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-09-05x86/Kconfig: Update defconfigs to current results of "make savedefconfig"Josh Triplett
The x86 defconfigs have become somewhat out of date compared to the current result of "make savedefconfig". Update them to the current output, as a prelude to further defconfig changes, to avoid unrelated noise in those further changes. Signed-off-by: Josh Triplett <josh@joshtriplett.org> Cc: Randy Dunlap <rdunlap@xenotime.net> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/80c8a5fbeaf6cdb72fb78a016013427efee52668.1346649518.git.josh@joshtriplett.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-09-04perf/x86: Enable Intel Cedarview Atom suppportStephane Eranian
This patch enables perf_events support for Intel Cedarview Atom (model 54) processors. Support includes PEBS and LBR. Tested on my Atom N2600 netbook. Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20120820092421.GA11284@quad Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-09-04KVM: PIC: fix use of uninitialised variable.Jamie Iles
Commit aea218f3cbbc (KVM: PIC: call ack notifiers for irqs that are dropped form irr) used an uninitialised variable to track whether an appropriate apic had been found. This could result in calling the ack notifier incorrectly. Cc: Gleb Natapov <gleb@redhat.com> Cc: Avi Kivity <avi@redhat.com> Signed-off-by: Jamie Iles <jamie@jamieiles.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-09-04KVM: cleanup pic resetGleb Natapov
kvm_pic_reset() is not used anywhere. Move reset logic from pic_ioport_write() there. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-09-01hpet: Remove unused PCI Vendor ID #defineJon Mason
HPET_ID_VENDOR_8086 is defined but never used. It would be a redefine of PCI_VENDOR_ID_INTEL if it was ever used. Signed-off-by: Jon Mason <jdmason@kudzu.us> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2012-08-30KVM: x86: remove unused variable from kvm_task_switch()Marcelo Tosatti
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-08-28Merge branch 'perf/urgent' into perf/coreIngo Molnar
Pick up the latest fixes because upcoming uprobes changes will rely on it. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-08-27KVM: VMX: Ignore segment G and D bits when considering whether we can virtualizeAvi Kivity
We will enter the guest with G and D cleared; as real hardware ignores D in real mode, and G is taken care of by the limit test, we allow more code to run in vm86 mode. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-08-27KVM: VMX: Save all segment data in real modeAvi Kivity
Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-08-27KVM: VMX: Preserve segment limit and access rights in real modeAvi Kivity
While this is undocumented, real processors do not reload the segment limit and access rights when loading a segment register in real mode. Real programs rely on it so we need to comply with this behaviour. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-08-27KVM: VMX: Return real real-mode segment data even if ↵Avi Kivity
emulate_invalid_guest_state=1 emulate_invalid_guest_state=1 doesn't mean we don't munge the segments in the vmcs; we do. So we need to return the real ones (maintained by vmx_set_segment). Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-08-27KVM: x86 emulator: Fix #GP error code during linearizationAvi Kivity
We want the segment selector, nor segment number. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-08-27KVM: x86 emulator: Check segment limits in real mode tooAvi Kivity
Segment limits are verified in real mode, not just protected mode. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-08-27KVM: x86 emulator: Leave segment limit and attributs alone in real modeAvi Kivity
When loading a segment in real mode, only the base and selector must be modified. The limit needs to be left alone, otherwise big real mode users will hit a #GP due to limit checking (currently this is suppressed because we don't check limits in real mode). Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-08-27KVM: VMX: Allow vm86 virtualization of big real modeAvi Kivity
Usually, big real mode uses large (4GB) segments. Currently we don't virtualize this; if any segment has a limit other than 0xffff, we emulate. But if we set the vmx-visible limit to 0xffff, we can use vm86 to virtualize real mode; if an access overruns the segment limit, the guest will #GP, which we will trap and forward to the emulator. This results in significantly faster execution, and less risk of hitting an unemulated instruction. If the limit is less than 0xffff, we retain the existing behaviour. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-08-27KVM: VMX: Allow real mode emulation using vm86 with dpl=0Avi Kivity
Real mode is always entered from protected mode with dpl=0. Since the dpl doesn't affect execution, and we already override it to 3 in the vmcs (as vmx requires), we can allow execution in that state. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-08-27KVM: VMX: Retain limit and attributes when entering protected modeAvi Kivity
Real processors don't change segment limits and attributes while in real mode. Mimic that behaviour. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-08-27KVM: VMX: Use kvm_segment to save protected-mode segments when entering realmodeAvi Kivity
Instead of using struct kvm_save_segment, use struct kvm_segment, which is what the other APIs use. This leads to some simplification. We replace save_rmode_seg() with a call to vmx_save_segment(). Since this depends on rmode.vm86_active, we move the call to before setting the flag. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-08-27KVM: VMX: Fix incorrect lookup of segment S flag in fix_pmode_dataseg()Avi Kivity
fix_pmode_dataseg() looks up S in ->base instead of ->ar_bytes. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-08-27KVM: VMX: Separate saving pre-realmode state from setting segmentsAvi Kivity
Commit b246dd5df139 ("KVM: VMX: Fix KVM_SET_SREGS with big real mode segments") moved fix_rmode_seg() to vmx_set_segment(), so that it is applied not just on transitions to real mode, but also on KVM_SET_SREGS (migration). However fix_rmode_seg() not only munges the vmcs segments, it also sets up the save area for us to restore when returning to protected mode or to return in vmx_get_segment(). Move saving the segment into a new function, save_rmode_seg(), and call it just during the transition. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-08-27KVM: x86 emulator: access GPRs on demandAvi Kivity
Instead of populating the entire register file, read in registers as they are accessed, and write back only the modified ones. This saves a VMREAD and VMWRITE on Intel (for rsp, since it is not usually used during emulation), and a two 128-byte copies for the registers. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-08-27KVM: x86: fix KVM_GET_MSR for PV EOIMichael S. Tsirkin
KVM_GET_MSR was missing support for PV EOI, which is needed for migration. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-08-27perf/x86: Fix microcode revision check for SNB-PEBSStephane Eranian
The following patch makes the microcode update code path actually invoke the perf_check_microcode() function and thus potentially renabling SNB PEBS. By default, CONFIG_MICROCODE_OLD_INTERFACE is forced to Y in arch/x86/Kconfig. There is no way to disable this. That means that the code path used in arch/x86/kernel/microcode_core.c did not include the call to perf_check_microcode(). Thus, even though the microcode was updated to a version that fixes the SNB PEBS problem, perf_event would still return EOPNOTSUPP when enabling precise sampling. This patch simply adds a call to perf_check_microcode() in the call path used when OLD_INTERFACE=y. Signed-off-by: Stephane Eranian <eranian@google.com> Acked-by: Borislav Petkov <borislav.petkov@amd.com> Cc: peterz@infradead.org Cc: andi@firstfloor.org Link: http://lkml.kernel.org/r/20120824133434.GA8014@quad Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-08-26Merge remote-tracking branch 'upstream/master' into queueMarcelo Tosatti
Merging critical fixes from upstream required for development. * upstream/master: (809 commits) libata: Add a space to " 2GB ATA Flash Disk" DMA blacklist entry Revert "powerpc: Update g5_defconfig" powerpc/perf: Use pmc_overflow() to detect rolled back events powerpc: Fix VMX in interrupt check in POWER7 copy loops powerpc: POWER7 copy_to_user/copy_from_user patch applied twice powerpc: Fix personality handling in ppc64_personality() powerpc/dma-iommu: Fix IOMMU window check powerpc: Remove unnecessary ifdefs powerpc/kgdb: Restore current_thread_info properly powerpc/kgdb: Bail out of KGDB when we've been triggered powerpc/kgdb: Do not set kgdb_single_step on ppc powerpc/mpic_msgr: Add missing includes powerpc: Fix null pointer deref in perf hardware breakpoints powerpc: Fixup whitespace in xmon powerpc: Fix xmon dl command for new printk implementation xfs: check for possible overflow in xfs_ioc_trim xfs: unlock the AGI buffer when looping in xfs_dialloc xfs: fix uninitialised variable in xfs_rtbuf_get() powerpc/fsl: fix "Failed to mount /dev: No such device" errors powerpc/fsl: update defconfigs ... Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-08-25Merge tag 'stable/for-linus-3.6-rc3-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen Pull three xen bug-fixes from Konrad Rzeszutek Wilk: - Revert the kexec fix which caused on non-kexec shutdowns a race. - Reuse existing P2M leafs - instead of requiring to allocate a large area of bootup virtual address estate. - Fix a one-off error when adding PFNs for balloon pages. * tag 'stable/for-linus-3.6-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen/setup: Fix one-off error when adding for-balloon PFNs to the P2M. xen/p2m: Reuse existing P2M leafs if they are filled with 1:1 PFNs or INVALID. Revert "xen PVonHVM: move shared_info to MMIO before kexec"
2012-08-25Merge git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm fixes from Marcelo Tosatti. * git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: x86 emulator: use stack size attribute to mask rsp in stack ops KVM: MMU: Fix mmu_shrink() so that it can free mmu pages as intended ppc: e500_tlb memset clears nothing KVM: PPC: Add cache flush on page map KVM: PPC: Book3S HV: Fix incorrect branch in H_CEDE code KVM: x86: update KVM_SAVE_MSRS_BEGIN to correct value
2012-08-23xen/mmu: If the revector fails, don't attempt to revector anything else.Konrad Rzeszutek Wilk
If the P2M revectoring would fail, we would try to continue on by cleaning the PMD for L1 (PTE) page-tables. The xen_cleanhighmap is greedy and erases the PMD on both boundaries. Since the P2M array can share the PMD, we would wipe out part of the __ka that is still used in the P2M tree to point to P2M leafs. This fixes it by bypassing the revectoring and continuing on. If the revector fails, a nice WARN is printed so we can still troubleshoot this. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-08-23xen/p2m: When revectoring deal with holes in the P2M array.Konrad Rzeszutek Wilk
When we free the PFNs and then subsequently populate them back during bootup: Freeing 20000-20200 pfn range: 512 pages freed 1-1 mapping on 20000->20200 Freeing 40000-40200 pfn range: 512 pages freed 1-1 mapping on 40000->40200 Freeing bad80-badf4 pfn range: 116 pages freed 1-1 mapping on bad80->badf4 Freeing badf6-bae7f pfn range: 137 pages freed 1-1 mapping on badf6->bae7f Freeing bb000-100000 pfn range: 282624 pages freed 1-1 mapping on bb000->100000 Released 283999 pages of unused memory Set 283999 page(s) to 1-1 mapping Populating 1acb8a-1f20e9 pfn range: 283999 pages added We end up having the P2M array (that is the one that was grafted on the P2M tree) filled with IDENTITY_FRAME or INVALID_P2M_ENTRY) entries. The patch titled "xen/p2m: Reuse existing P2M leafs if they are filled with 1:1 PFNs or INVALID." recycles said slots and replaces the P2M tree leaf's with &mfn_list[xx] with p2m_identity or p2m_missing. And re-uses the P2M array sections for other P2M tree leaf's. For the above mentioned bootup excerpt, the PFNs at 0x20000->0x20200 are going to be IDENTITY based: P2M[0][256][0] -> P2M[0][257][0] get turned in IDENTITY_FRAME. We can re-use that and replace P2M[0][256] to point to p2m_identity. The "old" page (the grafted P2M array provided by Xen) that was at P2M[0][256] gets put somewhere else. Specifically at P2M[6][358], b/c when we populate back: Populating 1acb8a-1f20e9 pfn range: 283999 pages added we fill P2M[6][358][0] (and P2M[6][358], P2M[6][359], ...) with the new MFNs. That is all OK, except when we revector we assume that the PFN count would be the same in the grafted P2M array and in the newly allocated. Since that is no longer the case, as we have holes in the P2M that point to p2m_missing or p2m_identity we have to take that into account. [v2: Check for overflow] [v3: Move within the __va check] [v4: Fix the computation] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-08-23xen/mmu: Release just the MFN list, not MFN list and part of pagetables.Konrad Rzeszutek Wilk
We call memblock_reserve for [start of mfn list] -> [PMD aligned end of mfn list] instead of <start of mfn list> -> <page aligned end of mfn list]. This has the disastrous effect that if at bootup the end of mfn_list is not PMD aligned we end up returning to memblock parts of the region past the mfn_list array. And those parts are the PTE tables with the disastrous effect of seeing this at bootup: Write protecting the kernel read-only data: 10240k Freeing unused kernel memory: 1860k freed Freeing unused kernel memory: 200k freed (XEN) mm.c:2429:d0 Bad type (saw 1400000000000002 != exp 7000000000000000) for mfn 116a80 (pfn 14e26) ... (XEN) mm.c:908:d0 Error getting mfn 116a83 (pfn 14e2a) from L1 entry 8000000116a83067 for l1e_owner=0, pg_owner=0 (XEN) mm.c:908:d0 Error getting mfn 4040 (pfn 5555555555555555) from L1 entry 0000000004040601 for l1e_owner=0, pg_owner=0 .. and so on. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-08-23xen/mmu: Remove from __ka space PMD entries for pagetables.Konrad Rzeszutek Wilk
Please first read the description in "xen/mmu: Copy and revector the P2M tree." At this stage, the __ka address space (which is what the old P2M tree was using) is partially disassembled. The cleanup_highmap has removed the PMD entries from 0-16MB and anything past _brk_end up to the max_pfn_mapped (which is the end of the ramdisk). The xen_remove_p2m_tree and code around has ripped out the __ka for the old P2M array. Here we continue on doing it to where the Xen page-tables were. It is safe to do it, as the page-tables are addressed using __va. For good measure we delete anything that is within MODULES_VADDR and up to the end of the PMD. At this point the __ka only contains PMD entries for the start of the kernel up to __brk. [v1: Per Stefano's suggestion wrapped the MODULES_VADDR in debug] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-08-23xen/mmu: Copy and revector the P2M tree.Konrad Rzeszutek Wilk
Please first read the description in "xen/p2m: Add logic to revector a P2M tree to use __va leafs" patch. The 'xen_revector_p2m_tree()' function allocates a new P2M tree copies the contents of the old one in it, and returns the new one. At this stage, the __ka address space (which is what the old P2M tree was using) is partially disassembled. The cleanup_highmap has removed the PMD entries from 0-16MB and anything past _brk_end up to the max_pfn_mapped (which is the end of the ramdisk). We have revectored the P2M tree (and the one for save/restore as well) to use new shiny __va address to new MFNs. The xen_start_info has been taken care of already in 'xen_setup_kernel_pagetable()' and xen_start_info->shared_info in 'xen_setup_shared_info()', so we are free to roam and delete PMD entries - which is exactly what we are going to do. We rip out the __ka for the old P2M array. [v1: Fix smatch warnings] [v2: memset was doing 0 instead of 0xff] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-08-23xen/p2m: Add logic to revector a P2M tree to use __va leafs.Konrad Rzeszutek Wilk
During bootup Xen supplies us with a P2M array. It sticks it right after the ramdisk, as can be seen with a 128GB PV guest: (certain parts removed for clarity): xc_dom_build_image: called xc_dom_alloc_segment: kernel : 0xffffffff81000000 -> 0xffffffff81e43000 (pfn 0x1000 + 0xe43 pages) xc_dom_pfn_to_ptr: domU mapping: pfn 0x1000+0xe43 at 0x7f097d8bf000 xc_dom_alloc_segment: ramdisk : 0xffffffff81e43000 -> 0xffffffff925c7000 (pfn 0x1e43 + 0x10784 pages) xc_dom_pfn_to_ptr: domU mapping: pfn 0x1e43+0x10784 at 0x7f0952dd2000 xc_dom_alloc_segment: phys2mach : 0xffffffff925c7000 -> 0xffffffffa25c7000 (pfn 0x125c7 + 0x10000 pages) xc_dom_pfn_to_ptr: domU mapping: pfn 0x125c7+0x10000 at 0x7f0942dd2000 xc_dom_alloc_page : start info : 0xffffffffa25c7000 (pfn 0x225c7) xc_dom_alloc_page : xenstore : 0xffffffffa25c8000 (pfn 0x225c8) xc_dom_alloc_page : console : 0xffffffffa25c9000 (pfn 0x225c9) nr_page_tables: 0x0000ffffffffffff/48: 0xffff000000000000 -> 0xffffffffffffffff, 1 table(s) nr_page_tables: 0x0000007fffffffff/39: 0xffffff8000000000 -> 0xffffffffffffffff, 1 table(s) nr_page_tables: 0x000000003fffffff/30: 0xffffffff80000000 -> 0xffffffffbfffffff, 1 table(s) nr_page_tables: 0x00000000001fffff/21: 0xffffffff80000000 -> 0xffffffffa27fffff, 276 table(s) xc_dom_alloc_segment: page tables : 0xffffffffa25ca000 -> 0xffffffffa26e1000 (pfn 0x225ca + 0x117 pages) xc_dom_pfn_to_ptr: domU mapping: pfn 0x225ca+0x117 at 0x7f097d7a8000 xc_dom_alloc_page : boot stack : 0xffffffffa26e1000 (pfn 0x226e1) xc_dom_build_image : virt_alloc_end : 0xffffffffa26e2000 xc_dom_build_image : virt_pgtab_end : 0xffffffffa2800000 So the physical memory and virtual (using __START_KERNEL_map addresses) layout looks as so: phys __ka /------------\ /-------------------\ | 0 | empty | 0xffffffff80000000| | .. | | .. | | 16MB | <= kernel starts | 0xffffffff81000000| | .. | | | | 30MB | <= kernel ends => | 0xffffffff81e43000| | .. | & ramdisk starts | .. | | 293MB | <= ramdisk ends=> | 0xffffffff925c7000| | .. | & P2M starts | .. | | .. | | .. | | 549MB | <= P2M ends => | 0xffffffffa25c7000| | .. | start_info | 0xffffffffa25c7000| | .. | xenstore | 0xffffffffa25c8000| | .. | cosole | 0xffffffffa25c9000| | 549MB | <= page tables => | 0xffffffffa25ca000| | .. | | | | 550MB | <= PGT end => | 0xffffffffa26e1000| | .. | boot stack | | \------------/ \-------------------/ As can be seen, the ramdisk, P2M and pagetables are taking a bit of __ka addresses space. Which is a problem since the MODULES_VADDR starts at 0xffffffffa0000000 - and P2M sits right in there! This results during bootup with the inability to load modules, with this error: ------------[ cut here ]------------ WARNING: at /home/konrad/ssd/linux/mm/vmalloc.c:106 vmap_page_range_noflush+0x2d9/0x370() Call Trace: [<ffffffff810719fa>] warn_slowpath_common+0x7a/0xb0 [<ffffffff81030279>] ? __raw_callee_save_xen_pmd_val+0x11/0x1e [<ffffffff81071a45>] warn_slowpath_null+0x15/0x20 [<ffffffff81130b89>] vmap_page_range_noflush+0x2d9/0x370 [<ffffffff81130c4d>] map_vm_area+0x2d/0x50 [<ffffffff811326d0>] __vmalloc_node_range+0x160/0x250 [<ffffffff810c5369>] ? module_alloc_update_bounds+0x19/0x80 [<ffffffff810c6186>] ? load_module+0x66/0x19c0 [<ffffffff8105cadc>] module_alloc+0x5c/0x60 [<ffffffff810c5369>] ? module_alloc_update_bounds+0x19/0x80 [<ffffffff810c5369>] module_alloc_update_bounds+0x19/0x80 [<ffffffff810c70c3>] load_module+0xfa3/0x19c0 [<ffffffff812491f6>] ? security_file_permission+0x86/0x90 [<ffffffff810c7b3a>] sys_init_module+0x5a/0x220 [<ffffffff815ce339>] system_call_fastpath+0x16/0x1b ---[ end trace fd8f7704fdea0291 ]--- vmalloc: allocation failure, allocated 16384 of 20480 bytes modprobe: page allocation failure: order:0, mode:0xd2 Since the __va and __ka are 1:1 up to MODULES_VADDR and cleanup_highmap rids __ka of the ramdisk mapping, what we want to do is similar - get rid of the P2M in the __ka address space. There are two ways of fixing this: 1) All P2M lookups instead of using the __ka address would use the __va address. This means we can safely erase from __ka space the PMD pointers that point to the PFNs for P2M array and be OK. 2). Allocate a new array, copy the existing P2M into it, revector the P2M tree to use that, and return the old P2M to the memory allocate. This has the advantage that it sets the stage for using XEN_ELF_NOTE_INIT_P2M feature. That feature allows us to set the exact virtual address space we want for the P2M - and allows us to boot as initial domain on large machines. So we pick option 2). This patch only lays the groundwork in the P2M code. The patch that modifies the MMU is called "xen/mmu: Copy and revector the P2M tree." Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-08-23xen/mmu: Recycle the Xen provided L4, L3, and L2 pagesKonrad Rzeszutek Wilk
As we are not using them. We end up only using the L1 pagetables and grafting those to our page-tables. [v1: Per Stefano's suggestion squashed two commits] [v2: Per Stefano's suggestion simplified loop] [v3: Fix smatch warnings] [v4: Add more comments] Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>