summaryrefslogtreecommitdiffstats
path: root/arch/x86
AgeCommit message (Collapse)Author
2012-10-20perf/x86: Disable uncore on virtualized CPUsYan, Zheng
Initializing uncore PMU on virtualized CPU may hang the kernel. This is because kvm does not emulate the entire hardware. Thers are lots of uncore related MSRs, making kvm enumerate them all is a non-trival task. So just disable uncore on virtualized CPU. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Tested-by: Pekka Enberg <penberg@kernel.org> Cc: a.p.zijlstra@chello.nl Cc: eranian@google.com Cc: andi@firstfloor.org Cc: avi@redhat.com Link: http://lkml.kernel.org/r/1345540117-14164-1-git-send-email-zheng.z.yan@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-10-19Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "Assorted small fixes" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf python: Properly link with libtraceevent perf hists browser: Add back callchain folding symbol perf tools: Fix build on sparc. perf python: Link with libtraceevent perf python: Initialize 'page_size' variable tools lib traceevent: Fix missed freeing of subargs in free_arg() in filter lib tools traceevent: Add back pevent assignment in __pevent_parse_format() perf hists browser: Fix off-by-two bug on the first column perf tools: Remove warnings on JIT samples for srcline sort key perf tools: Fix segfault when using srcline sort key perf: Require exclude_guest to use PEBS - kernel side enforcement perf tool: Precise mode requires exclude_guest
2012-10-20Merge tag 'perf-urgent-for-mingo' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent Pull perf/urgent fixes from Arnaldo Carvalho de Melo: * The python binding needs to link with libtraceevent and to initialize the 'page_size' variable so that mmaping works again. * The callchain folding character that appears on the TUI just before the overhead had disappeared due to recent changes, add it back. * Intel PEBS in VT-x context uses the DS address as a guest linear address, even though its programmed by the host as a host linear address. This either results in guest memory corruption and or the hardware faulting and 'crashing' the virtual machine. Therefore we have to disable PEBS on VT-x enter and re-enable on VT-x exit, enforcing a strict exclude_guest. Kernel side enforcement fix by Peter Zijlstra, tooling side fix by David Ahern. * Fix build on sparc due to UAPI, fix from David Miller. * Fixes for the srclike sort key for unresolved symbols and when processing samples in JITted code, where we don't have an ELF file, just an special symbol table, fixes from Namhyung Kim. * Fix some leaks in libtraceevent, from Steven Rostedt. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-10-19Merge commit 'v3.7-rc1' into stable/for-linus-3.7Konrad Rzeszutek Wilk
* commit 'v3.7-rc1': (10892 commits) Linux 3.7-rc1 x86, boot: Explicitly include autoconf.h for hostprogs perf: Fix UAPI fallout ARM: config: make sure that platforms are ordered by option string ARM: config: sort select statements alphanumerically UAPI: (Scripted) Disintegrate include/linux/byteorder UAPI: (Scripted) Disintegrate include/linux UAPI: Unexport linux/blk_types.h UAPI: Unexport part of linux/ppp-comp.h perf: Handle new rbtree implementation procfs: don't need a PATH_MAX allocation to hold a string representation of an int vfs: embed struct filename inside of names_cache allocation if possible audit: make audit_inode take struct filename vfs: make path_openat take a struct filename pointer vfs: turn do_path_lookup into wrapper around struct filename variant audit: allow audit code to satisfy getname requests from its names_list vfs: define struct filename and have getname() return it btrfs: Fix compilation with user namespace support enabled userns: Fix posix_acl_file_xattr_userns gid conversion userns: Properly print bluetooth socket uids ...
2012-10-19xen/x86: don't corrupt %eip when returning from a signal handlerDavid Vrabel
In 32 bit guests, if a userspace process has %eax == -ERESTARTSYS (-512) or -ERESTARTNOINTR (-513) when it is interrupted by an event /and/ the process has a pending signal then %eip (and %eax) are corrupted when returning to the main process after handling the signal. The application may then crash with SIGSEGV or a SIGILL or it may have subtly incorrect behaviour (depending on what instruction it returned to). The occurs because handle_signal() is incorrectly thinking that there is a system call that needs to restarted so it adjusts %eip and %eax to re-execute the system call instruction (even though user space had not done a system call). If %eax == -514 (-ERESTARTNOHAND (-514) or -ERESTART_RESTARTBLOCK (-516) then handle_signal() only corrupted %eax (by setting it to -EINTR). This may cause the application to crash or have incorrect behaviour. handle_signal() assumes that regs->orig_ax >= 0 means a system call so any kernel entry point that is not for a system call must push a negative value for orig_ax. For example, for physical interrupts on bare metal the inverse of the vector is pushed and page_fault() sets regs->orig_ax to -1, overwriting the hardware provided error code. xen_hypervisor_callback() was incorrectly pushing 0 for orig_ax instead of -1. Classic Xen kernels pushed %eax which works as %eax cannot be both non-negative and -RESTARTSYS (etc.), but using -1 is consistent with other non-system call entry points and avoids some of the tests in handle_signal(). There were similar bugs in xen_failsafe_callback() of both 32 and 64-bit guests. If the fault was corrected and the normal return path was used then 0 was incorrectly pushed as the value for orig_ax. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Acked-by: Jan Beulich <JBeulich@suse.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Cc: stable@vger.kernel.org Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-10-19xen: grant: use xen_pfn_t type for frame_list.Ian Campbell
This correctly sizes it as 64 bit on ARM but leaves it as unsigned long on x86 (therefore no intended change on x86). The long and ulong guest handles are now unused (and a bit dangerous) so remove them. Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-10-19xen: sysfs: fix build warning.Ian Campbell
Define PRI macros for xen_ulong_t and xen_pfn_t and use to fix: drivers/xen/sys-hypervisor.c:288:4: warning: format '%lx' expects argument of type 'long unsigned int', but argument 3 has type 'xen_ulong_t' [-Wformat] Ideally this would use PRIx64 on ARM but these (or equivalent) don't seem to be available in the kernel. Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-10-19xen/x86: remove duplicated include from enlighten.cWei Yongjun
Remove duplicated include. dpatch engine is used to auto generate this patch. (https://github.com/weiyj/dpatch) CC: stable@vger.kernel.org Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-10-19Merge commit '5bc66170dc486556a1e36fd384463536573f4b82' into x86/urgentH. Peter Anvin
From Borislav Petkov <bp@amd64.org>: Below is a RAS fix which reverts the addition of a sysfs attribute which we agreed is not needed, post-factum. And this should go in now because that sysfs attribute is going to end up in 3.7 otherwise and thus exposed to userspace; removing it then would be a lot harder. This is done as a merge rather than a simple patch/cherry-pick since the baseline for this patch was not in the previous x86/urgent. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-10-19x86, MCE: Remove bios_cmci_threshold sysfs attributeBorislav Petkov
450cc201038f3 ("x86/mce: Provide boot argument to honour bios-set CMCI threshold") added the bios_cmci_threshold sysfs attribute which was supposed to communicate to userspace tools that BIOS CMCI threshold has been honoured. However, this info is not of any importance to userspace - it should rather get the actual error count it has been thresholded already from MCi_STATUS[38:52]. So drop this before it becomes a used interface (good thing we caught this early in 3.7-rc1, right after the merge window closed). Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Acked-by: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/20121017105940.GA14590@x1.osrc.amd.com Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-10-18crypto: aesni - fix XTS mode on x86-32, add wrapper function for asmlinkage ↵Jussi Kivilinna
aesni_enc() Calling convention for internal functions and 'asmlinkage' functions is different on x86-32. Therefore do not directly cast aesni_enc as XTS tweak function, but use wrapper function in between. Fixes crash with "XTS + aesni_intel + x86-32" combination. Cc: stable@vger.kernel.org Reported-by: Krzysztof Kolasa <kkolasa@winsoft.pl> Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-18KVM: VMX: report internal error for MMIO #PF due to delivery eventXiao Guangrong
The #PF with PFEC.RSV = 1 indicates that the guest is accessing MMIO, we can not fix it if it is caused by delivery event. Reporting internal error for this case Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-10-18KVM: VMX: report internal error for the unhandleable eventXiao Guangrong
VM exits during Event Delivery is really unexpected if it is not caused by Exceptions/EPT-VIOLATION/TASK_SWITCH, we'd better to report an internal and freeze the guest, the VMM has the chance to check the guest Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-10-18KVM: do not de-cache cr4 bits needlesslyGleb Natapov
Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-10-17x86, amd, mce: Avoid NULL pointer reference on CPU northbridge lookupDaniel J Blueman
When booting on a federated multi-server system (NumaScale), the processor Northbridge lookup returns NULL; add guards to prevent this causing an oops. On those systems, the northbridge is accessed through MMIO and the "normal" northbridge enumeration in amd_nb.c doesn't work since we're generating the northbridge ID from the initial APIC ID and the last is not unique on those systems. Long story short, we end up without northbridge descriptors. Signed-off-by: Daniel J Blueman <daniel@numascale-asia.com> Cc: stable@vger.kernel.org # 3.6 Link: http://lkml.kernel.org/r/1349073725-14093-1-git-send-email-daniel@numascale-asia.com [ Boris: beef up commit message ] Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-10-17x86: Exclude E820_RESERVED regions and memory holes above 4 GB from direct ↵Jacob Shin
mapping. On systems with very large memory (1 TB in our case), BIOS may report a reserved region or a hole in the E820 map, even above the 4 GB range. Exclude these from the direct mapping. [ hpa: this should be done not just for > 4 GB but for everything above the legacy region (1 MB), at the very least. That, however, turns out to require significant restructuring. That work is well underway, but is not suitable for rc/stable. ] Cc: stable@kernel.org # > 2.6.32 Signed-off-by: Jacob Shin <jacob.shin@amd.com> Link: http://lkml.kernel.org/r/1319145326-13902-1-git-send-email-jacob.shin@amd.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-10-17KVM: MMU: introduce FNAME(prefetch_gpte)Xiao Guangrong
The only difference between FNAME(update_pte) and FNAME(pte_prefetch) is that the former is allowed to prefetch gfn from dirty logged slot, so introduce a common function to prefetch spte Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-10-17KVM: MMU: move prefetch_invalid_gpte out of pagaing_tmp.hXiao Guangrong
The function does not depend on guest mmu mode, move it out from paging_tmpl.h Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-10-17KVM: MMU: cleanup FNAME(page_fault)Xiao Guangrong
Let it return emulate state instead of spte like __direct_map Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-10-17KVM: MMU: remove mmu_is_invalidXiao Guangrong
Remove mmu_is_invalid and use is_invalid_pfn instead Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-10-16perf: Require exclude_guest to use PEBS - kernel side enforcementPeter Zijlstra
Intel PEBS in VT-x context uses the DS address as a guest linear address, even though its programmed by the host as a host linear address. This either results in guest memory corruption and or the hardware faulting and 'crashing' the virtual machine. Therefore we have to disable PEBS on VT-x enter and re-enable on VT-x exit, enforcing a strict exclude_guest. This patch enforces exclude_guest kernel side. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Avi Kivity <avi@redhat.com> Cc: David Ahern <dsahern@gmail.com> Cc: Gleb Natapov <gleb@redhat.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Robert Richter <robert.richter@amd.com> Link: http://lkml.kernel.org/r/1347569955-54626-3-git-send-email-dsahern@gmail.com Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-10-15kbuild: Fix accidental revert in commit fe04ddfMichal Marek
Commit fe04ddf7c291 ("kbuild: Do not package /boot and /lib in make tar-pkg") accidentally reverted two previous kbuild commits. I don't know what I was thinking. This brings back changes made by commits 24cc7fb69a5b ("x86/kbuild: archscripts depends on scripts_basic") and c1c1a59e37da ("firmware: fix directory creation rule matching with make 3.80") Reported-by: Jan Beulich <JBeulich@suse.com> Cc: <stable@vger.kernel.org> Signed-off-by: Michal Marek <mmarek@suse.cz> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-15crypto: crc32c - Optimize CRC32C calculation with PCLMULQDQ instructionTim Chen
This patch adds the crc_pcl function that calculates CRC32C checksum using the PCLMULQDQ instruction on processors that support this feature. This will provide speedup over using CRC32 instruction only. The usage of PCLMULQDQ necessitate the invocation of kernel_fpu_begin and kernel_fpu_end and incur some overhead. So the new crc_pcl function is only invoked for buffer size of 512 bytes or more. Larger sized buffers will expect to see greater speedup. This feature is best used coupled with eager_fpu which reduces the kernel_fpu_begin/end overhead. For buffer size of 1K the speedup is around 1.6x and for buffer size greater than 4K, the speedup is around 3x compared to original implementation in crc32c-intel module. Test was performed on Sandy Bridge based platform with constant frequency set for cpu. A white paper detailing the algorithm can be found here: http://download.intel.com/design/intarch/papers/323405.pdf Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-10-15crypto: crc32c - Rename crc32c-intel.c to crc32c-intel_glue.cTim Chen
This patch renames the crc32c-intel.c file to crc32c-intel_glue.c file in preparation for linking with the new crc32c-pcl-intel-asm.S file, which contains optimized crc32c calculation based on PCLMULQDQ instruction. Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-10-15oprofile, x86: Fix wrapping bug in op_x86_get_ctrl()Dan Carpenter
The "event" variable is a u16 so the shift will always wrap to zero making the line a no-op. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: <stable@vger.kernel.org> v2.6.32.. Signed-off-by: Robert Richter <robert.richter@amd.com>
2012-10-14Merge branch 'modules-next' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux Pull module signing support from Rusty Russell: "module signing is the highlight, but it's an all-over David Howells frenzy..." Hmm "Magrathea: Glacier signing key". Somebody has been reading too much HHGTTG. * 'modules-next' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: (37 commits) X.509: Fix indefinite length element skip error handling X.509: Convert some printk calls to pr_devel asymmetric keys: fix printk format warning MODSIGN: Fix 32-bit overflow in X.509 certificate validity date checking MODSIGN: Make mrproper should remove generated files. MODSIGN: Use utf8 strings in signer's name in autogenerated X.509 certs MODSIGN: Use the same digest for the autogen key sig as for the module sig MODSIGN: Sign modules during the build process MODSIGN: Provide a script for generating a key ID from an X.509 cert MODSIGN: Implement module signature checking MODSIGN: Provide module signing public keys to the kernel MODSIGN: Automatically generate module signing keys if missing MODSIGN: Provide Kconfig options MODSIGN: Provide gitignore and make clean rules for extra files MODSIGN: Add FIPS policy module: signature checking hook X.509: Add a crypto key parser for binary (DER) X.509 certificates MPILIB: Provide a function to read raw data into an MPI X.509: Add an ASN.1 decoder X.509: Add simple ASN.1 grammar compiler ...
2012-10-14x86, boot: Explicitly include autoconf.h for hostprogsMatt Fleming
The hostprogs need access to the CONFIG_* symbols found in include/generated/autoconf.h. But commit abbf1590de22 ("UAPI: Partition the header include path sets and add uapi/ header directories") replaced $(LINUXINCLUDE) with $(USERINCLUDE) which doesn't contain the necessary include paths. This has the undesirable effect of breaking the EFI boot stub because the #ifdef CONFIG_EFI_STUB code in arch/x86/boot/tools/build.c is never compiled. It should also be noted that because $(USERINCLUDE) isn't exported by the top-level Makefile it's actually empty in arch/x86/boot/Makefile. Cc: H. Peter Anvin <hpa@zytor.com> Cc: Ingo Molnar <mingo@kernel.org> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Matt Fleming <matt.fleming@intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-13Merge tag 'for_linus-3.7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/kgdb Pull KGDB/KDB fixes and cleanups from Jason Wessel: "Cleanups - Clean up compile warnings in kgdboc.c and x86/kernel/kgdb.c - Add module event hooks for simplified debugging with gdb Fixes - Fix kdb to stop paging with 'q' on bta and dmesg - Fix for data that scrolls off the vga console due to line wrapping when using the kdb pager New - The debug core registers for kernel module events which allows a kernel aware gdb to automatically load symbols and break on entry to a kernel module - Allow kgdboc=kdb to setup kdb on the vga console" * tag 'for_linus-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/kgdb: tty/console: fix warnings in drivers/tty/serial/kgdboc.c kdb,vt_console: Fix missed data due to pager overruns kdb: Fix dmesg/bta scroll to quit with 'q' kgdboc: Accept either kbd or kdb to activate the vga + keyboard kdb shell kgdb,x86: fix warning about unused variable mips,kgdb: fix recursive page fault with CONFIG_KPROBES kgdb: Add module event hooks
2012-10-13Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf updates from Ingo Molnar: "This tree includes some late late perf items that missed the first round: tools: - Bash auto completion improvements, now we can auto complete the tools long options, tracepoint event names, etc, from Namhyung Kim. - Look up thread using tid instead of pid in 'perf sched'. - Move global variables into a perf_kvm struct, from David Ahern. - Hists refactorings, preparatory for improved 'diff' command, from Jiri Olsa. - Hists refactorings, preparatory for event group viewieng work, from Namhyung Kim. - Remove double negation on optional feature macro definitions, from Namhyung Kim. - Remove several cases of needless global variables, on most builtins. - misc fixes kernel: - sysfs support for IBS on AMD CPUs, from Robert Richter. - Support for an upcoming Intel CPU, the Xeon-Phi / Knights Corner HPC blade PMU, from Vince Weaver. - misc fixes" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (46 commits) perf: Fix perf_cgroup_switch for sw-events perf: Clarify perf_cpu_context::active_pmu usage by renaming it to ::unique_pmu perf/AMD/IBS: Add sysfs support perf hists: Add more helpers for hist entry stat perf hists: Move he->stat.nr_events initialization to a template perf hists: Introduce struct he_stat perf diff: Removing the total_period argument from output code perf tool: Add hpp interface to enable/disable hpp column perf tools: Removing hists pair argument from output path perf hists: Separate overhead and baseline columns perf diff: Refactor diff displacement possition info perf hists: Add struct hists pointer to struct hist_entry perf tools: Complete tracepoint event names perf/x86: Add support for Intel Xeon-Phi Knights Corner PMU perf evlist: Remove some unused methods perf evlist: Introduce add_newtp method perf kvm: Move global variables into a perf_kvm struct perf tools: Convert to BACKTRACE_SUPPORT perf tools: Long option completion support for each subcommands perf tools: Complete long option names of perf command ...
2012-10-13Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal Pull third pile of kernel_execve() patches from Al Viro: "The last bits of infrastructure for kernel_thread() et.al., with alpha/arm/x86 use of those. Plus sanitizing the asm glue and do_notify_resume() on alpha, fixing the "disabled irq while running task_work stuff" breakage there. At that point the rest of kernel_thread/kernel_execve/sys_execve work can be done independently for different architectures. The only pending bits that do depend on having all architectures converted are restrictred to fs/* and kernel/* - that'll obviously have to wait for the next cycle. I thought we'd have to wait for all of them done before we start eliminating the longjump-style insanity in kernel_execve(), but it turned out there's a very simple way to do that without flagday-style changes." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal: alpha: switch to saner kernel_execve() semantics arm: switch to saner kernel_execve() semantics x86, um: convert to saner kernel_execve() semantics infrastructure for saner ret_from_kernel_thread semantics make sure that kernel_thread() callbacks call do_exit() themselves make sure that we always have a return path from kernel_execve() ppc: eeh_event should just use kthread_run() don't bother with kernel_thread/kernel_execve for launching linuxrc alpha: get rid of switch_stack argument of do_work_pending() alpha: don't bother passing switch_stack separately from regs alpha: take SIGPENDING/NOTIFY_RESUME loop into signal.c alpha: simplify TIF_NEED_RESCHED handling
2012-10-12x86, um: convert to saner kernel_execve() semanticsAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-10-12Merge tag 'stable/for-linus-3.7-rc0-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen Pull Xen fixes from Konrad Rzeszutek Wilk: "This has four bug-fixes and one tiny feature that I forgot to put initially in my tree due to oversight. The feature is for kdump kernels to speed up the /proc/vmcore reading. There is a ram_is_pfn helper function that the different platforms can register for. We are now doing that. The bug-fixes cover some embarrassing struct pv_cpu_ops variables being set to NULL on Xen (but not baremetal). We had a similar issue in the past with {write|read}_msr_safe and this fills the three missing ones. The other bug-fix is to make the console output (hvc) be capable of dealing with misbehaving backends and not fall flat on its face. Lastly, a quirk for older XenBus implementations that came with an ancient v3.4 hypervisor (so RHEL5 based) - reading of certain non-existent attributes just hangs the guest during bootup - so we take precaution of not doing that on such older installations. Feature: - Register a pfn_is_ram helper to speed up reading of /proc/vmcore. Bug-fixes: - Three pvops call for Xen were undefined causing BUG_ONs. - Add a quirk so that the shutdown watches (used by kdump) are not used with older Xen (3.4). - Fix ungraceful state transition for the HVC console." * tag 'stable/for-linus-3.7-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen/pv-on-hvm kexec: add quirk for Xen 3.4 and shutdown watches. xen/bootup: allow {read|write}_cr8 pvops call. xen/bootup: allow read_tscp call for Xen PV guests. xen pv-on-hvm: add pfn_is_ram helper for kdump xen/hvc: handle backend CLOSED without CLOSING
2012-10-12Merge branch 'timers-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer core update from Thomas Gleixner: - Bug fixes (one for a longstanding dead loop issue) - Rework of time related vsyscalls - Alarm timer updates - Jiffies updates to remove compile time dependencies * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: timekeeping: Cast raw_interval to u64 to avoid shift overflow timers: Fix endless looping between cascade() and internal_add_timer() time/jiffies: bring back unconditional LATCH definition time: Convert x86_64 to using new update_vsyscall time: Only do nanosecond rounding on GENERIC_TIME_VSYSCALL_OLD systems time: Introduce new GENERIC_TIME_VSYSCALL time: Convert CONFIG_GENERIC_TIME_VSYSCALL to CONFIG_GENERIC_TIME_VSYSCALL_OLD time: Move update_vsyscall definitions to timekeeper_internal.h time: Move timekeeper structure to timekeeper_internal.h for vsyscall changes jiffies: Remove compile time assumptions about CLOCK_TICK_RATE jiffies: Kill unused TICK_USEC_TO_NSEC alarmtimer: Rename alarmtimer_remove to alarmtimer_dequeue alarmtimer: Remove unused helpers & defines alarmtimer: Use hrtimer per-alarm instead of per-base alarmtimer: Implement minimum alarm interval for allowing suspend
2012-10-12xen/bootup: allow {read|write}_cr8 pvops call.Konrad Rzeszutek Wilk
We actually do not do anything about it. Just return a default value of zero and if the kernel tries to write anything but 0 we BUG_ON. This fixes the case when an user tries to suspend the machine and it blows up in save_processor_state b/c 'read_cr8' is set to NULL and we get: kernel BUG at /home/konrad/ssd/linux/arch/x86/include/asm/paravirt.h:100! invalid opcode: 0000 [#1] SMP Pid: 2687, comm: init.late Tainted: G O 3.6.0upstream-00002-gac264ac-dirty #4 Bochs Bochs RIP: e030:[<ffffffff814d5f42>] [<ffffffff814d5f42>] save_processor_state+0x212/0x270 .. snip.. Call Trace: [<ffffffff810733bf>] do_suspend_lowlevel+0xf/0xac [<ffffffff8107330c>] ? x86_acpi_suspend_lowlevel+0x10c/0x150 [<ffffffff81342ee2>] acpi_suspend_enter+0x57/0xd5 CC: stable@vger.kernel.org Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-10-12xen/bootup: allow read_tscp call for Xen PV guests.Konrad Rzeszutek Wilk
The hypervisor will trap it. However without this patch, we would crash as the .read_tscp is set to NULL. This patch fixes it and sets it to the native_read_tscp call. CC: stable@vger.kernel.org Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-10-12kgdb,x86: fix warning about unused variableJason Wessel
When compiling without CONFIG_DEBUG_RODATA the following compiler warning is generated: arch/x86/kernel/kgdb.c: In function 'kgdb_arch_set_breakpoint': arch/x86/kernel/kgdb.c:749: warning: unused variable 'opc' The variable instantiation needs to be inside the #ifdef to make the warning go away. Reported-by: Thiago Rafael Becker <trbecker@trbecker.org> Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
2012-10-12Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal Pull pile 2 of execve and kernel_thread unification work from Al Viro: "Stuff in there: kernel_thread/kernel_execve/sys_execve conversions for several more architectures plus assorted signal fixes and cleanups. There'll be more (in particular, real fixes for the alpha do_notify_resume() irq mess)..." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal: (43 commits) alpha: don't open-code trace_report_syscall_{enter,exit} Uninclude linux/freezer.h m32r: trim masks avr32: trim masks tile: don't bother with SIGTRAP in setup_frame microblaze: don't bother with SIGTRAP in setup_rt_frame() mn10300: don't bother with SIGTRAP in setup_frame() frv: no need to raise SIGTRAP in setup_frame() x86: get rid of duplicate code in case of CONFIG_VM86 unicore32: remove pointless test h8300: trim _TIF_WORK_MASK parisc: decide whether to go to slow path (tracesys) based on thread flags parisc: don't bother looping in do_signal() parisc: fix double restarts bury the rest of TIF_IRET sanitize tsk_is_polling() bury _TIF_RESTORE_SIGMASK unicore32: unobfuscate _TIF_WORK_MASK mips: NOTIFY_RESUME is not needed in TIF masks mips: merge the identical "return from syscall" per-ABI code ... Conflicts: arch/arm/include/asm/thread_info.h
2012-10-10Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal Pull generic execve() changes from Al Viro: "This introduces the generic kernel_thread() and kernel_execve() functions, and switches x86, arm, alpha, um and s390 over to them." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal: (26 commits) s390: convert to generic kernel_execve() s390: switch to generic kernel_thread() s390: fold kernel_thread_helper() into ret_from_fork() s390: fold execve_tail() into start_thread(), convert to generic sys_execve() um: switch to generic kernel_thread() x86, um/x86: switch to generic sys_execve and kernel_execve x86: split ret_from_fork alpha: introduce ret_from_kernel_execve(), switch to generic kernel_execve() alpha: switch to generic kernel_thread() alpha: switch to generic sys_execve() arm: get rid of execve wrapper, switch to generic execve() implementation arm: optimized current_pt_regs() arm: introduce ret_from_kernel_execve(), switch to generic kernel_execve() arm: split ret_from_fork, simplify kernel_thread() [based on patch by rmk] generic sys_execve() generic kernel_execve() new helper: current_pt_regs() preparation for generic kernel_thread() um: kill thread->forking um: let signal_delivered() do SIGTRAP on singlestepping into handler ...
2012-10-10Merge branch 'for-linus-37rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml Pull UML changes from Richard Weinberger: "UML receives this time only cleanups. The most outstanding change is the 'include "foo.h"' do 'include <foo.h>' conversion done by Al Viro. It touches many files, that's why the diffstat is rather big." * 'for-linus-37rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml: typo in UserModeLinux-HOWTO hppfs: fix the return value of get_inode() hostfs: drop vmtruncate um: get rid of pointless include "..." where include <...> will do um: move sysrq.h out of include/shared um/x86: merge 32 and 64 bit variants of ptrace.h um/x86: merge 32 and 64bit variants of checksum.h
2012-10-09um: get rid of pointless include "..." where include <...> will doAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Richard Weinberger <richard@nod.at>
2012-10-09um: move sysrq.h out of include/sharedAl Viro
never used by userland-side objects Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Richard Weinberger <richard@nod.at>
2012-10-09um/x86: merge 32 and 64 bit variants of ptrace.hAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Richard Weinberger <richard@nod.at>
2012-10-09um/x86: merge 32 and 64bit variants of checksum.hAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Richard Weinberger <richard@nod.at>
2012-10-09Merge branch 'akpm' (Andrew's patch-bomb)Linus Torvalds
Merge patches from Andrew Morton: "A few misc things and very nearly all of the MM tree. A tremendous amount of stuff (again), including a significant rbtree library rework." * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (160 commits) sparc64: Support transparent huge pages. mm: thp: Use more portable PMD clearing sequenece in zap_huge_pmd(). mm: Add and use update_mmu_cache_pmd() in transparent huge page code. sparc64: Document PGD and PMD layout. sparc64: Eliminate PTE table memory wastage. sparc64: Halve the size of PTE tables sparc64: Only support 4MB huge pages and 8KB base pages. memory-hotplug: suppress "Trying to free nonexistent resource <XXXXXXXXXXXXXXXX-YYYYYYYYYYYYYYYY>" warning mm: memcg: clean up mm_match_cgroup() signature mm: document PageHuge somewhat mm: use %pK for /proc/vmallocinfo mm, thp: fix mlock statistics mm, thp: fix mapped pages avoiding unevictable list on mlock memory-hotplug: update memory block's state and notify userspace memory-hotplug: preparation to notify memory block's state at memory hot remove mm: avoid section mismatch warning for memblock_type_name make GFP_NOTRACK definition unconditional cma: decrease cc.nr_migratepages after reclaiming pagelist CMA: migrate mlocked pages kpageflags: fix wrong KPF_THP on non-huge compound pages ...
2012-10-09mm: Add and use update_mmu_cache_pmd() in transparent huge page code.David Miller
The transparent huge page code passes a PMD pointer in as the third argument of update_mmu_cache(), which expects a PTE pointer. This never got noticed because X86 implements update_mmu_cache() as a macro and thus we don't get any type checking, and X86 is the only architecture which supports transparent huge pages currently. Before other architectures can support transparent huge pages properly we need to add a new interface which will take a PMD pointer as the third argument rather than a PTE pointer. [akpm@linux-foundation.org: implement update_mm_cache_pmd() for s390] Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09mm: thp: fix pmd_present for split_huge_page and PROT_NONE with THPAndrea Arcangeli
In many places !pmd_present has been converted to pmd_none. For pmds that's equivalent and pmd_none is quicker so using pmd_none is better. However (unless we delete pmd_present) we should provide an accurate pmd_present too. This will avoid the risk of code thinking the pmd is non present because it's under __split_huge_page_map, see the pmd_mknotpresent there and the comment above it. If the page has been mprotected as PROT_NONE, it would also lead to a pmd_present false negative in the same way as the race with split_huge_page. Because the PSE bit stays on at all times (both during split_huge_page and when the _PAGE_PROTNONE bit get set), we could only check for the PSE bit, but checking the PROTNONE bit too is still good to remember pmd_present must always keep PROT_NONE into account. This explains a not reproducible BUG_ON that was seldom reported on the lists. The same issue is in pmd_large, it would go wrong with both PROT_NONE and if it races with split_huge_page. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Acked-by: Rik van Riel <riel@redhat.com> Cc: Johannes Weiner <jweiner@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Mel Gorman <mgorman@suse.de> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09readahead: fault retry breaks mmap file read random detectionShaohua Li
.fault now can retry. The retry can break state machine of .fault. In filemap_fault, if page is miss, ra->mmap_miss is increased. In the second try, since the page is in page cache now, ra->mmap_miss is decreased. And these are done in one fault, so we can't detect random mmap file access. Add a new flag to indicate .fault is tried once. In the second try, skip ra->mmap_miss decreasing. The filemap_fault state machine is ok with it. I only tested x86, didn't test other archs, but looks the change for other archs is obvious, but who knows :) Signed-off-by: Shaohua Li <shaohua.li@fusionio.com> Cc: Rik van Riel <riel@redhat.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09atomic: implement generic atomic_dec_if_positive()Shaohua Li
The x86 implementation of atomic_dec_if_positive is quite generic, so make it available to all architectures. This is needed for "swap: add a simple detector for inappropriate swapin readahead". [akpm@linux-foundation.org: do the "#define foo foo" trick in the conventional manner] Signed-off-by: Shaohua Li <shli@fusionio.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: "David S. Miller" <davem@davemloft.net> Cc: Rik van Riel <riel@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Michal Simek <monstr@monstr.eu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09rbtree: move augmented rbtree functionality to rbtree_augmented.hMichel Lespinasse
Provide rb_insert_augmented() and rb_erase_augmented() through a new rbtree_augmented.h include file. rb_erase_augmented() is defined there as an __always_inline function, in order to allow inlining of augmented rbtree callbacks into it. Since this generates a relatively large function, each augmented rbtree user should make sure to have a single call site. Signed-off-by: Michel Lespinasse <walken@google.com> Cc: Rik van Riel <riel@redhat.com> Cc: Hillf Danton <dhillf@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09mm: replace vma prio_tree with an interval treeMichel Lespinasse
Implement an interval tree as a replacement for the VMA prio_tree. The algorithms are similar to lib/interval_tree.c; however that code can't be directly reused as the interval endpoints are not explicitly stored in the VMA. So instead, the common algorithm is moved into a template and the details (node type, how to get interval endpoints from the node, etc) are filled in using the C preprocessor. Once the interval tree functions are available, using them as a replacement to the VMA prio tree is a relatively simple, mechanical job. Signed-off-by: Michel Lespinasse <walken@google.com> Cc: Rik van Riel <riel@redhat.com> Cc: Hillf Danton <dhillf@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>