summaryrefslogtreecommitdiffstats
path: root/arch/x86/kernel/cpu
AgeCommit message (Collapse)Author
2012-01-26CPU: Introduce ARCH_HAS_CPU_AUTOPROBE and X86 partsThomas Renninger
This patch is based on Andi Kleen's work: Implement autoprobing/loading of modules serving CPU specific features (x86cpu autoloading). And Kay Siever's work to get rid of sysdev cpu structures and making use of struct device instead. Before, the cpuid driver had to be loaded to get the x86cpu autoloading feature. With this patch autoloading works through the /sys/devices/system/cpu object Cc: Kay Sievers <kay.sievers@vrfy.org> Cc: Dave Jones <davej@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Huang Ying <ying.huang@intel.com> Cc: Len Brown <lenb@kernel.org> Acked-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Thomas Renninger <trenn@suse.de> Acked-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2012-01-26X86: Introduce HW-Pstate scattered cpuid featureThomas Renninger
It is rather similar to CPB (boot capability) feature and exists since fam10h (can be looked up in AMD's BKDG). The feature is needed for powernow-k8 to cleanup init functions and to provide proper autoloading matching with the new x86cpu modalias feature. Cc: Kay Sievers <kay.sievers@vrfy.org> Cc: Dave Jones <davej@redhat.com> Cc: Borislav Petkov <bp@amd64.org> Signed-off-by: Thomas Renninger <trenn@suse.de> Acked-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2012-01-26Add driver auto probing for x86 features v4Andi Kleen
There's a growing number of drivers that support a specific x86 feature or CPU. Currently loading these drivers currently on a generic distribution requires various driver specific hacks and it often doesn't work. This patch adds auto probing for drivers based on the x86 cpuid information, in particular based on vendor/family/model number and also based on CPUID feature bits. For example a common issue is not loading the SSE 4.2 accelerated CRC module: this can significantly lower the performance of BTRFS which relies on fast CRC. Another issue is loading the right CPUFREQ driver for the current CPU. Currently distributions often try all all possible driver until one sticks, which is not really a good way to do this. It works with existing udev without any changes. The code exports the x86 information as a generic string in sysfs that can be matched by udev's pattern matching. This scheme does not support numeric ranges, so if you want to handle e.g. ranges of model numbers they have to be encoded in ASCII or simply all models or families listed. Fixing that would require changing udev. Another issue is that udev will happily load all drivers that match, there is currently no nice way to stop a specific driver from being loaded if it's not needed (e.g. if you don't need fast CRC) But there are not that many cpu specific drivers around and they're all not that bloated, so this isn't a particularly serious issue. Originally this patch added the modalias to the normal cpu sysdevs. However sysdevs don't have all the infrastructure needed for udev, so it couldn't really autoload drivers. This patch instead adds the CPU modaliases to the cpuid devices, which are real devices with full support for udev. This implies that the cpuid driver has to be loaded to use this. This patch just adds infrastructure, some driver conversions in followups. Thanks to Kay for helping with some sysfs magic. v2: Constifcation, some updates v4: (trenn@suse.de): - Use kzalloc instead of kmalloc to terminate modalias buffer - Use uppercase hex values to match correctly against hex values containing letters Cc: Dave Jones <davej@redhat.com> Cc: Kay Sievers <kay.sievers@vrfy.org> Cc: Jen Axboe <axboe@kernel.dk> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Huang Ying <ying.huang@intel.com> Cc: Len Brown <lenb@kernel.org> Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Thomas Renninger <trenn@suse.de> Acked-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2012-01-16mce: fix warning messages about static struct mce_deviceGreg Kroah-Hartman
When suspending, there was a large list of warnings going something like: Device 'machinecheck1' does not have a release() function, it is broken and must be fixed This patch turns the static mce_devices into dynamically allocated, and properly frees them when they are removed from the system. It solves the warning messages on my laptop here. Reported-by: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com> Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Tested-by: Djalal Harouni <tixxdz@opendz.org> Cc: Kay Sievers <kay.sievers@vrfy.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Borislav Petkov <bp@amd64.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-15Merge branch 'perf-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (39 commits) perf tools: Fix compile error on x86_64 Ubuntu perf report: Fix --stdio output alignment when --showcpuutilization used perf annotate: Get rid of field_sep check perf annotate: Fix usage string perf kmem: Fix a memory leak perf kmem: Add missing closedir() calls perf top: Add error message for EMFILE perf test: Change type of '-v' option to INCR perf script: Add missing closedir() calls tracing: Fix compile error when static ftrace is enabled recordmcount: Fix handling of elf64 big-endian objects. perf tools: Add const.h to MANIFEST to make perf-tar-src-pkg work again perf tools: Add support for guest/host-only profiling perf kvm: Do guest-only counting by default perf top: Don't update total_period on process_sample perf hists: Stop using 'self' for struct hist_entry perf hists: Rename total_session to total_period x86: Add counter when debug stack is used with interrupts enabled x86: Allow NMIs to hit breakpoints in i386 x86: Keep current stack in NMI breakpoints ...
2012-01-13x86/mce: Fix CPU hotplug and suspend regression related to MCESrivatsa S. Bhat
Commit 8a25a2fd126c ("cpu: convert 'cpu' and 'machinecheck' sysdev_class to a regular subsystem") changed how things are dealt with in the MCE subsystem. Some of the things that got broken due to this are CPU hotplug and suspend/hibernate. MCE uses per_cpu allocations of struct device. So, when a CPU goes offline and comes back online, in order to ensure that we start from a clean slate with respect to the MCE subsystem, zero out the entire per_cpu device structure to 0 before using it. Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-08Merge branch 'tip/perf/urgent' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace into perf/core
2012-01-07Merge branch 'driver-core-next' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core * 'driver-core-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (73 commits) arm: fix up some samsung merge sysdev conversion problems firmware: Fix an oops on reading fw_priv->fw in sysfs loading file Drivers:hv: Fix a bug in vmbus_driver_unregister() driver core: remove __must_check from device_create_file debugfs: add missing #ifdef HAS_IOMEM arm: time.h: remove device.h #include driver-core: remove sysdev.h usage. clockevents: remove sysdev.h arm: convert sysdev_class to a regular subsystem arm: leds: convert sysdev_class to a regular subsystem kobject: remove kset_find_obj_hinted() m86k: gpio - convert sysdev_class to a regular subsystem mips: txx9_sram - convert sysdev_class to a regular subsystem mips: 7segled - convert sysdev_class to a regular subsystem sh: dma - convert sysdev_class to a regular subsystem sh: intc - convert sysdev_class to a regular subsystem power: suspend - convert sysdev_class to a regular subsystem power: qe_ic - convert sysdev_class to a regular subsystem power: cmm - convert sysdev_class to a regular subsystem s390: time - convert sysdev_class to a regular subsystem ... Fix up conflicts with 'struct sysdev' removal from various platform drivers that got changed: - arch/arm/mach-exynos/cpu.c - arch/arm/mach-exynos/irq-eint.c - arch/arm/mach-s3c64xx/common.c - arch/arm/mach-s3c64xx/cpu.c - arch/arm/mach-s5p64x0/cpu.c - arch/arm/mach-s5pv210/common.c - arch/arm/plat-samsung/include/plat/cpu.h - arch/powerpc/kernel/sysfs.c and fix up cpu_is_hotpluggable() as per Greg in include/linux/cpu.h
2012-01-07Merge branch 'tip/x86/core-3' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace into perf/core
2012-01-06Merge branch 'x86-mce-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip * 'x86-mce-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: add IRQ context simulation in module mce-inject x86, mce, therm_throt: Don't report power limit and package level thermal throttle events in mcelog x86, MCE: Drain mcelog buffer x86, mce: Add wrappers for registering on the decode chain
2012-01-06Merge branch 'x86-cpu-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip * 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, centaur: Enable cx8 for VIA Eden too
2012-01-06Merge branch 'x86-cleanups-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip * 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: Use "do { } while(0)" for empty lock_cmos()/unlock_cmos() macros x86: Use "do { } while(0)" for empty flush_tlb_fix_spurious_fault() macro x86, CPU: Drop superfluous get_cpu_cap() prototype arch/x86/mm/pageattr.c: Quiet sparse noise; local functions should be static arch/x86/kernel/ptrace.c: Quiet sparse noise x86: Use kmemdup() in copy_thread(), rather than duplicating its implementation x86: Replace the EVT_TO_HPET_DEV() macro with an inline function
2012-01-06Merge branch 'x86-asm-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip * 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits) x86: Fix atomic64_xxx_cx8() functions x86: Fix and improve cmpxchg_double{,_local}() x86_64, asm: Optimise fls(), ffs() and fls64() x86, bitops: Move fls64.h inside __KERNEL__ x86: Fix and improve percpu_cmpxchg{8,16}b_double() x86: Report cpb and eff_freq_ro flags correctly x86/i386: Use less assembly in strlen(), speed things up a bit x86: Use the same node_distance for 32 and 64-bit x86: Fix rflags in FAKE_STACK_FRAME x86: Clean up and extend do_int3() x86: Call do_notify_resume() with interrupts enabled x86/div64: Add a micro-optimization shortcut if base is power of two x86-64: Cleanup some assembly entry points x86-64: Slightly shorten line system call entry and exit paths x86-64: Reduce amount of redundant code generated for invalidate_interruptNN x86-64: Slightly shorten int_ret_from_sys_call x86, efi: Convert efi_phys_get_time() args to physical addresses x86: Default to vsyscall=emulate x86-64: Set siginfo and context on vsyscall emulation faults x86: consolidate xchg and xadd macros ...
2012-01-06Merge branch 'x86-apic-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip * 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: Skip cpus with apic-ids >= 255 in !x2apic_mode x86, x2apic: Allow "nox2apic" to disable x2apic mode setup by BIOS x86, x2apic: Fallback to xapic when BIOS doesn't setup interrupt-remapping x86, acpi: Skip acpi x2apic entries if the x2apic feature is not present x86, apic: Add probe() for apic_flat x86: Simplify code by removing a !SMP #ifdefs from 'struct cpuinfo_x86' x86: Convert per-cpu counter icr_read_retry_count into a member of irq_stat x86: Add per-cpu stat counter for APIC ICR read tries pci, x86/io-apic: Allow PCI_IOAPIC to be user configurable on x86 x86: Fix the !CONFIG_NUMA build of the new CPU ID fixup code support x86: Add NumaChip support x86: Add x86_init platform override to fix up NUMA core numbering x86: Make flat_init_apic_ldr() available
2012-01-06Merge branch 'driver-core-next' into Linux 3.2Greg Kroah-Hartman
This resolves the conflict in the arch/arm/mach-s3c64xx/s3c6400.c file, and it fixes the build error in the arch/x86/kernel/microcode_core.c file, that the merge did not catch. The microcode_core.c patch was provided by Stephen Rothwell <sfr@canb.auug.org.au> who was invaluable in the merge issues involved with the large sysdev removal process in the driver-core tree. Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2012-01-06Merge branch 'perf-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (106 commits) perf kvm: Fix copy & paste error in description perf script: Kill script_spec__delete perf top: Fix a memory leak perf stat: Introduce get_ratio_color() helper perf session: Remove impossible condition check perf tools: Fix feature-bits rework fallout, remove unused variable perf script: Add generic perl handler to process events perf tools: Use for_each_set_bit() to iterate over feature flags perf tools: Unify handling of features when writing feature section perf report: Accept fifos as input file perf tools: Moving code in some files perf tools: Fix out-of-bound access to struct perf_session perf tools: Continue processing header on unknown features perf tools: Improve macros for struct feature_ops perf: builtin-record: Document and check that mmap_pages must be a power of two. perf: builtin-record: Provide advice if mmap'ing fails with EPERM. perf tools: Fix truncated annotation perf script: look up thread using tid instead of pid perf tools: Look up thread names for system wide profiling perf tools: Fix comm for processes with named threads ...
2012-01-06Merge branch 'core-rcu-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (64 commits) cpu: Export cpu_up() rcu: Apply ACCESS_ONCE() to rcu_boost() return value Revert "rcu: Permit rt_mutex_unlock() with irqs disabled" docs: Additional LWN links to RCU API rcu: Augment rcu_batch_end tracing for idle and callback state rcu: Add rcutorture tests for srcu_read_lock_raw() rcu: Make rcutorture test for hotpluggability before offlining CPUs driver-core/cpu: Expose hotpluggability to the rest of the kernel rcu: Remove redundant rcu_cpu_stall_suppress declaration rcu: Adaptive dyntick-idle preparation rcu: Keep invoking callbacks if CPU otherwise idle rcu: Irq nesting is always 0 on rcu_enter_idle_common rcu: Don't check irq nesting from rcu idle entry/exit rcu: Permit dyntick-idle with callbacks pending rcu: Document same-context read-side constraints rcu: Identify dyntick-idle CPUs on first force_quiescent_state() pass rcu: Remove dynticks false positives and RCU failures rcu: Reduce latency of rcu_prepare_for_idle() rcu: Eliminate RCU_FAST_NO_HZ grace-period hang rcu: Avoid needlessly IPIing CPUs at GP end ...
2012-01-04Merge commit 'v3.2-rc7' into x86/asmIngo Molnar
Merge reason: Update from -rc4 to -rc7. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-23perf/x86: Fix raw_spin_unlock_irqrestore() usageRobert Richter
Use raw_spin_unlock_irqrestore() as equivalent to raw_spin_lock_irqsave(). Signed-off-by: Robert Richter <robert.richter@amd.com> Cc: Stephane Eranian <eranian@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1324646665-13334-1-git-send-email-robert.richter@amd.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-21cpu: convert 'cpu' and 'machinecheck' sysdev_class to a regular subsystemKay Sievers
This moves the 'cpu sysdev_class' over to a regular 'cpu' subsystem and converts the devices to regular devices. The sysdev drivers are implemented as subsystem interfaces now. After all sysdev classes are ported to regular driver core entities, the sysdev implementation will be entirely removed from the kernel. Userspace relies on events and generic sysfs subsystem infrastructure from sysdev devices, which are made available with this conversion. Cc: Haavard Skinnemoen <hskinnemoen@gmail.com> Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Borislav Petkov <bp@amd64.org> Cc: Tigran Aivazian <tigran@aivazian.fsnet.co.uk> Cc: Len Brown <lenb@kernel.org> Cc: Zhang Rui <rui.zhang@intel.com> Cc: Dave Jones <davej@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Russell King <rmk+kernel@arm.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-12-21x86: Add counter when debug stack is used with interrupts enabledSteven Rostedt
Mathieu Desnoyers pointed out a case that can cause issues with NMIs running on the debug stack: int3 -> interrupt -> NMI -> int3 Because the interrupt changes the stack, the NMI will not see that it preempted the debug stack. Looking deeper at this case, interrupts only happen when the int3 is from userspace or in an a location in the exception table (fixup). userspace -> int3 -> interurpt -> NMI -> int3 All other int3s that happen in the kernel should be processed without ever enabling interrupts, as the do_trap() call will panic the kernel if it is called to process any other location within the kernel. Adding a counter around the sections that enable interrupts while using the debug stack allows the NMI to also check that case. If the NMI sees that it either interrupted a task using the debug stack or the debug counter is non-zero, then it will have to change the IDT table to make the int3 not change stacks (which will corrupt the stack if it does). Note, I had to move the debug_usage functions out of processor.h and into debugreg.h because of the static inlined functions to inc and dec the debug_usage counter. __get_cpu_var() requires smp.h which includes processor.h, and would fail to build. Link: http://lkml.kernel.org/r/1323976535.23971.112.camel@gandalf.stny.rr.com Reported-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: H. Peter Anvin <hpa@linux.intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Paul Turner <pjt@google.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-12-21x86: Keep current stack in NMI breakpointsSteven Rostedt
We want to allow NMI handlers to have breakpoints to be able to remove stop_machine from ftrace, kprobes and jump_labels. But if an NMI interrupts a current breakpoint, and then it triggers a breakpoint itself, it will switch to the breakpoint stack and corrupt the data on it for the breakpoint processing that it interrupted. Instead, have the NMI check if it interrupted breakpoint processing by checking if the stack that is currently used is a breakpoint stack. If it is, then load a special IDT that changes the IST for the debug exception to keep the same stack in kernel context. When the NMI is done, it puts it back. This way, if the NMI does trigger a breakpoint, it will keep using the same stack and not stomp on the breakpoint data for the breakpoint it interrupted. Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-12-21perf events: Add Intel x86 mapping for PERF_COUNT_HW_REF_CPU_CYCLESStephane Eranian
Add event maps for Intel x86 processors (with architected PMU v2 or later). On AMD, there is frequency scaling but no Turbo. There is no core cycle event not subject to frequency scaling, therefore we do not provide a mapping. Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1323559734-3488-4-git-send-email-eranian@google.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-21perf events: Enable raw event support for Intel unhalted_reference_cycles eventStephane Eranian
This patch adds the encoding and definitions necessary for the unhalted_reference_cycles event avaialble since Intel Core 2 processors. Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1323559734-3488-2-git-send-email-eranian@google.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-21x86: Simplify code by removing a !SMP #ifdefs from 'struct cpuinfo_x86'Kevin Winchester
Several fields in struct cpuinfo_x86 were not defined for the !SMP case, likely to save space. However, those fields still have some meaning for UP, and keeping them allows some #ifdef removal from other files. The additional size of the UP kernel from this change is not significant enough to worry about keeping up the distinction: text data bss dec hex filename 4737168 506459 972040 6215667 5ed7f3 vmlinux.o.before 4737444 506459 972040 6215943 5ed907 vmlinux.o.after for a difference of 276 bytes for an example UP config. If someone wants those 276 bytes back badly then it should be implemented in a cleaner way. Signed-off-by: Kevin Winchester <kjwinchester@gmail.com> Cc: Steffen Persvold <sp@numascale.com> Link: http://lkml.kernel.org/r/1324428742-12498-1-git-send-email-kjwinchester@gmail.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-20Merge commit 'v3.2-rc6' into perf/coreIngo Molnar
Merge reason: Update with the latest fixes. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-18Merge branch 'mce-inject' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras into x86/mce
2011-12-16x86: add IRQ context simulation in module mce-injectChen Gong
mce-inject provides a mechanism to simulate errors so that test scripts can check for correct operation of the kernel without requiring any specialized hardware to create rare events. The existing code can simulate events in normal process context and also in NMI context - but not in IRQ context. This patch fills that gap. Link: https://lkml.org/lkml/2011/12/7/537 Signed-off-by: Chen Gong <gong.chen@linux.intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
2011-12-15x86, centaur: Enable cx8 for VIA Eden tooTimo Teräs
My box with following cpuinfo needs the cx8 enabling still: vendor_id : CentaurHauls cpu family : 6 model : 13 model name : VIA Eden Processor 1200MHz stepping : 0 cpu MHz : 1199.940 cache size : 128 KB This fixes valgrind to work on my box (it requires and checks cx8 from cpuinfo). Signed-off-by: Timo Teräs <timo.teras@iki.fi> Link: http://lkml.kernel.org/r/1323961888-10223-1-git-send-email-timo.teras@iki.fi Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2011-12-15x86: Report cpb and eff_freq_ro flags correctlyJoerg Roedel
Add the flags to get rid of the [9] and [10] feature names in cpuinfo's 'power management' fields and replace them with meaningful names. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Link: http://lkml.kernel.org/r/1323875574-17881-1-git-send-email-joerg.roedel@amd.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-15Merge branch 'early-mce-decode' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp into x86/mce
2011-12-14x86, mce, therm_throt: Don't report power limit and package level thermal ↵Fenghua Yu
throttle events in mcelog Thermal throttle and power limit events are not defined as MCE errors in x86 architecture and should not generate MCE errors in mcelog. Current kernel generates fake software defined MCE errors for these events. This may confuse users because they may think the machine has real MCE errors while actually only thermal throttle or power limit events happen. To make it worse, buggy firmware on some platforms may falsely generate the events. Therefore, kernel reports MCE errors which users think as real hardware errors. Although the firmware bugs should be fixed, on the other hand, kernel should not report MCE errors either. So mcelog is not a good mechanism to report these events. To report the events, we count them in respective counters (core_power_limit_count, package_power_limit_count, core_throttle_count, and package_throttle_count) in /sys/devices/system/cpu/cpu#/thermal_throttle/. Users can check the counters for each event on each CPU. Please note that all CPU's on one package report duplicate counters. It's user application's responsibity to retrieve a package level counter for one package. This patch doesn't report package level power limit, core level power limit, and package level thermal throttle events in mcelog. When the events happen, only report them in respective counters in sysfs. Since core level thermal throttle has been legacy code in kernel for a while and users accepted it as MCE error in mcelog, core level thermal throttle is still reported in mcelog. In the mean time, the event is counted in a counter in sysfs as well. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Acked-by: Borislav Petkov <bp@amd64.org> Acked-by: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/20111215001945.GA21009@linux-os.sc.intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-12-14x86, MCE: Drain mcelog bufferBorislav Petkov
Add a function which drains whatever MCEs were logged in already during boot and before the decoder chains were registered. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-12-14x86, mce: Add wrappers for registering on the decode chainBorislav Petkov
No functionality change, this is done so that in a follow-on patch all queued-up MCEs can be decoded after registering on the chain. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-12-11x86: Call idle notifier after irq_enter()Frederic Weisbecker
Interrupts notify the idle exit state before calling irq_enter(). But the notifier code calls rcu_read_lock() and this is not allowed while rcu is in an extended quiescent state. We need to wait for irq_enter() -> rcu_idle_exit() to be called before doing so otherwise this results in a grumpy RCU: [ 0.099991] WARNING: at include/linux/rcupdate.h:194 __atomic_notifier_call_chain+0xd2/0x110() [ 0.099991] Hardware name: AMD690VM-FMH [ 0.099991] Modules linked in: [ 0.099991] Pid: 0, comm: swapper Not tainted 3.0.0-rc6+ #255 [ 0.099991] Call Trace: [ 0.099991] <IRQ> [<ffffffff81051c8a>] warn_slowpath_common+0x7a/0xb0 [ 0.099991] [<ffffffff81051cd5>] warn_slowpath_null+0x15/0x20 [ 0.099991] [<ffffffff817d6fa2>] __atomic_notifier_call_chain+0xd2/0x110 [ 0.099991] [<ffffffff817d6ff1>] atomic_notifier_call_chain+0x11/0x20 [ 0.099991] [<ffffffff81001873>] exit_idle+0x43/0x50 [ 0.099991] [<ffffffff81020439>] smp_apic_timer_interrupt+0x39/0xa0 [ 0.099991] [<ffffffff817da253>] apic_timer_interrupt+0x13/0x20 [ 0.099991] <EOI> [<ffffffff8100ae67>] ? default_idle+0xa7/0x350 [ 0.099991] [<ffffffff8100ae65>] ? default_idle+0xa5/0x350 [ 0.099991] [<ffffffff8100b19b>] amd_e400_idle+0x8b/0x110 [ 0.099991] [<ffffffff810cb01f>] ? rcu_enter_nohz+0x8f/0x160 [ 0.099991] [<ffffffff810019a0>] cpu_idle+0xb0/0x110 [ 0.099991] [<ffffffff817a7505>] rest_init+0xe5/0x140 [ 0.099991] [<ffffffff817a7468>] ? rest_init+0x48/0x140 [ 0.099991] [<ffffffff81cc5ca3>] start_kernel+0x3d1/0x3dc [ 0.099991] [<ffffffff81cc5321>] x86_64_start_reservations+0x131/0x135 [ 0.099991] [<ffffffff81cc5412>] x86_64_start_kernel+0xed/0xf4 Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Andy Henroid <andrew.d.henroid@intel.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2011-12-09x86, CPU: Drop superfluous get_cpu_cap() prototypeBorislav Petkov
The get_cpu_cap() external function prototype was declared twice so lose one of them. Clean up the header guard while at it. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Link: http://lkml.kernel.org/r/1322594083-14507-1-git-send-email-bp@amd64.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-06perf, x86: Expose perf capability to other modulesGleb Natapov
KVM needs to know perf capability to decide which PMU it can expose to a guest. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1320929850-10480-8-git-send-email-gleb@redhat.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-06perf, x86: Implement arch event mask as quirkPeter Zijlstra
Implement the disabling of arch events as a quirk so that we can print a message along with it. This creates some visibility into the problem space and could allow us to work on adding more work-around like the AAJ80 one. Requested-by: Ingo Molnar <mingo@elte.hu> Cc: Gleb Natapov <gleb@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-wcja2z48wklzu1b0nkz0a5y7@git.kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-06x86, perf: Disable non available architectural eventsGleb Natapov
Intel CPUs report non-available architectural events in cpuid leaf 0AH.EBX. Use it to disable events that are not available according to CPU. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1320929850-10480-7-git-send-email-gleb@redhat.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-06perf, x86: Prefer fixed-purpose counters when schedulingPeter Zijlstra
This avoids a scheduling failure for cases like: cycles, cycles, instructions, instructions (on Core2) Which would end up being programmed like: PMC0, PMC1, FP-instructions, fail Because all events will have the same weight. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-8tnwb92asqj7xajqqoty4gel@git.kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-06perf, x86: Fix event scheduler for constraints with overlapping countersRobert Richter
The current x86 event scheduler fails to resolve scheduling problems of certain combinations of events and constraints. This happens if the counter mask of such an event is not a subset of any other counter mask of a constraint with an equal or higher weight, e.g. constraints of the AMD family 15h pmu: counter mask weight amd_f15_PMC30 0x09 2 <--- overlapping counters amd_f15_PMC20 0x07 3 amd_f15_PMC53 0x38 3 The scheduler does not find then an existing solution. Here is an example: event code counter failure possible solution 0x02E PMC[3,0] 0 3 0x043 PMC[2:0] 1 0 0x045 PMC[2:0] 2 1 0x046 PMC[2:0] FAIL 2 The event scheduler may not select the correct counter in the first cycle because it needs to know which subsequent events will be scheduled. It may fail to schedule the events then. To solve this, we now save the scheduler state of events with overlapping counter counstraints. If we fail to schedule the events we rollback to those states and try to use another free counter. Constraints with overlapping counters are marked with a new introduced overlap flag. We set the overlap flag for such constraints to give the scheduler a hint which events to select for counter rescheduling. The EVENT_CONSTRAINT_OVERLAP() macro can be used for this. Care must be taken as the rescheduling algorithm is O(n!) which will increase scheduling cycles for an over-commited system dramatically. The number of such EVENT_CONSTRAINT_OVERLAP() macros and its counter masks must be kept at a minimum. Thus, the current stack is limited to 2 states to limit the number of loops the algorithm takes in the worst case. On systems with no overlapping-counter constraints, this implementation does not increase the loop count compared to the previous algorithm. V2: * Renamed redo -> overlap. * Reimplementation using perf scheduling helper functions. V3: * Added WARN_ON_ONCE() if out of save states. * Changed function interface of perf_sched_restore_state() to use bool as return value. Signed-off-by: Robert Richter <robert.richter@amd.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1321616122-1533-3-git-send-email-robert.richter@amd.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-06perf, x86: Implement event scheduler helper functionsRobert Richter
This patch introduces x86 perf scheduler code helper functions. We need this to later add more complex functionality to support overlapping counter constraints (next patch). The algorithm is modified so that the range of weight values is now generated from the constraints. There shouldn't be other functional changes. With the helper functions the scheduler is controlled. There are functions to initialize, traverse the event list, find unused counters etc. The scheduler keeps its own state. V3: * Added macro for_each_set_bit_cont(). * Changed functions interfaces of perf_sched_find_counter() and perf_sched_next_event() to use bool as return value. * Added some comments to make code better understandable. V4: * Fix broken event assignment if weight of the first event is not wmin (perf_sched_init()). Signed-off-by: Robert Richter <robert.richter@amd.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1321616122-1533-2-git-send-email-robert.richter@amd.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-06x86: Fix the !CONFIG_NUMA build of the new CPU ID fixup code supportSteffen Persvold
I used "ifdef CONFIG_NUMA" simply because it doesn't make sense in a non-numa configuration even with SMP enabled. Besides, the only place where it is called right now is in kernel/cpu/amd.c:srat_detect_node() within the "CONFIG_NUMA" protected part. Signed-off-by: Steffen Persvold <sp@numascale.com> Cc: Daniel J Blueman <daniel@numascale-asia.com> Cc: Jesse Barnes <jbarnes@virtuousgeek.org> Link: http://lkml.kernel.org/r/1323073238-32686-2-git-send-email-daniel@numascale-asia.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-05Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: intr_remapping: Fix section mismatch in ir_dev_scope_init() intel-iommu: Fix section mismatch in dmar_parse_rmrr_atsr_dev() x86, amd: Fix up numa_node information for AMD CPU family 15h model 0-0fh northbridge functions x86, AMD: Correct align_va_addr documentation x86/rtc, mrst: Don't register a platform RTC device for for Intel MID platforms x86/mrst: Battery fixes x86/paravirt: PTE updates in k(un)map_atomic need to be synchronous, regardless of lazy_mmu mode x86: Fix "Acer Aspire 1" reboot hang x86/mtrr: Resolve inconsistency with Intel processor manual x86: Document rdmsr_safe restrictions x86, microcode: Fix the failure path of microcode update driver init code Add TAINT_FIRMWARE_WORKAROUND on MTRR fixup x86/mpparse: Account for bus types other than ISA and PCI x86, mrst: Change the pmic_gpio device type to IPC mrst: Added some platform data for the SFI translations x86,mrst: Power control commands update x86/reboot: Blacklist Dell OptiPlex 990 known to require PCI reboot x86, UV: Fix UV2 hub part number x86: Add user_mode_vm check in stack_overflow_check
2011-12-05Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf: Fix loss of notification with multi-event perf, x86: Force IBS LVT offset assignment for family 10h perf, x86: Disable PEBS on SandyBridge chips trace_events_filter: Use rcu_assign_pointer() when setting ftrace_event_call->filter perf session: Fix crash with invalid CPU list perf python: Fix undefined symbol problem perf/x86: Enable raw event access to Intel offcore events perf: Don't use -ENOSPC for out of PMU resources perf: Do not set task_ctx pointer in cpuctx if there are no events in the context perf/x86: Fix PEBS instruction unwind oprofile, x86: Fix crash when unloading module (nmi timer mode) oprofile: Fix crash when unloading module (hr timer mode)
2011-12-05x86: Add x86_init platform override to fix up NUMA core numberingDaniel J Blueman
Add an x86_init vector for handling inconsistent core numbering. This is useful for multi-fabric platforms, such as Numascale NumaConnect. v2: - use struct x86_cpuinit_ops - provide default fall-back function to warn Signed-off-by: Daniel J Blueman <daniel@numascale-asia.com> Cc: Steffen Persvold <sp@numascale.com> Cc: Jesse Barnes <jbarnes@virtuousgeek.org> Link: http://lkml.kernel.org/r/1323073238-32686-2-git-send-email-daniel@numascale-asia.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-05x86/mtrr: Resolve inconsistency with Intel processor manualAjaykumar Hotchandani
Following is from Notes of section 11.5.3 of Intel processor manual available at: http://www.intel.com/Assets/PDF/manual/325384.pdf For the Pentium 4 and Intel Xeon processors, after the sequence of steps given above has been executed, the cache lines containing the code between the end of the WBINVD instruction and before the MTRRS have actually been disabled may be retained in the cache hierarchy. Here, to remove code from the cache completely, a second WBINVD instruction must be executed after the MTRRs have been disabled. This patch provides resolution for that. Ideally, I will like to make changes only for Pentium 4 and Xeon processors. But, I am not finding easier way to do it. And, extra wbinvd() instruction does not hurt much for other processors. Signed-off-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Lucas De Marchi <lucas.demarchi@profusion.mobi> Link: http://lkml.kernel.org/r/4EBD1CC5.3030008@oracle.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-05Add TAINT_FIRMWARE_WORKAROUND on MTRR fixupPrarit Bhargava
TAINT_FIRMWARE_WORKAROUND should be set when an MTRR fixup is done. Signed-off-by: Prarit Bhargava <prarit@redhat.com> Acked-by: David Rientjes <rientjes@google.com> Link: http://lkml.kernel.org/r/1318958650-12447-1-git-send-email-prarit@redhat.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-05perf, x86: Force IBS LVT offset assignment for family 10hRobert Richter
On AMD family 10h we see firmware bug messages like the following: [Firmware Bug]: cpu 6, try to use APIC500 (LVT offset 0) for vector 0x10400, but the register is already in use for vector 0xf9 on another cpu [Firmware Bug]: cpu 6, IBS interrupt offset 0 not available (MSRC001103A=0x0000000000000100) [Firmware Bug]: using offset 1 for IBS interrupts [Firmware Bug]: workaround enabled for IBS LVT offset perf: AMD IBS detected (0x00000007) We always see this, since the offsets are not assigned by the BIOS for this family. Force LVT offset assignment in this case. If the OS assignment fails, fallback to BIOS settings and try to setup this. The fallback to BIOS settings weakens the family check since force_ibs_eilvt_setup() may fail e.g. in case of virtual machines. But setup may still succeed if BIOS offsets are correct. Other families don't have a workaround implemented that assigns LVT offsets. It's ok, to drop calling force_ibs_eilvt_setup() for that families. With the patch the [Firmware Bug] messages vanish. We see now: IBS: LVT offset 1 assigned perf: AMD IBS detected (0x00000007) Signed-off-by: Robert Richter <robert.richter@amd.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20111109162225.GO12451@erda.amd.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-05perf, x86: Disable PEBS on SandyBridge chipsPeter Zijlstra
Cc: Stephane Eranian <eranian@google.com> Cc: stable@kernel.org Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>