summaryrefslogtreecommitdiffstats
path: root/arch/x86/kernel/cpu
AgeCommit message (Collapse)Author
2014-01-20Merge branch 'x86-kaslr-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 kernel address space randomization support from Peter Anvin: "This enables kernel address space randomization for x86" * 'x86-kaslr-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, kaslr: Clarify RANDOMIZE_BASE_MAX_OFFSET x86, kaslr: Remove unused including <linux/version.h> x86, kaslr: Use char array to gain sizeof sanity x86, kaslr: Add a circular multiply for better bit diffusion x86, kaslr: Mix entropy sources together as needed x86/relocs: Add percpu fixup for GNU ld 2.23 x86, boot: Rename get_flags() and check_flags() to *_cpuflags() x86, kaslr: Raise the maximum virtual address to -1 GiB on x86_64 x86, kaslr: Report kernel offset on panic x86, kaslr: Select random position from e820 maps x86, kaslr: Provide randomness functions x86, kaslr: Return location from decompress_kernel x86, boot: Move CPU flags out of cpucheck x86, relocs: Add more per-cpu gold special cases
2014-01-20Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull leftover x86 fixes from Ingo Molnar: "Two leftover fixes that did not make it into v3.13" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: Add check for number of available vectors before CPU down x86, cpu, amd: Add workaround for family 16h, erratum 793
2014-01-20Merge branch 'x86-ras-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 RAS changes from Ingo Molnar: - SCI reporting for other error types not only correctable ones - GHES cleanups - Add the functionality to override error reporting agents as some machines are sporting a new extended error logging capability which, if done properly in the BIOS, makes a corresponding EDAC module redundant - PCIe AER tracepoint severity levels fix - Error path correction for the mce device init - MCE timer fix - Add more flexibility to the error injection (EINJ) debugfs interface * 'x86-ras-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, mce: Fix mce_start_timer semantics ACPI, APEI, GHES: Cleanup ghes memory error handling ACPI, APEI: Cleanup alignment-aware accesses ACPI, APEI, GHES: Do not report only correctable errors with SCI ACPI, APEI, EINJ: Changes to the ACPI/APEI/EINJ debugfs interface ACPI, eMCA: Combine eMCA/EDAC event reporting priority EDAC, sb_edac: Modify H/W event reporting policy EDAC: Add an edac_report parameter to EDAC PCI, AER: Fix severity usage in aer trace event x86, mce: Call put_device on device_register failure
2014-01-20Merge branch 'x86-microcode-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 microcode loader updates from Ingo Molnar: "There are two main changes in this tree: - AMD microcode early loading fixes - some microcode loader source files reorganization" * 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, microcode: Move to a proper location x86, microcode, AMD: Fix early ucode loading x86, microcode: Share native MSR accessing variants x86, ramdisk: Export relocated ramdisk VA
2014-01-20Merge branch 'x86-cpu-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 TLB detection update from Ingo Molnar: "A single change that extends our TLB cache size detection+reporting code" * 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, cpu: Detect more TLB configuration
2014-01-20Merge branch 'x86-cleanups-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cleanups from Ingo Molnar: "Misc cleanups" * 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, cpu, amd: Fix a shadowed variable situation um, x86: Fix vDSO build x86: Delete non-required instances of include <linux/init.h> x86, realmode: Pointer walk cleanups, pull out invariant use of __pa() x86/traps: Clean up error exception handler definitions
2014-01-20Merge branch 'sched-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler changes from Ingo Molnar: - Add the initial implementation of SCHED_DEADLINE support: a real-time scheduling policy where tasks that meet their deadlines and periodically execute their instances in less than their runtime quota see real-time scheduling and won't miss any of their deadlines. Tasks that go over their quota get delayed (Available to privileged users for now) - Clean up and fix preempt_enable_no_resched() abuse all around the tree - Do sched_clock() performance optimizations on x86 and elsewhere - Fix and improve auto-NUMA balancing - Fix and clean up the idle loop - Apply various cleanups and fixes * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (60 commits) sched: Fix __sched_setscheduler() nice test sched: Move SCHED_RESET_ON_FORK into attr::sched_flags sched: Fix up attr::sched_priority warning sched: Fix up scheduler syscall LTP fails sched: Preserve the nice level over sched_setscheduler() and sched_setparam() calls sched/core: Fix htmldocs warnings sched/deadline: No need to check p if dl_se is valid sched/deadline: Remove unused variables sched/deadline: Fix sparse static warnings m68k: Fix build warning in mac_via.h sched, thermal: Clean up preempt_enable_no_resched() abuse sched, net: Fixup busy_loop_us_clock() sched, net: Clean up preempt_enable_no_resched() abuse sched/preempt: Fix up missed PREEMPT_NEED_RESCHED folding sched/preempt, locking: Rework local_bh_{dis,en}able() sched/clock, x86: Avoid a runtime condition in native_sched_clock() sched/clock: Fix up clear_sched_clock_stable() sched/clock, x86: Use a static_key for sched_clock_stable sched/clock: Remove local_irq_disable() from the clocks sched/clock, x86: Rewrite cyc2ns() to avoid the need to disable IRQs ...
2014-01-16Merge branch 'perf/urgent' into perf/coreIngo Molnar
Pick up the latest fixes, refresh the development tree. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-16perf/x86/amd/ibs: Fix waking up from S3 for AMD family 10hRobert Richter
On AMD family 10h we see following error messages while waking up from S3 for all non-boot CPUs leading to a failed IBS initialization: Enabling non-boot CPUs ... smpboot: Booting Node 0 Processor 1 APIC 0x1 [Firmware Bug]: cpu 1, try to use APIC500 (LVT offset 0) for vector 0x400, but the register is already in use for vector 0xf9 on another cpu perf: IBS APIC setup failed on cpu #1 process: Switch to broadcast mode on CPU1 CPU1 is up ... ACPI: Waking up from system sleep state S3 Reason for this is that during suspend the LVT offset for the IBS vector gets lost and needs to be reinialized while resuming. The offset is read from the IBSCTL msr. On family 10h the offset needs to be 1 as offset 0 is used for the MCE threshold interrupt, but firmware assings it for IBS to 0 too. The kernel needs to reprogram the vector. The msr is a readonly node msr, but a new value can be written via pci config space access. The reinitialization is implemented for family 10h in setup_ibs_ctl() which is forced during IBS setup. This patch fixes IBS setup after waking up from S3 by adding resume/supend hooks for the boot cpu which does the offset reinitialization. Marking it as stable to let distros pick up this fix. Signed-off-by: Robert Richter <rric@kernel.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: <stable@vger.kernel.org> v3.2.. Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1389797849-5565-1-git-send-email-rric.net@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-15x86, cpu, amd: Fix a shadowed variable situationBorislav Petkov
Having u32 and struct cpuinfo_x86 * by the same name is not very smart, although it was ok in this case due to the limited scope of u32 c and it being used only once in there. Fix this. Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/1389786735-16751-1-git-send-email-bp@alien8.de Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2014-01-14x86, cpu, amd: Add workaround for family 16h, erratum 793Borislav Petkov
This adds the workaround for erratum 793 as a precaution in case not every BIOS implements it. This addresses CVE-2013-6885. Erratum text: [Revision Guide for AMD Family 16h Models 00h-0Fh Processors, document 51810 Rev. 3.04 November 2013] 793 Specific Combination of Writes to Write Combined Memory Types and Locked Instructions May Cause Core Hang Description Under a highly specific and detailed set of internal timing conditions, a locked instruction may trigger a timing sequence whereby the write to a write combined memory type is not flushed, causing the locked instruction to stall indefinitely. Potential Effect on System Processor core hang. Suggested Workaround BIOS should set MSR C001_1020[15] = 1b. Fix Planned No fix planned [ hpa: updated description, fixed typo in MSR name ] Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/20140114230711.GS29865@pd.tnic Tested-by: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-01-13x86, microcode: Move to a proper locationBorislav Petkov
We've grown a bunch of microcode loader files all prefixed with "microcode_". They should be under cpu/ because this is strictly CPU-related functionality so do that and drop the prefix since they're in their own directory now which gives that prefix. :) While at it, drop MICROCODE_INTEL_LIB config item and stash the functionality under CONFIG_MICROCODE_INTEL as it was its only user. Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
2014-01-13sched/clock, x86: Use a static_key for sched_clock_stablePeter Zijlstra
In order to avoid the runtime condition and variable load turn sched_clock_stable into a static_key. Also provide a shorter implementation of local_clock() and cpu_clock(int) when sched_clock_stable==1. MAINLINE PRE POST sched_clock_stable: 1 1 1 (cold) sched_clock: 329841 221876 215295 (cold) local_clock: 301773 234692 220773 (warm) sched_clock: 38375 25602 25659 (warm) local_clock: 100371 33265 27242 (warm) rdtsc: 27340 24214 24208 sched_clock_stable: 0 0 0 (cold) sched_clock: 382634 235941 237019 (cold) local_clock: 396890 297017 294819 (warm) sched_clock: 38194 25233 25609 (warm) local_clock: 143452 71234 71232 (warm) rdtsc: 27345 24245 24243 Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/n/tip-eummbdechzz37mwmpags1gjr@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-13sched/clock, x86: Rewrite cyc2ns() to avoid the need to disable IRQsPeter Zijlstra
Use a ring-buffer like multi-version object structure which allows always having a coherent object; we use this to avoid having to disable IRQs while reading sched_clock() and avoids a problem when getting an NMI while changing the cyc2ns data. MAINLINE PRE POST sched_clock_stable: 1 1 1 (cold) sched_clock: 329841 331312 257223 (cold) local_clock: 301773 310296 309889 (warm) sched_clock: 38375 38247 25280 (warm) local_clock: 100371 102713 85268 (warm) rdtsc: 27340 27289 24247 sched_clock_stable: 0 0 0 (cold) sched_clock: 382634 372706 301224 (cold) local_clock: 396890 399275 399870 (warm) sched_clock: 38194 38124 25630 (warm) local_clock: 143452 148698 129629 (warm) rdtsc: 27345 27365 24307 Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/n/tip-s567in1e5ekq2nlyhn8f987r@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-12Merge tag 'ras_for_3.14_p2' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras into x86/ras Pull RAS updates from Borislav Petkov: " SCI reporting for other error types not only correctable ones + APEI GHES cleanups + mce timer fix " Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-12Merge tag 'v3.13-rc8' into x86/ras, to pick up fixes.Ingo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-12x86, mce: Fix mce_start_timer semanticsBorislav Petkov
So mce_start_timer() has a 'cpu' argument which is supposed to mean to start a timer on that cpu. However, the code currently starts a timer on the *current* cpu the function runs on and causes the sanity-check in mce_timer_fn to fire: WARNING: CPU: 0 PID: 0 at arch/x86/kernel/cpu/mcheck/mce.c:1286 mce_timer_fn because it is running on the wrong cpu. This was triggered by Prarit Bhargava <prarit@redhat.com> by offlining all the cpus in succession. Then, we were fiddling with the CMCI storm settings when starting the timer whereas there's no need for that - if there's storm happening on this newly restarted cpu, we're going to be in normal CMCI mode initially and then when the CMCI interrupt starts firing, we're going to go to the polling mode with the timer real soon. Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Prarit Bhargava <prarit@redhat.com> Cc: Tony Luck <tony.luck@intel.com> Reviewed-by: Chen, Gong <gong.chen@linux.intel.com> Link: http://lkml.kernel.org/r/1387722156-5511-1-git-send-email-prarit@redhat.com
2014-01-12perf/x86/intel: Add Intel RAPL PP1 energy counter supportStephane Eranian
This patch adds support for the Intel RAPL energy counter PP1 (Power Plane 1). On client processors, it usually corresponds to the energy consumption of the builtin graphic card. That is why the sysfs event is called energy-gpu. New event: - name: power/energy-gpu/ - code: event=0x4 - unit: 2^-32 Joules On processors without graphics, this should count 0. The patch only enables this event on client processors. Reviewed-by: Maria Dimakopoulou <maria.n.dimakopoulou@gmail.com> Signed-off-by: Stephane Eranian <eranian@google.com> Cc: ak@linux.intel.com Cc: acme@redhat.com Cc: jolsa@redhat.com Cc: zheng.z.yan@intel.com Cc: bp@alien8.de Cc: vincent.weaver@maine.edu Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1389176153-3128-3-git-send-email-eranian@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-06x86: Delete non-required instances of include <linux/init.h>Paul Gortmaker
None of these files are actually using any __init type directives and hence don't need to include <linux/init.h>. Most are just a left over from __devinit and __cpuinit removal, or simply due to code getting copied from one driver to the next. [ hpa: undid incorrect removal from arch/x86/kernel/head_32.S ] Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Link: http://lkml.kernel.org/r/1389054026-12947-1-git-send-email-paul.gortmaker@windriver.com Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2014-01-03x86, cpu: Detect more TLB configurationKirill A. Shutemov
The Intel Software Developer’s Manual covers few more TLB configurations exposed as CPUID 2 descriptors: 61H Instruction TLB: 4 KByte pages, fully associative, 48 entries 63H Data TLB: 1 GByte pages, 4-way set associative, 4 entries 76H Instruction TLB: 2M/4M pages, fully associative, 8 entries B5H Instruction TLB: 4KByte pages, 8-way set associative, 64 entries B6H Instruction TLB: 4KByte pages, 8-way set associative, 128 entries C1H Shared 2nd-Level TLB: 4 KByte/2MByte pages, 8-way associative, 1024 entries C2H DTLB DTLB: 2 MByte/$MByte pages, 4-way associative, 16 entries Let's detect them as well. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Link: http://lkml.kernel.org/r/1387801018-14499-1-git-send-email-kirill.shutemov@linux.intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-12-29Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Peter Anvin: "There is a small EFI fix and a big power regression fix in this batch. My queue also had a fix for downing a CPU when there are insufficient number of IRQ vectors available, but I'm holding that one for now due to recent bug reports" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/efi: Don't select EFI from certain special ACPI drivers x86 idle: Repair large-server 50-watt idle-power regression
2013-12-21ACPI, APEI, GHES: Do not report only correctable errors with SCIChen, Gong
Currently SCI is employed to handle corrected errors - memory corrected errors, more specifically but in fact SCI still can be used to handle any errors, e.g. uncorrected or even fatal ones if enabled by the BIOS. Enable logging for those kinds of errors too. Signed-off-by: Chen, Gong <gong.chen@linux.intel.com> Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/1385363701-12387-1-git-send-email-gong.chen@linux.intel.com [ Boris: massage commit message, rename function arg. ] Signed-off-by: Borislav Petkov <bp@suse.de>
2013-12-19x86 idle: Repair large-server 50-watt idle-power regressionLen Brown
Linux 3.10 changed the timing of how thread_info->flags is touched: x86: Use generic idle loop (7d1a941731fabf27e5fb6edbebb79fe856edb4e5) This caused Intel NHM-EX and WSM-EX servers to experience a large number of immediate MONITOR/MWAIT break wakeups, which caused cpuidle to demote from deep C-states to shallow C-states, which caused these platforms to experience a significant increase in idle power. Note that this issue was already present before the commit above, however, it wasn't seen often enough to be noticed in power measurements. Here we extend an errata workaround from the Core2 EX "Dunnington" to extend to NHM-EX and WSM-EX, to prevent these immediate returns from MWAIT, reducing idle power on these platforms. While only acpi_idle ran on Dunnington, intel_idle may also run on these two newer systems. As of today, there are no other models that are known to need this tweak. Link: http://lkml.kernel.org/r/CAJvTdK=%2BaNN66mYpCGgbHGCHhYQAKx-vB0kJSWjVpsNb_hOAtQ@mail.gmail.com Signed-off-by: Len Brown <len.brown@intel.com> Link: http://lkml.kernel.org/r/baff264285f6e585df757d58b17788feabc68918.1387403066.git.len.brown@intel.com Cc: <stable@vger.kernel.org> # 3.12.x, 3.11.x, 3.10.x Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-12-17perf/x86: enable Haswell Celeron RAPL supportStephane Eranian
Enable RAPL support for Haswell Celeron (model 69). Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: ak@linux.intel.com Cc: acme@redhat.com Cc: jolsa@redhat.com Cc: zheng.z.yan@intel.com Cc: bp@alien8.de Cc: vincent.weaver@maine.edu Cc: maria.n.dimakopoulou@gmail.com Cc: peterz@infradead.org Link: http://lkml.kernel.org/r/1387225224-27799-2-git-send-email-eranian@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-12-05perf/x86: Fix constraint table end marker bugMaria Dimakopoulou
The EVENT_CONSTRAINT_END() macro defines the end marker as a constraint with a weight of zero. This was all fine until we blacklisted the corrupting memory events on Intel IvyBridge. These events are blacklisted by using a counter bitmask of zero. Thus, they also get a constraint weight of zero. The iteration macro: for_each_constraint tests the weight==0. Therefore, it was stopping at the first blacklisted event, i.e., 0xd0. The corrupting events were therefore considered as unconstrained and were scheduled on any of the generic counters. This patch fixes the end marker to have a weight of -1. With this, the blacklisted events get an empty constraint and cannot be scheduled which is what we want for now. Signed-off-by: Maria Dimakopoulou <maria.n.dimakopoulou@gmail.com> Reviewed-by: Stephane Eranian <eranian@google.com> Cc: peterz@infradead.org Cc: ak@linux.intel.com Cc: jolsa@redhat.com Cc: zheng.z.yan@intel.com Link: http://lkml.kernel.org/r/20131204232437.GA10689@starlight Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-30x86, mce: Call put_device on device_register failureLevente Kurusa
This patch adds a call to put_device() when the device_register() call has failed. This is required so that the last reference to the device is given up. Signed-off-by: Levente Kurusa <levex@linux.com> Link: http://lkml.kernel.org/r/5298F900.9000208@linux.com Signed-off-by: Borislav Petkov <bp@suse.de>
2013-11-27perf/x86: Add RAPL hrtimer supportStephane Eranian
The RAPL PMU counters do not interrupt on overflow. Therefore, the kernel needs to poll the counters to avoid missing an overflow. This patch adds the hrtimer code to do this. The timer interval is calculated at boot time based on the power unit used by the HW. There is one hrtimer per-cpu to handle the case of multiple simultaneous use across cores on the same package + hotplug CPU. Thanks to Maria Dimakopoulou for her contributions to this patch especially on the math aspects. Signed-off-by: Stephane Eranian <eranian@google.com> Reviewed-by: Maria Dimakopoulou <maria.n.dimakopoulou@gmail.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> [ Applied 32-bit build fix. ] Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: acme@redhat.com Cc: jolsa@redhat.com Cc: zheng.z.yan@intel.com Cc: bp@alien8.de Cc: maria.n.dimakopoulou@gmail.com Link: http://lkml.kernel.org/r/1384275531-10892-5-git-send-email-eranian@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-27perf/x86: Add Intel RAPL PMU supportStephane Eranian
This patch adds a new uncore PMU to expose the Intel RAPL energy consumption counters. Up to 3 counters, each counting a particular RAPL event are exposed. The RAPL counters are available on Intel SandyBridge, IvyBridge, Haswell. The server skus add a 3rd counter. The following events are available and exposed in sysfs: - power/energy-cores: power consumption of all cores on socket - power/energy-pkg: power consumption of all cores + LLc cache - power/energy-dram: power consumption of DRAM (servers only) For each event both the unit (Joules) and scale (2^-32 J) is exposed in sysfs for use by perf stat and other tools. The files are: /sys/devices/power/events/energy-*.unit /sys/devices/power/events/energy-*.scale The RAPL PMU is uncore by nature and is implemented such that it only works in system-wide mode. Measuring only one CPU per socket is sufficient. The /sys/devices/power/cpumask file can be used by tools to figure out which CPUs to monitor by default. For instance, on a 2-socket system, 2 CPUs (one on each socket) will be shown. All the counters measure in the same unit (exposed via sysfs). The perf_events API exposes all RAPL counters as 64-bit integers counting in unit of 1/2^32 Joules (about 0.23 nJ). User level tools must convert the counts by multiplying them by 2^-32 to obtain Joules. The reason for this is that the kernel avoids doing floating point math whenever possible because it is expensive (user floating-point state must be saved). The method used avoids kernel floating-point usage. There is no loss of precision. Thanks to PeterZ for suggesting this approach. To convert the raw count in Watt: W = C * 2.3 / (1e10 * time) or ldexp(C, -32). RAPL PMU is a new standalone PMU which registers with the perf_event core subsystem. The PMU type (attr->type) is dynamically allocated and is available from /sys/device/power/type. Sampling is not supported by the RAPL PMU. There is no privilege level filtering either. Signed-off-by: Stephane Eranian <eranian@google.com> Reviewed-by: Maria Dimakopoulou <maria.n.dimakopoulou@gmail.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: acme@redhat.com Cc: jolsa@redhat.com Cc: zheng.z.yan@intel.com Cc: bp@alien8.de Link: http://lkml.kernel.org/r/1384275531-10892-4-git-send-email-eranian@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-15Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull trivial tree updates from Jiri Kosina: "Usual earth-shaking, news-breaking, rocket science pile from trivial.git" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (23 commits) doc: usb: Fix typo in Documentation/usb/gadget_configs.txt doc: add missing files to timers/00-INDEX timekeeping: Fix some trivial typos in comments mm: Fix some trivial typos in comments irq: Fix some trivial typos in comments NUMA: fix typos in Kconfig help text mm: update 00-INDEX doc: Documentation/DMA-attributes.txt fix typo DRM: comment: `halve' -> `half' Docs: Kconfig: `devlopers' -> `developers' doc: typo on word accounting in kprobes.c in mutliple architectures treewide: fix "usefull" typo treewide: fix "distingush" typo mm/Kconfig: Grammar s/an/a/ kexec: Typo s/the/then/ Documentation/kvm: Update cpuid documentation for steal time and pv eoi treewide: Fix common typo in "identify" __page_to_pfn: Fix typo in comment Correct some typos for word frequency clk: fixed-factor: Fix a trivial typo ...
2013-11-12Merge branch 'x86-mce-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 RAS changes from Ingo Molnar: "The biggest change adds support for Intel 'CPER' (UEFI Common Platform Error Record) error logging, which builds upon an enhanced error logging mechanism available on Xeon processors. Full description is here: http://www.intel.com/content/www/us/en/architecture-and-technology/enhanced-mca-logging-xeon-paper.html This change provides a module (and support code) to check for an extended error log and prints extra details about the error on the console" * 'x86-mce-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: ACPI, x86: Fix extended error log driver to depend on CONFIG_X86_LOCAL_APIC dmi: Avoid unaligned memory access in save_mem_devices() Move cper.c from drivers/acpi/apei to drivers/firmware/efi EDAC, GHES: Update ghes error record info ACPI, APEI, CPER: Cleanup CPER memory error output format ACPI, APEI, CPER: Enhance memory reporting capability ACPI, APEI, CPER: Add UEFI 2.4 support for memory error DMI: Parse memory device (type 17) in SMBIOS ACPI, x86: Extended error log driver for x86 platform bitops: Introduce a more generic BITMASK macro ACPI, CPER: Update cper info ACPI, APEI, CPER: Fix status check during error printing
2013-11-12Merge branch 'x86-hyperv-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86/hyperv changes from Ingo Molnar: "These changes enable Linux guests to boot as 'Modern VM' guest kernels on MS-Hyperv hosts" * 'x86-hyperv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, hyperv: Move a variable to avoid an unused variable warning x86, hyperv: Fix build error due to missing <asm/apic.h> include x86, hyperv: Correctly guard the local APIC calibration code x86, hyperv: Get the local APIC timer frequency from the hypervisor
2013-11-12Merge branch 'x86-cpu-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cpu changes from Ingo Molnar: "The biggest change that stands out is the increase of the CONFIG_NR_CPUS range from 4096 to 8192 - as real hardware out there already went beyond 4k CPUs ... We only allow more than 512 CPUs if offstack cpumasks are enabled. CONFIG_MAXSMP=y remains to be the 'you are nuts!' extreme testcase, which now means a max of 8192 CPUs" * 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/cpu: Increase max CPU count to 8192 x86/cpu: Allow higher NR_CPUS values x86/cpu: Always print SMP information in /proc/cpuinfo x86/cpu: Track legacy CPU model data only on 32-bit kernels
2013-11-12Merge branch 'sched-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler changes from Ingo Molnar: "The main changes in this cycle are: - (much) improved CONFIG_NUMA_BALANCING support from Mel Gorman, Rik van Riel, Peter Zijlstra et al. Yay! - optimize preemption counter handling: merge the NEED_RESCHED flag into the preempt_count variable, by Peter Zijlstra. - wait.h fixes and code reorganization from Peter Zijlstra - cfs_bandwidth fixes from Ben Segall - SMP load-balancer cleanups from Peter Zijstra - idle balancer improvements from Jason Low - other fixes and cleanups" * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (129 commits) ftrace, sched: Add TRACE_FLAG_PREEMPT_RESCHED stop_machine: Fix race between stop_two_cpus() and stop_cpus() sched: Remove unnecessary iteration over sched domains to update nr_busy_cpus sched: Fix asymmetric scheduling for POWER7 sched: Move completion code from core.c to completion.c sched: Move wait code from core.c to wait.c sched: Move wait.c into kernel/sched/ sched/wait: Fix __wait_event_interruptible_lock_irq_timeout() sched: Avoid throttle_cfs_rq() racing with period_timer stopping sched: Guarantee new group-entities always have weight sched: Fix hrtimer_cancel()/rq->lock deadlock sched: Fix cfs_bandwidth misuse of hrtimer_expires_remaining sched: Fix race on toggling cfs_bandwidth_used sched: Remove extra put_online_cpus() inside sched_setaffinity() sched/rt: Fix task_tick_rt() comment sched/wait: Fix build breakage sched/wait: Introduce prepare_to_wait_event() sched/wait: Add ___wait_cond_timeout() to wait_event*_timeout() too sched: Remove get_online_cpus() usage sched: Fix race in migrate_swap_stop() ...
2013-11-06x86, hyperv: Move a variable to avoid an unused variable warningH. Peter Anvin
The variable hv_lapic_frequency causes an unused variable warning if CONFIG_X86_LOCAL_APIC is disabled. Since the variable is only used inside a small if statement, move the declaration of that variable into the if statement itself. Cc: K. Y. Srinivasan <kys@microsoft.com> Link: http://lkml.kernel.org/r/1381444224-3303-1-git-send-email-kys@microsoft.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-11-06perf/x86/intel: Add Ivy Bridge-EP uncore IRP box supportYan, Zheng
Unlike other uncore boxes, IRP boxes live in PCI buses with no UBOX device. For PCI bus without UBOX device, we find the next bus that has UBOX device and use its 'bus to socket' mapping. Besides the counter/control registers in IRP boxes are not properly aligned. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: eranian@google.com Cc: "Yan Zheng" <zheng.z.yan@intel.com> Link: http://lkml.kernel.org/r/1383197815-17706-2-git-send-email-zheng.z.yan@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-06perf/x86/intel/uncore: Add filter support for IvyBridge-EP QPI boxesYan, Zheng
The encoding for filter registers of IvyBridge-EP uncore QPI boxes is completely the same as SandyBridge-EP. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: eranian@google.com Cc: "Yan Zheng" <zheng.z.yan@intel.com> Link: http://lkml.kernel.org/r/1383197815-17706-1-git-send-email-zheng.z.yan@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-06perf: Fix arch_perf_out_copy_user defaultPeter Zijlstra
The arch_perf_output_copy_user() default of __copy_from_user_inatomic() returns bytes not copied, while all other argument functions given DEFINE_OUTPUT_COPY() return bytes copied. Since copy_from_user_nmi() is the odd duck out by returning bytes copied where all other *copy_{to,from}* functions return bytes not copied, change it over and ammend DEFINE_OUTPUT_COPY() to expect bytes not copied. Oddly enough DEFINE_OUTPUT_COPY() already returned bytes not copied while expecting its worker functions to return bytes copied. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Acked-by: will.deacon@arm.com Cc: Frederic Weisbecker <fweisbec@gmail.com> Link: http://lkml.kernel.org/r/20131030201622.GR16117@laptop.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-06x86/cpu: Always print SMP information in /proc/cpuinfoHATAYAMA Daisuke
Currently show_cpuinfo_core() displays cpu core information only if the number of threads per a whole cores is 2 or larger. However, this condition doesn't care about the number of sockets. For example, this condition doesn't hold on systems with two logical cpus consisting of two sockets and a single core on each socket - yet the topology information would be interesting to see in that case as well. I don't know whether or not there are processors in real world by which such configurations are possible, but at least on vitual machine environments, such configuration can occur, typically when no explicit SMP information is provided in advance. For example, on qemu/KVM, SMP information is specified via -smp command-line option, more specifically, its syntax is: -smp n[,cores=cores][,threads=threads][,sockets=sockets][,maxcpus=maxcpus] If this is not specified, qemu tells configuration with n-sockets, 1-core and 1-thread to the guest machine, on which guest, MP information is not displayed in /proc/cpuinfo. I saw this situation on VMWare guest environment, too. To fix this issue, this patch simply removes the condition because this information is useful even if there's only 1 thread. Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: H. Peter Anvin <hpa@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/5277D644.4090707@jp.fujitsu.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-06Merge tag 'v3.12' into x86/cpu, to refresh the branch before queueing up ↵Ingo Molnar
more changes Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-04Merge branch 'perf/urgent' into perf/core to fix conflictsIngo Molnar
Conflicts: tools/perf/bench/numa.c Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-30x86, hyperv: Fix build error due to missing <asm/apic.h> includeDavid Rientjes
9e7827b5ea4c ("x86, hyperv: Get the local APIC timer frequency from the hypervisor") breaks the build with some configs because apic.h isn't directly included: arch/x86/kernel/cpu/mshyperv.c: In function 'ms_hyperv_init_platform': arch/x86/kernel/cpu/mshyperv.c:90:3: error: 'lapic_timer_frequency' undeclared (first use in this function) arch/x86/kernel/cpu/mshyperv.c:90:3: note: each undeclared identifier is reported only once for each function it appears in Fix it by including asm/apic.h. Signed-off-by: David Rientjes <rientjes@google.com> Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1310111604160.31170@chino.kir.corp.google.com Acked-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-10-29perf/x86: Fix NMI measurementsPeter Zijlstra
OK, so what I'm actually seeing on my WSM is that sched/clock.c is 'broken' for the purpose we're using it for. What triggered it is that my WSM-EP is broken :-( [ 0.001000] tsc: Fast TSC calibration using PIT [ 0.002000] tsc: Detected 2533.715 MHz processor [ 0.500180] TSC synchronization [CPU#0 -> CPU#6]: [ 0.505197] Measured 3 cycles TSC warp between CPUs, turning off TSC clock. [ 0.004000] tsc: Marking TSC unstable due to check_tsc_sync_source failed For some reason it consistently detects TSC skew, even though NHM+ should have a single clock domain for 'reasonable' systems. This marks sched_clock_stable=0, which means that we do fancy stuff to try and get a 'sane' clock. Part of this fancy stuff relies on the tick, clearly that's gone when NOHZ=y. So for idle cpus time gets stuck, until it either wakes up or gets kicked by another cpu. While this is perfectly fine for the scheduler -- it only cares about actually running stuff, and when we're running stuff we're obviously not idle. This does somewhat break down for perf which can trigger events just fine on an otherwise idle cpu. So I've got NMIs get get 'measured' as taking ~1ms, which actually don't last nearly that long: <idle>-0 [013] d.h. 886.311970: rcu_nmi_enter <-do_nmi ... <idle>-0 [013] d.h. 886.311997: perf_sample_event_took: HERE!!! : 1040990 So ftrace (which uses sched_clock(), not the fancy bits) only sees ~27us, but we measure ~1ms !! Now since all this measurement stuff lives in x86 code, we can actually fix it. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: mingo@kernel.org Cc: dave.hansen@linux.intel.com Cc: eranian@google.com Cc: Don Zickus <dzickus@redhat.com> Cc: jmario@redhat.com Cc: acme@infradead.org Link: http://lkml.kernel.org/r/20131017133350.GG3364@laptop.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-26x86/cpu: Track legacy CPU model data only on 32-bit kernelsJan Beulich
struct cpu_dev's c_models is only ever set inside CONFIG_X86_32 conditionals (or code that's being built for 32-bit only), so there's no use of reserving the (empty) space for the model names in a 64-bit kernel. Similarly, c_size_cache is only used in the #else of a CONFIG_X86_64 conditional, so reserving space for (and in one case even initializing) that field is pointless for 64-bit kernels too. While moving both fields to the end of the structure, I also noticed that: - the c_models array size was one too small, potentially causing table_lookup_model() to return garbage on Intel CPUs (intel.c's instance was lacking the sentinel with family being zero), so the patch bumps that by one, - c_models' vendor sub-field was unused (and anyway redundant with the base structure's c_x86_vendor field), so the patch deletes it. Also rename the legacy fields so that their legacy nature stands out and comment their declarations. Signed-off-by: Jan Beulich <jbeulich@suse.com> Link: http://lkml.kernel.org/r/5265036802000078000FC4DB@nat28.tlf.novell.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-23ACPI, APEI, CPER: Add UEFI 2.4 support for memory errorChen, Gong
In latest UEFI spec(by now it is 2.4) memory error definition for CPER (UEFI 2.4 Appendix N Common Platform Error Record) adds some new fields. These fields help people to locate memory error to an actual DIMM location. Original-author: Tony Luck <tony.luck@intel.com> Signed-off-by: Chen, Gong <gong.chen@linux.intel.com> Reviewed-by: Borislav Petkov <bp@suse.de> Reviewed-by: Mauro Carvalho Chehab <m.chehab@samsung.com> Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
2013-10-16perf/x86: Optimize intel_pmu_pebs_fixup_ip()Peter Zijlstra
There's been reports of high NMI handler overhead, highlighted by such kernel messages: [ 3697.380195] perf samples too long (10009 > 10000), lowering kernel.perf_event_max_sample_rate to 13000 [ 3697.389509] INFO: NMI handler (perf_event_nmi_handler) took too long to run: 9.331 msecs Don Zickus analyzed the source of the overhead and reported: > While there are a few places that are causing latencies, for now I focused on > the longest one first. It seems to be 'copy_user_from_nmi' > > intel_pmu_handle_irq -> > intel_pmu_drain_pebs_nhm -> > __intel_pmu_drain_pebs_nhm -> > __intel_pmu_pebs_event -> > intel_pmu_pebs_fixup_ip -> > copy_from_user_nmi > > In intel_pmu_pebs_fixup_ip(), if the while-loop goes over 50, the sum of > all the copy_from_user_nmi latencies seems to go over 1,000,000 cycles > (there are some cases where only 10 iterations are needed to go that high > too, but in generall over 50 or so). At this point copy_user_from_nmi > seems to account for over 90% of the nmi latency. The solution to that is to avoid having to call copy_from_user_nmi() for every instruction. Since we already limit the max basic block size, we can easily pre-allocate a piece of memory to copy the entire thing into in one go. Don reported this test result: > Your patch made a huge difference in improvement. The > copy_from_user_nmi() no longer hits the million of cycles. I still > have a batch of 100,000-300,000 cycles. My longest NMI paths used > to be dominated by copy_from_user_nmi, now it is not (I have to dig > up the new hot path). Reported-and-tested-by: Don Zickus <dzickus@redhat.com> Cc: jmario@redhat.com Cc: acme@infradead.org Cc: dave.hansen@linux.intel.com Cc: eranian@google.com Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20131016105755.GX10651@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-14treewide: fix "distingush" typoMichael Opdenacker
Signed-off-by: Michael Opdenacker <michael.opdenacker@free-electrons.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2013-10-14treewide: Fix common typo in "identify"Maxime Jayat
Correct common misspelling of "identify" as "indentify" throughout the kernel Signed-off-by: Maxime Jayat <maxime@artisandeveloppeur.fr> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2013-10-13x86, kaslr: Provide randomness functionsKees Cook
Adds potential sources of randomness: RDRAND, RDTSC, or the i8254. This moves the pre-alternatives inline rdrand function into the header so both pieces of code can use it. Availability of RDRAND is then controlled by CONFIG_ARCH_RANDOM, if someone wants to disable it even for kASLR. Signed-off-by: Kees Cook <keescook@chromium.org> Link: http://lkml.kernel.org/r/1381450698-28710-4-git-send-email-keescook@chromium.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-10-11Merge branch 'core/urgent' into sched/coreIngo Molnar
Merge in asm goto fix, to be able to apply the asm/rmwcc.h fix. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-10x86, hyperv: Correctly guard the local APIC calibration codeK. Y. Srinivasan
The code that gets the local APIC timer frequency from the hypervisor rather depends on there being a local APIC. Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Link: http://lkml.kernel.org/r/1381444224-3303-1-git-send-email-kys@microsoft.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>