summaryrefslogtreecommitdiffstats
path: root/arch/x86/kernel/smpboot.c
AgeCommit message (Collapse)Author
2012-06-15Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar. * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/smp: Fix topology checks on AMD MCM CPUs x86/mm: Fix some kernel-doc warnings x86, um: Correct syscall table type attributes breaking gcc 4.8
2012-06-13x86/smp: Fix topology checks on AMD MCM CPUsBorislav Petkov
The warning below triggers on AMD MCM packages because physical package IDs on the cores of a _physical_ socket are the same. I.e., this field says which CPUs belong to the same physical package. However, the same two CPUs belong to two different internal, i.e. "logical" nodes in the same physical socket which is reflected in the CPU-to-node map on x86 with NUMA. Which makes this check wrong on the above topologies so circumvent it. [ 0.444413] Booting Node 0, Processors #1 #2 #3 #4 #5 Ok. [ 0.461388] ------------[ cut here ]------------ [ 0.465997] WARNING: at arch/x86/kernel/smpboot.c:310 topology_sane.clone.1+0x6e/0x81() [ 0.473960] Hardware name: Dinar [ 0.477170] sched: CPU #6's mc-sibling CPU #0 is not on the same node! [node: 1 != 0]. Ignoring dependency. [ 0.486860] Booting Node 1, Processors #6 [ 0.491104] Modules linked in: [ 0.494141] Pid: 0, comm: swapper/6 Not tainted 3.4.0+ #1 [ 0.499510] Call Trace: [ 0.501946] [<ffffffff8144bf92>] ? topology_sane.clone.1+0x6e/0x81 [ 0.508185] [<ffffffff8102f1fc>] warn_slowpath_common+0x85/0x9d [ 0.514163] [<ffffffff8102f2b7>] warn_slowpath_fmt+0x46/0x48 [ 0.519881] [<ffffffff8144bf92>] topology_sane.clone.1+0x6e/0x81 [ 0.525943] [<ffffffff8144c234>] set_cpu_sibling_map+0x251/0x371 [ 0.532004] [<ffffffff8144c4ee>] start_secondary+0x19a/0x218 [ 0.537729] ---[ end trace 4eaa2a86a8e2da22 ]--- [ 0.628197] #7 #8 #9 #10 #11 Ok. [ 0.807108] Booting Node 3, Processors #12 #13 #14 #15 #16 #17 Ok. [ 0.897587] Booting Node 2, Processors #18 #19 #20 #21 #22 #23 Ok. [ 0.917443] Brought up 24 CPUs We ran a topology sanity check test we have here on it and it all looks ok... hopefully :). Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Cc: Andreas Herrmann <andreas.herrmann3@amd.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20120529135442.GE29157@aftab.osrc.amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-06sched/x86: Calculate booted cores after construction of sibling_maskKamalesh Babulal
Commit 316ad248307fb ("sched/x86: Rewrite set_cpu_sibling_map()") broke the booted_cores accounting. The problem is that the booted_cores accounting needs all the sibling links set up. So restore the second loop and add a comment as to why its needed. On qemu booted with -smp sockets=1,cores=2,threads=2; Before: $ grep cores /proc/cpuinfo cpu cores : 2 cpu cores : 1 cpu cores : 4 cpu cores : 3 With the patch: $ grep cores /proc/cpuinfo cpu cores : 2 cpu cores : 2 cpu cores : 2 cpu cores : 2 Reported-by: Prarit Bhargava <prarit@redhat.com> Reported-by: Borislav Petkov <bp@amd64.org> Signed-off-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20120531073738.GH7511@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-05-30sched/x86: Use cpu_llc_shared_mask(cpu) for coregroup_maskPeter Zijlstra
Commit commit 8e7fbcbc2 ("sched: Remove stale power aware scheduling remnants and dysfunctional knobs") made a boo-boo with removing the power aware scheduling muck from the x86 topology bits. We should unconditionally use the llc_shared mask for multi-core. Reported-and-tested-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Borislav Petkov <bp@amd64.org> Cc: Andreas Herrmann <andreas.herrmann3@amd.com> Link: http://lkml.kernel.org/n/tip-lsksc2kfyeveb13avh327p0d@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-05-29Merge branch 'x86-trampoline-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 trampoline rework from H. Peter Anvin: "This code reworks all the "trampoline"/"realmode" code (various bits that need to live in the first megabyte of memory, most but not all of which runs in real mode at some point) in the kernel into a single object. The main reason for doing this is that it eliminates the last place in the kernel where we needed pages to be mapped RWX. This code separates all that code into proper R/RW/RX pages." Fix up conflicts in arch/x86/kernel/Makefile (mca removed next to reboot code), and arch/x86/kernel/reboot.c (reboot code moved around in one branch, modified in this one), and arch/x86/tools/relocs.c (mostly same code came in earlier due to working around the ld bugs just before the 3.4 release). Also remove stale x86-relocs entry from scripts/.gitignore as per Peter Anvin. * commit '61f5446169046c217a5479517edac3a890c3bee7': (36 commits) x86, realmode: Move end signature into header.S x86, relocs: When printing an error, say relative or absolute x86, relocs: More relocations which may end up as absolute x86, relocs: Workaround for binutils 2.22.52.0.1 section bug xen-acpi-processor: Add missing #include <xen/xen.h> acpi, bgrd: Add missing <linux/io.h> to drivers/acpi/bgrt.c x86, realmode: Change EFER to a single u64 field x86, realmode: Move kernel/realmode.c to realmode/init.c x86, realmode: Move not-common bits out of trampoline_common.S x86, realmode: Mask out EFER.LMA when saving trampoline EFER x86, realmode: Fix no cache bits test in reboot_32.S x86, realmode: Make sure all generated files are listed in targets x86, realmode: build fix: remove duplicate build x86, realmode: read cr4 and EFER from kernel for 64-bit trampoline x86, realmode: fixes compilation issue in tboot.c x86, realmode: move relocs from scripts/ to arch/x86/tools x86, realmode: header for trampoline code x86, realmode: flattened rm hierachy x86, realmode: don't copy real_mode_header x86, realmode: fix 64-bit wakeup sequence ...
2012-05-22Merge branch 'sched-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler changes from Ingo Molnar: "The biggest change is the cleanup/simplification of the load-balancer: instead of the current practice of architectures twiddling scheduler internal data structures and providing the scheduler domains in colorfully inconsistent ways, we now have generic scheduler code in kernel/sched/core.c:sched_init_numa() that looks at the architecture's node_distance() parameters and (while not fully trusting it) deducts a NUMA topology from it. This inevitably changes balancing behavior - hopefully for the better. There are various smaller optimizations, cleanups and fixlets as well" * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched: Taint kernel with TAINT_WARN after sleep-in-atomic bug sched: Remove stale power aware scheduling remnants and dysfunctional knobs sched/debug: Fix printing large integers on 32-bit platforms sched/fair: Improve the ->group_imb logic sched/nohz: Fix rq->cpu_load[] calculations sched/numa: Don't scale the imbalance sched/fair: Revert sched-domain iteration breakage sched/x86: Rewrite set_cpu_sibling_map() sched/numa: Fix the new NUMA topology bits sched/numa: Rewrite the CONFIG_NUMA sched domain support sched/fair: Propagate 'struct lb_env' usage into find_busiest_group sched/fair: Add some serialization to the sched_domain load-balance walk sched/fair: Let minimally loaded cpu balance the group sched: Change rq->nr_running to unsigned int x86/numa: Check for nonsensical topologies on real hw as well x86/numa: Hard partition cpu topology masks on node boundaries x86/numa: Allow specifying node_distance() for numa=fake x86/sched: Make mwait_usable() heed to "idle=" kernel parameters properly sched: Update documentation and comments sched_rt: Avoid unnecessary dequeue and enqueue of pushable tasks in set_cpus_allowed_rt()
2012-05-17sched: Remove stale power aware scheduling remnants and dysfunctional knobsPeter Zijlstra
It's been broken forever (i.e. it's not scheduling in a power aware fashion), as reported by Suresh and others sending patches, and nobody cares enough to fix it properly ... so remove it to make space free for something better. There's various problems with the code as it stands today, first and foremost the user interface which is bound to topology levels and has multiple values per level. This results in a state explosion which the administrator or distro needs to master and almost nobody does. Furthermore large configuration state spaces aren't good, it means the thing doesn't just work right because it's either under so many impossibe to meet constraints, or even if there's an achievable state workloads have to be aware of it precisely and can never meet it for dynamic workloads. So pushing this kind of decision to user-space was a bad idea even with a single knob - it's exponentially worse with knobs on every node of the topology. There is a proposal to replace the user interface with a single 3 state knob: sched_balance_policy := { performance, power, auto } where 'auto' would be the preferred default which looks at things like Battery/AC mode and possible cpufreq state or whatever the hw exposes to show us power use expectations - but there's been no progress on it in the past many months. Aside from that, the actual implementation of the various knobs is known to be broken. There have been sporadic attempts at fixing things but these always stop short of reaching a mergable state. Therefore this wholesale removal with the hopes of spurring people who care to come forward once again and work on a coherent replacement. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/1326104915.2442.53.camel@twins Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-05-14sched/x86: Rewrite set_cpu_sibling_map()Peter Zijlstra
Commit ad7687dde ("x86/numa: Check for nonsensical topologies on real hw as well") is broken in that the condition can trigger for valid setups but only changes the end result for invalid setups with no real means of discerning between those. Rewrite set_cpu_sibling_map() to make the code clearer and make sure to only warn when the check changes the end result. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-klcwahu3gx467uhfiqjyhdcs@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-05-09x86/numa: Check for nonsensical topologies on real hw as wellIngo Molnar
Instead of only checking nonsensical topologies on numa-emu, do it on real hardware as well, and print a warning. Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Tejun Heo <tj@kernel.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: x86@kernel.org Link: http://lkml.kernel.org/n/tip-re15l0jqjtpz709oxozt2zoh@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-05-09x86/numa: Hard partition cpu topology masks on node boundariesPeter Zijlstra
When using numa=fake= you can get weird topologies where LLCs can span nodes and other such nonsense. Cure this by hard partitioning these masks on node boundaries. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Tejun Heo <tj@kernel.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: x86@kernel.org Link: http://lkml.kernel.org/n/tip-di5vwjm96q5vrb76opwuflwx@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-05-08x86, realmode: header for trampoline codeJarkko Sakkinen
Added header for trampoline code that can be used to supply input data to it. This makes interface between real mode code and kernel cleaner and simpler. Replaced two confusing pointers to level4 pgt in trampoline_64.S with a single pointer to the beginning of the page table. Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@intel.com> Link: http://lkml.kernel.org/r/1336501366-28617-21-git-send-email-jarkko.sakkinen@intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-05-08x86, realmode: don't copy real_mode_headerJarkko Sakkinen
Replaced copying of real_mode_header with a pointer to beginning of RM memory. Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@intel.com> Link: http://lkml.kernel.org/r/1336501366-28617-19-git-send-email-jarkko.sakkinen@intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-05-08x86, realmode: Move SMP trampoline to unified realmode codeJarkko Sakkinen
Migrated SMP trampoline code to the real mode blob. SMP trampoline code is not yet removed from .x86_trampoline because it is needed by the wakeup code. [ hpa: always enable compiling startup_32_smp in head_32.S... it is only a few instructions which go into .init on UP builds, and it makes the rest of the code less #ifdef ugly. ] Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@intel.com> Link: http://lkml.kernel.org/r/1336501366-28617-6-git-send-email-jarkko.sakkinen@intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-04-26x86: Use generic idle thread allocationThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: x86@kernel.org Link: http://lkml.kernel.org/r/20120420124557.246929343@linutronix.de
2012-04-26x86: Add task_struct argument to smp_ops.cpu_upThomas Gleixner
Preparatory patch to use the generic idle thread allocation. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: x86@kernel.org Link: http://lkml.kernel.org/r/20120420124557.176604405@linutronix.de
2012-03-30Merge branch 'release' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux Pull ACPI & Power Management changes from Len Brown: - ACPI 5.0 after-ripples, ACPICA/Linux divergence cleanup - cpuidle evolving, more ARM use - thermal sub-system evolving, ditto - assorted other PM bits Fix up conflicts in various cpuidle implementations due to ARM cpuidle cleanups (ARM at91 self-refresh and cpu idle code rewritten into "standby" in asm conflicting with the consolidation of cpuidle time keeping), trivial SH include file context conflict and RCU tracing fixes in generic code. * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux: (77 commits) ACPI throttling: fix endian bug in acpi_read_throttling_status() Disable MCP limit exceeded messages from Intel IPS driver ACPI video: Don't start video device until its associated input device has been allocated ACPI video: Harden video bus adding. ACPI: Add support for exposing BGRT data ACPI: export acpi_kobj ACPI: Fix logic for removing mappings in 'acpi_unmap' CPER failed to handle generic error records with multiple sections ACPI: Clean redundant codes in scan.c ACPI: Fix unprotected smp_processor_id() in acpi_processor_cst_has_changed() ACPI: consistently use should_use_kmap() PNPACPI: Fix device ref leaking in acpi_pnp_match ACPI: Fix use-after-free in acpi_map_lsapic ACPI: processor_driver: add missing kfree ACPI, APEI: Fix incorrect APEI register bit width check and usage Update documentation for parameter *notrigger* in einj.txt ACPI, APEI, EINJ, new parameter to control trigger action ACPI, APEI, EINJ, limit the range of einj_param ACPI, APEI, Fix ERST header length check cpuidle: power_usage should be declared signed integer ...
2012-03-30idle, x86: Allow off-lined CPU to enter deeper C statesBoris Ostrovsky
Currently when a CPU is off-lined it enters either MWAIT-based idle or, if MWAIT is not desired or supported, HLT-based idle (which places the processor in C1 state). This patch allows processors without MWAIT support to stay in states deeper than C1. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@amd.com> Signed-off-by: Len Brown <len.brown@intel.com>
2012-03-29Merge branch 'sched-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Ingo Molnar. * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: cpusets: Remove an unused variable sched/rt: Improve pick_next_highest_task_rt() sched: Fix select_fallback_rq() vs cpu_active/cpu_online sched/x86/smp: Do not enable IRQs over calibrate_delay() sched: Fix compiler warning about declared inline after use MAINTAINERS: Update email address for SCHEDULER and PERF EVENTS
2012-03-28Merge branch 'kvm-updates/3.4' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm updates from Avi Kivity: "Changes include timekeeping improvements, support for assigning host PCI devices that share interrupt lines, s390 user-controlled guests, a large ppc update, and random fixes." This is with the sign-off's fixed, hopefully next merge window we won't have rebased commits. * 'kvm-updates/3.4' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (130 commits) KVM: Convert intx_mask_lock to spin lock KVM: x86: fix kvm_write_tsc() TSC matching thinko x86: kvmclock: abstract save/restore sched_clock_state KVM: nVMX: Fix erroneous exception bitmap check KVM: Ignore the writes to MSR_K7_HWCR(3) KVM: MMU: make use of ->root_level in reset_rsvds_bits_mask KVM: PMU: add proper support for fixed counter 2 KVM: PMU: Fix raw event check KVM: PMU: warn when pin control is set in eventsel msr KVM: VMX: Fix delayed load of shared MSRs KVM: use correct tlbs dirty type in cmpxchg KVM: Allow host IRQ sharing for assigned PCI 2.3 devices KVM: Ensure all vcpus are consistent with in-kernel irqchip settings KVM: x86 emulator: Allow PM/VM86 switch during task switch KVM: SVM: Fix CPL updates KVM: x86 emulator: VM86 segments must have DPL 3 KVM: x86 emulator: Fix task switch privilege checks arch/powerpc/kvm/book3s_hv.c: included linux/sched.h twice KVM: x86 emulator: correctly mask pmc index bits in RDPMC instruction emulation KVM: mmu_notifier: Flush TLBs before releasing mmu_lock ...
2012-03-27sched/x86/smp: Do not enable IRQs over calibrate_delay()Peter Zijlstra
We should not ever enable IRQs until we're fully set up. This opens up a window where interrupts can hit the cpu and interrupts can do wakeups, wakeups need state that isn't set-up yet, in particular this cpu isn't elegible to run tasks, so if any cpu-affine task that got created in CPU_UP_PREPARE manages to get a wakeup, its affinity mask will get broken and we'll run into lots of 'interesting' problems. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/n/tip-yaezmlbriluh166tfkgni22m@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-03-22Merge branch 'x86-platform-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 platform changes from Ingo Molnar. Removes the Moorestown platform that nobody ever used. * 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/platform: Move APIC ID validity check into platform APIC code x86/olpc/xo15/sci: Enable lid close wakeup control x86/geode/net5501: Add platform driver for Soekris Engineering net5501 x86/geode/alix2: Supplement driver to include GPIO button support x86/mid/powerbtn: Use MSIC read/write instead of ipc_scu x86/mid/thermal: Turn off thermistor x86/mid/thermal: Add msic_thermal alias x86/mid/thermal: Convert to use Intel MSIC API x86/mid/scu_ipc: Remove Moorestown support x86/mid: Kill off Moorestown x86/mrst: Add msic_thermal platform support x86/config: Select MSIC MFD driver on Intel Medfield platform x86/mid: Remove Intel Moorestown x86/mrst: Set ISA bus type for fake MP IRQs x86/ioapic: Use legacy_pic to set correct gsi-irq mapping
2012-03-22Merge branch 'x86-debug-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86/debug changes from Ingo Molnar. * 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: Fix section warnings x86-64: Fix CFI data for common_interrupt() x86: Properly _init-annotate NMI selftest code x86/debug: Fix/improve the show_msr=<cpus> debug print out
2012-03-22Merge branches 'x86-cpu-for-linus', 'x86-boot-for-linus', ↵Linus Torvalds
'x86-cpufeature-for-linus', 'x86-process-for-linus' and 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull trivial x86 branches from Ingo Molnar: small one-liners to fix up details. * 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: Remove some noise from boot log when starting cpus * 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, boot: Fix port argument to inl() function * 'x86-cpufeature-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, cpufeature: Add CPU features from Intel document 319433-012A * 'x86-process-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86_64: Record stack pointer before task execution begins * 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/UV: Lower UV rtc clocksource rating
2012-03-14x86/platform: Move APIC ID validity check into platform APIC codeDaniel J Blueman
Move APIC ID validity check into platform APIC code, so it can be overridden when needed. For NumaChip systems, always trust MADT, as it's constructed with high APIC IDs. Behaviour verifies on standard x86 systems and on NumaChip systems with this, and compile-tested with allyesconfig. Signed-off-by: Daniel J Blueman <daniel@numascale-asia.com> Reviewed-by: Steffen Persvold <sp@numascale.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: H. Peter Anvin <hpa@linux.intel.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Link: http://lkml.kernel.org/r/1331709454-27966-1-git-send-email-daniel@numascale-asia.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-03-12sched: Cleanup cpu_active madnessPeter Zijlstra
Stepan found: CPU0 CPUn _cpu_up() __cpu_up() boostrap() notify_cpu_starting() set_cpu_online() while (!cpu_active()) cpu_relax() <PREEMPT-out> smp_call_function(.wait=1) /* we find cpu_online() is true */ arch_send_call_function_ipi_mask() /* wait-forever-more */ <PREEMPT-in> local_irq_enable() cpu_notify(CPU_ONLINE) sched_cpu_active() set_cpu_active() Now the purpose of cpu_active is mostly with bringing down a cpu, where we mark it !active to avoid the load-balancer from moving tasks to it while we tear down the cpu. This is required because we only update the sched_domain tree after we brought the cpu-down. And this is needed so that some tasks can still run while we bring it down, we just don't want new tasks to appear. On cpu-up however the sched_domain tree doesn't yet include the new cpu, so its invisible to the load-balancer, regardless of the active state. So instead of setting the active state after we boot the new cpu (and consequently having to wait for it before enabling interrupts) set the cpu active before we set it online and avoid the whole mess. Reported-by: Stepan Moskovchenko <stepanm@codeaurora.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1323965362.18942.71.camel@twins Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-03-05x86: Introduce x86_cpuinit.early_percpu_clock_init hookIgor Mammedov
When kvm guest uses kvmclock, it may hang on vcpu hot-plug. This is caused by an overflow in pvclock_get_nsec_offset, u64 delta = tsc - shadow->tsc_timestamp; which in turn is caused by an undefined values from percpu hv_clock that hasn't been initialized yet. Uninitialized clock on being booted cpu is accessed from start_secondary -> smp_callin -> smp_store_cpu_info -> identify_secondary_cpu -> mtrr_ap_init -> mtrr_restore -> stop_machine_from_inactive_cpu -> queue_stop_cpus_work ... -> sched_clock -> kvm_clock_read which is well before x86_cpuinit.setup_percpu_clockev call in start_secondary, where percpu clock is initialized. This patch introduces a hook that allows to setup/initialize per_cpu clock early and avoid overflow due to reading - undefined values - old values if cpu was offlined and then onlined again Another possible early user of this clock source is ftrace that accesses it to get timestamps for ring buffer entries. So if mtrr_ap_init is moved from identify_secondary_cpu to past x86_cpuinit.setup_percpu_clockev in start_secondary, ftrace may cause the same overflow/hang on cpu hot-plug anyway. More complete description of the problem: https://lkml.org/lkml/2012/2/2/101 Credits to Marcelo Tosatti <mtosatti@redhat.com> for hook idea. Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-02-22x86: Remove some noise from boot log when starting cpusLuck, Tony
Printing the "start_ip" for every secondary cpu is very noisy on a large system - and doesn't add any value. Drop this message. Console log before: Booting Node 0, Processors #1 smpboot cpu 1: start_ip = 96000 #2 smpboot cpu 2: start_ip = 96000 #3 smpboot cpu 3: start_ip = 96000 #4 smpboot cpu 4: start_ip = 96000 ... #31 smpboot cpu 31: start_ip = 96000 Brought up 32 CPUs Console log after: Booting Node 0, Processors #1 #2 #3 #4 #5 #6 #7 Ok. Booting Node 1, Processors #8 #9 #10 #11 #12 #13 #14 #15 Ok. Booting Node 0, Processors #16 #17 #18 #19 #20 #21 #22 #23 Ok. Booting Node 1, Processors #24 #25 #26 #27 #28 #29 #30 #31 Brought up 32 CPUs Acked-by: Borislav Petkov <bp@amd64.org> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/4f452eb42507460426@agluck-desktop.sc.intel.com Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2012-02-12x86/debug: Fix/improve the show_msr=<cpus> debug print outYinghai Lu
Found out that show_msr=<cpus> is broken, when I asked a user to use it to capture debug info about broken MTRR's whose MTRR settings are probably different between CPUs. Only the first CPUs MSRs are printed, but that is not enough to track down the suspected bug. For years we called print_cpu_msr from print_cpu_info(), but this commit: | commit 2eaad1fddd7450a48ad464229775f97fbfe8af36 | Author: Mike Travis <travis@sgi.com> | Date: Thu Dec 10 17:19:36 2009 -0800 | | x86: Limit the number of processor bootup messages removed the print_cpu_info() call from all APs. Put it back - it will only print MSRs when the user specifically requests them via show_msr=<cpus>. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: Mike Travis <travis@sgi.com> Link: http://lkml.kernel.org/r/1329069237-11483-1-git-send-email-yinghai@kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-01-11Merge branch 'x86-platform-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip * 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/intel config: Fix the APB_TIMER selection x86/mrst: Add additional debug prints for pb_keys x86/intel config: Revamp configuration to allow for Moorestown and Medfield x86/intel/scu/ipc: Match the changes in the x86 configuration x86/apb: Fix configuration constraints x86: Fix INTEL_MID silly x86/Kconfig: Cyclone-timer depends on x86-summit x86: Reduce clock calibration time during slave cpu startup x86/config: Revamp configuration for MID devices x86/sfi: Kill the IRQ as id hack
2012-01-11Merge branch 'x86-debug-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip * 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, reboot: Fix typo in nmi reboot path x86, NMI: Add to_cpumask() to silence compile warning x86, NMI: NMI selftest depends on the local apic x86: Add stack top margin for stack overflow checking x86, NMI: NMI-selftest should handle the UP case properly x86: Fix the 32-bit stackoverflow-debug build x86, NMI: Add knob to disable using NMI IPIs to stop cpus x86, NMI: Add NMI IPI selftest x86, reboot: Use NMI instead of REBOOT_VECTOR to stop cpus x86: Clean up the range of stack overflow checking x86: Panic on detection of stack overflow x86: Check stack overflow in detail
2011-12-23x86: Skip cpus with apic-ids >= 255 in !x2apic_modeSuresh Siddha
If the x2apic mode is disabled for reasons like interrupt-remapping not available etc, then we need to skip the logical cpu bringup of apic-id's >= 255. Otherwise as the platform is in xapic mode, init/startup IPI's will consider only the low 8-bits and there is a possibility of re-sending init/startup IPI's to the logical cpu that is already online. This will avoid potential reboots/unpredictable behavior etc. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Link: http://lkml.kernel.org/r/20111222014632.702932458@sbsiddha-desk.sc.intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-12-05x86: Reduce clock calibration time during slave cpu startupJack Steiner
Reduce the startup time for slave cpus. Adds hooks for an arch-specific function for clock calibration. These hooks are used on x86. If a newly started cpu has the same phys_proc_id as a core already active, uses the TSC for the delay loop and has a CONSTANT_TSC, use the already-calculated value of loops_per_jiffy. This patch reduces the time required to start slave cpus on a 4096 cpu system from: 465 sec OLD 62 sec NEW This reduces boot time on a 4096p system by almost 7 minutes. Nice... Signed-off-by: Jack Steiner <steiner@sgi.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: John Stultz <john.stultz@linaro.org> [fix CONFIG_SMP=n build] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-05x86, NMI: Add NMI IPI selftestDon Zickus
The previous patch modified the stop cpus path to use NMI instead of IRQ as the way to communicate to the other cpus to shutdown. There were some concerns that various machines may have problems with using an NMI IPI. This patch creates a selftest to check if NMI is working at boot. The idea is to help catch any issues before the machine panics and we learn the hard way. Loosely based on the locking-selftest.c file, this separate file runs a couple of simple tests and reports the results. The output looks like: ... Brought up 4 CPUs ---------------- | NMI testsuite: -------------------- remote IPI: ok | local IPI: ok | -------------------- Good, all 2 testcases passed! | --------------------------------- Total of 4 processors activated (21330.61 BogoMIPS). ... Signed-off-by: Don Zickus <dzickus@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robert Richter <robert.richter@amd.com> Cc: seiji.aguchi@hds.com Cc: vgoyal@redhat.com Cc: mjg@redhat.com Cc: tony.luck@intel.com Cc: gong.chen@intel.com Cc: satoru.moriya@hds.com Cc: avi@redhat.com Cc: Andi Kleen <andi@firstfloor.org> Link: http://lkml.kernel.org/r/1318533267-18880-3-git-send-email-dzickus@redhat.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-07-22Merge branch 'x86-cleanups-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, smpboot: Mark the names[] array in __inquire_remote_apic() as const x86: Convert vmalloc()+memset() to vzalloc()
2011-07-21x86, smpboot: Mark the names[] array in __inquire_remote_apic() as constGreg Dietsche
This array is read-only. Make it explicit by marking as const. Signed-off-by: Greg Dietsche <Gregory.Dietsche@cuw.edu> Link: http://lkml.kernel.org/r/1309482653-23648-1-git-send-email-Gregory.Dietsche@cuw.edu Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-06-08x86: cpu-hotplug: Prevent softirq wakeup on wrong CPUThomas Gleixner
After a newly plugged CPU sets the cpu_online bit it enables interrupts and goes idle. The cpu which brought up the new cpu waits for the cpu_online bit and when it observes it, it sets the cpu_active bit for this cpu. The cpu_active bit is the relevant one for the scheduler to consider the cpu as a viable target. With forced threaded interrupt handlers which imply forced threaded softirqs we observed the following race: cpu 0 cpu 1 bringup(cpu1); set_cpu_online(smp_processor_id(), true); local_irq_enable(); while (!cpu_online(cpu1)); timer_interrupt() -> wake_up(softirq_thread_cpu1); -> enqueue_on(softirq_thread_cpu1, cpu0); ^^^^ cpu_notify(CPU_ONLINE, cpu1); -> sched_cpu_active(cpu1) -> set_cpu_active((cpu1, true); When an interrupt happens before the cpu_active bit is set by the cpu which brought up the newly onlined cpu, then the scheduler refuses to enqueue the woken thread which is bound to that newly onlined cpu on that newly onlined cpu due to the not yet set cpu_active bit and selects a fallback runqueue. Not really an expected and desirable behaviour. So far this has only been observed with forced hard/softirq threading, but in theory this could happen without forced threaded hard/softirqs as well. It's probably unobservable as it would take a massive interrupt storm on the newly onlined cpu which causes the softirq loop to wake up the softirq thread and an even longer delay of the cpu which waits for the cpu_online bit. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Peter Zijlstra <peterz@infradead.org> Cc: stable@kernel.org # 2.6.39
2011-05-30x86: Fix mwait_play_dead() faulting on mwait-incapable cpusAvi Kivity
A logic error in mwait_play_dead() causes the kernel to use mwait even on cpus which don't support it, such as KVM virtual cpus. Introduced by: 349c004e3d31: x86: A fast way to check capabilities of the current cpu Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=36222 Reported-by: Török Edwin <edwintorok@gmail.com> Signed-off-by: Avi Kivity <avi@redhat.com> Cc: Christoph Lameter <cl@linux.com> Cc: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/1306758237-9327-1-git-send-email-avi@redhat.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-29Merge branch 'idle-release' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-idle-2.6 * 'idle-release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-idle-2.6: x86 idle: deprecate mwait_idle() and "idle=mwait" cmdline param x86 idle: deprecate "no-hlt" cmdline param x86 idle APM: deprecate CONFIG_APM_CPU_IDLE x86 idle floppy: deprecate disable_hlt() x86 idle: EXPORT_SYMBOL(default_idle, pm_idle) only when APM demands it x86 idle: clarify AMD erratum 400 workaround idle governor: Avoid lock acquisition to read pm_qos before entering idle cpuidle: menu: fixed wrapping timers at 4.294 seconds
2011-05-29x86 idle: clarify AMD erratum 400 workaroundLen Brown
The workaround for AMD erratum 400 uses the term "c1e" falsely suggesting: 1. Intel C1E is somehow involved 2. All AMD processors with C1E are involved Use the string "amd_c1e" instead of simply "c1e" to clarify that this workaround is specific to AMD's version of C1E. Use the string "e400" to clarify that the workaround is specific to AMD processors with Erratum 400. This patch is text-substitution only, with no functional change. cc: x86@kernel.org Acked-by: Borislav Petkov <borislav.petkov@amd.com> Signed-off-by: Len Brown <len.brown@intel.com>
2011-03-29x86: A fast way to check capabilities of the current cpuChristoph Lameter
Add this_cpu_has() which determines if the current cpu has a certain ability using a segment prefix and a bit test operation. For that we need to add bit operations to x86s percpu.h. Many uses of cpu_has use a pointer passed to a function to determine the current flags. That is no longer necessary after this patch. However, this patch only converts the straightforward cases where cpu_has is used with this_cpu_ptr. The rest is work for later. -tj: Rolled up patch to add x86_ prefix and use percpu_read() instead of percpu_read_stable(). Signed-off-by: Christoph Lameter <cl@linux.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Tejun Heo <tj@kernel.org>
2011-03-16Merge branch 'x86-trampoline-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-trampoline-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: Fix binutils-2.21 symbol related build failures x86-64, trampoline: Remove unused variable x86, reboot: Fix the use of passed arguments in 32-bit BIOS reboot x86, reboot: Move the real-mode reboot code to an assembly file x86: Make the GDT_ENTRY() macro in <asm/segment.h> safe for assembly x86, trampoline: Use the unified trampoline setup for ACPI wakeup x86, trampoline: Common infrastructure for low memory trampolines Fix up trivial conflicts in arch/x86/kernel/Makefile
2011-03-15Merge branch 'x86-mm-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (93 commits) x86, tlb, UV: Do small micro-optimization for native_flush_tlb_others() x86-64, NUMA: Don't call numa_set_distanc() for all possible node combinations during emulation x86-64, NUMA: Don't assume phys node 0 is always online in numa_emulation() x86-64, NUMA: Clean up initmem_init() x86-64, NUMA: Fix numa_emulation code with node0 without RAM x86-64, NUMA: Revert NUMA affine page table allocation x86: Work around old gas bug x86-64, NUMA: Better explain numa_distance handling x86-64, NUMA: Fix distance table handling mm: Move early_node_map[] reverse scan helpers under HAVE_MEMBLOCK x86-64, NUMA: Fix size of numa_distance array x86: Rename e820_table_* to pgt_buf_* bootmem: Move __alloc_memory_core_early() to nobootmem.c bootmem: Move contig_page_data definition to bootmem.c/nobootmem.c bootmem: Separate out CONFIG_NO_BOOTMEM code into nobootmem.c x86-64, NUMA: Seperate out numa_alloc_distance() from numa_set_distance() x86-64, NUMA: Add proper function comments to global functions x86-64, NUMA: Move NUMA emulation into numa_emulation.c x86-64, NUMA: Prepare numa_emulation() for moving NUMA emulation into a separate file x86-64, NUMA: Do not scan two times for setup_node_bootmem() ... Fix up conflicts in arch/x86/kernel/smpboot.c
2011-03-15Merge branch 'irq-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (116 commits) x86: Enable forced interrupt threading support x86: Mark low level interrupts IRQF_NO_THREAD x86: Use generic show_interrupts x86: ioapic: Avoid redundant lookup of irq_cfg x86: ioapic: Use new move_irq functions x86: Use the proper accessors in fixup_irqs() x86: ioapic: Use irq_data->state x86: ioapic: Simplify irq chip and handler setup x86: Cleanup the genirq name space genirq: Add chip flag to force mask on suspend genirq: Add desc->irq_data accessor genirq: Add comments to Kconfig switches genirq: Fixup fasteoi handler for oneshot mode genirq: Provide forced interrupt threading sched: Switch wait_task_inactive to schedule_hrtimeout() genirq: Add IRQF_NO_THREAD genirq: Allow shared oneshot interrupts genirq: Prepare the handling of shared oneshot interrupts genirq: Make warning in handle_percpu_event useful x86: ioapic: Move trigger defines to io_apic.h ... Fix up trivial(?) conflicts in arch/x86/pci/xen.c due to genirq name space changes clashing with the Xen cleanups. The set_irq_msi() had moved to xen_bind_pirq_msi_to_irq().
2011-03-15Merge branch 'x86-cpu-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: Fix and clean up generic_processor_info() x86: Don't copy per_cpu cpuinfo for BSP two times x86: Move llc_shared_map out of cpu_info
2011-03-15Merge commit 'v2.6.38' into x86/mmIngo Molnar
Conflicts: arch/x86/mm/numa_64.c Merge reason: Resolve the conflict, update the branch to .38. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-02-23x86: Rework arch_disable_smp_support() for x86Henrik Kretzschmar
Currently arch_disable_smp_support() on x86 disables only the support for the IOAPIC and is also compiled in if SMP-support is not. Therefore this function is renamed to disable_ioapic_support(), which meets its purpose and is only compiled in the kernel when IOAPIC support is also. A new arch_disable_smp_support() is created in smpboot.c, which calls disable_ioapic_support() and gets only compiled in the kernel when SMP support is also. Signed-off-by: Henrik Kretzschmar <henne@nachtwindheim.de> LKML-Reference: <1298385487-4708-3-git-send-email-henne@nachtwindheim.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-02-17x86, trampoline: Common infrastructure for low memory trampolinesH. Peter Anvin
Common infrastructure for low memory trampolines. This code installs the trampolines permanently in low memory very early. It also permits multiple pieces of code to be used for this purpose. This code also introduces a standard infrastructure for computing symbol addresses in the trampoline code. The only change to the actual SMP trampolines themselves is that the 64-bit trampoline has been made reusable -- the previous version would overwrite the code with a status variable; this moves the status variable to a separate location. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> LKML-Reference: <4D5DFBE4.7090104@intel.com> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Matthieu Castet <castet.matthieu@free.fr> Cc: Stephen Rothwell <sfr@canb.auug.org.au>
2011-02-16Merge branch 'x86/amd-nb' into x86/mmIngo Molnar
Merge reason: consolidate it into the more generic x86/mm tree to prevent conflicts with ongoing NUMA work. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-02-14Merge commit 'v2.6.38-rc4' into x86/numaIngo Molnar
Merge reason: Merge latest fixes before applying new patch. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-02-14Merge commit 'v2.6.38-rc4' into x86/cpuIngo Molnar
Merge reason: pick up the latest fixes. Signed-off-by: Ingo Molnar <mingo@elte.hu>