Age | Commit message (Collapse) | Author |
|
Rather than using a single constant PERCPU_ENOUGH_ROOM, compute it as
the sum of kernel_percpu + PERCPU_MODULE_RESERVE. This is now common
to all architectures; if an architecture wants to set
PERCPU_ENOUGH_ROOM to something special, then it may do so (ia64 is
the only one which does).
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Andi Kleen <ak@suse.de>
|
|
All were already in some header
Signed-off-by: Andi Kleen <ak@suse.de>
|
|
Signed-off-by: Andi Kleen <ak@suse.de>
|
|
machine_ops is an interface for the machine_* functions defined in
<linux/reboot.h>. This is intended to allow hypervisors to intercept
the reboot process, but it could be used to implement other x86
subarchtecture reboots.
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
|
|
Add a smp_ops interface. This abstracts the API defined by
<linux/smp.h> for use within arch/i386. The primary intent is that it
be used by a paravirtualizing hypervisor to implement SMP, but it
could also be used by non-APIC-using sub-architectures.
This is related to CONFIG_PARAVIRT, but is implemented unconditionally
since it is simpler that way and not a highly performance-sensitive
interface.
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
|
|
Now we have an explicit per-cpu GDT variable, we don't need to keep the
descriptors around to use them to find the GDT: expose cpu_gdt directly.
We could go further and make load_gdt() pack the descriptor for us, or even
assume it means "load the current cpu's GDT" which is what it always does.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Fix "Section mismatch" warnings in arch/x86_64/kernel/time.c
Signed-off-by: Bernhard Walle <bwalle@suse.de>
Signed-off-by: Andi Kleen <ak@suse.de>
|
|
commit 5e518d7672dea4cd7c60871e40d0490c52f01d13 did the same change to
i386's variant.
With this change, i386's and x86-64's versions are identical, raising
the question whether the x86-64 one should go (just like there's only
one instance of edd.S).
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
|
|
operation.
This patch touches the NMI watchdog every MAX_ORDER_NR_PAGES
to inhibit the machine from triggering an NMI while the CPUs
are locked. This situation is happening on boxes with more
than 64CPUs and 128GB of RAM when Alt-SysRq-m is performed.
It has been succesfully tested for regression on uni, 2, 4, 8
32, and 64 CPU boxes with various memory configuration.
Signed-off-by: Andi Kleen <ak@suse.de>
|
|
Current vsyscall_gtod_data is large (3 or 4 cache lines dirtied at timer
interrupt). We can shrink it to exactly 64 bytes (1 cache line on AMD64)
Instead of copying a whole struct clocksource, we copy only needed fields.
I deleted an unused field : offset_base
This patch fixes one oddity in vgettimeofday(): It can returns a timeval with
tv_usec = 1000000. Maybe not a bug, but why not doing the right thing ?
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Andi Kleen <ak@suse.de>
|
|
There is a tiny probability that the return value from vtime(time_t *t) is
Signed-off-by: Andi Kleen <ak@suse.de>
different than the value stored in *t
Using a temporary variable solves the problem and gives a faster code.
17: 48 85 ff test %rdi,%rdi
1a: 48 8b 05 00 00 00 00 mov 0(%rip),%rax #
__vsyscall_gtod_data.wall_time_tv.tv_sec
21: 74 03 je 26
23: 48 89 07 mov %rax,(%rdi)
26: c9 leaveq
27: c3 retq
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
|
|
Many years ago, UNEXPECTED_IO_APIC() contained printk()'s (but nothing more).
Now that it's completely empty for years, we can as well remove it.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andi Kleen <ak@suse.de>
|
|
- there's no reason for duplicating the prototype from
include/linux/syscalls.h in include/asm-x86_64/unistd.h
- every file should #include the headers containing the prototypes for
it's global functions
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andi Kleen <ak@suse.de>
|
|
lists management.
x86_64 currently simulates a list using the index and private fields of the
page struct. Seems that the code was inherited from i386. But x86_64 does
not use the slab to allocate pgds and pmds etc. So the lru field is not
used by the slab and therefore available.
This patch uses standard list operations on page->lru to realize pgd
tracking.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Signed-off-by: Parag Warudkar <parag.warudkar@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
|
|
suggested by Jan Beulich
Signed-off-by: Andi Kleen <ak@suse.de>
|
|
Move X86_EFLAGS_IF et al out to a new header: processor-flags.h, so we
can include it from irqflags.h and use it in raw_irqs_disabled_flags().
As a side-effect, we could now use these flags in .S files.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andi Kleen <ak@suse.de>
|
|
Signed-off-by: Parag Warudkar <parag.warudkar@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
|
|
On x86-64, kernel memory freed after init can be entirely unmapped instead
of just getting 'poisoned' by overwriting with a debug pattern.
On i386 and x86-64 (under CONFIG_DEBUG_RODATA), kernel text and bug table
can also be write-protected.
Compared to the first version, this one prevents re-creating deleted
mappings in the kernel image range on x86-64, if those got removed
previously. This, together with the original changes, prevents temporarily
having inconsistent mappings when cacheability attributes are being
changed on such pages (e.g. from AGP code). While on i386 such duplicate
mappings don't exist, the same change is done there, too, both for
consistency and because checking pte_present() before using various other
pte_XXX functions is a requirement anyway. At once, i386 code gets
adjusted to use pte_huge() instead of open coding this.
AK: split out cpa() changes
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
|
|
Fix various broken corner cases in i386 and x86-64 change_page_attr.
AK: split off from tighten kernel image access rights
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
|
|
paravirt.c used to implement native versions of all low-level
functions. Far cleaner is to have the native versions exposed in the
headers and as inline native_XXX, and if !CONFIG_PARAVIRT, then simply
#define XXX native_XXX.
There are several nice side effects:
1) write_dt_entry() now takes the correct "struct Xgt_desc_struct *"
not "void *".
2) load_TLS is reintroduced to the for loop, not manually unrolled
with a #error in case the bounds ever change.
3) Macros become inlines, with type checking.
4) Access to the native versions is trivial for KVM, lguest, Xen and
others who might want it.
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@muc.de>
Cc: Avi Kivity <avi@qumranet.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Rename boot_gdt_table to boot_gdt to avoid the duplicate T(able).
Signed-off-by: Sebastien Dugue <sebastien.dugue@bull.net>
Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
We now have cpu_init() and secondary_cpu_init() doing nothing but calling
_cpu_init() with the same arguments. Rename _cpu_init() to cpu_init() and use
it as a replcement for secondary_cpu_init().
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Now we are no longer dynamically allocating the GDT, we don't need the
"cpu_gdt_table" at all: we can switch straight from "boot_gdt_table" to the
per-cpu GDT. This means initializing the cpu_gdt array in C.
The boot CPU uses the per-cpu var directly, then in smp_prepare_cpus() it
switches to the per-cpu copy just allocated. For secondary CPUs, the
early_gdt_descr is set to point directly to their per-cpu copy.
For UP the code is very simple: it keeps using the "per-cpu" GDT as per SMP,
but we never have to move.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Allocating PDA and GDT at boot is a pain. Using simple per-cpu variables adds
happiness (although we need the GDT page-aligned for Xen, which we do in a
followup patch).
[akpm@linux-foundation.org: build fix]
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Because the command line is increased to 2048 characters after 2.6.21, it's
not possible for boot loaders and userspace tools to determine the length
of the command line the kernel can understand. The benefit of knowing the
length is that users can be warned if the command line size is too long
which prevents surprise if things don't work after bootup.
This patch updates the boot protocol to contain a field called
"cmdline_size" that contain the length of the command line (excluding the
terminating zero).
The patch also adds missing fields (of protocol version 2.05) to the x86_64
setup code.
Signed-off-by: Bernhard Walle <bwalle@suse.de>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Alon Bar-Lev <alon.barlev@gmail.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The specific case I am encountering is kdump under Xen with a 64 bit
hypervisor and 32 bit kernel/userspace. The dump created is 64 bit due to
the hypervisor but the dump kernel is 32 bit for maximum compatibility.
It's possibly less likely to be useful in a purely native scenario but I
see no reason to disallow it.
[akpm@linux-foundation.org: build fix]
Signed-off-by: Ian Campbell <ian.campbell@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: Vivek Goyal <vgoyal@in.ibm.com>
Cc: Horms <horms@verge.net.au>
Cc: Magnus Damm <magnus.damm@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
GCC (4.1 at least) unrolls it anyway, but I can't believe this code
was ever justifiable. (I've also submitted a patch which cleans up
i386, which is even uglier).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Whenever we schedule, __switch_to calls load_esp0 which does:
tss->esp0 = thread->esp0;
This is never initialized for the initial thread (ie "swapper"), so when we're
scheduling that, we end up setting esp0 to 0. This is fine: the swapper never
leaves ring 0, so this field is never used.
lguest, however, gets upset that we're trying to used an unmapped page as our
kernel stack. Rather than work around it there, let's initialize it.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The lguest patches somehow managed to trigger this:
In file included from arch/i386/lguest/lguest.c:38:
include/asm/asm-offsets.h:67:1: warning: "VDSO_PRELINK" redefined
In file included from include/linux/elf.h:7,
from include/linux/module.h:15,
from include/linux/device.h:21,
from include/linux/interrupt.h:15,
from arch/i386/lguest/lguest.c:27:
include/asm/elf.h:140:1: warning: this is the location of the previous definition
I assume that using the same identifier twice was a bad idea..
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
|
|
Create a document to explain how to use numa=fake in conjunction with cpusets
for coarse memory resource management.
An attempt to get more awareness and testing for this feature.
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Paul Jackson <pj@sgi.com>
Cc: Christoph Lameter <clameter@engr.sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
remove the reporting of the constant_tsc flag from the "power management"
field in /proc/cpuinfo. The NULL value there was replaced by "" because
the former would result in a printout of [8] if the flag is set.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
|
|
Extends the numa=fake x86_64 command-line option to split the remaining system
memory into nodes of fixed size. Any leftover memory is allocated to a final
node unless the command-line ends with a comma.
For example:
numa=fake=2*512,*128 gives two 512M nodes and the remaining system
memory is split into nodes of 128M each.
This is beneficial for systems where the exact size of RAM is unknown or not
necessarily relevant, but the size of the remaining nodes to be allocated is
known based on their capacity for resource management.
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Paul Jackson <pj@sgi.com>
Cc: Christoph Lameter <clameter@engr.sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Extends the numa=fake x86_64 command-line option to split the remaining
system memory into equal-sized nodes.
For example:
numa=fake=2*512,4* gives two 512M nodes and the remaining system
memory is split into four approximately equal
chunks.
This is beneficial for systems where the exact size of RAM is unknown or not
necessarily relevant, but the granularity with which nodes shall be allocated
is known.
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Paul Jackson <pj@sgi.com>
Cc: Christoph Lameter <clameter@engr.sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Extends the numa=fake x86_64 command-line option to allow for configurable
node sizes. These nodes can be used in conjunction with cpusets for coarse
memory resource management.
The old command-line option is still supported:
numa=fake=32 gives 32 fake NUMA nodes, ignoring the NUMA setup of the
actual machine.
But now you may configure your system for the node sizes of your choice:
numa=fake=2*512,1024,2*256
gives two 512M nodes, one 1024M node, two 256M nodes, and
the rest of system memory to a sixth node.
The existing hash function is maintained to support the various node sizes
that are possible with this implementation.
Each node of the same size receives roughly the same amount of available
pages, regardless of any reserved memory with its address range. The total
available pages on the system is calculated and divided by the number of equal
nodes to allocate. These nodes are then dynamically allocated and their
borders extended until such time as their number of available pages reaches
the required size.
Configurable node sizes are recommended when used in conjunction with cpusets
for memory control because it eliminates the overhead associated with scanning
the zonelists of many smaller full nodes on page_alloc().
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Paul Jackson <pj@sgi.com>
Cc: Christoph Lameter <clameter@engr.sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Fix comments to represent the true number of quadwords in GDT.
Signed-off-by: Ahmed S. Darwish <darwish.07@gmail.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
This patch makes the needlessly global vmi_pmd_clear() static.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
|
|
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
|
|
Reorder code to avoid multiple inclusion of elf.h.
#undef several symbols to avoid build errors over redefinitions.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Change mark_tsc_unstable() so it takes a string argument, which holds the
reason the TSC was marked unstable.
This is then displayed the first time mark_tsc_unstable is called.
This should help us better debug why the TSC was marked unstable on certain
systems and allow us to make sure we're not being overly paranoid when
throwing out this troublesome clocksource.
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
|
|
Work around a warning with -Wmissing-prototypes in
arch/i386/kernel/asm-offsets.c
The warning isn't gcc's fault - asm-offsets.c is simply a special file.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
clean up unneeded type cast by properly declare data type.
Signed-off-by: Ken Chen <kenchen@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
|
|
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Cc: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
o Modpost generates warnings for i386 if compiled with CONFIG_RELOCATABLE=y
WARNING: vmlinux - Section mismatch: reference to .init.text:find_unisys_acpi_oem_table from .text between 'acpi_madt_oem_check' (at offset 0xc0101eda) and 'enable_apic_mode'
WARNING: vmlinux - Section mismatch: reference to .init.text:acpi_get_table_header_early from .text between 'acpi_madt_oem_check' (at offset 0xc0101ef0) and 'enable_apic_mode'
WARNING: vmlinux - Section mismatch: reference to .init.text:parse_unisys_oem from .text between 'acpi_madt_oem_check' (at offset 0xc0101f2e) and 'enable_apic_mode'
WARNING: vmlinux - Section mismatch: reference to .init.text:setup_unisys from .text between 'acpi_madt_oem_check' (at offset 0xc0101f37) and 'enable_apic_mode'WARNING: vmlinux - Section mismatch: reference to .init.text:parse_unisys_oem from .text between 'mps_oem_check' (at offset 0xc0101ec7) and 'acpi_madt_oem_check'
WARNING: vmlinux - Section mismatch: reference to .init.text:es7000_sw_apic from .text between 'enable_apic_mode' (at offset 0xc0101f48) and 'check_apicid_present'
o Some functions which are inline (acpi_madt_oem_check) are not inlined by
compiler as these functions are accessed using function pointer. These
functions are put in .text section and they in-turn access __init type
functions hence modpost generates warnings.
o Do not iniline acpi_madt_oem_check, instead make it __init.
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Cc: Len Brown <lenb@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Enable system hashtable memory to be distributed among nodes on x86_64 NUMA
Forcing the kernel to use node interleaved vmalloc instead of bootmem for
the system hashtable memory (alloc_large_system_hash) reduces the memory
imbalance on node 0 by around 40MB on a 8 node x86_64 NUMA box:
Before the following patch, on bootup of a 8 node box:
Node 0 MemTotal: 3407488 kB
Node 0 MemFree: 3206296 kB
Node 0 MemUsed: 201192 kB
Node 0 Active: 7012 kB
Node 0 Inactive: 512 kB
Node 0 Dirty: 0 kB
Node 0 Writeback: 0 kB
Node 0 FilePages: 1912 kB
Node 0 Mapped: 420 kB
Node 0 AnonPages: 5612 kB
Node 0 PageTables: 468 kB
Node 0 NFS_Unstable: 0 kB
Node 0 Bounce: 0 kB
Node 0 Slab: 5408 kB
Node 0 SReclaimable: 644 kB
Node 0 SUnreclaim: 4764 kB
After the patch (or using hashdist=1 on the kernel command line):
Node 0 MemTotal: 3407488 kB
Node 0 MemFree: 3247608 kB
Node 0 MemUsed: 159880 kB
Node 0 Active: 3012 kB
Node 0 Inactive: 616 kB
Node 0 Dirty: 0 kB
Node 0 Writeback: 0 kB
Node 0 FilePages: 2424 kB
Node 0 Mapped: 380 kB
Node 0 AnonPages: 1200 kB
Node 0 PageTables: 396 kB
Node 0 NFS_Unstable: 0 kB
Node 0 Bounce: 0 kB
Node 0 Slab: 6304 kB
Node 0 SReclaimable: 1596 kB
Node 0 SUnreclaim: 4708 kB
I guess it is a good idea to keep HASHDIST_DEFAULT "on" for x86_64 NUMA
since x86_64 has no dearth of vmalloc space? Or maybe enable hash
distribution for all 64bit NUMA arches? The following patch does it only
for x86_64.
I ran a HPC MPI benchmark -- 'Ansys wingsolid', which takes up quite a bit of
memory and uses up tlb entries. This was on a 4 way, 2 socket
Tyan AMD box (non vsmp), with 8G total memory (4G pernode).
The results with and without hash distribution are:
1. Vanilla - runtime of 1188.000s
2. With hashdist=1 runtime of 1154.000s
Oprofile output for the duration of run is:
1. Vanilla:
PU: AMD64 processors, speed 2411.16 MHz (estimated)
Counted L1_AND_L2_DTLB_MISSES events (L1 and L2 DTLB misses) with a unit
mask of 0x00 (No unit mask) count 500
samples % app name symbol name
163054 6.5513 libansys1.so MultiFront::decompose(int, int,
Elemset *, int *, int, int, int)
162061 6.5114 libansys3.so blockSaxpy6L_fd
162042 6.5107 libansys3.so blockInnerProduct6L_fd
156286 6.2794 libansys3.so maxb33_
87879 3.5309 libansys1.so elmatrixmultpcg_
84857 3.4095 libansys4.so saxpy_pcg
58637 2.3560 libansys4.so .st4560
46612 1.8728 libansys4.so .st4282
43043 1.7294 vmlinux-t copy_user_generic_string
41326 1.6604 libansys3.so blockSaxpyBackSolve6L_fd
41288 1.6589 libansys3.so blockInnerProductBackSolve6L_fd
2. With hashdist=1
CPU: AMD64 processors, speed 2411.13 MHz (estimated)
Counted L1_AND_L2_DTLB_MISSES events (L1 and L2 DTLB misses) with a unit
mask of 0x00 (No unit mask) count 500
samples % app name symbol name
162993 6.9814 libansys1.so MultiFront::decompose(int, int,
Elemset *, int *, int, int, int)
160799 6.8874 libansys3.so blockInnerProduct6L_fd
160459 6.8729 libansys3.so blockSaxpy6L_fd
156018 6.6826 libansys3.so maxb33_
84700 3.6279 libansys4.so saxpy_pcg
83434 3.5737 libansys1.so elmatrixmultpcg_
58074 2.4875 libansys4.so .st4560
46000 1.9703 libansys4.so .st4282
41166 1.7632 libansys3.so blockSaxpyBackSolve6L_fd
41033 1.7575 libansys3.so blockInnerProductBackSolve6L_fd
35762 1.5318 libansys1.so inner_product_sub
35591 1.5245 libansys1.so inner_product_sub2
28259 1.2104 libansys4.so addVectors
Signed-off-by: Pravin B. Shelar <pravin.shelar@calsoftinc.com>
Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org>
Signed-off-by: Shai Fultheim <shai@scalex86.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: Christoph Lameter <clameter@engr.sgi.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Signed-off-by: Andi Kleen <ak@suse.de>
|
|
Previously it wasn't enabled in the binfmt_aout is a module case.
Signed-off-by: Andi Kleen <ak@suse.de>
|
|
When compiling with -Os (which is default) the compiler defaults to it
anyways. And with -O2 it probably generates somewhat better (although
also larger) code.
Signed-off-by: Andi Kleen <ak@suse.de>
|
|
Fix for the following patch. Provide dummy cpufreq functions when
CPUFREQ is not compiled in.
Cc: Andi Kleen <ak@suse.de>
Cc: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
|