diff options
Diffstat (limited to 'Documentation')
-rw-r--r-- | Documentation/blackfin/00-INDEX | 11 | ||||
-rw-r--r-- | Documentation/blackfin/Filesystems | 169 | ||||
-rw-r--r-- | Documentation/blackfin/cache-lock.txt | 48 | ||||
-rw-r--r-- | Documentation/blackfin/cachefeatures.txt | 65 | ||||
-rw-r--r-- | Documentation/filesystems/proc.txt | 31 | ||||
-rw-r--r-- | Documentation/pcmcia/driver.txt | 30 | ||||
-rw-r--r-- | Documentation/power/interface.txt | 8 | ||||
-rw-r--r-- | Documentation/sysctl/vm.txt | 23 | ||||
-rw-r--r-- | Documentation/vm/slabinfo.c | 943 | ||||
-rw-r--r-- | Documentation/vm/slub.txt | 113 |
10 files changed, 1418 insertions, 23 deletions
diff --git a/Documentation/blackfin/00-INDEX b/Documentation/blackfin/00-INDEX new file mode 100644 index 00000000000..7cb3b356b24 --- /dev/null +++ b/Documentation/blackfin/00-INDEX @@ -0,0 +1,11 @@ +00-INDEX + - This file + +cache-lock.txt + - HOWTO for blackfin cache locking. + +cachefeatures.txt + - Supported cache features. + +Filesystems + - Requirements for mounting the root file system. diff --git a/Documentation/blackfin/Filesystems b/Documentation/blackfin/Filesystems new file mode 100644 index 00000000000..51260a1b803 --- /dev/null +++ b/Documentation/blackfin/Filesystems @@ -0,0 +1,169 @@ +/* + * File: Documentation/blackfin/Filesystems + * Based on: + * Author: + * + * Created: + * Description: This file contains the simple DMA Implementation for Blackfin + * + * Rev: $Id: Filesystems 2384 2006-11-01 04:12:43Z magicyang $ + * + * Modified: + * Copyright 2004-2006 Analog Devices Inc. + * + * Bugs: Enter bugs at http://blackfin.uclinux.org/ + * + */ + + How to mount the root file system in uClinux/Blackfin + ----------------------------------------------------- + +1 Mounting EXT3 File system. + ------------------------ + + Creating an EXT3 File system for uClinux/Blackfin: + + +Please follow the steps to form the EXT3 File system and mount the same as root +file system. + +a Make an ext3 file system as large as you want the final root file + system. + + mkfs.ext3 /dev/ram0 <your-rootfs-size-in-1k-blocks> + +b Mount this Empty file system on a free directory as: + + mount -t ext3 /dev/ram0 ./test + where ./test is the empty directory. + +c Copy your root fs directory that you have so carefully made over. + + cp -af /tmp/my_final_rootfs_files/* ./test + + (For ex: cp -af uClinux-dist/romfs/* ./test) + +d If you have done everything right till now you should be able to see + the required "root" dir's (that's etc, root, bin, lib, sbin...) + +e Now unmount the file system + + umount ./test + +f Create the root file system image. + + dd if=/dev/ram0 bs=1k count=<your-rootfs-size-in-1k-blocks> \ + > ext3fs.img + + +Now you have to tell the kernel that will be mounting this file system as +rootfs. +So do a make menuconfig under kernel and select the Ext3 journaling file system +support under File system --> submenu. + + +2. Mounting EXT2 File system. + ------------------------- + +By default the ext2 file system image will be created if you invoke make from +the top uClinux-dist directory. + + +3. Mounting CRAMFS File System + ---------------------------- + +To create a CRAMFS file system image execute the command + + mkfs.cramfs ./test cramfs.img + + where ./test is the target directory. + + +4. Mounting ROMFS File System + -------------------------- + +To create a ROMFS file system image execute the command + + genromfs -v -V "ROMdisk" -f romfs.img -d ./test + + where ./test is the target directory + + +5. Mounting the JFFS2 Filesystem + ----------------------------- + +To create a compressed JFFS filesystem (JFFS2), please execute the command + + mkfs.jffs2 -d ./test -o jffs2.img + + where ./test is the target directory. + +However, please make sure the following is in your kernel config. + +/* + * RAM/ROM/Flash chip drivers + */ +#define CONFIG_MTD_CFI 1 +#define CONFIG_MTD_ROM 1 +/* + * Mapping drivers for chip access + */ +#define CONFIG_MTD_COMPLEX_MAPPINGS 1 +#define CONFIG_MTD_BF533 1 +#undef CONFIG_MTD_UCLINUX + +Through the u-boot boot loader, use the jffs2.img in the corresponding +partition made in linux-2.6.x/drivers/mtd/maps/bf533_flash.c. + +NOTE - Currently the Flash driver is available only for EZKIT. Watch out for a + STAMP driver soon. + + +6. Mounting the NFS File system + ----------------------------- + + For mounting the NFS please do the following in the kernel config. + + In Networking Support --> Networking options --> TCP/IP networking --> + IP: kernel level autoconfiguration + + Enable BOOTP Support. + + In Kernel hacking --> Compiled-in kernel boot parameter add the following + + root=/dev/nfs rw ip=bootp + + In File system --> Network File system, Enable + + NFS file system support --> NFSv3 client support + Root File system on NFS + + in uClibc menuconfig, do the following + In Networking Support + enable Remote Procedure Call (RPC) support + Full RPC Support + + On the Host side, ensure that /etc/dhcpd.conf looks something like this + + ddns-update-style ad-hoc; + allow bootp; + subnet 10.100.4.0 netmask 255.255.255.0 { + default-lease-time 122209600; + max-lease-time 31557600; + group { + host bf533 { + hardware ethernet 00:CF:52:49:C3:01; + fixed-address 10.100.4.50; + option root-path "/home/nfsmount"; + } + } + + ensure that /etc/exports looks something like this + /home/nfsmount *(rw,no_root_squash,no_all_squash) + + run the following commands as root (may differ depending on your + distribution) : + - service nfs start + - service portmap start + - service dhcpd start + - /usr/sbin/exportfs diff --git a/Documentation/blackfin/cache-lock.txt b/Documentation/blackfin/cache-lock.txt new file mode 100644 index 00000000000..88ba1e6c31c --- /dev/null +++ b/Documentation/blackfin/cache-lock.txt @@ -0,0 +1,48 @@ +/* + * File: Documentation/blackfin/cache-lock.txt + * Based on: + * Author: + * + * Created: + * Description: This file contains the simple DMA Implementation for Blackfin + * + * Rev: $Id: cache-lock.txt 2384 2006-11-01 04:12:43Z magicyang $ + * + * Modified: + * Copyright 2004-2006 Analog Devices Inc. + * + * Bugs: Enter bugs at http://blackfin.uclinux.org/ + * + */ + +How to lock your code in cache in uClinux/blackfin +-------------------------------------------------- + +There are only a few steps required to lock your code into the cache. +Currently you can lock the code by Way. + +Below are the interface provided for locking the cache. + + +1. cache_grab_lock(int Ways); + +This function grab the lock for locking your code into the cache specified +by Ways. + + +2. cache_lock(int Ways); + +This function should be called after your critical code has been executed. +Once the critical code exits, the code is now loaded into the cache. This +function locks the code into the cache. + + +So, the example sequence will be: + + cache_grab_lock(WAY0_L); /* Grab the lock */ + + critical_code(); /* Execute the code of interest */ + + cache_lock(WAY0_L); /* Lock the cache */ + +Where WAY0_L signifies WAY0 locking. diff --git a/Documentation/blackfin/cachefeatures.txt b/Documentation/blackfin/cachefeatures.txt new file mode 100644 index 00000000000..0fbec23becb --- /dev/null +++ b/Documentation/blackfin/cachefeatures.txt @@ -0,0 +1,65 @@ +/* + * File: Documentation/blackfin/cachefeatures.txt + * Based on: + * Author: + * + * Created: + * Description: This file contains the simple DMA Implementation for Blackfin + * + * Rev: $Id: cachefeatures.txt 2384 2006-11-01 04:12:43Z magicyang $ + * + * Modified: + * Copyright 2004-2006 Analog Devices Inc. + * + * Bugs: Enter bugs at http://blackfin.uclinux.org/ + * + */ + + - Instruction and Data cache initialization. + icache_init(); + dcache_init(); + + - Instruction and Data cache Invalidation Routines, when flushing the + same is not required. + _icache_invalidate(); + _dcache_invalidate(); + + Also, for invalidating the entire instruction and data cache, the below + routines are provided (another method for invalidation, refer page no 267 and 287 of + ADSP-BF533 Hardware Reference manual) + + invalidate_entire_dcache(); + invalidate_entire_icache(); + + -External Flushing of Instruction and data cache routines. + + flush_instruction_cache(); + flush_data_cache(); + + - Internal Flushing of Instruction and Data Cache. + + icplb_flush(); + dcplb_flush(); + + - Locking the cache. + + cache_grab_lock(); + cache_lock(); + + Please refer linux-2.6.x/Documentation/blackfin/cache-lock.txt for how to + lock the cache. + + Locking the cache is optional feature. + + - Miscellaneous cache functions. + + flush_cache_all(); + flush_cache_mm(); + invalidate_dcache_range(); + flush_dcache_range(); + flush_dcache_page(); + flush_cache_range(); + flush_cache_page(); + invalidate_dcache_range(); + flush_page_to_ram(); + diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt index 7aaf09b86a5..3f4b226572e 100644 --- a/Documentation/filesystems/proc.txt +++ b/Documentation/filesystems/proc.txt @@ -122,21 +122,22 @@ subdirectory has the entries listed in Table 1-1. Table 1-1: Process specific entries in /proc .............................................................................. - File Content - cmdline Command line arguments - cpu Current and last cpu in which it was executed (2.4)(smp) - cwd Link to the current working directory - environ Values of environment variables - exe Link to the executable of this process - fd Directory, which contains all file descriptors - maps Memory maps to executables and library files (2.4) - mem Memory held by this process - root Link to the root directory of this process - stat Process status - statm Process memory status information - status Process status in human readable form - wchan If CONFIG_KALLSYMS is set, a pre-decoded wchan - smaps Extension based on maps, presenting the rss size for each mapped file + File Content + clear_refs Clears page referenced bits shown in smaps output + cmdline Command line arguments + cpu Current and last cpu in which it was executed (2.4)(smp) + cwd Link to the current working directory + environ Values of environment variables + exe Link to the executable of this process + fd Directory, which contains all file descriptors + maps Memory maps to executables and library files (2.4) + mem Memory held by this process + root Link to the root directory of this process + stat Process status + statm Process memory status information + status Process status in human readable form + wchan If CONFIG_KALLSYMS is set, a pre-decoded wchan + smaps Extension based on maps, the rss size for each mapped file .............................................................................. For example, to get the status information of a process, all you have to do is diff --git a/Documentation/pcmcia/driver.txt b/Documentation/pcmcia/driver.txt new file mode 100644 index 00000000000..0ac16792077 --- /dev/null +++ b/Documentation/pcmcia/driver.txt @@ -0,0 +1,30 @@ +PCMCIA Driver +------------- + + +sysfs +----- + +New PCMCIA IDs may be added to a device driver pcmcia_device_id table at +runtime as shown below: + +echo "match_flags manf_id card_id func_id function device_no \ +prod_id_hash[0] prod_id_hash[1] prod_id_hash[2] prod_id_hash[3]" > \ +/sys/bus/pcmcia/drivers/{driver}/new_id + +All fields are passed in as hexadecimal values (no leading 0x). +The meaning is described in the PCMCIA specification, the match_flags is +a bitwise or-ed combination from PCMCIA_DEV_ID_MATCH_* constants +defined in include/linux/mod_devicetable.h. + +Once added, the driver probe routine will be invoked for any unclaimed +PCMCIA device listed in its (newly updated) pcmcia_device_id list. + +A common use-case is to add a new device according to the manufacturer ID +and the card ID (form the manf_id and card_id file in the device tree). +For this, just use: + +echo "0x3 manf_id card_id 0 0 0 0 0 0 0" > \ + /sys/bus/pcmcia/drivers/{driver}/new_id + +after loading the driver. diff --git a/Documentation/power/interface.txt b/Documentation/power/interface.txt index 8c5b41bf3f3..fd5192a8fa8 100644 --- a/Documentation/power/interface.txt +++ b/Documentation/power/interface.txt @@ -34,8 +34,12 @@ for 5 seconds, resume devices, unfreeze tasks and enable nonboot CPUs. Then, we are able to look in the log messages and work out, for example, which code is being slow and which device drivers are misbehaving. -Reading from this file will display what the mode is currently set -to. Writing to this file will accept one of +Reading from this file will display all supported modes and the currently +selected one in brackets, for example + + [shutdown] reboot test testproc + +Writing to this file will accept one of 'platform' (only if the platform supports it) 'shutdown' diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt index e96a341eb7e..1d192565e18 100644 --- a/Documentation/sysctl/vm.txt +++ b/Documentation/sysctl/vm.txt @@ -197,11 +197,22 @@ and may not be fast. panic_on_oom -This enables or disables panic on out-of-memory feature. If this is set to 1, -the kernel panics when out-of-memory happens. If this is set to 0, the kernel -will kill some rogue process, called oom_killer. Usually, oom_killer can kill -rogue processes and system will survive. If you want to panic the system -rather than killing rogue processes, set this to 1. +This enables or disables panic on out-of-memory feature. -The default value is 0. +If this is set to 0, the kernel will kill some rogue process, +called oom_killer. Usually, oom_killer can kill rogue processes and +system will survive. + +If this is set to 1, the kernel panics when out-of-memory happens. +However, if a process limits using nodes by mempolicy/cpusets, +and those nodes become memory exhaustion status, one process +may be killed by oom-killer. No panic occurs in this case. +Because other nodes' memory may be free. This means system total status +may be not fatal yet. +If this is set to 2, the kernel panics compulsorily even on the +above-mentioned. + +The default value is 0. +1 and 2 are for failover of clustering. Please select either +according to your policy of failover. diff --git a/Documentation/vm/slabinfo.c b/Documentation/vm/slabinfo.c new file mode 100644 index 00000000000..41710ccf3a2 --- /dev/null +++ b/Documentation/vm/slabinfo.c @@ -0,0 +1,943 @@ +/* + * Slabinfo: Tool to get reports about slabs + * + * (C) 2007 sgi, Christoph Lameter <clameter@sgi.com> + * + * Compile by: + * + * gcc -o slabinfo slabinfo.c + */ +#include <stdio.h> +#include <stdlib.h> +#include <sys/types.h> +#include <dirent.h> +#include <string.h> +#include <unistd.h> +#include <stdarg.h> +#include <getopt.h> +#include <regex.h> + +#define MAX_SLABS 500 +#define MAX_ALIASES 500 +#define MAX_NODES 1024 + +struct slabinfo { + char *name; + int alias; + int refs; + int aliases, align, cache_dma, cpu_slabs, destroy_by_rcu; + int hwcache_align, object_size, objs_per_slab; + int sanity_checks, slab_size, store_user, trace; + int order, poison, reclaim_account, red_zone; + unsigned long partial, objects, slabs; + int numa[MAX_NODES]; + int numa_partial[MAX_NODES]; +} slabinfo[MAX_SLABS]; + +struct aliasinfo { + char *name; + char *ref; + struct slabinfo *slab; +} aliasinfo[MAX_ALIASES]; + +int slabs = 0; +int aliases = 0; +int alias_targets = 0; +int highest_node = 0; + +char buffer[4096]; + +int show_alias = 0; +int show_slab = 0; +int skip_zero = 1; +int show_numa = 0; +int show_track = 0; +int show_first_alias = 0; +int validate = 0; +int shrink = 0; +int show_inverted = 0; +int show_single_ref = 0; +int show_totals = 0; +int sort_size = 0; + +int page_size; + +regex_t pattern; + +void fatal(const char *x, ...) +{ + va_list ap; + + va_start(ap, x); + vfprintf(stderr, x, ap); + va_end(ap); + exit(1); +} + +void usage(void) +{ + printf("slabinfo [-ahnpvtsz] [slab-regexp]\n" + "-a|--aliases Show aliases\n" + "-h|--help Show usage information\n" + "-n|--numa Show NUMA information\n" + "-s|--shrink Shrink slabs\n" + "-v|--validate Validate slabs\n" + "-t|--tracking Show alloc/free information\n" + "-T|--Totals Show summary information\n" + "-l|--slabs Show slabs\n" + "-S|--Size Sort by size\n" + "-z|--zero Include empty slabs\n" + "-f|--first-alias Show first alias\n" + "-i|--inverted Inverted list\n" + "-1|--1ref Single reference\n" + ); +} + +unsigned long read_obj(char *name) +{ + FILE *f = fopen(name, "r"); + + if (!f) + buffer[0] = 0; + else { + if (!fgets(buffer,sizeof(buffer), f)) + buffer[0] = 0; + fclose(f); + if (buffer[strlen(buffer)] == '\n') + buffer[strlen(buffer)] = 0; + } + return strlen(buffer); +} + + +/* + * Get the contents of an attribute + */ +unsigned long get_obj(char *name) +{ + if (!read_obj(name)) + return 0; + + return atol(buffer); +} + +unsigned long get_obj_and_str(char *name, char **x) +{ + unsigned long result = 0; + char *p; + + *x = NULL; + + if (!read_obj(name)) { + x = NULL; + return 0; + } + result = strtoul(buffer, &p, 10); + while (*p == ' ') + p++; + if (*p) + *x = strdup(p); + return result; +} + +void set_obj(struct slabinfo *s, char *name, int n) +{ + char x[100]; + + sprintf(x, "%s/%s", s->name, name); + + FILE *f = fopen(x, "w"); + + if (!f) + fatal("Cannot write to %s\n", x); + + fprintf(f, "%d\n", n); + fclose(f); +} + +/* + * Put a size string together + */ +int store_size(char *buffer, unsigned long value) +{ + unsigned long divisor = 1; + char trailer = 0; + int n; + + if (value > 1000000000UL) { + divisor = 100000000UL; + trailer = 'G'; + } else if (value > 1000000UL) { + divisor = 100000UL; + trailer = 'M'; + } else if (value > 1000UL) { + divisor = 100; + trailer = 'K'; + } + + value /= divisor; + n = sprintf(buffer, "%ld",value); + if (trailer) { + buffer[n] = trailer; + n++; + buffer[n] = 0; + } + if (divisor != 1) { + memmove(buffer + n - 2, buffer + n - 3, 4); + buffer[n-2] = '.'; + n++; + } + return n; +} + +void decode_numa_list(int *numa, char *t) +{ + int node; + int nr; + + memset(numa, 0, MAX_NODES * sizeof(int)); + + while (*t == 'N') { + t++; + node = strtoul(t, &t, 10); + if (*t == '=') { + t++; + nr = strtoul(t, &t, 10); + numa[node] = nr; + if (node > highest_node) + highest_node = node; + } + while (*t == ' ') + t++; + } +} + +void slab_validate(struct slabinfo *s) +{ + set_obj(s, "validate", 1); +} + +void slab_shrink(struct slabinfo *s) +{ + set_obj(s, "shrink", 1); +} + +int line = 0; + +void first_line(void) +{ + printf("Name Objects Objsize Space " + "Slabs/Part/Cpu O/S O %%Fr %%Ef Flg\n"); +} + +/* + * Find the shortest alias of a slab + */ +struct aliasinfo *find_one_alias(struct slabinfo *find) +{ + struct aliasinfo *a; + struct aliasinfo *best = NULL; + + for(a = aliasinfo;a < aliasinfo + aliases; a++) { + if (a->slab == find && + (!best || strlen(best->name) < strlen(a->name))) { + best = a; + if (strncmp(a->name,"kmall", 5) == 0) + return best; + } + } + if (best) + return best; + fatal("Cannot find alias for %s\n", find->name); + return NULL; +} + +unsigned long slab_size(struct slabinfo *s) +{ + return s->slabs * (page_size << s->order); +} + + +void slabcache(struct slabinfo *s) +{ + char size_str[20]; + char dist_str[40]; + char flags[20]; + char *p = flags; + + if (skip_zero && !s->slabs) + return; + + store_size(size_str, slab_size(s)); + sprintf(dist_str,"%lu/%lu/%d", s->slabs, s->partial, s->cpu_slabs); + + if (!line++) + first_line(); + + if (s->aliases) + *p++ = '*'; + if (s->cache_dma) + *p++ = 'd'; + if (s->hwcache_align) + *p++ = 'A'; + if (s->poison) + *p++ = 'P'; + if (s->reclaim_account) + *p++ = 'a'; + if (s->red_zone) + *p++ = 'Z'; + if (s->sanity_checks) + *p++ = 'F'; + if (s->store_user) + *p++ = 'U'; + if (s->trace) + *p++ = 'T'; + + *p = 0; + printf("%-21s %8ld %7d %8s %14s %4d %1d %3ld %3ld %s\n", + s->name, s->objects, s->object_size, size_str, dist_str, + s->objs_per_slab, s->order, + s->slabs ? (s->partial * 100) / s->slabs : 100, + s->slabs ? (s->objects * s->object_size * 100) / + (s->slabs * (page_size << s->order)) : 100, + flags); +} + +void slab_numa(struct slabinfo *s) +{ + int node; + + if (!highest_node) + fatal("No NUMA information available.\n"); + + if (skip_zero && !s->slabs) + return; + + if (!line) { + printf("\nSlab Node "); + for(node = 0; node <= highest_node; node++) + printf(" %4d", node); + printf("\n----------------------"); + for(node = 0; node <= highest_node; node++) + printf("-----"); + printf("\n"); + } + printf("%-21s ", s->name); + for(node = 0; node <= highest_node; node++) { + char b[20]; + + store_size(b, s->numa[node]); + printf(" %4s", b); + } + printf("\n"); + line++; +} + +void show_tracking(struct slabinfo *s) +{ + printf("\n%s: Calls to allocate a slab object\n", s->name); + printf("---------------------------------------------------\n"); + if (read_obj("alloc_calls")) + printf(buffer); + + printf("%s: Calls to free a slab object\n", s->name); + printf("-----------------------------------------------\n"); + if (read_obj("free_calls")) + printf(buffer); + +} + +void totals(void) +{ + struct slabinfo *s; + + int used_slabs = 0; + char b1[20], b2[20], b3[20], b4[20]; + unsigned long long max = 1ULL << 63; + + /* Object size */ + unsigned long long min_objsize = max, max_objsize = 0, avg_objsize; + + /* Number of partial slabs in a slabcache */ + unsigned long long min_partial = max, max_partial = 0, + avg_partial, total_partial = 0; + + /* Number of slabs in a slab cache */ + unsigned long long min_slabs = max, max_slabs = 0, + avg_slabs, total_slabs = 0; + + /* Size of the whole slab */ + unsigned long long min_size = max, max_size = 0, + avg_size, total_size = 0; + + /* Bytes used for object storage in a slab */ + unsigned long long min_used = max, max_used = 0, + avg_used, total_used = 0; + + /* Waste: Bytes used for alignment and padding */ + unsigned long long min_waste = max, max_waste = 0, + avg_waste, total_waste = 0; + /* Number of objects in a slab */ + unsigned long long min_objects = max, max_objects = 0, + avg_objects, total_objects = 0; + /* Waste per object */ + unsigned long long min_objwaste = max, + max_objwaste = 0, avg_objwaste, + total_objwaste = 0; + + /* Memory per object */ + unsigned long long min_memobj = max, + max_memobj = 0, avg_memobj, + total_objsize = 0; + + /* Percentage of partial slabs per slab */ + unsigned long min_ppart = 100, max_ppart = 0, + avg_ppart, total_ppart = 0; + + /* Number of objects in partial slabs */ + unsigned long min_partobj = max, max_partobj = 0, + avg_partobj, total_partobj = 0; + + /* Percentage of partial objects of all objects in a slab */ + unsigned long min_ppartobj = 100, max_ppartobj = 0, + avg_ppartobj, total_ppartobj = 0; + + + for (s = slabinfo; s < slabinfo + slabs; s++) { + unsigned long long size; + unsigned long used; + unsigned long long wasted; + unsigned long long objwaste; + long long objects_in_partial_slabs; + unsigned long percentage_partial_slabs; + unsigned long percentage_partial_objs; + + if (!s->slabs || !s->objects) + continue; + + used_slabs++; + + size = slab_size(s); + used = s->objects * s->object_size; + wasted = size - used; + objwaste = s->slab_size - s->object_size; + + objects_in_partial_slabs = s->objects - + (s->slabs - s->partial - s ->cpu_slabs) * + s->objs_per_slab; + + if (objects_in_partial_slabs < 0) + objects_in_partial_slabs = 0; + + percentage_partial_slabs = s->partial * 100 / s->slabs; + if (percentage_partial_slabs > 100) + percentage_partial_slabs = 100; + + percentage_partial_objs = objects_in_partial_slabs * 100 + / s->objects; + + if (percentage_partial_objs > 100) + percentage_partial_objs = 100; + + if (s->object_size < min_objsize) + min_objsize = s->object_size; + if (s->partial < min_partial) + min_partial = s->partial; + if (s->slabs < min_slabs) + min_slabs = s->slabs; + if (size < min_size) + min_size = size; + if (wasted < min_waste) + min_waste = wasted; + if (objwaste < min_objwaste) + min_objwaste = objwaste; + if (s->objects < min_objects) + min_objects = s->objects; + if (used < min_used) + min_used = used; + if (objects_in_partial_slabs < min_partobj) + min_partobj = objects_in_partial_slabs; + if (percentage_partial_slabs < min_ppart) + min_ppart = percentage_partial_slabs; + if (percentage_partial_objs < min_ppartobj) + min_ppartobj = percentage_partial_objs; + if (s->slab_size < min_memobj) + min_memobj = s->slab_size; + + if (s->object_size > max_objsize) + max_objsize = s->object_size; + if (s->partial > max_partial) + max_partial = s->partial; + if (s->slabs > max_slabs) + max_slabs = s->slabs; + if (size > max_size) + max_size = size; + if (wasted > max_waste) + max_waste = wasted; + if (objwaste > max_objwaste) + max_objwaste = objwaste; + if (s->objects > max_objects) + max_objects = s->objects; + if (used > max_used) + max_used = used; + if (objects_in_partial_slabs > max_partobj) + max_partobj = objects_in_partial_slabs; + if (percentage_partial_slabs > max_ppart) + max_ppart = percentage_partial_slabs; + if (percentage_partial_objs > max_ppartobj) + max_ppartobj = percentage_partial_objs; + if (s->slab_size > max_memobj) + max_memobj = s->slab_size; + + total_partial += s->partial; + total_slabs += s->slabs; + total_size += size; + total_waste += wasted; + + total_objects += s->objects; + total_used += used; + total_partobj += objects_in_partial_slabs; + total_ppart += percentage_partial_slabs; + total_ppartobj += percentage_partial_objs; + + total_objwaste += s->objects * objwaste; + total_objsize += s->objects * s->slab_size; + } + + if (!total_objects) { + printf("No objects\n"); + return; + } + if (!used_slabs) { + printf("No slabs\n"); + return; + } + + /* Per slab averages */ + avg_partial = total_partial / used_slabs; + avg_slabs = total_slabs / used_slabs; + avg_size = total_size / used_slabs; + avg_waste = total_waste / used_slabs; + + avg_objects = total_objects / used_slabs; + avg_used = total_used / used_slabs; + avg_partobj = total_partobj / used_slabs; + avg_ppart = total_ppart / used_slabs; + avg_ppartobj = total_ppartobj / used_slabs; + + /* Per object object sizes */ + avg_objsize = total_used / total_objects; + avg_objwaste = total_objwaste / total_objects; + avg_partobj = total_partobj * 100 / total_objects; + avg_memobj = total_objsize / total_objects; + + printf("Slabcache Totals\n"); + printf("----------------\n"); + printf("Slabcaches : %3d Aliases : %3d->%-3d Active: %3d\n", + slabs, aliases, alias_targets, used_slabs); + + store_size(b1, total_size);store_size(b2, total_waste); + store_size(b3, total_waste * 100 / total_used); + printf("Memory used: %6s # Loss : %6s MRatio: %6s%%\n", b1, b2, b3); + + store_size(b1, total_objects);store_size(b2, total_partobj); + store_size(b3, total_partobj * 100 / total_objects); + printf("# Objects : %6s # PartObj: %6s ORatio: %6s%%\n", b1, b2, b3); + + printf("\n"); + printf("Per Cache Average Min Max Total\n"); + printf("---------------------------------------------------------\n"); + + store_size(b1, avg_objects);store_size(b2, min_objects); + store_size(b3, max_objects);store_size(b4, total_objects); + printf("#Objects %10s %10s %10s %10s\n", + b1, b2, b3, b4); + + store_size(b1, avg_slabs);store_size(b2, min_slabs); + store_size(b3, max_slabs);store_size(b4, total_slabs); + printf("#Slabs %10s %10s %10s %10s\n", + b1, b2, b3, b4); + + store_size(b1, avg_partial);store_size(b2, min_partial); + store_size(b3, max_partial);store_size(b4, total_partial); + printf("#PartSlab %10s %10s %10s %10s\n", + b1, b2, b3, b4); + store_size(b1, avg_ppart);store_size(b2, min_ppart); + store_size(b3, max_ppart); + store_size(b4, total_partial * 100 / total_slabs); + printf("%%PartSlab %10s%% %10s%% %10s%% %10s%%\n", + b1, b2, b3, b4); + + store_size(b1, avg_partobj);store_size(b2, min_partobj); + store_size(b3, max_partobj); + store_size(b4, total_partobj); + printf("PartObjs %10s %10s %10s %10s\n", + b1, b2, b3, b4); + + store_size(b1, avg_ppartobj);store_size(b2, min_ppartobj); + store_size(b3, max_ppartobj); + store_size(b4, total_partobj * 100 / total_objects); + printf("%% PartObj %10s%% %10s%% %10s%% %10s%%\n", + b1, b2, b3, b4); + + store_size(b1, avg_size);store_size(b2, min_size); + store_size(b3, max_size);store_size(b4, total_size); + printf("Memory %10s %10s %10s %10s\n", + b1, b2, b3, b4); + + store_size(b1, avg_used);store_size(b2, min_used); + store_size(b3, max_used);store_size(b4, total_used); + printf("Used %10s %10s %10s %10s\n", + b1, b2, b3, b4); + + store_size(b1, avg_waste);store_size(b2, min_waste); + store_size(b3, max_waste);store_size(b4, total_waste); + printf("Loss %10s %10s %10s %10s\n", + b1, b2, b3, b4); + + printf("\n"); + printf("Per Object Average Min Max\n"); + printf("---------------------------------------------\n"); + + store_size(b1, avg_memobj);store_size(b2, min_memobj); + store_size(b3, max_memobj); + printf("Memory %10s %10s %10s\n", + b1, b2, b3); + store_size(b1, avg_objsize);store_size(b2, min_objsize); + store_size(b3, max_objsize); + printf("User %10s %10s %10s\n", + b1, b2, b3); + + store_size(b1, avg_objwaste);store_size(b2, min_objwaste); + store_size(b3, max_objwaste); + printf("Loss %10s %10s %10s\n", + b1, b2, b3); +} + +void sort_slabs(void) +{ + struct slabinfo *s1,*s2; + + for (s1 = slabinfo; s1 < slabinfo + slabs; s1++) { + for (s2 = s1 + 1; s2 < slabinfo + slabs; s2++) { + int result; + + if (sort_size) + result = slab_size(s1) < slab_size(s2); + else + result = strcasecmp(s1->name, s2->name); + + if (show_inverted) + result = -result; + + if (result > 0) { + struct slabinfo t; + + memcpy(&t, s1, sizeof(struct slabinfo)); + memcpy(s1, s2, sizeof(struct slabinfo)); + memcpy(s2, &t, sizeof(struct slabinfo)); + } + } + } +} + +void sort_aliases(void) +{ + struct aliasinfo *a1,*a2; + + for (a1 = aliasinfo; a1 < aliasinfo + aliases; a1++) { + for (a2 = a1 + 1; a2 < aliasinfo + aliases; a2++) { + char *n1, *n2; + + n1 = a1->name; + n2 = a2->name; + if (show_alias && !show_inverted) { + n1 = a1->ref; + n2 = a2->ref; + } + if (strcasecmp(n1, n2) > 0) { + struct aliasinfo t; + + memcpy(&t, a1, sizeof(struct aliasinfo)); + memcpy(a1, a2, sizeof(struct aliasinfo)); + memcpy(a2, &t, sizeof(struct aliasinfo)); + } + } + } +} + +void link_slabs(void) +{ + struct aliasinfo *a; + struct slabinfo *s; + + for (a = aliasinfo; a < aliasinfo + aliases; a++) { + + for(s = slabinfo; s < slabinfo + slabs; s++) + if (strcmp(a->ref, s->name) == 0) { + a->slab = s; + s->refs++; + break; + } + if (s == slabinfo + slabs) + fatal("Unresolved alias %s\n", a->ref); + } +} + +void alias(void) +{ + struct aliasinfo *a; + char *active = NULL; + + sort_aliases(); + link_slabs(); + + for(a = aliasinfo; a < aliasinfo + aliases; a++) { + + if (!show_single_ref && a->slab->refs == 1) + continue; + + if (!show_inverted) { + if (active) { + if (strcmp(a->slab->name, active) == 0) { + printf(" %s", a->name); + continue; + } + } + printf("\n%-20s <- %s", a->slab->name, a->name); + active = a->slab->name; + } + else + printf("%-20s -> %s\n", a->name, a->slab->name); + } + if (active) + printf("\n"); +} + + +void rename_slabs(void) +{ + struct slabinfo *s; + struct aliasinfo *a; + + for (s = slabinfo; s < slabinfo + slabs; s++) { + if (*s->name != ':') + continue; + + if (s->refs > 1 && !show_first_alias) + continue; + + a = find_one_alias(s); + + s->name = a->name; + } +} + +int slab_mismatch(char *slab) +{ + return regexec(&pattern, slab, 0, NULL, 0); +} + +void read_slab_dir(void) +{ + DIR *dir; + struct dirent *de; + struct slabinfo *slab = slabinfo; + struct aliasinfo *alias = aliasinfo; + char *p; + char *t; + int count; + + dir = opendir("."); + while ((de = readdir(dir))) { + if (de->d_name[0] == '.' || + slab_mismatch(de->d_name)) + continue; + switch (de->d_type) { + case DT_LNK: + alias->name = strdup(de->d_name); + count = readlink(de->d_name, buffer, sizeof(buffer)); + + if (count < 0) + fatal("Cannot read symlink %s\n", de->d_name); + + buffer[count] = 0; + p = buffer + count; + while (p > buffer && p[-1] != '/') + p--; + alias->ref = strdup(p); + alias++; + break; + case DT_DIR: + if (chdir(de->d_name)) + fatal("Unable to access slab %s\n", slab->name); + slab->name = strdup(de->d_name); + slab->alias = 0; + slab->refs = 0; + slab->aliases = get_obj("aliases"); + slab->align = get_obj("align"); + slab->cache_dma = get_obj("cache_dma"); + slab->cpu_slabs = get_obj("cpu_slabs"); + slab->destroy_by_rcu = get_obj("destroy_by_rcu"); + slab->hwcache_align = get_obj("hwcache_align"); + slab->object_size = get_obj("object_size"); + slab->objects = get_obj("objects"); + slab->objs_per_slab = get_obj("objs_per_slab"); + slab->order = get_obj("order"); + slab->partial = get_obj("partial"); + slab->partial = get_obj_and_str("partial", &t); + decode_numa_list(slab->numa_partial, t); + slab->poison = get_obj("poison"); + slab->reclaim_account = get_obj("reclaim_account"); + slab->red_zone = get_obj("red_zone"); + slab->sanity_checks = get_obj("sanity_checks"); + slab->slab_size = get_obj("slab_size"); + slab->slabs = get_obj_and_str("slabs", &t); + decode_numa_list(slab->numa, t); + slab->store_user = get_obj("store_user"); + slab->trace = get_obj("trace"); + chdir(".."); + if (slab->name[0] == ':') + alias_targets++; + slab++; + break; + default : + fatal("Unknown file type %lx\n", de->d_type); + } + } + closedir(dir); + slabs = slab - slabinfo; + aliases = alias - aliasinfo; + if (slabs > MAX_SLABS) + fatal("Too many slabs\n"); + if (aliases > MAX_ALIASES) + fatal("Too many aliases\n"); +} + +void output_slabs(void) +{ + struct slabinfo *slab; + + for (slab = slabinfo; slab < slabinfo + slabs; slab++) { + + if (slab->alias) + continue; + + + if (show_numa) + slab_numa(slab); + else + if (show_track) + show_tracking(slab); + else + if (validate) + slab_validate(slab); + else + if (shrink) + slab_shrink(slab); + else { + if (show_slab) + slabcache(slab); + } + } +} + +struct option opts[] = { + { "aliases", 0, NULL, 'a' }, + { "slabs", 0, NULL, 'l' }, + { "numa", 0, NULL, 'n' }, + { "zero", 0, NULL, 'z' }, + { "help", 0, NULL, 'h' }, + { "validate", 0, NULL, 'v' }, + { "first-alias", 0, NULL, 'f' }, + { "shrink", 0, NULL, 's' }, + { "track", 0, NULL, 't'}, + { "inverted", 0, NULL, 'i'}, + { "1ref", 0, NULL, '1'}, + { NULL, 0, NULL, 0 } +}; + +int main(int argc, char *argv[]) +{ + int c; + int err; + char *pattern_source; + + page_size = getpagesize(); + if (chdir("/sys/slab")) + fatal("This kernel does not have SLUB support.\n"); + + while ((c = getopt_long(argc, argv, "afhil1npstvzTS", opts, NULL)) != -1) + switch(c) { + case '1': + show_single_ref = 1; + break; + case 'a': + show_alias = 1; + break; + case 'f': + show_first_alias = 1; + break; + case 'h': + usage(); + return 0; + case 'i': + show_inverted = 1; + break; + case 'n': + show_numa = 1; + break; + case 's': + shrink = 1; + break; + case 'l': + show_slab = 1; + break; + case 't': + show_track = 1; + break; + case 'v': + validate = 1; + break; + case 'z': + skip_zero = 0; + break; + case 'T': + show_totals = 1; + break; + case 'S': + sort_size = 1; + break; + + default: + fatal("%s: Invalid option '%c'\n", argv[0], optopt); + + } + + if (!show_slab && !show_alias && !show_track + && !validate && !shrink) + show_slab = 1; + + if (argc > optind) + pattern_source = argv[optind]; + else + pattern_source = ".*"; + + err = regcomp(&pattern, pattern_source, REG_ICASE|REG_NOSUB); + if (err) + fatal("%s: Invalid pattern '%s' code %d\n", + argv[0], pattern_source, err); + read_slab_dir(); + if (show_alias) + alias(); + else + if (show_totals) + totals(); + else { + link_slabs(); + rename_slabs(); + sort_slabs(); + output_slabs(); + } + return 0; +} diff --git a/Documentation/vm/slub.txt b/Documentation/vm/slub.txt new file mode 100644 index 00000000000..727c8d81aea --- /dev/null +++ b/Documentation/vm/slub.txt @@ -0,0 +1,113 @@ +Short users guide for SLUB +-------------------------- + +First of all slub should transparently replace SLAB. If you enable +SLUB then everything should work the same (Note the word "should". +There is likely not much value in that word at this point). + +The basic philosophy of SLUB is very different from SLAB. SLAB +requires rebuilding the kernel to activate debug options for all +SLABS. SLUB always includes full debugging but its off by default. +SLUB can enable debugging only for selected slabs in order to avoid +an impact on overall system performance which may make a bug more +difficult to find. + +In order to switch debugging on one can add a option "slub_debug" +to the kernel command line. That will enable full debugging for +all slabs. + +Typically one would then use the "slabinfo" command to get statistical +data and perform operation on the slabs. By default slabinfo only lists +slabs that have data in them. See "slabinfo -h" for more options when +running the command. slabinfo can be compiled with + +gcc -o slabinfo Documentation/vm/slabinfo.c + +Some of the modes of operation of slabinfo require that slub debugging +be enabled on the command line. F.e. no tracking information will be +available without debugging on and validation can only partially +be performed if debugging was not switched on. + +Some more sophisticated uses of slub_debug: +------------------------------------------- + +Parameters may be given to slub_debug. If none is specified then full +debugging is enabled. Format: + +slub_debug=<Debug-Options> Enable options for all slabs +slub_debug=<Debug-Options>,<slab name> + Enable options only for select slabs + +Possible debug options are + F Sanity checks on (enables SLAB_DEBUG_FREE. Sorry + SLAB legacy issues) + Z Red zoning + P Poisoning (object and padding) + U User tracking (free and alloc) + T Trace (please only use on single slabs) + +F.e. in order to boot just with sanity checks and red zoning one would specify: + + slub_debug=FZ + +Trying to find an issue in the dentry cache? Try + + slub_debug=,dentry_cache + +to only enable debugging on the dentry cache. + +Red zoning and tracking may realign the slab. We can just apply sanity checks +to the dentry cache with + + slub_debug=F,dentry_cache + +In case you forgot to enable debugging on the kernel command line: It is +possible to enable debugging manually when the kernel is up. Look at the +contents of: + +/sys/slab/<slab name>/ + +Look at the writable files. Writing 1 to them will enable the +corresponding debug option. All options can be set on a slab that does +not contain objects. If the slab already contains objects then sanity checks +and tracing may only be enabled. The other options may cause the realignment +of objects. + +Careful with tracing: It may spew out lots of information and never stop if +used on the wrong slab. + +SLAB Merging +------------ + +If no debugging is specified then SLUB may merge similar slabs together +in order to reduce overhead and increase cache hotness of objects. +slabinfo -a displays which slabs were merged together. + +Getting more performance +------------------------ + +To some degree SLUB's performance is limited by the need to take the +list_lock once in a while to deal with partial slabs. That overhead is +governed by the order of the allocation for each slab. The allocations +can be influenced by kernel parameters: + +slub_min_objects=x (default 8) +slub_min_order=x (default 0) +slub_max_order=x (default 4) + +slub_min_objects allows to specify how many objects must at least fit +into one slab in order for the allocation order to be acceptable. +In general slub will be able to perform this number of allocations +on a slab without consulting centralized resources (list_lock) where +contention may occur. + +slub_min_order specifies a minim order of slabs. A similar effect like +slub_min_objects. + +slub_max_order specified the order at which slub_min_objects should no +longer be checked. This is useful to avoid SLUB trying to generate +super large order pages to fit slub_min_objects of a slab cache with +large object sizes into one high order page. + + +Christoph Lameter, <clameter@sgi.com>, April 10, 2007 |