summaryrefslogtreecommitdiffstats
path: root/Documentation
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/ABI/testing/sysfs-devices-cache_disable18
-rw-r--r--Documentation/ABI/testing/sysfs-kernel-slab479
-rw-r--r--Documentation/DMA-API.txt12
-rw-r--r--Documentation/DocBook/Makefile5
-rw-r--r--Documentation/DocBook/kgdb.tmpl2
-rw-r--r--Documentation/RCU/trace.txt102
-rw-r--r--Documentation/filesystems/Locking24
-rw-r--r--Documentation/filesystems/tmpfs.txt2
-rw-r--r--Documentation/futex-requeue-pi.txt131
-rw-r--r--Documentation/hwmon/sysfs-interface6
-rw-r--r--Documentation/input/bcm5974.txt65
-rw-r--r--Documentation/input/multi-touch-protocol.txt195
-rw-r--r--Documentation/kernel-doc-nano-HOWTO.txt7
-rw-r--r--Documentation/kernel-parameters.txt31
-rw-r--r--Documentation/lockdep-design.txt6
-rw-r--r--Documentation/memory-barriers.txt129
-rw-r--r--Documentation/networking/ip-sysctl.txt15
-rw-r--r--Documentation/scheduler/sched-rt-group.txt20
-rw-r--r--Documentation/sound/alsa/HD-Audio-Models.txt1
-rw-r--r--Documentation/sound/alsa/Procfile.txt5
-rw-r--r--Documentation/sysctl/vm.txt32
-rw-r--r--Documentation/sysfs-rules.txt2
-rw-r--r--Documentation/trace/ftrace.txt15
-rw-r--r--Documentation/x86/boot.txt122
-rw-r--r--Documentation/x86/x86_64/boot-options.txt5
-rw-r--r--Documentation/x86/x86_64/mm.txt9
26 files changed, 1339 insertions, 101 deletions
diff --git a/Documentation/ABI/testing/sysfs-devices-cache_disable b/Documentation/ABI/testing/sysfs-devices-cache_disable
new file mode 100644
index 00000000000..175bb4f7051
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-devices-cache_disable
@@ -0,0 +1,18 @@
+What: /sys/devices/system/cpu/cpu*/cache/index*/cache_disable_X
+Date: August 2008
+KernelVersion: 2.6.27
+Contact: mark.langsdorf@amd.com
+Description: These files exist in every cpu's cache index directories.
+ There are currently 2 cache_disable_# files in each
+ directory. Reading from these files on a supported
+ processor will return that cache disable index value
+ for that processor and node. Writing to one of these
+ files will cause the specificed cache index to be disabled.
+
+ Currently, only AMD Family 10h Processors support cache index
+ disable, and only for their L3 caches. See the BIOS and
+ Kernel Developer's Guide at
+ http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/31116-Public-GH-BKDG_3.20_2-4-09.pdf
+ for formatting information and other details on the
+ cache index disable.
+Users: joachim.deguara@amd.com
diff --git a/Documentation/ABI/testing/sysfs-kernel-slab b/Documentation/ABI/testing/sysfs-kernel-slab
new file mode 100644
index 00000000000..6dcf75e594f
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-kernel-slab
@@ -0,0 +1,479 @@
+What: /sys/kernel/slab
+Date: May 2007
+KernelVersion: 2.6.22
+Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
+ Christoph Lameter <cl@linux-foundation.org>
+Description:
+ The /sys/kernel/slab directory contains a snapshot of the
+ internal state of the SLUB allocator for each cache. Certain
+ files may be modified to change the behavior of the cache (and
+ any cache it aliases, if any).
+Users: kernel memory tuning tools
+
+What: /sys/kernel/slab/cache/aliases
+Date: May 2007
+KernelVersion: 2.6.22
+Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
+ Christoph Lameter <cl@linux-foundation.org>
+Description:
+ The aliases file is read-only and specifies how many caches
+ have merged into this cache.
+
+What: /sys/kernel/slab/cache/align
+Date: May 2007
+KernelVersion: 2.6.22
+Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
+ Christoph Lameter <cl@linux-foundation.org>
+Description:
+ The align file is read-only and specifies the cache's object
+ alignment in bytes.
+
+What: /sys/kernel/slab/cache/alloc_calls
+Date: May 2007
+KernelVersion: 2.6.22
+Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
+ Christoph Lameter <cl@linux-foundation.org>
+Description:
+ The alloc_calls file is read-only and lists the kernel code
+ locations from which allocations for this cache were performed.
+ The alloc_calls file only contains information if debugging is
+ enabled for that cache (see Documentation/vm/slub.txt).
+
+What: /sys/kernel/slab/cache/alloc_fastpath
+Date: February 2008
+KernelVersion: 2.6.25
+Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
+ Christoph Lameter <cl@linux-foundation.org>
+Description:
+ The alloc_fastpath file is read-only and specifies how many
+ objects have been allocated using the fast path.
+ Available when CONFIG_SLUB_STATS is enabled.
+
+What: /sys/kernel/slab/cache/alloc_from_partial
+Date: February 2008
+KernelVersion: 2.6.25
+Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
+ Christoph Lameter <cl@linux-foundation.org>
+Description:
+ The alloc_from_partial file is read-only and specifies how
+ many times a cpu slab has been full and it has been refilled
+ by using a slab from the list of partially used slabs.
+ Available when CONFIG_SLUB_STATS is enabled.
+
+What: /sys/kernel/slab/cache/alloc_refill
+Date: February 2008
+KernelVersion: 2.6.25
+Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
+ Christoph Lameter <cl@linux-foundation.org>
+Description:
+ The alloc_refill file is read-only and specifies how many
+ times the per-cpu freelist was empty but there were objects
+ available as the result of remote cpu frees.
+ Available when CONFIG_SLUB_STATS is enabled.
+
+What: /sys/kernel/slab/cache/alloc_slab
+Date: February 2008
+KernelVersion: 2.6.25
+Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
+ Christoph Lameter <cl@linux-foundation.org>
+Description:
+ The alloc_slab file is read-only and specifies how many times
+ a new slab had to be allocated from the page allocator.
+ Available when CONFIG_SLUB_STATS is enabled.
+
+What: /sys/kernel/slab/cache/alloc_slowpath
+Date: February 2008
+KernelVersion: 2.6.25
+Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
+ Christoph Lameter <cl@linux-foundation.org>
+Description:
+ The alloc_slowpath file is read-only and specifies how many
+ objects have been allocated using the slow path because of a
+ refill or allocation from a partial or new slab.
+ Available when CONFIG_SLUB_STATS is enabled.
+
+What: /sys/kernel/slab/cache/cache_dma
+Date: May 2007
+KernelVersion: 2.6.22
+Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
+ Christoph Lameter <cl@linux-foundation.org>
+Description:
+ The cache_dma file is read-only and specifies whether objects
+ are from ZONE_DMA.
+ Available when CONFIG_ZONE_DMA is enabled.
+
+What: /sys/kernel/slab/cache/cpu_slabs
+Date: May 2007
+KernelVersion: 2.6.22
+Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
+ Christoph Lameter <cl@linux-foundation.org>
+Description:
+ The cpu_slabs file is read-only and displays how many cpu slabs
+ are active and their NUMA locality.
+
+What: /sys/kernel/slab/cache/cpuslab_flush
+Date: April 2009
+KernelVersion: 2.6.31
+Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
+ Christoph Lameter <cl@linux-foundation.org>
+Description:
+ The file cpuslab_flush is read-only and specifies how many
+ times a cache's cpu slabs have been flushed as the result of
+ destroying or shrinking a cache, a cpu going offline, or as
+ the result of forcing an allocation from a certain node.
+ Available when CONFIG_SLUB_STATS is enabled.
+
+What: /sys/kernel/slab/cache/ctor
+Date: May 2007
+KernelVersion: 2.6.22
+Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
+ Christoph Lameter <cl@linux-foundation.org>
+Description:
+ The ctor file is read-only and specifies the cache's object
+ constructor function, which is invoked for each object when a
+ new slab is allocated.
+
+What: /sys/kernel/slab/cache/deactivate_empty
+Date: February 2008
+KernelVersion: 2.6.25
+Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
+ Christoph Lameter <cl@linux-foundation.org>
+Description:
+ The file deactivate_empty is read-only and specifies how many
+ times an empty cpu slab was deactivated.
+ Available when CONFIG_SLUB_STATS is enabled.
+
+What: /sys/kernel/slab/cache/deactivate_full
+Date: February 2008
+KernelVersion: 2.6.25
+Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
+ Christoph Lameter <cl@linux-foundation.org>
+Description:
+ The file deactivate_full is read-only and specifies how many
+ times a full cpu slab was deactivated.
+ Available when CONFIG_SLUB_STATS is enabled.
+
+What: /sys/kernel/slab/cache/deactivate_remote_frees
+Date: February 2008
+KernelVersion: 2.6.25
+Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
+ Christoph Lameter <cl@linux-foundation.org>
+Description:
+ The file deactivate_remote_frees is read-only and specifies how
+ many times a cpu slab has been deactivated and contained free
+ objects that were freed remotely.
+ Available when CONFIG_SLUB_STATS is enabled.
+
+What: /sys/kernel/slab/cache/deactivate_to_head
+Date: February 2008
+KernelVersion: 2.6.25
+Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
+ Christoph Lameter <cl@linux-foundation.org>
+Description:
+ The file deactivate_to_head is read-only and specifies how
+ many times a partial cpu slab was deactivated and added to the
+ head of its node's partial list.
+ Available when CONFIG_SLUB_STATS is enabled.
+
+What: /sys/kernel/slab/cache/deactivate_to_tail
+Date: February 2008
+KernelVersion: 2.6.25
+Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
+ Christoph Lameter <cl@linux-foundation.org>
+Description:
+ The file deactivate_to_tail is read-only and specifies how
+ many times a partial cpu slab was deactivated and added to the
+ tail of its node's partial list.
+ Available when CONFIG_SLUB_STATS is enabled.
+
+What: /sys/kernel/slab/cache/destroy_by_rcu
+Date: May 2007
+KernelVersion: 2.6.22
+Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
+ Christoph Lameter <cl@linux-foundation.org>
+Description:
+ The destroy_by_rcu file is read-only and specifies whether
+ slabs (not objects) are freed by rcu.
+
+What: /sys/kernel/slab/cache/free_add_partial
+Date: February 2008
+KernelVersion: 2.6.25
+Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
+ Christoph Lameter <cl@linux-foundation.org>
+Description:
+ The file free_add_partial is read-only and specifies how many
+ times an object has been freed in a full slab so that it had to
+ added to its node's partial list.
+ Available when CONFIG_SLUB_STATS is enabled.
+
+What: /sys/kernel/slab/cache/free_calls
+Date: May 2007
+KernelVersion: 2.6.22
+Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
+ Christoph Lameter <cl@linux-foundation.org>
+Description:
+ The free_calls file is read-only and lists the locations of
+ object frees if slab debugging is enabled (see
+ Documentation/vm/slub.txt).
+
+What: /sys/kernel/slab/cache/free_fastpath
+Date: February 2008
+KernelVersion: 2.6.25
+Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
+ Christoph Lameter <cl@linux-foundation.org>
+Description:
+ The free_fastpath file is read-only and specifies how many
+ objects have been freed using the fast path because it was an
+ object from the cpu slab.
+ Available when CONFIG_SLUB_STATS is enabled.
+
+What: /sys/kernel/slab/cache/free_frozen
+Date: February 2008
+KernelVersion: 2.6.25
+Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
+ Christoph Lameter <cl@linux-foundation.org>
+Description:
+ The free_frozen file is read-only and specifies how many
+ objects have been freed to a frozen slab (i.e. a remote cpu
+ slab).
+ Available when CONFIG_SLUB_STATS is enabled.
+
+What: /sys/kernel/slab/cache/free_remove_partial
+Date: February 2008
+KernelVersion: 2.6.25
+Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
+ Christoph Lameter <cl@linux-foundation.org>
+Description:
+ The file free_remove_partial is read-only and specifies how
+ many times an object has been freed to a now-empty slab so
+ that it had to be removed from its node's partial list.
+ Available when CONFIG_SLUB_STATS is enabled.
+
+What: /sys/kernel/slab/cache/free_slab
+Date: February 2008
+KernelVersion: 2.6.25
+Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
+ Christoph Lameter <cl@linux-foundation.org>
+Description:
+ The free_slab file is read-only and specifies how many times an
+ empty slab has been freed back to the page allocator.
+ Available when CONFIG_SLUB_STATS is enabled.
+
+What: /sys/kernel/slab/cache/free_slowpath
+Date: February 2008
+KernelVersion: 2.6.25
+Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
+ Christoph Lameter <cl@linux-foundation.org>
+Description:
+ The free_slowpath file is read-only and specifies how many
+ objects have been freed using the slow path (i.e. to a full or
+ partial slab).
+ Available when CONFIG_SLUB_STATS is enabled.
+
+What: /sys/kernel/slab/cache/hwcache_align
+Date: May 2007
+KernelVersion: 2.6.22
+Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
+ Christoph Lameter <cl@linux-foundation.org>
+Description:
+ The hwcache_align file is read-only and specifies whether
+ objects are aligned on cachelines.
+
+What: /sys/kernel/slab/cache/min_partial
+Date: February 2009
+KernelVersion: 2.6.30
+Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
+ David Rientjes <rientjes@google.com>
+Description:
+ The min_partial file specifies how many empty slabs shall
+ remain on a node's partial list to avoid the overhead of
+ allocating new slabs. Such slabs may be reclaimed by utilizing
+ the shrink file.
+
+What: /sys/kernel/slab/cache/object_size
+Date: May 2007
+KernelVersion: 2.6.22
+Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
+ Christoph Lameter <cl@linux-foundation.org>
+Description:
+ The object_size file is read-only and specifies the cache's
+ object size.
+
+What: /sys/kernel/slab/cache/objects
+Date: May 2007
+KernelVersion: 2.6.22
+Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
+ Christoph Lameter <cl@linux-foundation.org>
+Description:
+ The objects file is read-only and displays how many objects are
+ active and from which nodes they are from.
+
+What: /sys/kernel/slab/cache/objects_partial
+Date: April 2008
+KernelVersion: 2.6.26
+Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
+ Christoph Lameter <cl@linux-foundation.org>
+Description:
+ The objects_partial file is read-only and displays how many
+ objects are on partial slabs and from which nodes they are
+ from.
+
+What: /sys/kernel/slab/cache/objs_per_slab
+Date: May 2007
+KernelVersion: 2.6.22
+Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
+ Christoph Lameter <cl@linux-foundation.org>
+Description:
+ The file objs_per_slab is read-only and specifies how many
+ objects may be allocated from a single slab of the order
+ specified in /sys/kernel/slab/cache/order.
+
+What: /sys/kernel/slab/cache/order
+Date: May 2007
+KernelVersion: 2.6.22
+Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
+ Christoph Lameter <cl@linux-foundation.org>
+Description:
+ The order file specifies the page order at which new slabs are
+ allocated. It is writable and can be changed to increase the
+ number of objects per slab. If a slab cannot be allocated
+ because of fragmentation, SLUB will retry with the minimum order
+ possible depending on its characteristics.
+
+What: /sys/kernel/slab/cache/order_fallback
+Date: April 2008
+KernelVersion: 2.6.26
+Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
+ Christoph Lameter <cl@linux-foundation.org>
+Description:
+ The file order_fallback is read-only and specifies how many
+ times an allocation of a new slab has not been possible at the
+ cache's order and instead fallen back to its minimum possible
+ order.
+ Available when CONFIG_SLUB_STATS is enabled.
+
+What: /sys/kernel/slab/cache/partial
+Date: May 2007
+KernelVersion: 2.6.22
+Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
+ Christoph Lameter <cl@linux-foundation.org>
+Description:
+ The partial file is read-only and displays how long many
+ partial slabs there are and how long each node's list is.
+
+What: /sys/kernel/slab/cache/poison
+Date: May 2007
+KernelVersion: 2.6.22
+Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
+ Christoph Lameter <cl@linux-foundation.org>
+Description:
+ The poison file specifies whether objects should be poisoned
+ when a new slab is allocated.
+
+What: /sys/kernel/slab/cache/reclaim_account
+Date: May 2007
+KernelVersion: 2.6.22
+Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
+ Christoph Lameter <cl@linux-foundation.org>
+Description:
+ The reclaim_account file specifies whether the cache's objects
+ are reclaimable (and grouped by their mobility).
+
+What: /sys/kernel/slab/cache/red_zone
+Date: May 2007
+KernelVersion: 2.6.22
+Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
+ Christoph Lameter <cl@linux-foundation.org>
+Description:
+ The red_zone file specifies whether the cache's objects are red
+ zoned.
+
+What: /sys/kernel/slab/cache/remote_node_defrag_ratio
+Date: January 2008
+KernelVersion: 2.6.25
+Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
+ Christoph Lameter <cl@linux-foundation.org>
+Description:
+ The file remote_node_defrag_ratio specifies the percentage of
+ times SLUB will attempt to refill the cpu slab with a partial
+ slab from a remote node as opposed to allocating a new slab on
+ the local node. This reduces the amount of wasted memory over
+ the entire system but can be expensive.
+ Available when CONFIG_NUMA is enabled.
+
+What: /sys/kernel/slab/cache/sanity_checks
+Date: May 2007
+KernelVersion: 2.6.22
+Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
+ Christoph Lameter <cl@linux-foundation.org>
+Description:
+ The sanity_checks file specifies whether expensive checks
+ should be performed on free and, at minimum, enables double free
+ checks. Caches that enable sanity_checks cannot be merged with
+ caches that do not.
+
+What: /sys/kernel/slab/cache/shrink
+Date: May 2007
+KernelVersion: 2.6.22
+Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
+ Christoph Lameter <cl@linux-foundation.org>
+Description:
+ The shrink file is written when memory should be reclaimed from
+ a cache. Empty partial slabs are freed and the partial list is
+ sorted so the slabs with the fewest available objects are used
+ first.
+
+What: /sys/kernel/slab/cache/slab_size
+Date: May 2007
+KernelVersion: 2.6.22
+Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
+ Christoph Lameter <cl@linux-foundation.org>
+Description:
+ The slab_size file is read-only and specifies the object size
+ with metadata (debugging information and alignment) in bytes.
+
+What: /sys/kernel/slab/cache/slabs
+Date: May 2007
+KernelVersion: 2.6.22
+Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
+ Christoph Lameter <cl@linux-foundation.org>
+Description:
+ The slabs file is read-only and displays how long many slabs
+ there are (both cpu and partial) and from which nodes they are
+ from.
+
+What: /sys/kernel/slab/cache/store_user
+Date: May 2007
+KernelVersion: 2.6.22
+Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
+ Christoph Lameter <cl@linux-foundation.org>
+Description:
+ The store_user file specifies whether the location of
+ allocation or free should be tracked for a cache.
+
+What: /sys/kernel/slab/cache/total_objects
+Date: April 2008
+KernelVersion: 2.6.26
+Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
+ Christoph Lameter <cl@linux-foundation.org>
+Description:
+ The total_objects file is read-only and displays how many total
+ objects a cache has and from which nodes they are from.
+
+What: /sys/kernel/slab/cache/trace
+Date: May 2007
+KernelVersion: 2.6.22
+Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
+ Christoph Lameter <cl@linux-foundation.org>
+Description:
+ The trace file specifies whether object allocations and frees
+ should be traced.
+
+What: /sys/kernel/slab/cache/validate
+Date: May 2007
+KernelVersion: 2.6.22
+Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
+ Christoph Lameter <cl@linux-foundation.org>
+Description:
+ Writing to the validate file causes SLUB to traverse all of its
+ cache's objects and check the validity of metadata.
diff --git a/Documentation/DMA-API.txt b/Documentation/DMA-API.txt
index d9aa43d78bc..25fb8bcf32a 100644
--- a/Documentation/DMA-API.txt
+++ b/Documentation/DMA-API.txt
@@ -704,12 +704,24 @@ this directory the following files can currently be found:
The current number of free dma_debug_entries
in the allocator.
+ dma-api/driver-filter
+ You can write a name of a driver into this file
+ to limit the debug output to requests from that
+ particular driver. Write an empty string to
+ that file to disable the filter and see
+ all errors again.
+
If you have this code compiled into your kernel it will be enabled by default.
If you want to boot without the bookkeeping anyway you can provide
'dma_debug=off' as a boot parameter. This will disable DMA-API debugging.
Notice that you can not enable it again at runtime. You have to reboot to do
so.
+If you want to see debug messages only for a special device driver you can
+specify the dma_debug_driver=<drivername> parameter. This will enable the
+driver filter at boot time. The debug code will only print errors for that
+driver afterwards. This filter can be disabled or changed later using debugfs.
+
When the code disables itself at runtime this is most likely because it ran
out of dma_debug_entries. These entries are preallocated at boot. The number
of preallocated entries is defined per architecture. If it is too low for you
diff --git a/Documentation/DocBook/Makefile b/Documentation/DocBook/Makefile
index 8918a32c6b3..b1eb661e630 100644
--- a/Documentation/DocBook/Makefile
+++ b/Documentation/DocBook/Makefile
@@ -143,7 +143,8 @@ quiet_cmd_db2pdf = PDF $@
$(call cmd,db2pdf)
-main_idx = Documentation/DocBook/index.html
+index = index.html
+main_idx = Documentation/DocBook/$(index)
build_main_index = rm -rf $(main_idx) && \
echo '<h1>Linux Kernel HTML Documentation</h1>' >> $(main_idx) && \
echo '<h2>Kernel Version: $(KERNELVERSION)</h2>' >> $(main_idx) && \
@@ -232,7 +233,7 @@ clean-files := $(DOCBOOKS) \
$(patsubst %.xml, %.pdf, $(DOCBOOKS)) \
$(patsubst %.xml, %.html, $(DOCBOOKS)) \
$(patsubst %.xml, %.9, $(DOCBOOKS)) \
- $(C-procfs-example)
+ $(C-procfs-example) $(index)
clean-dirs := $(patsubst %.xml,%,$(DOCBOOKS)) man
diff --git a/Documentation/DocBook/kgdb.tmpl b/Documentation/DocBook/kgdb.tmpl
index 372dec20c8d..5cff41a5fa7 100644
--- a/Documentation/DocBook/kgdb.tmpl
+++ b/Documentation/DocBook/kgdb.tmpl
@@ -281,7 +281,7 @@
seriously wrong while debugging, it will most often be the case
that you want to enable gdb to be verbose about its target
communications. You do this prior to issuing the <constant>target
- remote</constant> command by typing in: <constant>set remote debug 1</constant>
+ remote</constant> command by typing in: <constant>set debug remote 1</constant>
</para>
</chapter>
<chapter id="KGDBTestSuite">
diff --git a/Documentation/RCU/trace.txt b/Documentation/RCU/trace.txt
index 068848240a8..02cced183b2 100644
--- a/Documentation/RCU/trace.txt
+++ b/Documentation/RCU/trace.txt
@@ -192,23 +192,24 @@ rcu/rcuhier (which displays the struct rcu_node hierarchy).
The output of "cat rcu/rcudata" looks as follows:
rcu:
- 0 c=4011 g=4012 pq=1 pqc=4011 qp=0 rpfq=1 rp=3c2a dt=23301/73 dn=2 df=1882 of=0 ri=2126 ql=2 b=10
- 1 c=4011 g=4012 pq=1 pqc=4011 qp=0 rpfq=3 rp=39a6 dt=78073/1 dn=2 df=1402 of=0 ri=1875 ql=46 b=10
- 2 c=4010 g=4010 pq=1 pqc=4010 qp=0 rpfq=-5 rp=1d12 dt=16646/0 dn=2 df=3140 of=0 ri=2080 ql=0 b=10
- 3 c=4012 g=4013 pq=1 pqc=4012 qp=1 rpfq=3 rp=2b50 dt=21159/1 dn=2 df=2230 of=0 ri=1923 ql=72 b=10
- 4 c=4012 g=4013 pq=1 pqc=4012 qp=1 rpfq=3 rp=1644 dt=5783/1 dn=2 df=3348 of=0 ri=2805 ql=7 b=10
- 5 c=4012 g=4013 pq=0 pqc=4011 qp=1 rpfq=3 rp=1aac dt=5879/1 dn=2 df=3140 of=0 ri=2066 ql=10 b=10
- 6 c=4012 g=4013 pq=1 pqc=4012 qp=1 rpfq=3 rp=ed8 dt=5847/1 dn=2 df=3797 of=0 ri=1266 ql=10 b=10
- 7 c=4012 g=4013 pq=1 pqc=4012 qp=1 rpfq=3 rp=1fa2 dt=6199/1 dn=2 df=2795 of=0 ri=2162 ql=28 b=10
+rcu:
+ 0 c=17829 g=17829 pq=1 pqc=17829 qp=0 dt=10951/1 dn=0 df=1101 of=0 ri=36 ql=0 b=10
+ 1 c=17829 g=17829 pq=1 pqc=17829 qp=0 dt=16117/1 dn=0 df=1015 of=0 ri=0 ql=0 b=10
+ 2 c=17829 g=17829 pq=1 pqc=17829 qp=0 dt=1445/1 dn=0 df=1839 of=0 ri=0 ql=0 b=10
+ 3 c=17829 g=17829 pq=1 pqc=17829 qp=0 dt=6681/1 dn=0 df=1545 of=0 ri=0 ql=0 b=10
+ 4 c=17829 g=17829 pq=1 pqc=17829 qp=0 dt=1003/1 dn=0 df=1992 of=0 ri=0 ql=0 b=10
+ 5 c=17829 g=17830 pq=1 pqc=17829 qp=1 dt=3887/1 dn=0 df=3331 of=0 ri=4 ql=2 b=10
+ 6 c=17829 g=17829 pq=1 pqc=17829 qp=0 dt=859/1 dn=0 df=3224 of=0 ri=0 ql=0 b=10
+ 7 c=17829 g=17830 pq=0 pqc=17829 qp=1 dt=3761/1 dn=0 df=1818 of=0 ri=0 ql=2 b=10
rcu_bh:
- 0 c=-268 g=-268 pq=1 pqc=-268 qp=0 rpfq=-145 rp=21d6 dt=23301/73 dn=2 df=0 of=0 ri=0 ql=0 b=10
- 1 c=-268 g=-268 pq=1 pqc=-268 qp=1 rpfq=-170 rp=20ce dt=78073/1 dn=2 df=26 of=0 ri=5 ql=0 b=10
- 2 c=-268 g=-268 pq=1 pqc=-268 qp=1 rpfq=-83 rp=fbd dt=16646/0 dn=2 df=28 of=0 ri=4 ql=0 b=10
- 3 c=-268 g=-268 pq=1 pqc=-268 qp=0 rpfq=-105 rp=178c dt=21159/1 dn=2 df=28 of=0 ri=2 ql=0 b=10
- 4 c=-268 g=-268 pq=1 pqc=-268 qp=1 rpfq=-30 rp=b54 dt=5783/1 dn=2 df=32 of=0 ri=0 ql=0 b=10
- 5 c=-268 g=-268 pq=1 pqc=-268 qp=1 rpfq=-29 rp=df5 dt=5879/1 dn=2 df=30 of=0 ri=3 ql=0 b=10
- 6 c=-268 g=-268 pq=1 pqc=-268 qp=1 rpfq=-28 rp=788 dt=5847/1 dn=2 df=32 of=0 ri=0 ql=0 b=10
- 7 c=-268 g=-268 pq=1 pqc=-268 qp=1 rpfq=-53 rp=1098 dt=6199/1 dn=2 df=30 of=0 ri=3 ql=0 b=10
+ 0 c=-275 g=-275 pq=1 pqc=-275 qp=0 dt=10951/1 dn=0 df=0 of=0 ri=0 ql=0 b=10
+ 1 c=-275 g=-275 pq=1 pqc=-275 qp=0 dt=16117/1 dn=0 df=13 of=0 ri=0 ql=0 b=10
+ 2 c=-275 g=-275 pq=1 pqc=-275 qp=0 dt=1445/1 dn=0 df=15 of=0 ri=0 ql=0 b=10
+ 3 c=-275 g=-275 pq=1 pqc=-275 qp=0 dt=6681/1 dn=0 df=9 of=0 ri=0 ql=0 b=10
+ 4 c=-275 g=-275 pq=1 pqc=-275 qp=0 dt=1003/1 dn=0 df=15 of=0 ri=0 ql=0 b=10
+ 5 c=-275 g=-275 pq=1 pqc=-275 qp=0 dt=3887/1 dn=0 df=15 of=0 ri=0 ql=0 b=10
+ 6 c=-275 g=-275 pq=1 pqc=-275 qp=0 dt=859/1 dn=0 df=15 of=0 ri=0 ql=0 b=10
+ 7 c=-275 g=-275 pq=1 pqc=-275 qp=0 dt=3761/1 dn=0 df=15 of=0 ri=0 ql=0 b=10
The first section lists the rcu_data structures for rcu, the second for
rcu_bh. Each section has one line per CPU, or eight for this 8-CPU system.
@@ -253,12 +254,6 @@ o "pqc" indicates which grace period the last-observed quiescent
o "qp" indicates that RCU still expects a quiescent state from
this CPU.
-o "rpfq" is the number of rcu_pending() calls on this CPU required
- to induce this CPU to invoke force_quiescent_state().
-
-o "rp" is low-order four hex digits of the count of how many times
- rcu_pending() has been invoked on this CPU.
-
o "dt" is the current value of the dyntick counter that is incremented
when entering or leaving dynticks idle state, either by the
scheduler or by irq. The number after the "/" is the interrupt
@@ -305,6 +300,9 @@ o "b" is the batch limit for this CPU. If more than this number
of RCU callbacks is ready to invoke, then the remainder will
be deferred.
+There is also an rcu/rcudata.csv file with the same information in
+comma-separated-variable spreadsheet format.
+
The output of "cat rcu/rcugp" looks as follows:
@@ -411,3 +409,63 @@ o Each element of the form "1/1 0:127 ^0" represents one struct
For example, the first entry at the lowest level shows
"^0", indicating that it corresponds to bit zero in
the first entry at the middle level.
+
+
+The output of "cat rcu/rcu_pending" looks as follows:
+
+rcu:
+ 0 np=255892 qsp=53936 cbr=0 cng=14417 gpc=10033 gps=24320 nf=6445 nn=146741
+ 1 np=261224 qsp=54638 cbr=0 cng=25723 gpc=16310 gps=2849 nf=5912 nn=155792
+ 2 np=237496 qsp=49664 cbr=0 cng=2762 gpc=45478 gps=1762 nf=1201 nn=136629
+ 3 np=236249 qsp=48766 cbr=0 cng=286 gpc=48049 gps=1218 nf=207 nn=137723
+ 4 np=221310 qsp=46850 cbr=0 cng=26 gpc=43161 gps=4634 nf=3529 nn=123110
+ 5 np=237332 qsp=48449 cbr=0 cng=54 gpc=47920 gps=3252 nf=201 nn=137456
+ 6 np=219995 qsp=46718 cbr=0 cng=50 gpc=42098 gps=6093 nf=4202 nn=120834
+ 7 np=249893 qsp=49390 cbr=0 cng=72 gpc=38400 gps=17102 nf=41 nn=144888
+rcu_bh:
+ 0 np=146741 qsp=1419 cbr=0 cng=6 gpc=0 gps=0 nf=2 nn=145314
+ 1 np=155792 qsp=12597 cbr=0 cng=0 gpc=4 gps=8 nf=3 nn=143180
+ 2 np=136629 qsp=18680 cbr=0 cng=0 gpc=7 gps=6 nf=0 nn=117936
+ 3 np=137723 qsp=2843 cbr=0 cng=0 gpc=10 gps=7 nf=0 nn=134863
+ 4 np=123110 qsp=12433 cbr=0 cng=0 gpc=4 gps=2 nf=0 nn=110671
+ 5 np=137456 qsp=4210 cbr=0 cng=0 gpc=6 gps=5 nf=0 nn=133235
+ 6 np=120834 qsp=9902 cbr=0 cng=0 gpc=6 gps=3 nf=2 nn=110921
+ 7 np=144888 qsp=26336 cbr=0 cng=0 gpc=8 gps=2 nf=0 nn=118542
+
+As always, this is once again split into "rcu" and "rcu_bh" portions.
+The fields are as follows:
+
+o "np" is the number of times that __rcu_pending() has been invoked
+ for the corresponding flavor of RCU.
+
+o "qsp" is the number of times that the RCU was waiting for a
+ quiescent state from this CPU.
+
+o "cbr" is the number of times that this CPU had RCU callbacks
+ that had passed through a grace period, and were thus ready
+ to be invoked.
+
+o "cng" is the number of times that this CPU needed another
+ grace period while RCU was idle.
+
+o "gpc" is the number of times that an old grace period had
+ completed, but this CPU was not yet aware of it.
+
+o "gps" is the number of times that a new grace period had started,
+ but this CPU was not yet aware of it.
+
+o "nf" is the number of times that this CPU suspected that the
+ current grace period had run for too long, and thus needed to
+ be forced.
+
+ Please note that "forcing" consists of sending resched IPIs
+ to holdout CPUs. If that CPU really still is in an old RCU
+ read-side critical section, then we really do have to wait for it.
+ The assumption behing "forcing" is that the CPU is not still in
+ an old RCU read-side critical section, but has not yet responded
+ for some other reason.
+
+o "nn" is the number of times that this CPU needed nothing. Alert
+ readers will note that the rcu "nn" number for a given CPU very
+ closely matches the rcu_bh "np" number for that same CPU. This
+ is due to short-circuit evaluation in rcu_pending().
diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking
index 76efe5b71d7..3120f8dd2c3 100644
--- a/Documentation/filesystems/Locking
+++ b/Documentation/filesystems/Locking
@@ -512,16 +512,24 @@ locking rules:
BKL mmap_sem PageLocked(page)
open: no yes
close: no yes
-fault: no yes
-page_mkwrite: no yes no
+fault: no yes can return with page locked
+page_mkwrite: no yes can return with page locked
access: no yes
- ->page_mkwrite() is called when a previously read-only page is
-about to become writeable. The file system is responsible for
-protecting against truncate races. Once appropriate action has been
-taking to lock out truncate, the page range should be verified to be
-within i_size. The page mapping should also be checked that it is not
-NULL.
+ ->fault() is called when a previously not present pte is about
+to be faulted in. The filesystem must find and return the page associated
+with the passed in "pgoff" in the vm_fault structure. If it is possible that
+the page may be truncated and/or invalidated, then the filesystem must lock
+the page, then ensure it is not already truncated (the page lock will block
+subsequent truncate), and then return with VM_FAULT_LOCKED, and the page
+locked. The VM will unlock the page.
+
+ ->page_mkwrite() is called when a previously read-only pte is
+about to become writeable. The filesystem again must ensure that there are
+no truncate/invalidate races, and then return with the page locked. If
+the page has been truncated, the filesystem should not look up a new page
+like the ->fault() handler, but simply return with VM_FAULT_NOPAGE, which
+will cause the VM to retry the fault.
->access() is called when get_user_pages() fails in
acces_process_vm(), typically used to debug a process through
diff --git a/Documentation/filesystems/tmpfs.txt b/Documentation/filesystems/tmpfs.txt
index 222437efd75..3015da0c6b2 100644
--- a/Documentation/filesystems/tmpfs.txt
+++ b/Documentation/filesystems/tmpfs.txt
@@ -133,4 +133,4 @@ RAM/SWAP in 10240 inodes and it is only accessible by root.
Author:
Christoph Rohland <cr@sap.com>, 1.12.01
Updated:
- Hugh Dickins <hugh@veritas.com>, 4 June 2007
+ Hugh Dickins, 4 June 2007
diff --git a/Documentation/futex-requeue-pi.txt b/Documentation/futex-requeue-pi.txt
new file mode 100644
index 00000000000..9dc1ff4fd53
--- /dev/null
+++ b/Documentation/futex-requeue-pi.txt
@@ -0,0 +1,131 @@
+Futex Requeue PI
+----------------
+
+Requeueing of tasks from a non-PI futex to a PI futex requires
+special handling in order to ensure the underlying rt_mutex is never
+left without an owner if it has waiters; doing so would break the PI
+boosting logic [see rt-mutex-desgin.txt] For the purposes of
+brevity, this action will be referred to as "requeue_pi" throughout
+this document. Priority inheritance is abbreviated throughout as
+"PI".
+
+Motivation
+----------
+
+Without requeue_pi, the glibc implementation of
+pthread_cond_broadcast() must resort to waking all the tasks waiting
+on a pthread_condvar and letting them try to sort out which task
+gets to run first in classic thundering-herd formation. An ideal
+implementation would wake the highest-priority waiter, and leave the
+rest to the natural wakeup inherent in unlocking the mutex
+associated with the condvar.
+
+Consider the simplified glibc calls:
+
+/* caller must lock mutex */
+pthread_cond_wait(cond, mutex)
+{
+ lock(cond->__data.__lock);
+ unlock(mutex);
+ do {
+ unlock(cond->__data.__lock);
+ futex_wait(cond->__data.__futex);
+ lock(cond->__data.__lock);
+ } while(...)
+ unlock(cond->__data.__lock);
+ lock(mutex);
+}
+
+pthread_cond_broadcast(cond)
+{
+ lock(cond->__data.__lock);
+ unlock(cond->__data.__lock);
+ futex_requeue(cond->data.__futex, cond->mutex);
+}
+
+Once pthread_cond_broadcast() requeues the tasks, the cond->mutex
+has waiters. Note that pthread_cond_wait() attempts to lock the
+mutex only after it has returned to user space. This will leave the
+underlying rt_mutex with waiters, and no owner, breaking the
+previously mentioned PI-boosting algorithms.
+
+In order to support PI-aware pthread_condvar's, the kernel needs to
+be able to requeue tasks to PI futexes. This support implies that
+upon a successful futex_wait system call, the caller would return to
+user space already holding the PI futex. The glibc implementation
+would be modified as follows:
+
+
+/* caller must lock mutex */
+pthread_cond_wait_pi(cond, mutex)
+{
+ lock(cond->__data.__lock);
+ unlock(mutex);
+ do {
+ unlock(cond->__data.__lock);
+ futex_wait_requeue_pi(cond->__data.__futex);
+ lock(cond->__data.__lock);
+ } while(...)
+ unlock(cond->__data.__lock);
+ /* the kernel acquired the the mutex for us */
+}
+
+pthread_cond_broadcast_pi(cond)
+{
+ lock(cond->__data.__lock);
+ unlock(cond->__data.__lock);
+ futex_requeue_pi(cond->data.__futex, cond->mutex);
+}
+
+The actual glibc implementation will likely test for PI and make the
+necessary changes inside the existing calls rather than creating new
+calls for the PI cases. Similar changes are needed for
+pthread_cond_timedwait() and pthread_cond_signal().
+
+Implementation
+--------------
+
+In order to ensure the rt_mutex has an owner if it has waiters, it
+is necessary for both the requeue code, as well as the waiting code,
+to be able to acquire the rt_mutex before returning to user space.
+The requeue code cannot simply wake the waiter and leave it to
+acquire the rt_mutex as it would open a race window between the
+requeue call returning to user space and the waiter waking and
+starting to run. This is especially true in the uncontended case.
+
+The solution involves two new rt_mutex helper routines,
+rt_mutex_start_proxy_lock() and rt_mutex_finish_proxy_lock(), which
+allow the requeue code to acquire an uncontended rt_mutex on behalf
+of the waiter and to enqueue the waiter on a contended rt_mutex.
+Two new system calls provide the kernel<->user interface to
+requeue_pi: FUTEX_WAIT_REQUEUE_PI and FUTEX_REQUEUE_CMP_PI.
+
+FUTEX_WAIT_REQUEUE_PI is called by the waiter (pthread_cond_wait()
+and pthread_cond_timedwait()) to block on the initial futex and wait
+to be requeued to a PI-aware futex. The implementation is the
+result of a high-speed collision between futex_wait() and
+futex_lock_pi(), with some extra logic to check for the additional
+wake-up scenarios.
+
+FUTEX_REQUEUE_CMP_PI is called by the waker
+(pthread_cond_broadcast() and pthread_cond_signal()) to requeue and
+possibly wake the waiting tasks. Internally, this system call is
+still handled by futex_requeue (by passing requeue_pi=1). Before
+requeueing, futex_requeue() attempts to acquire the requeue target
+PI futex on behalf of the top waiter. If it can, this waiter is
+woken. futex_requeue() then proceeds to requeue the remaining
+nr_wake+nr_requeue tasks to the PI futex, calling
+rt_mutex_start_proxy_lock() prior to each requeue to prepare the
+task as a waiter on the underlying rt_mutex. It is possible that
+the lock can be acquired at this stage as well, if so, the next
+waiter is woken to finish the acquisition of the lock.
+
+FUTEX_REQUEUE_PI accepts nr_wake and nr_requeue as arguments, but
+their sum is all that really matters. futex_requeue() will wake or
+requeue up to nr_wake + nr_requeue tasks. It will wake only as many
+tasks as it can acquire the lock for, which in the majority of cases
+should be 0 as good programming practice dictates that the caller of
+either pthread_cond_broadcast() or pthread_cond_signal() acquire the
+mutex prior to making the call. FUTEX_REQUEUE_PI requires that
+nr_wake=1. nr_requeue should be INT_MAX for broadcast and 0 for
+signal.
diff --git a/Documentation/hwmon/sysfs-interface b/Documentation/hwmon/sysfs-interface
index 2f10ce6a879..004ee161721 100644
--- a/Documentation/hwmon/sysfs-interface
+++ b/Documentation/hwmon/sysfs-interface
@@ -150,6 +150,11 @@ fan[1-*]_min Fan minimum value
Unit: revolution/min (RPM)
RW
+fan[1-*]_max Fan maximum value
+ Unit: revolution/min (RPM)
+ Only rarely supported by the hardware.
+ RW
+
fan[1-*]_input Fan input value.
Unit: revolution/min (RPM)
RO
@@ -390,6 +395,7 @@ OR
in[0-*]_min_alarm
in[0-*]_max_alarm
fan[1-*]_min_alarm
+fan[1-*]_max_alarm
temp[1-*]_min_alarm
temp[1-*]_max_alarm
temp[1-*]_crit_alarm
diff --git a/Documentation/input/bcm5974.txt b/Documentation/input/bcm5974.txt
new file mode 100644
index 00000000000..5e22dcf6d48
--- /dev/null
+++ b/Documentation/input/bcm5974.txt
@@ -0,0 +1,65 @@
+BCM5974 Driver (bcm5974)
+------------------------
+ Copyright (C) 2008-2009 Henrik Rydberg <rydberg@euromail.se>
+
+The USB initialization and package decoding was made by Scott Shawcroft as
+part of the touchd user-space driver project:
+ Copyright (C) 2008 Scott Shawcroft (scott.shawcroft@gmail.com)
+
+The BCM5974 driver is based on the appletouch driver:
+ Copyright (C) 2001-2004 Greg Kroah-Hartman (greg@kroah.com)
+ Copyright (C) 2005 Johannes Berg (johannes@sipsolutions.net)
+ Copyright (C) 2005 Stelian Pop (stelian@popies.net)
+ Copyright (C) 2005 Frank Arnold (frank@scirocco-5v-turbo.de)
+ Copyright (C) 2005 Peter Osterlund (petero2@telia.com)
+ Copyright (C) 2005 Michael Hanselmann (linux-kernel@hansmi.ch)
+ Copyright (C) 2006 Nicolas Boichat (nicolas@boichat.ch)
+
+This driver adds support for the multi-touch trackpad on the new Apple
+Macbook Air and Macbook Pro laptops. It replaces the appletouch driver on
+those computers, and integrates well with the synaptics driver of the Xorg
+system.
+
+Known to work on Macbook Air, Macbook Pro Penryn and the new unibody
+Macbook 5 and Macbook Pro 5.
+
+Usage
+-----
+
+The driver loads automatically for the supported usb device ids, and
+becomes available both as an event device (/dev/input/event*) and as a
+mouse via the mousedev driver (/dev/input/mice).
+
+USB Race
+--------
+
+The Apple multi-touch trackpads report both mouse and keyboard events via
+different interfaces of the same usb device. This creates a race condition
+with the HID driver, which, if not told otherwise, will find the standard
+HID mouse and keyboard, and claim the whole device. To remedy, the usb
+product id must be listed in the mouse_ignore list of the hid driver.
+
+Debug output
+------------
+
+To ease the development for new hardware version, verbose packet output can
+be switched on with the debug kernel module parameter. The range [1-9]
+yields different levels of verbosity. Example (as root):
+
+echo -n 9 > /sys/module/bcm5974/parameters/debug
+
+tail -f /var/log/debug
+
+echo -n 0 > /sys/module/bcm5974/parameters/debug
+
+Trivia
+------
+
+The driver was developed at the ubuntu forums in June 2008 [1], and now has
+a more permanent home at bitmath.org [2].
+
+Links
+-----
+
+[1] http://ubuntuforums.org/showthread.php?t=840040
+[2] http://http://bitmath.org/code/
diff --git a/Documentation/input/multi-touch-protocol.txt b/Documentation/input/multi-touch-protocol.txt
new file mode 100644
index 00000000000..a12ea3b586e
--- /dev/null
+++ b/Documentation/input/multi-touch-protocol.txt
@@ -0,0 +1,195 @@
+Multi-touch (MT) Protocol
+-------------------------
+ Copyright (C) 2009 Henrik Rydberg <rydberg@euromail.se>
+
+
+Introduction
+------------
+
+In order to utilize the full power of the new multi-touch devices, a way to
+report detailed finger data to user space is needed. This document
+describes the multi-touch (MT) protocol which allows kernel drivers to
+report details for an arbitrary number of fingers.
+
+
+Usage
+-----
+
+Anonymous finger details are sent sequentially as separate packets of ABS
+events. Only the ABS_MT events are recognized as part of a finger
+packet. The end of a packet is marked by calling the input_mt_sync()
+function, which generates a SYN_MT_REPORT event. This instructs the
+receiver to accept the data for the current finger and prepare to receive
+another. The end of a multi-touch transfer is marked by calling the usual
+input_sync() function. This instructs the receiver to act upon events
+accumulated since last EV_SYN/SYN_REPORT and prepare to receive a new
+set of events/packets.
+
+A set of ABS_MT events with the desired properties is defined. The events
+are divided into categories, to allow for partial implementation. The
+minimum set consists of ABS_MT_TOUCH_MAJOR, ABS_MT_POSITION_X and
+ABS_MT_POSITION_Y, which allows for multiple fingers to be tracked. If the
+device supports it, the ABS_MT_WIDTH_MAJOR may be used to provide the size
+of the approaching finger. Anisotropy and direction may be specified with
+ABS_MT_TOUCH_MINOR, ABS_MT_WIDTH_MINOR and ABS_MT_ORIENTATION. The
+ABS_MT_TOOL_TYPE may be used to specify whether the touching tool is a
+finger or a pen or something else. Devices with more granular information
+may specify general shapes as blobs, i.e., as a sequence of rectangular
+shapes grouped together by an ABS_MT_BLOB_ID. Finally, for the few devices
+that currently support it, the ABS_MT_TRACKING_ID event may be used to
+report finger tracking from hardware [5].
+
+Here is what a minimal event sequence for a two-finger touch would look
+like:
+
+ ABS_MT_TOUCH_MAJOR
+ ABS_MT_POSITION_X
+ ABS_MT_POSITION_Y
+ SYN_MT_REPORT
+ ABS_MT_TOUCH_MAJOR
+ ABS_MT_POSITION_X
+ ABS_MT_POSITION_Y
+ SYN_MT_REPORT
+ SYN_REPORT
+
+
+Event Semantics
+---------------
+
+The word "contact" is used to describe a tool which is in direct contact
+with the surface. A finger, a pen or a rubber all classify as contacts.
+
+ABS_MT_TOUCH_MAJOR
+
+The length of the major axis of the contact. The length should be given in
+surface units. If the surface has an X times Y resolution, the largest
+possible value of ABS_MT_TOUCH_MAJOR is sqrt(X^2 + Y^2), the diagonal [4].
+
+ABS_MT_TOUCH_MINOR
+
+The length, in surface units, of the minor axis of the contact. If the
+contact is circular, this event can be omitted [4].
+
+ABS_MT_WIDTH_MAJOR
+
+The length, in surface units, of the major axis of the approaching
+tool. This should be understood as the size of the tool itself. The
+orientation of the contact and the approaching tool are assumed to be the
+same [4].
+
+ABS_MT_WIDTH_MINOR
+
+The length, in surface units, of the minor axis of the approaching
+tool. Omit if circular [4].
+
+The above four values can be used to derive additional information about
+the contact. The ratio ABS_MT_TOUCH_MAJOR / ABS_MT_WIDTH_MAJOR approximates
+the notion of pressure. The fingers of the hand and the palm all have
+different characteristic widths [1].
+
+ABS_MT_ORIENTATION
+
+The orientation of the ellipse. The value should describe a signed quarter
+of a revolution clockwise around the touch center. The signed value range
+is arbitrary, but zero should be returned for a finger aligned along the Y
+axis of the surface, a negative value when finger is turned to the left, and
+a positive value when finger turned to the right. When completely aligned with
+the X axis, the range max should be returned. Orientation can be omitted
+if the touching object is circular, or if the information is not available
+in the kernel driver. Partial orientation support is possible if the device
+can distinguish between the two axis, but not (uniquely) any values in
+between. In such cases, the range of ABS_MT_ORIENTATION should be [0, 1]
+[4].
+
+ABS_MT_POSITION_X
+
+The surface X coordinate of the center of the touching ellipse.
+
+ABS_MT_POSITION_Y
+
+The surface Y coordinate of the center of the touching ellipse.
+
+ABS_MT_TOOL_TYPE
+
+The type of approaching tool. A lot of kernel drivers cannot distinguish
+between different tool types, such as a finger or a pen. In such cases, the
+event should be omitted. The protocol currently supports MT_TOOL_FINGER and
+MT_TOOL_PEN [2].
+
+ABS_MT_BLOB_ID
+
+The BLOB_ID groups several packets together into one arbitrarily shaped
+contact. This is a low-level anonymous grouping, and should not be confused
+with the high-level trackingID [5]. Most kernel drivers will not have blob
+capability, and can safely omit the event.
+
+ABS_MT_TRACKING_ID
+
+The TRACKING_ID identifies an initiated contact throughout its life cycle
+[5]. There are currently only a few devices that support it, so this event
+should normally be omitted.
+
+
+Event Computation
+-----------------
+
+The flora of different hardware unavoidably leads to some devices fitting
+better to the MT protocol than others. To simplify and unify the mapping,
+this section gives recipes for how to compute certain events.
+
+For devices reporting contacts as rectangular shapes, signed orientation
+cannot be obtained. Assuming X and Y are the lengths of the sides of the
+touching rectangle, here is a simple formula that retains the most
+information possible:
+
+ ABS_MT_TOUCH_MAJOR := max(X, Y)
+ ABS_MT_TOUCH_MINOR := min(X, Y)
+ ABS_MT_ORIENTATION := bool(X > Y)
+
+The range of ABS_MT_ORIENTATION should be set to [0, 1], to indicate that
+the device can distinguish between a finger along the Y axis (0) and a
+finger along the X axis (1).
+
+
+Finger Tracking
+---------------
+
+The kernel driver should generate an arbitrary enumeration of the set of
+anonymous contacts currently on the surface. The order in which the packets
+appear in the event stream is not important.
+
+The process of finger tracking, i.e., to assign a unique trackingID to each
+initiated contact on the surface, is left to user space; preferably the
+multi-touch X driver [3]. In that driver, the trackingID stays the same and
+unique until the contact vanishes (when the finger leaves the surface). The
+problem of assigning a set of anonymous fingers to a set of identified
+fingers is a euclidian bipartite matching problem at each event update, and
+relies on a sufficiently rapid update rate.
+
+There are a few devices that support trackingID in hardware. User space can
+make use of these native identifiers to reduce bandwidth and cpu usage.
+
+
+Notes
+-----
+
+In order to stay compatible with existing applications, the data
+reported in a finger packet must not be recognized as single-touch
+events. In addition, all finger data must bypass input filtering,
+since subsequent events of the same type refer to different fingers.
+
+The first kernel driver to utilize the MT protocol is the bcm5974 driver,
+where examples can be found.
+
+[1] With the extension ABS_MT_APPROACH_X and ABS_MT_APPROACH_Y, the
+difference between the contact position and the approaching tool position
+could be used to derive tilt.
+[2] The list can of course be extended.
+[3] The multi-touch X driver is currently in the prototyping stage. At the
+time of writing (April 2009), the MT protocol is not yet merged, and the
+prototype implements finger matching, basic mouse support and two-finger
+scrolling. The project aims at improving the quality of current multi-touch
+functionality available in the Synaptics X driver, and in addition
+implement more advanced gestures.
+[4] See the section on event computation.
+[5] See the section on finger tracking.
diff --git a/Documentation/kernel-doc-nano-HOWTO.txt b/Documentation/kernel-doc-nano-HOWTO.txt
index 026ec7d5738..4d04572b654 100644
--- a/Documentation/kernel-doc-nano-HOWTO.txt
+++ b/Documentation/kernel-doc-nano-HOWTO.txt
@@ -269,7 +269,10 @@ Use the argument mechanism to document members or constants.
Inside a struct description, you can use the "private:" and "public:"
comment tags. Structure fields that are inside a "private:" area
-are not listed in the generated output documentation.
+are not listed in the generated output documentation. The "private:"
+and "public:" tags must begin immediately following a "/*" comment
+marker. They may optionally include comments between the ":" and the
+ending "*/" marker.
Example:
@@ -283,7 +286,7 @@ Example:
struct my_struct {
int a;
int b;
-/* private: */
+/* private: internal use only */
int c;
};
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 6ce5f48859c..f08e2bebb18 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -17,6 +17,12 @@ are specified on the kernel command line with the module name plus
usbcore.blinkenlights=1
+Hyphens (dashes) and underscores are equivalent in parameter names, so
+ log_buf_len=1M print-fatal-signals=1
+can also be entered as
+ log-buf-len=1M print_fatal_signals=1
+
+
This document may not be entirely up to date and comprehensive. The command
"modinfo -p ${modulename}" shows a current list of all parameters of a loadable
module. Loadable modules, after being loaded into the running kernel, also
@@ -323,11 +329,6 @@ and is between 256 and 4096 characters. It is defined in the file
flushed before they will be reused, which
is a lot of faster
- amd_iommu_size= [HW,X86-64]
- Define the size of the aperture for the AMD IOMMU
- driver. Possible values are:
- '32M', '64M' (default), '128M', '256M', '512M', '1G'
-
amijoy.map= [HW,JOY] Amiga joystick support
Map of devices attached to JOY0DAT and JOY1DAT
Format: <a>,<b>
@@ -345,7 +346,7 @@ and is between 256 and 4096 characters. It is defined in the file
not play well with APC CPU idle - disable it if you have
APC and your system crashes randomly.
- apic= [APIC,i386] Advanced Programmable Interrupt Controller
+ apic= [APIC,X86-32] Advanced Programmable Interrupt Controller
Change the output verbosity whilst booting
Format: { quiet (default) | verbose | debug }
Change the amount of debugging information output
@@ -640,6 +641,13 @@ and is between 256 and 4096 characters. It is defined in the file
DMA-API debugging code disables itself because the
architectural default is too low.
+ dma_debug_driver=<driver_name>
+ With this option the DMA-API debugging driver
+ filter feature can be enabled at boot time. Just
+ pass the driver to filter for as the parameter.
+ The filter can be disabled or changed to another
+ driver later using sysfs.
+
dscc4.setup= [NET]
dtc3181e= [HW,SCSI]
@@ -702,7 +710,7 @@ and is between 256 and 4096 characters. It is defined in the file
to discrete, to make X server driver able to add WB
entry later. This parameter enables that.
- enable_timer_pin_1 [i386,x86-64]
+ enable_timer_pin_1 [X86]
Enable PIN 1 of APIC timer
Can be useful to work around chipset bugs
(in particular on some ATI chipsets).
@@ -775,7 +783,7 @@ and is between 256 and 4096 characters. It is defined in the file
hashdist= [KNL,NUMA] Large hashes allocated during boot
are distributed across NUMA nodes. Defaults on
- for IA-64, off otherwise.
+ for 64bit NUMA, off otherwise.
Format: 0 | 1 (for off | on)
hcl= [IA-64] SGI's Hardware Graph compatibility layer
@@ -1529,6 +1537,10 @@ and is between 256 and 4096 characters. It is defined in the file
register save and restore. The kernel will only save
legacy floating-point registers on task switch.
+ noxsave [BUGS=X86] Disables x86 extended register state save
+ and restore using xsave. The kernel will fallback to
+ enabling legacy floating-point and sse state.
+
nohlt [BUGS=ARM,SH] Tells the kernel that the sleep(SH) or
wfi(ARM) instruction doesn't work correctly and not to
use it. This is also useful when using JTAG debugger.
@@ -1565,6 +1577,9 @@ and is between 256 and 4096 characters. It is defined in the file
noinitrd [RAM] Tells the kernel not to load any configured
initial RAM disk.
+ nointremap [X86-64, Intel-IOMMU] Do not enable interrupt
+ remapping.
+
nointroute [IA-64]
nojitter [IA64] Disables jitter checking for ITC timers.
diff --git a/Documentation/lockdep-design.txt b/Documentation/lockdep-design.txt
index 938ea22f2cc..e20d913d591 100644
--- a/Documentation/lockdep-design.txt
+++ b/Documentation/lockdep-design.txt
@@ -54,9 +54,9 @@ locking error messages, inside curlies. A contrived example:
The bit position indicates STATE, STATE-read, for each of the states listed
above, and the character displayed in each indicates:
- '.' acquired while irqs disabled
- '+' acquired in irq context
- '-' acquired with irqs enabled
+ '.' acquired while irqs disabled and not in irq context
+ '-' acquired in irq context
+ '+' acquired with irqs enabled
'?' acquired in irq context with irqs enabled.
Unused mutexes cannot be part of the cause of an error.
diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
index f5b7127f54a..7f5809eddee 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -31,6 +31,7 @@ Contents:
- Locking functions.
- Interrupt disabling functions.
+ - Sleep and wake-up functions.
- Miscellaneous functions.
(*) Inter-CPU locking barrier effects.
@@ -1217,6 +1218,132 @@ barriers are required in such a situation, they must be provided from some
other means.
+SLEEP AND WAKE-UP FUNCTIONS
+---------------------------
+
+Sleeping and waking on an event flagged in global data can be viewed as an
+interaction between two pieces of data: the task state of the task waiting for
+the event and the global data used to indicate the event. To make sure that
+these appear to happen in the right order, the primitives to begin the process
+of going to sleep, and the primitives to initiate a wake up imply certain
+barriers.
+
+Firstly, the sleeper normally follows something like this sequence of events:
+
+ for (;;) {
+ set_current_state(TASK_UNINTERRUPTIBLE);
+ if (event_indicated)
+ break;
+ schedule();
+ }
+
+A general memory barrier is interpolated automatically by set_current_state()
+after it has altered the task state:
+
+ CPU 1
+ ===============================
+ set_current_state();
+ set_mb();
+ STORE current->state
+ <general barrier>
+ LOAD event_indicated
+
+set_current_state() may be wrapped by:
+
+ prepare_to_wait();
+ prepare_to_wait_exclusive();
+
+which therefore also imply a general memory barrier after setting the state.
+The whole sequence above is available in various canned forms, all of which
+interpolate the memory barrier in the right place:
+
+ wait_event();
+ wait_event_interruptible();
+ wait_event_interruptible_exclusive();
+ wait_event_interruptible_timeout();
+ wait_event_killable();
+ wait_event_timeout();
+ wait_on_bit();
+ wait_on_bit_lock();
+
+
+Secondly, code that performs a wake up normally follows something like this:
+
+ event_indicated = 1;
+ wake_up(&event_wait_queue);
+
+or:
+
+ event_indicated = 1;
+ wake_up_process(event_daemon);
+
+A write memory barrier is implied by wake_up() and co. if and only if they wake
+something up. The barrier occurs before the task state is cleared, and so sits
+between the STORE to indicate the event and the STORE to set TASK_RUNNING:
+
+ CPU 1 CPU 2
+ =============================== ===============================
+ set_current_state(); STORE event_indicated
+ set_mb(); wake_up();
+ STORE current->state <write barrier>
+ <general barrier> STORE current->state
+ LOAD event_indicated
+
+The available waker functions include:
+
+ complete();
+ wake_up();
+ wake_up_all();
+ wake_up_bit();
+ wake_up_interruptible();
+ wake_up_interruptible_all();
+ wake_up_interruptible_nr();
+ wake_up_interruptible_poll();
+ wake_up_interruptible_sync();
+ wake_up_interruptible_sync_poll();
+ wake_up_locked();
+ wake_up_locked_poll();
+ wake_up_nr();
+ wake_up_poll();
+ wake_up_process();
+
+
+[!] Note that the memory barriers implied by the sleeper and the waker do _not_
+order multiple stores before the wake-up with respect to loads of those stored
+values after the sleeper has called set_current_state(). For instance, if the
+sleeper does:
+
+ set_current_state(TASK_INTERRUPTIBLE);
+ if (event_indicated)
+ break;
+ __set_current_state(TASK_RUNNING);
+ do_something(my_data);
+
+and the waker does:
+
+ my_data = value;
+ event_indicated = 1;
+ wake_up(&event_wait_queue);
+
+there's no guarantee that the change to event_indicated will be perceived by
+the sleeper as coming after the change to my_data. In such a circumstance, the
+code on both sides must interpolate its own memory barriers between the
+separate data accesses. Thus the above sleeper ought to do:
+
+ set_current_state(TASK_INTERRUPTIBLE);
+ if (event_indicated) {
+ smp_rmb();
+ do_something(my_data);
+ }
+
+and the waker should do:
+
+ my_data = value;
+ smp_wmb();
+ event_indicated = 1;
+ wake_up(&event_wait_queue);
+
+
MISCELLANEOUS FUNCTIONS
-----------------------
@@ -1366,7 +1493,7 @@ WHERE ARE MEMORY BARRIERS NEEDED?
Under normal operation, memory operation reordering is generally not going to
be a problem as a single-threaded linear piece of code will still appear to
-work correctly, even if it's in an SMP kernel. There are, however, three
+work correctly, even if it's in an SMP kernel. There are, however, four
circumstances in which reordering definitely _could_ be a problem:
(*) Interprocessor interaction.
diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index ec5de02f543..b121c5db707 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -1266,13 +1266,22 @@ sctp_rmem - vector of 3 INTEGERs: min, default, max
sctp_wmem - vector of 3 INTEGERs: min, default, max
See tcp_wmem for a description.
-UNDOCUMENTED:
/proc/sys/net/core/*
- dev_weight FIXME
+dev_weight - INTEGER
+ The maximum number of packets that kernel can handle on a NAPI
+ interrupt, it's a Per-CPU variable.
+
+ Default: 64
/proc/sys/net/unix/*
- max_dgram_qlen FIXME
+max_dgram_qlen - INTEGER
+ The maximum length of dgram socket receive queue
+
+ Default: 10
+
+
+UNDOCUMENTED:
/proc/sys/net/irda/*
fast_poll_increase FIXME
diff --git a/Documentation/scheduler/sched-rt-group.txt b/Documentation/scheduler/sched-rt-group.txt
index 5ba4d3fc625..1df7f9cdab0 100644
--- a/Documentation/scheduler/sched-rt-group.txt
+++ b/Documentation/scheduler/sched-rt-group.txt
@@ -4,6 +4,7 @@
CONTENTS
========
+0. WARNING
1. Overview
1.1 The problem
1.2 The solution
@@ -14,6 +15,23 @@ CONTENTS
3. Future plans
+0. WARNING
+==========
+
+ Fiddling with these settings can result in an unstable system, the knobs are
+ root only and assumes root knows what he is doing.
+
+Most notable:
+
+ * very small values in sched_rt_period_us can result in an unstable
+ system when the period is smaller than either the available hrtimer
+ resolution, or the time it takes to handle the budget refresh itself.
+
+ * very small values in sched_rt_runtime_us can result in an unstable
+ system when the runtime is so small the system has difficulty making
+ forward progress (NOTE: the migration thread and kstopmachine both
+ are real-time processes).
+
1. Overview
===========
@@ -169,7 +187,7 @@ get their allocated time.
Implementing SCHED_EDF might take a while to complete. Priority Inheritance is
the biggest challenge as the current linux PI infrastructure is geared towards
-the limited static priority levels 0-139. With deadline scheduling you need to
+the limited static priority levels 0-99. With deadline scheduling you need to
do deadline inheritance (since priority is inversely proportional to the
deadline delta (deadline - now).
diff --git a/Documentation/sound/alsa/HD-Audio-Models.txt b/Documentation/sound/alsa/HD-Audio-Models.txt
index 8eec05bc079..322869fc8a9 100644
--- a/Documentation/sound/alsa/HD-Audio-Models.txt
+++ b/Documentation/sound/alsa/HD-Audio-Models.txt
@@ -334,6 +334,7 @@ STAC9227/9228/9229/927x
ref-no-jd Reference board without HP/Mic jack detection
3stack D965 3stack
5stack D965 5stack + SPDIF
+ 5stack-no-fp D965 5stack without front panel
dell-3stack Dell Dimension E520
dell-bios Fixes with Dell BIOS setup
auto BIOS setup (default)
diff --git a/Documentation/sound/alsa/Procfile.txt b/Documentation/sound/alsa/Procfile.txt
index bba2dbb79d8..cfac20cf9e3 100644
--- a/Documentation/sound/alsa/Procfile.txt
+++ b/Documentation/sound/alsa/Procfile.txt
@@ -104,6 +104,11 @@ card*/pcm*/xrun_debug
When this value is greater than 1, the driver will show the
stack trace additionally. This may help the debugging.
+ Since 2.6.30, this option also enables the hwptr check using
+ jiffies. This detects spontaneous invalid pointer callback
+ values, but can be lead to too much corrections for a (mostly
+ buggy) hardware that doesn't give smooth pointer updates.
+
card*/pcm*/sub*/info
The general information of this PCM sub-stream.
diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt
index 97c4b328432..c302ddf629a 100644
--- a/Documentation/sysctl/vm.txt
+++ b/Documentation/sysctl/vm.txt
@@ -39,8 +39,6 @@ Currently, these files are in /proc/sys/vm:
- nr_hugepages
- nr_overcommit_hugepages
- nr_pdflush_threads
-- nr_pdflush_threads_min
-- nr_pdflush_threads_max
- nr_trim_pages (only if CONFIG_MMU=n)
- numa_zonelist_order
- oom_dump_tasks
@@ -90,6 +88,10 @@ will itself start writeback.
If dirty_bytes is written, dirty_ratio becomes a function of its value
(dirty_bytes / the amount of dirtyable system memory).
+Note: the minimum value allowed for dirty_bytes is two pages (in bytes); any
+value lower than this limit will be ignored and the old configuration will be
+retained.
+
==============================================================
dirty_expire_centisecs
@@ -465,32 +467,6 @@ The default value is 0.
==============================================================
-nr_pdflush_threads_min
-
-This value controls the minimum number of pdflush threads.
-
-At boot time, the kernel will create and maintain 'nr_pdflush_threads_min'
-threads for the kernel's lifetime.
-
-The default value is 2. The minimum value you can specify is 1, and
-the maximum value is the current setting of 'nr_pdflush_threads_max'.
-
-See 'nr_pdflush_threads_max' below for more information.
-
-==============================================================
-
-nr_pdflush_threads_max
-
-This value controls the maximum number of pdflush threads that can be
-created. The pdflush algorithm will create a new pdflush thread (up to
-this maximum) if no pdflush threads have been available for >= 1 second.
-
-The default value is 8. The minimum value you can specify is the
-current value of 'nr_pdflush_threads_min' and the
-maximum is 1000.
-
-==============================================================
-
overcommit_memory:
This value contains a flag that enables memory overcommitment.
diff --git a/Documentation/sysfs-rules.txt b/Documentation/sysfs-rules.txt
index 6049a2a84dd..5d8bc2cd250 100644
--- a/Documentation/sysfs-rules.txt
+++ b/Documentation/sysfs-rules.txt
@@ -113,7 +113,7 @@ versions of the sysfs interface.
"devices" directory at /sys/subsystem/<name>/devices.
If /sys/subsystem exists, /sys/bus, /sys/class and /sys/block can be
- ignored. If it does not exist, you have always to scan all three
+ ignored. If it does not exist, you always have to scan all three
places, as the kernel is free to move a subsystem from one place to
the other, as long as the devices are still reachable by the same
subsystem name.
diff --git a/Documentation/trace/ftrace.txt b/Documentation/trace/ftrace.txt
index fd9a3e69381..e362f50c496 100644
--- a/Documentation/trace/ftrace.txt
+++ b/Documentation/trace/ftrace.txt
@@ -518,9 +518,18 @@ priority with zero (0) being the highest priority and the nice
values starting at 100 (nice -20). Below is a quick chart to map
the kernel priority to user land priorities.
- Kernel priority: 0 to 99 ==> user RT priority 99 to 0
- Kernel priority: 100 to 139 ==> user nice -20 to 19
- Kernel priority: 140 ==> idle task priority
+ Kernel Space User Space
+ ===============================================================
+ 0(high) to 98(low) user RT priority 99(high) to 1(low)
+ with SCHED_RR or SCHED_FIFO
+ ---------------------------------------------------------------
+ 99 sched_priority is not used in scheduling
+ decisions(it must be specified as 0)
+ ---------------------------------------------------------------
+ 100(high) to 139(low) user nice -20(high) to 19(low)
+ ---------------------------------------------------------------
+ 140 idle task priority
+ ---------------------------------------------------------------
The task states are:
diff --git a/Documentation/x86/boot.txt b/Documentation/x86/boot.txt
index e0203662f9e..8da3a795083 100644
--- a/Documentation/x86/boot.txt
+++ b/Documentation/x86/boot.txt
@@ -50,6 +50,10 @@ Protocol 2.08: (Kernel 2.6.26) Added crc32 checksum and ELF format
Protocol 2.09: (Kernel 2.6.26) Added a field of 64-bit physical
pointer to single linked list of struct setup_data.
+Protocol 2.10: (Kernel 2.6.31) Added a protocol for relaxed alignment
+ beyond the kernel_alignment added, new init_size and
+ pref_address fields. Added extended boot loader IDs.
+
**** MEMORY LAYOUT
The traditional memory map for the kernel loader, used for Image or
@@ -168,12 +172,13 @@ Offset Proto Name Meaning
021C/4 2.00+ ramdisk_size initrd size (set by boot loader)
0220/4 2.00+ bootsect_kludge DO NOT USE - for bootsect.S use only
0224/2 2.01+ heap_end_ptr Free memory after setup end
-0226/2 N/A pad1 Unused
+0226/1 2.02+(3 ext_loader_ver Extended boot loader version
+0227/1 2.02+(3 ext_loader_type Extended boot loader ID
0228/4 2.02+ cmd_line_ptr 32-bit pointer to the kernel command line
022C/4 2.03+ ramdisk_max Highest legal initrd address
0230/4 2.05+ kernel_alignment Physical addr alignment required for kernel
0234/1 2.05+ relocatable_kernel Whether kernel is relocatable or not
-0235/1 N/A pad2 Unused
+0235/1 2.10+ min_alignment Minimum alignment, as a power of two
0236/2 N/A pad3 Unused
0238/4 2.06+ cmdline_size Maximum size of the kernel command line
023C/4 2.07+ hardware_subarch Hardware subarchitecture
@@ -182,6 +187,8 @@ Offset Proto Name Meaning
024C/4 2.08+ payload_length Length of kernel payload
0250/8 2.09+ setup_data 64-bit physical pointer to linked list
of struct setup_data
+0258/8 2.10+ pref_address Preferred loading address
+0260/4 2.10+ init_size Linear memory required during initialization
(1) For backwards compatibility, if the setup_sects field contains 0, the
real value is 4.
@@ -190,6 +197,8 @@ Offset Proto Name Meaning
field are unusable, which means the size of a bzImage kernel
cannot be determined.
+(3) Ignored, but safe to set, for boot protocols 2.02-2.09.
+
If the "HdrS" (0x53726448) magic number is not found at offset 0x202,
the boot protocol version is "old". Loading an old kernel, the
following parameters should be assumed:
@@ -343,18 +352,32 @@ Protocol: 2.00+
0xTV here, where T is an identifier for the boot loader and V is
a version number. Otherwise, enter 0xFF here.
+ For boot loader IDs above T = 0xD, write T = 0xE to this field and
+ write the extended ID minus 0x10 to the ext_loader_type field.
+ Similarly, the ext_loader_ver field can be used to provide more than
+ four bits for the bootloader version.
+
+ For example, for T = 0x15, V = 0x234, write:
+
+ type_of_loader <- 0xE4
+ ext_loader_type <- 0x05
+ ext_loader_ver <- 0x23
+
Assigned boot loader ids:
0 LILO (0x00 reserved for pre-2.00 bootloader)
1 Loadlin
2 bootsect-loader (0x20, all other values reserved)
- 3 SYSLINUX
- 4 EtherBoot
+ 3 Syslinux
+ 4 Etherboot/gPXE
5 ELILO
7 GRUB
- 8 U-BOOT
+ 8 U-Boot
9 Xen
A Gujin
B Qemu
+ C Arcturus Networks uCbootloader
+ E Extended (see ext_loader_type)
+ F Special (0xFF = undefined)
Please contact <hpa@zytor.com> if you need a bootloader ID
value assigned.
@@ -453,6 +476,35 @@ Protocol: 2.01+
Set this field to the offset (from the beginning of the real-mode
code) of the end of the setup stack/heap, minus 0x0200.
+Field name: ext_loader_ver
+Type: write (optional)
+Offset/size: 0x226/1
+Protocol: 2.02+
+
+ This field is used as an extension of the version number in the
+ type_of_loader field. The total version number is considered to be
+ (type_of_loader & 0x0f) + (ext_loader_ver << 4).
+
+ The use of this field is boot loader specific. If not written, it
+ is zero.
+
+ Kernels prior to 2.6.31 did not recognize this field, but it is safe
+ to write for protocol version 2.02 or higher.
+
+Field name: ext_loader_type
+Type: write (obligatory if (type_of_loader & 0xf0) == 0xe0)
+Offset/size: 0x227/1
+Protocol: 2.02+
+
+ This field is used as an extension of the type number in
+ type_of_loader field. If the type in type_of_loader is 0xE, then
+ the actual type is (ext_loader_type + 0x10).
+
+ This field is ignored if the type in type_of_loader is not 0xE.
+
+ Kernels prior to 2.6.31 did not recognize this field, but it is safe
+ to write for protocol version 2.02 or higher.
+
Field name: cmd_line_ptr
Type: write (obligatory)
Offset/size: 0x228/4
@@ -482,11 +534,19 @@ Protocol: 2.03+
0x37FFFFFF, you can start your ramdisk at 0x37FE0000.)
Field name: kernel_alignment
-Type: read (reloc)
+Type: read/modify (reloc)
Offset/size: 0x230/4
-Protocol: 2.05+
+Protocol: 2.05+ (read), 2.10+ (modify)
+
+ Alignment unit required by the kernel (if relocatable_kernel is
+ true.) A relocatable kernel that is loaded at an alignment
+ incompatible with the value in this field will be realigned during
+ kernel initialization.
- Alignment unit required by the kernel (if relocatable_kernel is true.)
+ Starting with protocol version 2.10, this reflects the kernel
+ alignment preferred for optimal performance; it is possible for the
+ loader to modify this field to permit a lesser alignment. See the
+ min_alignment and pref_address field below.
Field name: relocatable_kernel
Type: read (reloc)
@@ -498,6 +558,22 @@ Protocol: 2.05+
After loading, the boot loader must set the code32_start field to
point to the loaded code, or to a boot loader hook.
+Field name: min_alignment
+Type: read (reloc)
+Offset/size: 0x235/1
+Protocol: 2.10+
+
+ This field, if nonzero, indicates as a power of two the minimum
+ alignment required, as opposed to preferred, by the kernel to boot.
+ If a boot loader makes use of this field, it should update the
+ kernel_alignment field with the alignment unit desired; typically:
+
+ kernel_alignment = 1 << min_alignment
+
+ There may be a considerable performance cost with an excessively
+ misaligned kernel. Therefore, a loader should typically try each
+ power-of-two alignment from kernel_alignment down to this alignment.
+
Field name: cmdline_size
Type: read
Offset/size: 0x238/4
@@ -582,6 +658,36 @@ Protocol: 2.09+
sure to consider the case where the linked list already contains
entries.
+Field name: pref_address
+Type: read (reloc)
+Offset/size: 0x258/8
+Protocol: 2.10+
+
+ This field, if nonzero, represents a preferred load address for the
+ kernel. A relocating bootloader should attempt to load at this
+ address if possible.
+
+ A non-relocatable kernel will unconditionally move itself and to run
+ at this address.
+
+Field name: init_size
+Type: read
+Offset/size: 0x25c/4
+
+ This field indicates the amount of linear contiguous memory starting
+ at the kernel runtime start address that the kernel needs before it
+ is capable of examining its memory map. This is not the same thing
+ as the total amount of memory the kernel needs to boot, but it can
+ be used by a relocating boot loader to help select a safe load
+ address for the kernel.
+
+ The kernel runtime start address is determined by the following algorithm:
+
+ if (relocatable_kernel)
+ runtime_start = align_up(load_address, kernel_alignment)
+ else
+ runtime_start = pref_address
+
**** THE IMAGE CHECKSUM
diff --git a/Documentation/x86/x86_64/boot-options.txt b/Documentation/x86/x86_64/boot-options.txt
index 34c13040a71..2db5893d6c9 100644
--- a/Documentation/x86/x86_64/boot-options.txt
+++ b/Documentation/x86/x86_64/boot-options.txt
@@ -150,11 +150,6 @@ NUMA
Otherwise, the remaining system RAM is allocated to an
additional node.
- numa=hotadd=percent
- Only allow hotadd memory to preallocate page structures upto
- percent of already available memory.
- numa=hotadd=0 will disable hotadd memory.
-
ACPI
acpi=off Don't enable ACPI
diff --git a/Documentation/x86/x86_64/mm.txt b/Documentation/x86/x86_64/mm.txt
index 29b52b14d0b..d6498e3cd71 100644
--- a/Documentation/x86/x86_64/mm.txt
+++ b/Documentation/x86/x86_64/mm.txt
@@ -6,10 +6,11 @@ Virtual memory map with 4 level page tables:
0000000000000000 - 00007fffffffffff (=47 bits) user space, different per mm
hole caused by [48:63] sign extension
ffff800000000000 - ffff80ffffffffff (=40 bits) guard hole
-ffff880000000000 - ffffc0ffffffffff (=57 TB) direct mapping of all phys. memory
-ffffc10000000000 - ffffc1ffffffffff (=40 bits) hole
-ffffc20000000000 - ffffe1ffffffffff (=45 bits) vmalloc/ioremap space
-ffffe20000000000 - ffffe2ffffffffff (=40 bits) virtual memory map (1TB)
+ffff880000000000 - ffffc7ffffffffff (=64 TB) direct mapping of all phys. memory
+ffffc80000000000 - ffffc8ffffffffff (=40 bits) hole
+ffffc90000000000 - ffffe8ffffffffff (=45 bits) vmalloc/ioremap space
+ffffe90000000000 - ffffe9ffffffffff (=40 bits) hole
+ffffea0000000000 - ffffeaffffffffff (=40 bits) virtual memory map (1TB)
... unused hole ...
ffffffff80000000 - ffffffffa0000000 (=512 MB) kernel text mapping, from phys 0
ffffffffa0000000 - fffffffffff00000 (=1536 MB) module mapping space