diff options
author | James Bottomley <jejb@mulgrave.il.steeleye.com> | 2007-05-30 23:57:05 -0500 |
---|---|---|
committer | James Bottomley <jejb@mulgrave.il.steeleye.com> | 2007-05-30 23:57:05 -0500 |
commit | 5bc65793cbf8da0d35f19ef025dda22887e79e80 (patch) | |
tree | 8291998abd73055de6f487fafa174ee2a5d3afee /Documentation | |
parent | 6edae708bf77e012d855a7e2c7766f211d234f4f (diff) | |
parent | 3f0a6766e0cc5a577805732e5adb50a585c58175 (diff) |
[SCSI] Merge up to linux-2.6 head
Conflicts:
drivers/scsi/jazz_esp.c
Same changes made by both SCSI and SPARC trees: problem with UTF-8
conversion in the copyright.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Diffstat (limited to 'Documentation')
-rw-r--r-- | Documentation/CodingStyle | 49 | ||||
-rw-r--r-- | Documentation/DocBook/gadget.tmpl | 2 | ||||
-rw-r--r-- | Documentation/DocBook/kernel-locking.tmpl | 123 | ||||
-rw-r--r-- | Documentation/DocBook/usb.tmpl | 28 | ||||
-rw-r--r-- | Documentation/HOWTO | 20 | ||||
-rw-r--r-- | Documentation/block/capability.txt | 15 | ||||
-rw-r--r-- | Documentation/dontdiff | 42 | ||||
-rw-r--r-- | Documentation/feature-removal-schedule.txt | 27 | ||||
-rw-r--r-- | Documentation/filesystems/directory-locking | 5 | ||||
-rw-r--r-- | Documentation/filesystems/porting | 8 | ||||
-rw-r--r-- | Documentation/gpio.txt | 8 | ||||
-rw-r--r-- | Documentation/i386/boot.txt | 401 | ||||
-rw-r--r-- | Documentation/initrd.txt | 74 | ||||
-rw-r--r-- | Documentation/kernel-parameters.txt | 24 | ||||
-rw-r--r-- | Documentation/ldm.txt | 21 | ||||
-rw-r--r-- | Documentation/memory-barriers.txt | 98 | ||||
-rw-r--r-- | Documentation/networking/netdevices.txt | 2 | ||||
-rw-r--r-- | Documentation/s390/cds.txt | 82 | ||||
-rw-r--r-- | Documentation/spi/spi-summary | 53 | ||||
-rw-r--r-- | Documentation/vm/slabinfo.c | 26 |
20 files changed, 725 insertions, 383 deletions
diff --git a/Documentation/CodingStyle b/Documentation/CodingStyle index afc28677589..b49b92edb39 100644 --- a/Documentation/CodingStyle +++ b/Documentation/CodingStyle @@ -495,29 +495,40 @@ re-formatting you may want to take a look at the man page. But remember: "indent" is not a fix for bad programming. - Chapter 10: Configuration-files + Chapter 10: Kconfig configuration files -For configuration options (arch/xxx/Kconfig, and all the Kconfig files), -somewhat different indentation is used. +For all of the Kconfig* configuration files throughout the source tree, +the indentation is somewhat different. Lines under a "config" definition +are indented with one tab, while help text is indented an additional two +spaces. Example: -Help text is indented with 2 spaces. - -if CONFIG_EXPERIMENTAL - tristate CONFIG_BOOM - default n - help - Apply nitroglycerine inside the keyboard (DANGEROUS) - bool CONFIG_CHEER - depends on CONFIG_BOOM - default y +config AUDIT + bool "Auditing support" + depends on NET help - Output nice messages when you explode -endif + Enable auditing infrastructure that can be used with another + kernel subsystem, such as SELinux (which requires this for + logging of avc messages output). Does not do system-call + auditing without CONFIG_AUDITSYSCALL. + +Features that might still be considered unstable should be defined as +dependent on "EXPERIMENTAL": + +config SLUB + depends on EXPERIMENTAL && !ARCH_USES_SLAB_PAGE_STRUCT + bool "SLUB (Unqueued Allocator)" + ... + +while seriously dangerous features (such as write support for certain +filesystems) should advertise this prominently in their prompt string: + +config ADFS_FS_RW + bool "ADFS write support (DANGEROUS)" + depends on ADFS_FS + ... -Generally, CONFIG_EXPERIMENTAL should surround all options not considered -stable. All options that are known to trash data (experimental write- -support for file-systems, for instance) should be denoted (DANGEROUS), other -experimental options should be denoted (EXPERIMENTAL). +For full documentation on the configuration files, see the file +Documentation/kbuild/kconfig-language.txt. Chapter 11: Data structures diff --git a/Documentation/DocBook/gadget.tmpl b/Documentation/DocBook/gadget.tmpl index e7fc9643340..6996d977bf8 100644 --- a/Documentation/DocBook/gadget.tmpl +++ b/Documentation/DocBook/gadget.tmpl @@ -52,7 +52,7 @@ <toc></toc> -<chapter><title>Introduction</title> +<chapter id="intro"><title>Introduction</title> <para>This document presents a Linux-USB "Gadget" kernel mode diff --git a/Documentation/DocBook/kernel-locking.tmpl b/Documentation/DocBook/kernel-locking.tmpl index 644c3884fab..0a441f73261 100644 --- a/Documentation/DocBook/kernel-locking.tmpl +++ b/Documentation/DocBook/kernel-locking.tmpl @@ -551,10 +551,12 @@ <function>spin_lock_irqsave()</function>, which is a superset of all other spinlock primitives. </para> + <table> <title>Table of Locking Requirements</title> <tgroup cols="11"> <tbody> + <row> <entry></entry> <entry>IRQ Handler A</entry> @@ -576,97 +578,128 @@ <row> <entry>IRQ Handler B</entry> -<entry>spin_lock_irqsave</entry> +<entry>SLIS</entry> <entry>None</entry> </row> <row> <entry>Softirq A</entry> -<entry>spin_lock_irq</entry> -<entry>spin_lock_irq</entry> -<entry>spin_lock</entry> +<entry>SLI</entry> +<entry>SLI</entry> +<entry>SL</entry> </row> <row> <entry>Softirq B</entry> -<entry>spin_lock_irq</entry> -<entry>spin_lock_irq</entry> -<entry>spin_lock</entry> -<entry>spin_lock</entry> +<entry>SLI</entry> +<entry>SLI</entry> +<entry>SL</entry> +<entry>SL</entry> </row> <row> <entry>Tasklet A</entry> -<entry>spin_lock_irq</entry> -<entry>spin_lock_irq</entry> -<entry>spin_lock</entry> -<entry>spin_lock</entry> +<entry>SLI</entry> +<entry>SLI</entry> +<entry>SL</entry> +<entry>SL</entry> <entry>None</entry> </row> <row> <entry>Tasklet B</entry> -<entry>spin_lock_irq</entry> -<entry>spin_lock_irq</entry> -<entry>spin_lock</entry> -<entry>spin_lock</entry> -<entry>spin_lock</entry> +<entry>SLI</entry> +<entry>SLI</entry> +<entry>SL</entry> +<entry>SL</entry> +<entry>SL</entry> <entry>None</entry> </row> <row> <entry>Timer A</entry> -<entry>spin_lock_irq</entry> -<entry>spin_lock_irq</entry> -<entry>spin_lock</entry> -<entry>spin_lock</entry> -<entry>spin_lock</entry> -<entry>spin_lock</entry> +<entry>SLI</entry> +<entry>SLI</entry> +<entry>SL</entry> +<entry>SL</entry> +<entry>SL</entry> +<entry>SL</entry> <entry>None</entry> </row> <row> <entry>Timer B</entry> -<entry>spin_lock_irq</entry> -<entry>spin_lock_irq</entry> -<entry>spin_lock</entry> -<entry>spin_lock</entry> -<entry>spin_lock</entry> -<entry>spin_lock</entry> -<entry>spin_lock</entry> +<entry>SLI</entry> +<entry>SLI</entry> +<entry>SL</entry> +<entry>SL</entry> +<entry>SL</entry> +<entry>SL</entry> +<entry>SL</entry> <entry>None</entry> </row> <row> <entry>User Context A</entry> -<entry>spin_lock_irq</entry> -<entry>spin_lock_irq</entry> -<entry>spin_lock_bh</entry> -<entry>spin_lock_bh</entry> -<entry>spin_lock_bh</entry> -<entry>spin_lock_bh</entry> -<entry>spin_lock_bh</entry> -<entry>spin_lock_bh</entry> +<entry>SLI</entry> +<entry>SLI</entry> +<entry>SLBH</entry> +<entry>SLBH</entry> +<entry>SLBH</entry> +<entry>SLBH</entry> +<entry>SLBH</entry> +<entry>SLBH</entry> <entry>None</entry> </row> <row> <entry>User Context B</entry> +<entry>SLI</entry> +<entry>SLI</entry> +<entry>SLBH</entry> +<entry>SLBH</entry> +<entry>SLBH</entry> +<entry>SLBH</entry> +<entry>SLBH</entry> +<entry>SLBH</entry> +<entry>DI</entry> +<entry>None</entry> +</row> + +</tbody> +</tgroup> +</table> + + <table> +<title>Legend for Locking Requirements Table</title> +<tgroup cols="2"> +<tbody> + +<row> +<entry>SLIS</entry> +<entry>spin_lock_irqsave</entry> +</row> +<row> +<entry>SLI</entry> <entry>spin_lock_irq</entry> -<entry>spin_lock_irq</entry> -<entry>spin_lock_bh</entry> -<entry>spin_lock_bh</entry> -<entry>spin_lock_bh</entry> -<entry>spin_lock_bh</entry> -<entry>spin_lock_bh</entry> +</row> +<row> +<entry>SL</entry> +<entry>spin_lock</entry> +</row> +<row> +<entry>SLBH</entry> <entry>spin_lock_bh</entry> +</row> +<row> +<entry>DI</entry> <entry>down_interruptible</entry> -<entry>None</entry> </row> </tbody> </tgroup> </table> + </sect1> </chapter> diff --git a/Documentation/DocBook/usb.tmpl b/Documentation/DocBook/usb.tmpl index a2ebd651b05..af293606fbe 100644 --- a/Documentation/DocBook/usb.tmpl +++ b/Documentation/DocBook/usb.tmpl @@ -185,7 +185,7 @@ </chapter> -<chapter><title>USB-Standard Types</title> +<chapter id="types"><title>USB-Standard Types</title> <para>In <filename><linux/usb/ch9.h></filename> you will find the USB data types defined in chapter 9 of the USB specification. @@ -197,7 +197,7 @@ </chapter> -<chapter><title>Host-Side Data Types and Macros</title> +<chapter id="hostside"><title>Host-Side Data Types and Macros</title> <para>The host side API exposes several layers to drivers, some of which are more necessary than others. @@ -211,7 +211,7 @@ </chapter> - <chapter><title>USB Core APIs</title> + <chapter id="usbcore"><title>USB Core APIs</title> <para>There are two basic I/O models in the USB API. The most elemental one is asynchronous: drivers submit requests @@ -248,7 +248,7 @@ !Edrivers/usb/core/hub.c </chapter> - <chapter><title>Host Controller APIs</title> + <chapter id="hcd"><title>Host Controller APIs</title> <para>These APIs are only for use by host controller drivers, most of which implement standard register interfaces such as @@ -285,7 +285,7 @@ !Idrivers/usb/core/buffer.c </chapter> - <chapter> + <chapter id="usbfs"> <title>The USB Filesystem (usbfs)</title> <para>This chapter presents the Linux <emphasis>usbfs</emphasis>. @@ -317,7 +317,7 @@ not it has a kernel driver. </para> - <sect1> + <sect1 id="usbfs-files"> <title>What files are in "usbfs"?</title> <para>Conventionally mounted at @@ -356,7 +356,7 @@ </sect1> - <sect1> + <sect1 id="usbfs-fstab"> <title>Mounting and Access Control</title> <para>There are a number of mount options for usbfs, which will @@ -439,7 +439,7 @@ </sect1> - <sect1> + <sect1 id="usbfs-devices"> <title>/proc/bus/usb/devices</title> <para>This file is handy for status viewing tools in user @@ -473,7 +473,7 @@ for (;;) { </para> </sect1> - <sect1> + <sect1 id="usbfs-bbbddd"> <title>/proc/bus/usb/BBB/DDD</title> <para>Use these files in one of these basic ways: @@ -510,7 +510,7 @@ for (;;) { </sect1> - <sect1> + <sect1 id="usbfs-lifecycle"> <title>Life Cycle of User Mode Drivers</title> <para>Such a driver first needs to find a device file @@ -565,7 +565,7 @@ for (;;) { </sect1> - <sect1><title>The ioctl() Requests</title> + <sect1 id="usbfs-ioctl"><title>The ioctl() Requests</title> <para>To use these ioctls, you need to include the following headers in your userspace program: @@ -604,7 +604,7 @@ for (;;) { </para> - <sect2> + <sect2 id="usbfs-mgmt"> <title>Management/Status Requests</title> <para>A number of usbfs requests don't deal very directly @@ -736,7 +736,7 @@ usbdev_ioctl (int fd, int ifno, unsigned request, void *param) </sect2> - <sect2> + <sect2 id="usbfs-sync"> <title>Synchronous I/O Support</title> <para>Synchronous requests involve the kernel blocking @@ -865,7 +865,7 @@ usbdev_ioctl (int fd, int ifno, unsigned request, void *param) </variablelist> </sect2> - <sect2> + <sect2 id="usbfs-async"> <title>Asynchronous I/O Support</title> <para>As mentioned above, there are situations where it may be diff --git a/Documentation/HOWTO b/Documentation/HOWTO index 48123dba5e6..ced9207bedc 100644 --- a/Documentation/HOWTO +++ b/Documentation/HOWTO @@ -396,26 +396,6 @@ bugme-janitor mailing list (every change in the bugzilla is mailed here) -Managing bug reports --------------------- - -One of the best ways to put into practice your hacking skills is by fixing -bugs reported by other people. Not only you will help to make the kernel -more stable, you'll learn to fix real world problems and you will improve -your skills, and other developers will be aware of your presence. Fixing -bugs is one of the best ways to get merits among other developers, because -not many people like wasting time fixing other people's bugs. - -To work in the already reported bug reports, go to http://bugzilla.kernel.org. -If you want to be advised of the future bug reports, you can subscribe to the -bugme-new mailing list (only new bug reports are mailed here) or to the -bugme-janitor mailing list (every change in the bugzilla is mailed here) - - http://lists.osdl.org/mailman/listinfo/bugme-new - http://lists.osdl.org/mailman/listinfo/bugme-janitors - - - Mailing lists ------------- diff --git a/Documentation/block/capability.txt b/Documentation/block/capability.txt new file mode 100644 index 00000000000..2f1729424ef --- /dev/null +++ b/Documentation/block/capability.txt @@ -0,0 +1,15 @@ +Generic Block Device Capability +=============================================================================== +This file documents the sysfs file block/<disk>/capability + +capability is a hex word indicating which capabilities a specific disk +supports. For more information on bits not listed here, see +include/linux/genhd.h + +Capability Value +------------------------------------------------------------------------------- +GENHD_FL_MEDIA_CHANGE_NOTIFY 4 + When this bit is set, the disk supports Asynchronous Notification + of media change events. These events will be broadcast to user + space via kernel uevent. + diff --git a/Documentation/dontdiff b/Documentation/dontdiff index 64e9f6c4826..595a5ea4c69 100644 --- a/Documentation/dontdiff +++ b/Documentation/dontdiff @@ -10,10 +10,12 @@ *.grp *.gz *.html +*.i *.jpeg *.ko *.log *.lst +*.moc *.mod.c *.o *.orig @@ -25,6 +27,9 @@ *.s *.sgml *.so +*.symtypes +*.tab.c +*.tab.h *.tex *.ver *.xml @@ -32,9 +37,13 @@ *_vga16.c *cscope* *~ +*.9 +*.9.gz .* .cscope 53c700_d.h +53c7xx_d.h +53c7xx_u.h 53c8xx_d.h* BitKeeper COPYING @@ -70,9 +79,11 @@ bzImage* classlist.h* comp*.log compile.h* +conf config config-* config_data.h* +config_data.gz* conmakehash consolemap_deftbl.c* crc32table.h* @@ -81,18 +92,23 @@ defkeymap.c* devlist.h* docproc dummy_sym.c* +elf2ecoff elfconfig.h* filelist fixdep fore200e_mkfirm fore200e_pca_fw.c* +gconf gen-devlist gen-kdb_cmds.c* gen_crc32table gen_init_cpio genksyms gentbl +*_gray256.c ikconfig.h* +initramfs_data.cpio +initramfs_data.cpio.gz initramfs_list kallsyms kconfig @@ -100,19 +116,30 @@ kconfig.tk keywords.c* ksym.c* ksym.h* +kxgettext +lkc_defs.h lex.c* +lex.*.c +lk201-map.c logo_*.c logo_*_clut224.c logo_*_mono.c lxdialog mach-types mach-types.h +machtypes.h make_times_h map maui_boot.h +mconf +miboot* mk_elfconfig +mkboot +mkbugboot mkdep +mkprep mktables +mktree modpost modversions.h* offset.h @@ -120,18 +147,28 @@ offsets.h oui.c* parse.c* parse.h* +patches* +pca200e.bin +pca200e_ecd.bin2 +piggy.gz +piggyback pnmtologo ppc_defs.h* promcon_tbl.c* pss_boot.h +qconf raid6altivec*.c raid6int*.c raid6tables.c +relocs +series setup sim710_d.h* +sImage sm_tbl* split-include tags +tftpboot.img times.h* tkparse trix_boot.h @@ -139,8 +176,11 @@ utsrelease.h* version.h* vmlinux vmlinux-* +vmlinux.aout vmlinux.lds vsyscall.lds wanxlfw.inc uImage -zImage +unifdef +zImage* +zconf.hash.c diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt index 498ff31f3aa..2d7ea85075b 100644 --- a/Documentation/feature-removal-schedule.txt +++ b/Documentation/feature-removal-schedule.txt @@ -62,7 +62,7 @@ Who: Dan Dennedy <dan@dennedy.org>, Stefan Richter <stefanr@s5r6.in-berlin.de> What: old NCR53C9x driver When: October 2007 Why: Replaced by the much better esp_scsi driver. Actual low-level - driver can ported over almost trivially. + driver can be ported over almost trivially. Who: David Miller <davem@davemloft.net> Christoph Hellwig <hch@lst.de> @@ -328,21 +328,20 @@ Who: Adrian Bunk <bunk@stusta.de> --------------------------- -What: libata.spindown_compat module parameter +What: libata spindown skipping and warning When: Dec 2008 -Why: halt(8) synchronizes caches for and spins down libata disks - because libata didn't use to spin down disk on system halt - (only synchronized caches). - Spin down on system halt is now implemented and can be tested - using sysfs node /sys/class/scsi_disk/h:c:i:l/manage_start_stop. +Why: Some halt(8) implementations synchronize caches for and spin + down libata disks because libata didn't use to spin down disk on + system halt (only synchronized caches). + Spin down on system halt is now implemented. sysfs node + /sys/class/scsi_disk/h:c:i:l/manage_start_stop is present if + spin down support is available. Because issuing spin down command to an already spun down disk - makes some disks spin up just to spin down again, the old - behavior needs to be maintained till userspace tool is updated - to check the sysfs node and not to spin down disks with the - node set to one. - This module parameter is to give userspace tool the time to - get updated and should be removed after userspace is - reasonably updated. + makes some disks spin up just to spin down again, libata tracks + device spindown status to skip the extra spindown command and + warn about it. + This is to give userspace tools the time to get updated and will + be removed after userspace is reasonably updated. Who: Tejun Heo <htejun@gmail.com> --------------------------- diff --git a/Documentation/filesystems/directory-locking b/Documentation/filesystems/directory-locking index d7099a9266f..ff7b611abf3 100644 --- a/Documentation/filesystems/directory-locking +++ b/Documentation/filesystems/directory-locking @@ -1,5 +1,6 @@ Locking scheme used for directory operations is based on two -kinds of locks - per-inode (->i_sem) and per-filesystem (->s_vfs_rename_sem). +kinds of locks - per-inode (->i_mutex) and per-filesystem +(->s_vfs_rename_mutex). For our purposes all operations fall in 5 classes: @@ -63,7 +64,7 @@ objects - A < B iff A is an ancestor of B. attempt to acquire some lock and already holds at least one lock. Let's consider the set of contended locks. First of all, filesystem lock is not contended, since any process blocked on it is not holding any locks. -Thus all processes are blocked on ->i_sem. +Thus all processes are blocked on ->i_mutex. Non-directory objects are not contended due to (3). Thus link creation can't be a part of deadlock - it can't be blocked on source diff --git a/Documentation/filesystems/porting b/Documentation/filesystems/porting index 5531694059a..dac45c92d87 100644 --- a/Documentation/filesystems/porting +++ b/Documentation/filesystems/porting @@ -107,7 +107,7 @@ free to drop it... --- [informational] -->link() callers hold ->i_sem on the object we are linking to. Some of your +->link() callers hold ->i_mutex on the object we are linking to. Some of your problems might be over... --- @@ -130,9 +130,9 @@ went in - and hadn't been documented ;-/). Just remove it from fs_flags --- [mandatory] -->setattr() is called without BKL now. Caller _always_ holds ->i_sem, so -watch for ->i_sem-grabbing code that might be used by your ->setattr(). -Callers of notify_change() need ->i_sem now. +->setattr() is called without BKL now. Caller _always_ holds ->i_mutex, so +watch for ->i_mutex-grabbing code that might be used by your ->setattr(). +Callers of notify_change() need ->i_mutex now. --- [recommended] diff --git a/Documentation/gpio.txt b/Documentation/gpio.txt index e8be0abb346..36af58eba13 100644 --- a/Documentation/gpio.txt +++ b/Documentation/gpio.txt @@ -111,7 +111,9 @@ setting up a platform_device using the GPIO, is mark its direction: The return value is zero for success, else a negative errno. It should be checked, since the get/set calls don't have error returns and since -misconfiguration is possible. (These calls could sleep.) +misconfiguration is possible. You should normally issue these calls from +a task context. However, for spinlock-safe GPIOs it's OK to use them +before tasking is enabled, as part of early board setup. For output GPIOs, the value provided becomes the initial output value. This helps avoid signal glitching during system startup. @@ -197,7 +199,9 @@ However, many platforms don't currently support this mechanism. Passing invalid GPIO numbers to gpio_request() will fail, as will requesting GPIOs that have already been claimed with that call. The return value of -gpio_request() must be checked. (These calls could sleep.) +gpio_request() must be checked. You should normally issue these calls from +a task context. However, for spinlock-safe GPIOs it's OK to request GPIOs +before tasking is enabled, as part of early board setup. These calls serve two basic purposes. One is marking the signals which are actually in use as GPIOs, for better diagnostics; systems may have diff --git a/Documentation/i386/boot.txt b/Documentation/i386/boot.txt index d01b7a2a0f2..35985b34d5a 100644 --- a/Documentation/i386/boot.txt +++ b/Documentation/i386/boot.txt @@ -2,7 +2,7 @@ ---------------------------- H. Peter Anvin <hpa@zytor.com> - Last update 2007-05-07 + Last update 2007-05-23 On the i386 platform, the Linux kernel uses a rather complicated boot convention. This has evolved partially due to historical aspects, as @@ -52,7 +52,8 @@ zImage kernels, typically looks like: 0A0000 +------------------------+ | Reserved for BIOS | Do not use. Reserved for BIOS EBDA. 09A000 +------------------------+ - | Stack/heap/cmdline | For use by the kernel real-mode code. + | Command line | + | Stack/heap | For use by the kernel real-mode code. 098000 +------------------------+ | Kernel setup | The kernel real-mode code. 090200 +------------------------+ @@ -73,10 +74,9 @@ zImage kernels, typically looks like: When using bzImage, the protected-mode kernel was relocated to 0x100000 ("high memory"), and the kernel real-mode block (boot sector, setup, and stack/heap) was made relocatable to any address between -0x10000 and end of low memory. Unfortunately, in protocols 2.00 and -2.01 the command line is still required to live in the 0x9XXXX memory -range, and that memory range is still overwritten by the early kernel. -The 2.02 protocol resolves that problem. +0x10000 and end of low memory. Unfortunately, in protocols 2.00 and +2.01 the 0x90000+ memory range is still used internally by the kernel; +the 2.02 protocol resolves that problem. It is desirable to keep the "memory ceiling" -- the highest point in low memory touched by the boot loader -- as low as possible, since @@ -93,6 +93,35 @@ zImage or old bzImage kernels, which need data written into the 0x90000 segment, the boot loader should make sure not to use memory above the 0x9A000 point; too many BIOSes will break above that point. +For a modern bzImage kernel with boot protocol version >= 2.02, a +memory layout like the following is suggested: + + ~ ~ + | Protected-mode kernel | +100000 +------------------------+ + | I/O memory hole | +0A0000 +------------------------+ + | Reserved for BIOS | Leave as much as possible unused + ~ ~ + | Command line | (Can also be below the X+10000 mark) +X+10000 +------------------------+ + | Stack/heap | For use by the kernel real-mode code. +X+08000 +------------------------+ + | Kernel setup | The kernel real-mode code. + | Kernel boot sector | The kernel legacy boot sector. +X +------------------------+ + | Boot loader | <- Boot sector entry point 0000:7C00 +001000 +------------------------+ + | Reserved for MBR/BIOS | +000800 +------------------------+ + | Typically used by MBR | +000600 +------------------------+ + | BIOS use only | +000000 +------------------------+ + +... where the address X is as low as the design of the boot loader +permits. + **** THE REAL-MODE KERNEL HEADER @@ -160,29 +189,147 @@ e.g. protocol version 2.01 will contain 0x0201 in this field. When setting fields in the header, you must make sure only to set fields supported by the protocol version in use. -The "kernel_version" field, if set to a nonzero value, contains a -pointer to a null-terminated human-readable kernel version number -string, less 0x200. This can be used to display the kernel version to -the user. This value should be less than (0x200*setup_sects). For -example, if this value is set to 0x1c00, the kernel version number -string can be found at offset 0x1e00 in the kernel file. This is a -valid value if and only if the "setup_sects" field contains the value -14 or higher. - -Most boot loaders will simply load the kernel at its target address -directly. Such boot loaders do not need to worry about filling in -most of the fields in the header. The following fields should be -filled out, however: - - vid_mode: - Please see the section on SPECIAL COMMAND LINE OPTIONS. - - type_of_loader: - If your boot loader has an assigned id (see table below), enter - 0xTV here, where T is an identifier for the boot loader and V is - a version number. Otherwise, enter 0xFF here. - - Assigned boot loader ids: + +**** DETAILS OF HEADER FIELDS + +For each field, some are information from the kernel to the bootloader +("read"), some are expected to be filled out by the bootloader +("write"), and some are expected to be read and modified by the +bootloader ("modify"). + +All general purpose boot loaders should write the fields marked +(obligatory). Boot loaders who want to load the kernel at a +nonstandard address should fill in the fields marked (reloc); other +boot loaders can ignore those fields. + +The byte order of all fields is littleendian (this is x86, after all.) + +Field name: setup_secs +Type: read +Offset/size: 0x1f1/1 +Protocol: ALL + + The size of the setup code in 512-byte sectors. If this field is + 0, the real value is 4. The real-mode code consists of the boot + sector (always one 512-byte sector) plus the setup code. + +Field name: root_flags +Type: modify (optional) +Offset/size: 0x1f2/2 +Protocol: ALL + + If this field is nonzero, the root defaults to readonly. The use of + this field is deprecated; use the "ro" or "rw" options on the + command line instead. + +Field name: syssize +Type: read +Offset/size: 0x1f4/4 (protocol 2.04+) 0x1f4/2 (protocol ALL) +Protocol: 2.04+ + + The size of the protected-mode code in units of 16-byte paragraphs. + For protocol versions older than 2.04 this field is only two bytes + wide, and therefore cannot be trusted for the size of a kernel if + the LOAD_HIGH flag is set. + +Field name: ram_size +Type: kernel internal +Offset/size: 0x1f8/2 +Protocol: ALL + + This field is obsolete. + +Field name: vid_mode +Type: modify (obligatory) +Offset/size: 0x1fa/2 + + Please see the section on SPECIAL COMMAND LINE OPTIONS. + +Field name: root_dev +Type: modify (optional) +Offset/size: 0x1fc/2 +Protocol: ALL + + The default root device device number. The use of this field is + deprecated, use the "root=" option on the command line instead. + +Field name: boot_flag +Type: read +Offset/size: 0x1fe/2 +Protocol: ALL + + Contains 0xAA55. This is the closest thing old Linux kernels have + to a magic number. + +Field name: jump +Type: read +Offset/size: 0x200/2 +Protocol: 2.00+ + + Contains an x86 jump instruction, 0xEB followed by a signed offset + relative to byte 0x202. This can be used to determine the size of + the header. + +Field name: header +Type: read +Offset/size: 0x202/4 +Protocol: 2.00+ + + Contains the magic number "HdrS" (0x53726448). + +Field name: version +Type: read +Offset/size: 0x206/2 +Protocol: 2.00+ + + Contains the boot protocol version, in (major << 8)+minor format, + e.g. 0x0204 for version 2.04, and 0x0a11 for a hypothetical version + 10.17. + +Field name: readmode_swtch +Type: modify (optional) +Offset/size: 0x208/4 +Protocol: 2.00+ + + Boot loader hook (see ADVANCED BOOT LOADER HOOKS below.) + +Field name: start_sys +Type: read +Offset/size: 0x20c/4 +Protocol: 2.00+ + + The load low segment (0x1000). Obsolete. + +Field name: kernel_version +Type: read +Offset/size: 0x20e/2 +Protocol: 2.00+ + + If set to a nonzero value, contains a pointer to a NUL-terminated + human-readable kernel version number string, less 0x200. This can + be used to display the kernel version to the user. This value + should be less than (0x200*setup_sects). + + For example, if this value is set to 0x1c00, the kernel version + number string can be found at offset 0x1e00 in the kernel file. + This is a valid value if and only if the "setup_sects" field + contains the value 15 or higher, as: + + 0x1c00 < 15*0x200 (= 0x1e00) but + 0x1c00 >= 14*0x200 (= 0x1c00) + + 0x1c00 >> 9 = 14, so the minimum value for setup_secs is 15. + +Field name: type_of_loader +Type: write (obligatory) +Offset/size: 0x210/1 +Protocol: 2.00+ + + If your boot loader has an assigned id (see table below), enter + 0xTV here, where T is an identifier for the boot loader and V is + a version number. Otherwise, enter 0xFF here. + + Assigned boot loader ids: 0 LILO (0x00 reserved for pre-2.00 bootloader) 1 Loadlin 2 bootsect-loader (0x20, all other values reserved) @@ -193,60 +340,145 @@ filled out, however: 8 U-BOOT 9 Xen A Gujin + B Qemu + + Please contact <hpa@zytor.com> if you need a bootloader ID + value assigned. + +Field name: loadflags +Type: modify (obligatory) +Offset/size: 0x211/1 +Protocol: 2.00+ + + This field is a bitmask. + + Bit 0 (read): LOADED_HIGH + - If 0, the protected-mode code is loaded at 0x10000. + - If 1, the protected-mode code is loaded at 0x100000. + + Bit 7 (write): CAN_USE_HEAP + Set this bit to 1 to indicate that the value entered in the + heap_end_ptr is valid. If this field is clear, some setup code + functionality will be disabled. + +Field name: setup_move_size +Type: modify (obligatory) +Offset/size: 0x212/2 +Protocol: 2.00-2.01 + + When using protocol 2.00 or 2.01, if the real mode kernel is not + loaded at 0x90000, it gets moved there later in the loading + sequence. Fill in this field if you want additional data (such as + the kernel command line) moved in addition to the real-mode kernel + itself. + + The unit is bytes starting with the beginning of the boot sector. + + This field is can be ignored when the protocol is 2.02 or higher, or + if the real-mode code is loaded at 0x90000. + +Field name: code32_start +Type: modify (optional, reloc) +Offset/size: 0x214/4 +Protocol: 2.00+ + + The address to jump to in protected mode. This defaults to the load + address of the kernel, and can be used by the boot loader to + determine the proper load address. + + This field can be modified for two purposes: + + 1. as a boot loader hook (see ADVANCED BOOT LOADER HOOKS below.) + + 2. if a bootloader which does not install a hook loads a + relocatable kernel at a nonstandard address it will have to modify + this field to point to the load address. + +Field name: ramdisk_image +Type: write (obligatory) +Offset/size: 0x218/4 +Protocol: 2.00+ + + The 32-bit linear address of the initial ramdisk or ramfs. Leave at + zero if there is no initial ramdisk/ramfs. + +Field name: ramdisk_size +Type: write (obligatory) +Offset/size: 0x21c/4 +Protocol: 2.00+ + + Size of the initial ramdisk or ramfs. Leave at zero if there is no + initial ramdisk/ramfs. + +Field name: bootsect_kludge +Type: kernel internal +Offset/size: 0x220/4 +Protocol: 2.00+ + + This field is obsolete. + +Field name: heap_end_ptr +Type: write (obligatory) +Offset/size: 0x224/2 +Protocol: 2.01+ + + Set this field to the offset (from the beginning of the real-mode + code) of the end of the setup stack/heap, minus 0x0200. + +Field name: cmd_line_ptr +Type: write (obligatory) +Offset/size: 0x228/4 +Protocol: 2.02+ + + Set this field to the linear address of the kernel command line. + The kernel command line can be located anywhere between the end of + the setup heap and 0xA0000; it does not have to be located in the + same 64K segment as the real-mode code itself. + + Fill in this field even if your boot loader does not support a + command line, in which case you can point this to an empty string + (or better yet, to the string "auto".) If this field is left at + zero, the kernel will assume that your boot loader does not support + the 2.02+ protocol. + +Field name: initrd_addr_max +Type: read +Offset/size: 0x22c/4 +Protocol: 2.03+ + + The maximum address that may be occupied by the initial + ramdisk/ramfs contents. For boot protocols 2.02 or earlier, this + field is not present, and the maximum address is 0x37FFFFFF. (This + address is defined as the address of the highest safe byte, so if + your ramdisk is exactly 131072 bytes long and this field is + 0x37FFFFFF, you can start your ramdisk at 0x37FE0000.) + +Field name: kernel_alignment +Type: read (reloc) +Offset/size: 0x230/4 +Protocol: 2.05+ + + Alignment unit required by the kernel (if relocatable_kernel is true.) + +Field name: relocatable_kernel +Type: read (reloc) +Offset/size: 0x234/1 +Protocol: 2.05+ + + If this field is nonzero, the protected-mode part of the kernel can + be loaded at any address that satisfies the kernel_alignment field. + After loading, the boot loader must set the code32_start field to + point to the loaded code, or to a boot loader hook. + +Field name: cmdline_size +Type: read +Offset/size: 0x238/4 +Protocol: 2.06+ - Please contact <hpa@zytor.com> if you need a bootloader ID - value assigned. - - loadflags, heap_end_ptr: - If the protocol version is 2.01 or higher, enter the - offset limit of the setup heap into heap_end_ptr and set the - 0x80 bit (CAN_USE_HEAP) of loadflags. heap_end_ptr appears to - be relative to the start of setup (offset 0x0200). - - setup_move_size: - When using protocol 2.00 or 2.01, if the real mode - kernel is not loaded at 0x90000, it gets moved there later in - the loading sequence. Fill in this field if you want - additional data (such as the kernel command line) moved in - addition to the real-mode kernel itself. - - The unit is bytes starting with the beginning of the boot - sector. - - ramdisk_image, ramdisk_size: - If your boot loader has loaded an initial ramdisk (initrd), - set ramdisk_image to the 32-bit pointer to the ramdisk data - and the ramdisk_size to the size of the ramdisk data. - - The initrd should typically be located as high in memory as - possible, as it may otherwise get overwritten by the early - kernel initialization sequence. However, it must never be - located above the address specified in the initrd_addr_max - field. The initrd should be at least 4K page aligned. - - cmd_line_ptr: - If the protocol version is 2.02 or higher, this is a 32-bit - pointer to the kernel command line. The kernel command line - can be located anywhere between the end of setup and 0xA0000. - Fill in this field even if your boot loader does not support a - command line, in which case you can point this to an empty - string (or better yet, to the string "auto".) If this field - is left at zero, the kernel will assume that your boot loader - does not support the 2.02+ protocol. - - ramdisk_max: - The maximum address that may be occupied by the initrd - contents. For boot protocols 2.02 or earlier, this field is - not present, and the maximum address is 0x37FFFFFF. (This - address is defined as the address of the highest safe byte, so - if your ramdisk is exactly 131072 bytes long and this field is - 0x37FFFFFF, you can start your ramdisk at 0x37FE0000.) - - cmdline_size: - The maximum size of the command line without the terminating - zero. This means that the command line can contain at most - cmdline_size characters. With protocol version 2.05 and - earlier, the maximum size was 255. + The maximum size of the command line without the terminating + zero. This means that the command line can contain at most + cmdline_size characters. With protocol version 2.05 and earlier, the + maximum size was 255. **** THE KERNEL COMMAND LINE @@ -494,7 +726,7 @@ switched off, especially if the loaded kernel has the floppy driver as a demand-loaded module! -**** ADVANCED BOOT TIME HOOKS +**** ADVANCED BOOT LOADER HOOKS If the boot loader runs in a particularly hostile environment (such as LOADLIN, which runs under DOS) it may be impossible to follow the @@ -519,4 +751,5 @@ IMPORTANT: All the hooks are required to preserve %esp, %ebp, %esi and set them up to BOOT_DS (0x18) yourself. After completing your hook, you should jump to the address - that was in this field before your boot loader overwrote it. + that was in this field before your boot loader overwrote it + (relocated, if appropriate.) diff --git a/Documentation/initrd.txt b/Documentation/initrd.txt index 15f1b35deb3..d3dc505104d 100644 --- a/Documentation/initrd.txt +++ b/Documentation/initrd.txt @@ -27,16 +27,20 @@ When using initrd, the system typically boots as follows: 1) the boot loader loads the kernel and the initial RAM disk 2) the kernel converts initrd into a "normal" RAM disk and frees the memory used by initrd - 3) initrd is mounted read-write as root - 4) /linuxrc is executed (this can be any valid executable, including + 3) if the root device is not /dev/ram0, the old (deprecated) + change_root procedure is followed. see the "Obsolete root change + mechanism" section below. + 4) root device is mounted. if it is /dev/ram0, the initrd image is + then mounted as root + 5) /sbin/init is executed (this can be any valid executable, including shell scripts; it is run with uid 0 and can do basically everything - init can do) - 5) linuxrc mounts the "real" root file system - 6) linuxrc places the root file system at the root directory using the + init can do). + 6) init mounts the "real" root file system + 7) init places the root file system at the root directory using the pivot_root system call - 7) the usual boot sequence (e.g. invocation of /sbin/init) is performed - on the root file system - 8) the initrd file system is removed + 8) init execs the /sbin/init on the new root filesystem, performing + the usual boot sequence + 9) the initrd file system is removed Note that changing the root directory does not involve unmounting it. It is therefore possible to leave processes running on initrd during that @@ -70,7 +74,7 @@ initrd adds the following new options: root=/dev/ram0 initrd is mounted as root, and the normal boot procedure is followed, - with the RAM disk still mounted as root. + with the RAM disk mounted as root. Compressed cpio images ---------------------- @@ -137,11 +141,11 @@ We'll describe the loopback device method: # mkdir /mnt/dev # mknod /mnt/dev/console c 5 1 5) copy all the files that are needed to properly use the initrd - environment. Don't forget the most important file, /linuxrc - Note that /linuxrc's permissions must include "x" (execute). + environment. Don't forget the most important file, /sbin/init + Note that /sbin/init's permissions must include "x" (execute). 6) correct operation the initrd environment can frequently be tested even without rebooting with the command - # chroot /mnt /linuxrc + # chroot /mnt /sbin/init This is of course limited to initrds that do not interfere with the general system state (e.g. by reconfiguring network interfaces, overwriting mounted devices, trying to start already running demons, @@ -154,7 +158,7 @@ We'll describe the loopback device method: # gzip -9 initrd For experimenting with initrd, you may want to take a rescue floppy and -only add a symbolic link from /linuxrc to /bin/sh. Alternatively, you +only add a symbolic link from /sbin/init to /bin/sh. Alternatively, you can try the experimental newlib environment [2] to create a small initrd. @@ -163,15 +167,14 @@ boot loaders support initrd. Since the boot process is still compatible with an older mechanism, the following boot command line parameters have to be given: - root=/dev/ram0 init=/linuxrc rw + root=/dev/ram0 rw (rw is only necessary if writing to the initrd file system.) With LOADLIN, you simply execute LOADLIN <kernel> initrd=<disk_image> -e.g. LOADLIN C:\LINUX\BZIMAGE initrd=C:\LINUX\INITRD.GZ root=/dev/ram0 - init=/linuxrc rw +e.g. LOADLIN C:\LINUX\BZIMAGE initrd=C:\LINUX\INITRD.GZ root=/dev/ram0 rw With LILO, you add the option INITRD=<path> to either the global section or to the section of the respective kernel in /etc/lilo.conf, and pass @@ -179,7 +182,7 @@ the options using APPEND, e.g. image = /bzImage initrd = /boot/initrd.gz - append = "root=/dev/ram0 init=/linuxrc rw" + append = "root=/dev/ram0 rw" and run /sbin/lilo @@ -191,7 +194,7 @@ Now you can boot and enjoy using initrd. Changing the root device ------------------------ -When finished with its duties, linuxrc typically changes the root device +When finished with its duties, init typically changes the root device and proceeds with starting the Linux system on the "real" root device. The procedure involves the following steps: @@ -217,7 +220,7 @@ must exist before calling pivot_root. Example: # mkdir initrd # pivot_root . initrd -Now, the linuxrc process may still access the old root via its +Now, the init process may still access the old root via its executable, shared libraries, standard input/output/error, and its current root directory. All these references are dropped by the following command: @@ -249,10 +252,6 @@ disk can be freed: It is also possible to use initrd with an NFS-mounted root, see the pivot_root(8) man page for details. -Note: if linuxrc or any program exec'ed from it terminates for some -reason, the old change_root mechanism is invoked (see section "Obsolete -root change mechanism"). - Usage scenarios --------------- @@ -264,15 +263,15 @@ as follows: 1) system boots from floppy or other media with a minimal kernel (e.g. support for RAM disks, initrd, a.out, and the Ext2 FS) and loads initrd - 2) /linuxrc determines what is needed to (1) mount the "real" root FS + 2) /sbin/init determines what is needed to (1) mount the "real" root FS (i.e. device type, device drivers, file system) and (2) the distribution media (e.g. CD-ROM, network, tape, ...). This can be done by asking the user, by auto-probing, or by using a hybrid approach. - 3) /linuxrc loads the necessary kernel modules - 4) /linuxrc creates and populates the root file system (this doesn't + 3) /sbin/init loads the necessary kernel modules + 4) /sbin/init creates and populates the root file system (this doesn't have to be a very usable system yet) - 5) /linuxrc invokes pivot_root to change the root file system and + 5) /sbin/init invokes pivot_root to change the root file system and execs - via chroot - a program that continues the installation 6) the boot loader is installed 7) the boot loader is configured to load an initrd with the set of @@ -291,7 +290,7 @@ different hardware configurations in a single administrative domain. In such cases, it is desirable to generate only a small set of kernels (ideally only one) and to keep the system-specific part of configuration information as small as possible. In this case, a common initrd could be -generated with all the necessary modules. Then, only /linuxrc or a file +generated with all the necessary modules. Then, only /sbin/init or a file read by it would have to be different. A third scenario are more convenient recovery disks, because information @@ -337,6 +336,25 @@ This old, deprecated mechanism is commonly called "change_root", while the new, supported mechanism is called "pivot_root". +Mixed change_root and pivot_root mechanism +------------------------------------------ + +In case you did not want to use root=/dev/ram0 to trig the pivot_root mechanism, +you may create both /linuxrc and /sbin/init in your initrd image. + +/linuxrc would contain only the following: + +#! /bin/sh +mount -n -t proc proc /proc +echo 0x0100 >/proc/sys/kernel/real-root-dev +umount -n /proc + +Once linuxrc exited, the kernel would mount again your initrd as root, +this time executing /sbin/init. Again, it would be duty of this init +to build the right environment (maybe using the root= device passed on +the cmdline) before the final execution of the real /sbin/init. + + Resources --------- diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index 09220a1e22d..aae2282600c 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -396,6 +396,26 @@ and is between 256 and 4096 characters. It is defined in the file clocksource is not available, it defaults to PIT. Format: { pit | tsc | cyclone | pmtmr } + clocksource= [GENERIC_TIME] Override the default clocksource + Format: <string> + Override the default clocksource and use the clocksource + with the name specified. + Some clocksource names to choose from, depending on + the platform: + [all] jiffies (this is the base, fallback clocksource) + [ACPI] acpi_pm + [ARM] imx_timer1,OSTS,netx_timer,mpu_timer2, + pxa_timer,timer3,32k_counter,timer0_1 + [AVR32] avr32 + [IA-32] pit,hpet,tsc,vmi-timer; + scx200_hrt on Geode; cyclone on IBM x440 + [MIPS] MIPS + [PARISC] cr16 + [S390] tod + [SH] SuperH + [SPARC64] tick + [X86-64] hpet,tsc + code_bytes [IA32] How many bytes of object code to print in an oops report. Range: 0 - 8192 @@ -1807,10 +1827,6 @@ and is between 256 and 4096 characters. It is defined in the file time Show timing data prefixed to each printk message line - clocksource= [GENERIC_TIME] Override the default clocksource - Override the default clocksource and use the clocksource - with the name specified. - tipar.timeout= [HW,PPT] Set communications timeout in tenths of a second (default 15). diff --git a/Documentation/ldm.txt b/Documentation/ldm.txt index e266e11c19a..718085bc9f1 100644 --- a/Documentation/ldm.txt +++ b/Documentation/ldm.txt @@ -2,10 +2,13 @@ LDM - Logical Disk Manager (Dynamic Disks) ------------------------------------------ +Originally Written by FlatCap - Richard Russon <ldm@flatcap.org>. +Last Updated by Anton Altaparmakov on 30 March 2007 for Windows Vista. + Overview -------- -Windows 2000 and XP use a new partitioning scheme. It is a complete +Windows 2000, XP, and Vista use a new partitioning scheme. It is a complete replacement for the MSDOS style partitions. It stores its information in a 1MiB journalled database at the end of the physical disk. The size of partitions is limited only by disk space. The maximum number of partitions is @@ -23,7 +26,11 @@ Once the LDM driver has divided up the disk, you can use the MD driver to assemble any multi-partition volumes, e.g. Stripes, RAID5. To prevent legacy applications from repartitioning the disk, the LDM creates a -dummy MSDOS partition containing one disk-sized partition. +dummy MSDOS partition containing one disk-sized partition. This is what is +supported with the Linux LDM driver. + +A newer approach that has been implemented with Vista is to put LDM on top of a +GPT label disk. This is not supported by the Linux LDM driver yet. Example @@ -88,13 +95,13 @@ and cannot boot from a Dynamic Disk. More Documentation ------------------ -There is an Overview of the LDM online together with complete Technical -Documentation. It can also be downloaded in html. +There is an Overview of the LDM together with complete Technical Documentation. +It is available for download. - http://linux-ntfs.sourceforge.net/ldm/index.html - http://linux-ntfs.sourceforge.net/downloads.html + http://www.linux-ntfs.org/content/view/19/37/ -If you have any LDM questions that aren't answered on the website, email me. +If you have any LDM questions that aren't answered in the documentation, email +me. Cheers, FlatCap - Richard Russon diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt index 58408dd023c..650657c5473 100644 --- a/Documentation/memory-barriers.txt +++ b/Documentation/memory-barriers.txt @@ -24,7 +24,7 @@ Contents: (*) Explicit kernel barriers. - Compiler barrier. - - The CPU memory barriers. + - CPU memory barriers. - MMIO write barrier. (*) Implicit kernel memory barriers. @@ -265,7 +265,7 @@ Memory barriers are such interventions. They impose a perceived partial ordering over the memory operations on either side of the barrier. Such enforcement is important because the CPUs and other devices in a system -can use a variety of tricks to improve performance - including reordering, +can use a variety of tricks to improve performance, including reordering, deferral and combination of memory operations; speculative loads; speculative branch prediction and various types of caching. Memory barriers are used to override or suppress these tricks, allowing the code to sanely control the @@ -457,7 +457,7 @@ sequence, Q must be either &A or &B, and that: (Q == &A) implies (D == 1) (Q == &B) implies (D == 4) -But! CPU 2's perception of P may be updated _before_ its perception of B, thus +But! CPU 2's perception of P may be updated _before_ its perception of B, thus leading to the following situation: (Q == &B) and (D == 2) ???? @@ -573,7 +573,7 @@ Basically, the read barrier always has to be there, even though it can be of the "weaker" type. [!] Note that the stores before the write barrier would normally be expected to -match the loads after the read barrier or data dependency barrier, and vice +match the loads after the read barrier or the data dependency barrier, and vice versa: CPU 1 CPU 2 @@ -588,7 +588,7 @@ versa: EXAMPLES OF MEMORY BARRIER SEQUENCES ------------------------------------ -Firstly, write barriers act as a partial orderings on store operations. +Firstly, write barriers act as partial orderings on store operations. Consider the following sequence of events: CPU 1 @@ -608,15 +608,15 @@ STORE B, STORE C } all occurring before the unordered set of { STORE D, STORE E +-------+ : : | | +------+ | |------>| C=3 | } /\ - | | : +------+ }----- \ -----> Events perceptible - | | : | A=1 | } \/ to rest of system + | | : +------+ }----- \ -----> Events perceptible to + | | : | A=1 | } \/ the rest of the system | | : +------+ } | CPU 1 | : | B=2 | } | | +------+ } | | wwwwwwwwwwwwwwww } <--- At this point the write barrier | | +------+ } requires all stores prior to the | | : | E=5 | } barrier to be committed before - | | : +------+ } further stores may be take place. + | | : +------+ } further stores may take place | |------>| D=4 | } | | +------+ +-------+ : : @@ -626,7 +626,7 @@ STORE B, STORE C } all occurring before the unordered set of { STORE D, STORE E V -Secondly, data dependency barriers act as a partial orderings on data-dependent +Secondly, data dependency barriers act as partial orderings on data-dependent loads. Consider the following sequence of events: CPU 1 CPU 2 @@ -975,7 +975,7 @@ compiler from moving the memory accesses either side of it to the other side: barrier(); -This a general barrier - lesser varieties of compiler barrier do not exist. +This is a general barrier - lesser varieties of compiler barrier do not exist. The compiler barrier has no direct effect on the CPU, which may then reorder things however it wishes. @@ -997,7 +997,7 @@ The Linux kernel has eight basic CPU memory barriers: All CPU memory barriers unconditionally imply compiler barriers. SMP memory barriers are reduced to compiler barriers on uniprocessor compiled -systems because it is assumed that a CPU will be appear to be self-consistent, +systems because it is assumed that a CPU will appear to be self-consistent, and will order overlapping accesses correctly with respect to itself. [!] Note that SMP memory barriers _must_ be used to control the ordering of @@ -1146,9 +1146,9 @@ for each construct. These operations all imply certain barriers: Therefore, from (1), (2) and (4) an UNLOCK followed by an unconditional LOCK is equivalent to a full barrier, but a LOCK followed by an UNLOCK is not. -[!] Note: one of the consequence of LOCKs and UNLOCKs being only one-way - barriers is that the effects instructions outside of a critical section may - seep into the inside of the critical section. +[!] Note: one of the consequences of LOCKs and UNLOCKs being only one-way + barriers is that the effects of instructions outside of a critical section + may seep into the inside of the critical section. A LOCK followed by an UNLOCK may not be assumed to be full memory barrier because it is possible for an access preceding the LOCK to happen after the @@ -1239,7 +1239,7 @@ three CPUs; then should the following sequence of events occur: UNLOCK M UNLOCK Q *D = d; *H = h; -Then there is no guarantee as to what order CPU #3 will see the accesses to *A +Then there is no guarantee as to what order CPU 3 will see the accesses to *A through *H occur in, other than the constraints imposed by the separate locks on the separate CPUs. It might, for example, see: @@ -1269,12 +1269,12 @@ However, if the following occurs: UNLOCK M [2] *H = h; -CPU #3 might see: +CPU 3 might see: *E, LOCK M [1], *C, *B, *A, UNLOCK M [1], LOCK M [2], *H, *F, *G, UNLOCK M [2], *D -But assuming CPU #1 gets the lock first, it won't see any of: +But assuming CPU 1 gets the lock first, CPU 3 won't see any of: *B, *C, *D, *F, *G or *H preceding LOCK M [1] *A, *B or *C following UNLOCK M [1] @@ -1327,12 +1327,12 @@ spinlock, for example: mmiowb(); spin_unlock(Q); -this will ensure that the two stores issued on CPU #1 appear at the PCI bridge -before either of the stores issued on CPU #2. +this will ensure that the two stores issued on CPU 1 appear at the PCI bridge +before either of the stores issued on CPU 2. -Furthermore, following a store by a load to the same device obviates the need -for an mmiowb(), because the load forces the store to complete before the load +Furthermore, following a store by a load from the same device obviates the need +for the mmiowb(), because the load forces the store to complete before the load is performed: CPU 1 CPU 2 @@ -1363,7 +1363,7 @@ circumstances in which reordering definitely _could_ be a problem: (*) Atomic operations. - (*) Accessing devices (I/O). + (*) Accessing devices. (*) Interrupts. @@ -1399,7 +1399,7 @@ To wake up a particular waiter, the up_read() or up_write() functions have to: (1) read the next pointer from this waiter's record to know as to where the next waiter record is; - (4) read the pointer to the waiter's task structure; + (2) read the pointer to the waiter's task structure; (3) clear the task pointer to tell the waiter it has been given the semaphore; @@ -1407,7 +1407,7 @@ To wake up a particular waiter, the up_read() or up_write() functions have to: (5) release the reference held on the waiter's task struct. -In otherwords, it has to perform this sequence of events: +In other words, it has to perform this sequence of events: LOAD waiter->list.next; LOAD waiter->task; @@ -1502,7 +1502,7 @@ operations and adjusting reference counters towards object destruction, and as such the implicit memory barrier effects are necessary. -The following operation are potential problems as they do _not_ imply memory +The following operations are potential problems as they do _not_ imply memory barriers, but might be used for implementing such things as UNLOCK-class operations: @@ -1517,7 +1517,7 @@ With these the appropriate explicit memory barrier should be used if necessary The following also do _not_ imply memory barriers, and so may require explicit memory barriers under some circumstances (smp_mb__before_atomic_dec() for -instance)): +instance): atomic_add(); atomic_sub(); @@ -1641,8 +1641,8 @@ functions: indeed have special I/O space access cycles and instructions, but many CPUs don't have such a concept. - The PCI bus, amongst others, defines an I/O space concept - which on such - CPUs as i386 and x86_64 cpus readily maps to the CPU's concept of I/O + The PCI bus, amongst others, defines an I/O space concept which - on such + CPUs as i386 and x86_64 - readily maps to the CPU's concept of I/O space. However, it may also be mapped as a virtual I/O space in the CPU's memory map, particularly on those CPUs that don't support alternate I/O spaces. @@ -1664,7 +1664,7 @@ functions: i386 architecture machines, for example, this is controlled by way of the MTRR registers. - Ordinarily, these will be guaranteed to be fully ordered and uncombined,, + Ordinarily, these will be guaranteed to be fully ordered and uncombined, provided they're not accessing a prefetchable device. However, intermediary hardware (such as a PCI bridge) may indulge in @@ -1689,7 +1689,7 @@ functions: (*) ioreadX(), iowriteX() - These will perform as appropriate for the type of access they're actually + These will perform appropriately for the type of access they're actually doing, be it inX()/outX() or readX()/writeX(). @@ -1705,7 +1705,7 @@ of arch-specific code. This means that it must be considered that the CPU will execute its instruction stream in any order it feels like - or even in parallel - provided that if an -instruction in the stream depends on the an earlier instruction, then that +instruction in the stream depends on an earlier instruction, then that earlier instruction must be sufficiently complete[*] before the later instruction may proceed; in other words: provided that the appearance of causality is maintained. @@ -1795,8 +1795,8 @@ eventually become visible on all CPUs, there's no guarantee that they will become apparent in the same order on those other CPUs. -Consider dealing with a system that has pair of CPUs (1 & 2), each of which has -a pair of parallel data caches (CPU 1 has A/B, and CPU 2 has C/D): +Consider dealing with a system that has a pair of CPUs (1 & 2), each of which +has a pair of parallel data caches (CPU 1 has A/B, and CPU 2 has C/D): : : +--------+ @@ -1835,7 +1835,7 @@ Imagine the system has the following properties: (*) the coherency queue is not flushed by normal loads to lines already present in the cache, even though the contents of the queue may - potentially effect those loads. + potentially affect those loads. Imagine, then, that two writes are made on the first CPU, with a write barrier between them to guarantee that they will appear to reach that CPU's caches in @@ -1845,7 +1845,7 @@ the requisite order: =============== =============== ======================================= u == 0, v == 1 and p == &u, q == &u v = 2; - smp_wmb(); Make sure change to v visible before + smp_wmb(); Make sure change to v is visible before change to p <A:modify v=2> v is now in cache A exclusively p = &v; @@ -1853,7 +1853,7 @@ the requisite order: The write memory barrier forces the other CPUs in the system to perceive that the local CPU's caches have apparently been updated in the correct order. But -now imagine that the second CPU that wants to read those values: +now imagine that the second CPU wants to read those values: CPU 1 CPU 2 COMMENT =============== =============== ======================================= @@ -1861,7 +1861,7 @@ now imagine that the second CPU that wants to read those values: q = p; x = *q; -The above pair of reads may then fail to happen in expected order, as the +The above pair of reads may then fail to happen in the expected order, as the cacheline holding p may get updated in one of the second CPU's caches whilst the update to the cacheline holding v is delayed in the other of the second CPU's caches by some other cache event: @@ -1916,7 +1916,7 @@ access depends on a read, not all do, so it may not be relied on. Other CPUs may also have split caches, but must coordinate between the various cachelets for normal memory accesses. The semantics of the Alpha removes the -need for coordination in absence of memory barriers. +need for coordination in the absence of memory barriers. CACHE COHERENCY VS DMA @@ -1931,10 +1931,10 @@ invalidate them as well). In addition, the data DMA'd to RAM by a device may be overwritten by dirty cache lines being written back to RAM from a CPU's cache after the device has -installed its own data, or cache lines simply present in a CPUs cache may -simply obscure the fact that RAM has been updated, until at such time as the -cacheline is discarded from the CPU's cache and reloaded. To deal with this, -the appropriate part of the kernel must invalidate the overlapping bits of the +installed its own data, or cache lines present in the CPU's cache may simply +obscure the fact that RAM has been updated, until at such time as the cacheline +is discarded from the CPU's cache and reloaded. To deal with this, the +appropriate part of the kernel must invalidate the overlapping bits of the cache on each CPU. See Documentation/cachetlb.txt for more information on cache management. @@ -1944,7 +1944,7 @@ CACHE COHERENCY VS MMIO ----------------------- Memory mapped I/O usually takes place through memory locations that are part of -a window in the CPU's memory space that have different properties assigned than +a window in the CPU's memory space that has different properties assigned than the usual RAM directed window. Amongst these properties is usually the fact that such accesses bypass the @@ -1960,7 +1960,7 @@ THE THINGS CPUS GET UP TO ========================= A programmer might take it for granted that the CPU will perform memory -operations in exactly the order specified, so that if a CPU is, for example, +operations in exactly the order specified, so that if the CPU is, for example, given the following piece of code to execute: a = *A; @@ -1969,7 +1969,7 @@ given the following piece of code to execute: d = *D; *E = e; -They would then expect that the CPU will complete the memory operation for each +they would then expect that the CPU will complete the memory operation for each instruction before moving on to the next one, leading to a definite sequence of operations as seen by external observers in the system: @@ -1986,8 +1986,8 @@ assumption doesn't hold because: (*) loads may be done speculatively, and the result discarded should it prove to have been unnecessary; - (*) loads may be done speculatively, leading to the result having being - fetched at the wrong time in the expected sequence of events; + (*) loads may be done speculatively, leading to the result having been fetched + at the wrong time in the expected sequence of events; (*) the order of the memory accesses may be rearranged to promote better use of the CPU buses and caches; @@ -2069,12 +2069,12 @@ AND THEN THERE'S THE ALPHA The DEC Alpha CPU is one of the most relaxed CPUs there is. Not only that, some versions of the Alpha CPU have a split data cache, permitting them to have -two semantically related cache lines updating at separate times. This is where +two semantically-related cache lines updated at separate times. This is where the data dependency barrier really becomes necessary as this synchronises both caches with the memory coherence system, thus making it seem like pointer changes vs new data occur in the right order. -The Alpha defines the Linux's kernel's memory barrier model. +The Alpha defines the Linux kernel's memory barrier model. See the subsection on "Cache Coherency" above. diff --git a/Documentation/networking/netdevices.txt b/Documentation/networking/netdevices.txt index 847cedb238f..ce1361f9524 100644 --- a/Documentation/networking/netdevices.txt +++ b/Documentation/networking/netdevices.txt @@ -49,7 +49,7 @@ dev->hard_start_xmit: for this and return -1 when the spin lock fails. The locking there should also properly protect against set_multicast_list - Context: BHs disabled + Context: Process with BHs disabled or BH (timer). Notes: netif_queue_stopped() is guaranteed false Interrupts must be enabled when calling hard_start_xmit. (Interrupts must also be enabled when enabling the BH handler.) diff --git a/Documentation/s390/cds.txt b/Documentation/s390/cds.txt index 05a2b4f7e38..58919d6a593 100644 --- a/Documentation/s390/cds.txt +++ b/Documentation/s390/cds.txt @@ -51,13 +51,8 @@ The major changes are: * The interrupt handlers must be adapted to use a ccw_device as argument. Moreover, they don't return a devstat, but an irb. * Before initiating an io, the options must be set via ccw_device_set_options(). - -read_dev_chars() - read device characteristics - -read_conf_data() -read_conf_data_lpm() - read configuration data. +* Instead of calling read_dev_chars()/read_conf_data(), the driver issues + the channel program and handles the interrupt itself. ccw_device_get_ciw() get commands from extended sense data. @@ -130,11 +125,6 @@ present their hardware status by the same (shared) IRQ, the operating system has to call every single device driver registered on this IRQ in order to determine the device driver owning the device that raised the interrupt. -In order not to introduce a new I/O concept to the common Linux code, -Linux/390 preserves the IRQ concept and semantically maps the ESA/390 -subchannels to Linux as IRQs. This allows Linux/390 to support up to 64k -different IRQs, uniquely representing a single device each. - Up to kernel 2.4, Linux/390 used to provide interfaces via the IRQ (subchannel). For internal use of the common I/O layer, these are still there. However, device drivers should use the new calling interface via the ccw_device only. @@ -151,9 +141,8 @@ information during their initialization step to recognize the devices they support using the information saved in the struct ccw_device given to them. This methods implies that Linux/390 doesn't require to probe for free (not armed) interrupt request lines (IRQs) to drive its devices with. Where -applicable, the device drivers can use the read_dev_chars() to retrieve device -characteristics. This can be done without having to request device ownership -previously. +applicable, the device drivers can use issue the READ DEVICE CHARACTERISTICS +ccw to retrieve device characteristics in its online routine. In order to allow for easy I/O initiation the CDS layer provides a ccw_device_start() interface that takes a device specific channel program (one @@ -170,69 +159,6 @@ SUBCHANNEL (HSCH) command without having pending I/O requests. This function is also covered by ccw_device_halt(). -read_dev_chars() - Read Device Characteristics - -This routine returns the characteristics for the device specified. - -The function is meant to be called with the device already enabled; that is, -at earliest during set_online() processing. - -The ccw_device must not be locked prior to calling read_dev_chars(). - -The function may be called enabled or disabled. - -int read_dev_chars(struct ccw_device *cdev, void **buffer, int length ); - -cdev - the ccw_device the information is requested for. -buffer - pointer to a buffer pointer. The buffer pointer itself - must contain a valid buffer area. -length - length of the buffer provided. - -The read_dev_chars() function returns : - - 0 - successful completion --ENODEV - cdev invalid --EINVAL - an invalid parameter was detected, or the function was called early. --EBUSY - an irrecoverable I/O error occurred or the device is not - operational. - - -read_conf_data(), read_conf_data_lpm() - Read Configuration Data - -Retrieve the device dependent configuration data. Please have a look at your -device dependent I/O commands for the device specific layout of the node -descriptor elements. read_conf_data_lpm() will retrieve the configuration data -for a specific path. - -The function is meant to be called with the device already enabled; that is, -at earliest during set_online() processing. - -The function may be called enabled or disabled, but the device must not be -locked - -int read_conf_data(struct ccw_device, void **buffer, int *length); -int read_conf_data_lpm(struct ccw_device, void **buffer, int *length, __u8 lpm); - -cdev - the ccw_device the data is requested for. -buffer - Pointer to a buffer pointer. The read_conf_data() routine - will allocate a buffer and initialize the buffer pointer - accordingly. It's the device driver's responsibility to - release the kernel memory if no longer needed. -length - Length of the buffer allocated and retrieved. -lpm - Logical path mask to be used for retrieving the data. If - zero the data is retrieved on the next path available. - -The read_conf_data() function returns : - 0 - Successful completion --ENODEV - cdev invalid. --EINVAL - An invalid parameter was detected, or the function was called early. --EIO - An irrecoverable I/O error occurred or the device is - not operational. --ENOMEM - The read_conf_data() routine couldn't obtain storage. --EOPNOTSUPP - The device doesn't support the read configuration - data command. - - get_ciw() - get command information word This call enables a device driver to get information about supported commands diff --git a/Documentation/spi/spi-summary b/Documentation/spi/spi-summary index 795fbb48ffa..76ea6c837be 100644 --- a/Documentation/spi/spi-summary +++ b/Documentation/spi/spi-summary @@ -1,26 +1,30 @@ Overview of Linux kernel SPI support ==================================== -02-Dec-2005 +21-May-2007 What is SPI? ------------ The "Serial Peripheral Interface" (SPI) is a synchronous four wire serial link used to connect microcontrollers to sensors, memory, and peripherals. +It's a simple "de facto" standard, not complicated enough to acquire a +standardization body. SPI uses a master/slave configuration. The three signal wires hold a clock (SCK, often on the order of 10 MHz), and parallel data lines with "Master Out, Slave In" (MOSI) or "Master In, Slave Out" (MISO) signals. (Other names are also used.) There are four clocking modes through which data is exchanged; mode-0 and mode-3 are most commonly used. Each clock cycle shifts data out and data in; the clock -doesn't cycle except when there is data to shift. +doesn't cycle except when there is a data bit to shift. Not all data bits +are used though; not every protocol uses those full duplex capabilities. -SPI masters may use a "chip select" line to activate a given SPI slave +SPI masters use a fourth "chip select" line to activate a given SPI slave device, so those three signal wires may be connected to several chips -in parallel. All SPI slaves support chipselects. Some devices have +in parallel. All SPI slaves support chipselects; they are usually active +low signals, labeled nCSx for slave 'x' (e.g. nCS0). Some devices have other signals, often including an interrupt to the master. -Unlike serial busses like USB or SMBUS, even low level protocols for +Unlike serial busses like USB or SMBus, even low level protocols for SPI slave functions are usually not interoperable between vendors (except for commodities like SPI memory chips). @@ -33,6 +37,11 @@ SPI slave functions are usually not interoperable between vendors - Some devices may use eight bit words. Others may different word lengths, such as streams of 12-bit or 20-bit digital samples. + - Words are usually sent with their most significant bit (MSB) first, + but sometimes the least significant bit (LSB) goes first instead. + + - Sometimes SPI is used to daisy-chain devices, like shift registers. + In the same way, SPI slaves will only rarely support any kind of automatic discovery/enumeration protocol. The tree of slave devices accessible from a given SPI master will normally be set up manually, with configuration @@ -44,6 +53,14 @@ half-duplex SPI, for request/response protocols), SSP ("Synchronous Serial Protocol"), PSP ("Programmable Serial Protocol"), and other related protocols. +Some chips eliminate a signal line by combining MOSI and MISO, and +limiting themselves to half-duplex at the hardware level. In fact +some SPI chips have this signal mode as a strapping option. These +can be accessed using the same programming interface as SPI, but of +course they won't handle full duplex transfers. You may find such +chips described as using "three wire" signaling: SCK, data, nCSx. +(That data line is sometimes called MOMI or SISO.) + Microcontrollers often support both master and slave sides of the SPI protocol. This document (and Linux) currently only supports the master side of SPI interactions. @@ -74,6 +91,32 @@ interfaces with SPI modes. Given SPI support, they could use MMC or SD cards without needing a special purpose MMC/SD/SDIO controller. +I'm confused. What are these four SPI "clock modes"? +----------------------------------------------------- +It's easy to be confused here, and the vendor documentation you'll +find isn't necessarily helpful. The four modes combine two mode bits: + + - CPOL indicates the initial clock polarity. CPOL=0 means the + clock starts low, so the first (leading) edge is rising, and + the second (trailing) edge is falling. CPOL=1 means the clock + starts high, so the first (leading) edge is falling. + + - CPHA indicates the clock phase used to sample data; CPHA=0 says + sample on the leading edge, CPHA=1 means the trailing edge. + + Since the signal needs to stablize before it's sampled, CPHA=0 + implies that its data is written half a clock before the first + clock edge. The chipselect may have made it become available. + +Chip specs won't always say "uses SPI mode X" in as many words, +but their timing diagrams will make the CPOL and CPHA modes clear. + +In the SPI mode number, CPOL is the high order bit and CPHA is the +low order bit. So when a chip's timing diagram shows the clock +starting low (CPOL=0) and data stabilized for sampling during the +trailing clock edge (CPHA=1), that's SPI mode 1. + + How do these driver programming interfaces work? ------------------------------------------------ The <linux/spi/spi.h> header file includes kerneldoc, as does the diff --git a/Documentation/vm/slabinfo.c b/Documentation/vm/slabinfo.c index 686a8e04a4f..d4f21ffd140 100644 --- a/Documentation/vm/slabinfo.c +++ b/Documentation/vm/slabinfo.c @@ -242,6 +242,9 @@ void decode_numa_list(int *numa, char *t) memset(numa, 0, MAX_NODES * sizeof(int)); + if (!t) + return; + while (*t == 'N') { t++; node = strtoul(t, &t, 10); @@ -259,11 +262,17 @@ void decode_numa_list(int *numa, char *t) void slab_validate(struct slabinfo *s) { + if (strcmp(s->name, "*") == 0) + return; + set_obj(s, "validate", 1); } void slab_shrink(struct slabinfo *s) { + if (strcmp(s->name, "*") == 0) + return; + set_obj(s, "shrink", 1); } @@ -386,7 +395,9 @@ void report(struct slabinfo *s) { if (strcmp(s->name, "*") == 0) return; - printf("\nSlabcache: %-20s Aliases: %2d Order : %2d\n", s->name, s->aliases, s->order); + + printf("\nSlabcache: %-20s Aliases: %2d Order : %2d Objects: %d\n", + s->name, s->aliases, s->order, s->objects); if (s->hwcache_align) printf("** Hardware cacheline aligned\n"); if (s->cache_dma) @@ -545,6 +556,9 @@ int slab_empty(struct slabinfo *s) void slab_debug(struct slabinfo *s) { + if (strcmp(s->name, "*") == 0) + return; + if (sanity && !s->sanity_checks) { set_obj(s, "sanity", 1); } @@ -791,11 +805,11 @@ void totals(void) store_size(b1, total_size);store_size(b2, total_waste); store_size(b3, total_waste * 100 / total_used); - printf("Memory used: %6s # Loss : %6s MRatio: %6s%%\n", b1, b2, b3); + printf("Memory used: %6s # Loss : %6s MRatio:%6s%%\n", b1, b2, b3); store_size(b1, total_objects);store_size(b2, total_partobj); store_size(b3, total_partobj * 100 / total_objects); - printf("# Objects : %6s # PartObj: %6s ORatio: %6s%%\n", b1, b2, b3); + printf("# Objects : %6s # PartObj: %6s ORatio:%6s%%\n", b1, b2, b3); printf("\n"); printf("Per Cache Average Min Max Total\n"); @@ -818,7 +832,7 @@ void totals(void) store_size(b1, avg_ppart);store_size(b2, min_ppart); store_size(b3, max_ppart); store_size(b4, total_partial * 100 / total_slabs); - printf("%%PartSlab %10s%% %10s%% %10s%% %10s%%\n", + printf("%%PartSlab%10s%% %10s%% %10s%% %10s%%\n", b1, b2, b3, b4); store_size(b1, avg_partobj);store_size(b2, min_partobj); @@ -830,7 +844,7 @@ void totals(void) store_size(b1, avg_ppartobj);store_size(b2, min_ppartobj); store_size(b3, max_ppartobj); store_size(b4, total_partobj * 100 / total_objects); - printf("%% PartObj %10s%% %10s%% %10s%% %10s%%\n", + printf("%% PartObj%10s%% %10s%% %10s%% %10s%%\n", b1, b2, b3, b4); store_size(b1, avg_size);store_size(b2, min_size); @@ -1100,6 +1114,8 @@ void output_slabs(void) ops(slab); else if (show_slab) slabcache(slab); + else if (show_report) + report(slab); } } |