diff options
Diffstat (limited to 'Documentation')
75 files changed, 1283 insertions, 303 deletions
diff --git a/Documentation/00-INDEX b/Documentation/00-INDEX index 38f8444bdd0..07de7e19b4c 100644 --- a/Documentation/00-INDEX +++ b/Documentation/00-INDEX @@ -29,6 +29,8 @@ DMA-ISA-LPC.txt - How to do DMA with ISA (and LPC) devices. DMA-attributes.txt - listing of the various possible attributes a DMA region can have +dmatest.txt + - how to compile, configure and use the dmatest system. DocBook/ - directory with DocBook templates etc. for kernel documentation. EDID/ @@ -77,6 +79,8 @@ arm/ - directory with info about Linux on the ARM architecture. arm64/ - directory with info about Linux on the 64 bit ARM architecture. +assoc_array.txt + - generic associative array intro. atomic_ops.txt - semantics and behavior of atomic and bitmask operations. auxdisplay/ @@ -87,6 +91,8 @@ bad_memory.txt - how to use kernel parameters to exclude bad RAM regions. basic_profiling.txt - basic instructions for those who wants to profile Linux kernel. +bcache.txt + - Block-layer cache on fast SSDs to improve slow (raid) I/O performance. binfmt_misc.txt - info on the kernel support for extra binary formats. blackfin/ @@ -171,6 +177,8 @@ early-userspace/ - info about initramfs, klibc, and userspace early during boot. edac.txt - information on EDAC - Error Detection And Correction +efi-stub.txt + - How to use the EFI boot stub to bypass GRUB or elilo on EFI systems. eisa.txt - info on EISA bus support. email-clients.txt @@ -195,8 +203,8 @@ futex-requeue-pi.txt - info on requeueing of tasks from a non-PI futex to a PI futex gcov.txt - use of GCC's coverage testing tool "gcov" with the Linux kernel -gpio.txt - - overview of GPIO (General Purpose Input/Output) access conventions. +gpio/ + - gpio related documentation hid/ - directory with information on human interface devices highuid.txt @@ -255,6 +263,8 @@ kernel-docs.txt - listing of various WWW + books that document kernel internals. kernel-parameters.txt - summary listing of command line / boot prompt args for the kernel. +kernel-per-CPU-kthreads.txt + - List of all per-CPU kthreads and how they introduce jitter. kmemcheck.txt - info on dynamic checker that detects uses of uninitialized memory. kmemleak.txt @@ -299,8 +309,6 @@ memory-devices/ - directory with info on parts like the Texas Instruments EMIF driver memory-hotplug.txt - Hotpluggable memory support, how to use and current status. -memory.txt - - info on typical Linux memory problems. metag/ - directory with info about Linux on Meta architecture. mips/ @@ -311,6 +319,8 @@ mmc/ - directory with info about the MMC subsystem mn10300/ - directory with info about the mn10300 architecture port +module-signing.txt + - Kernel module signing for increased security when loading modules. mtd/ - directory with info about memory technology devices (flash) mono.txt @@ -343,6 +353,8 @@ pcmcia/ - info on the Linux PCMCIA driver. percpu-rw-semaphore.txt - RCU based read-write semaphore optimized for locking for reading +phy.txt + - Description of the generic PHY framework. pi-futex.txt - documentation on lightweight priority inheritance futexes. pinctrl.txt @@ -431,6 +443,8 @@ sysrq.txt - info on the magic SysRq key. target/ - directory with info on generating TCM v4 fabric .ko modules +this_cpu_ops.txt + - List rationale behind and the way to use this_cpu operations. thermal/ - directory with information on managing thermal issues (CPU/temp) trace/ @@ -469,6 +483,8 @@ wimax/ - directory with info about Intel Wireless Wimax Connections workqueue.txt - information on the Concurrency Managed Workqueue implementation +ww-mutex-design.txt + - Intro to Mutex wait/would deadlock handling.s x86/x86_64/ - directory with info on Linux support for AMD x86-64 (Hammer) machines. xtensa/ diff --git a/Documentation/ABI/testing/sysfs-devices-power b/Documentation/ABI/testing/sysfs-devices-power index efe449bdf81..7dbf96b724e 100644 --- a/Documentation/ABI/testing/sysfs-devices-power +++ b/Documentation/ABI/testing/sysfs-devices-power @@ -187,7 +187,7 @@ Description: Not all drivers support this attribute. If it isn't supported, attempts to read or write it will yield I/O errors. -What: /sys/devices/.../power/pm_qos_latency_us +What: /sys/devices/.../power/pm_qos_resume_latency_us Date: March 2012 Contact: Rafael J. Wysocki <rjw@rjwysocki.net> Description: @@ -205,6 +205,31 @@ Description: This attribute has no effect on system-wide suspend/resume and hibernation. +What: /sys/devices/.../power/pm_qos_latency_tolerance_us +Date: January 2014 +Contact: Rafael J. Wysocki <rjw@rjwysocki.net> +Description: + The /sys/devices/.../power/pm_qos_latency_tolerance_us attribute + contains the PM QoS active state latency tolerance limit for the + given device in microseconds. That is the maximum memory access + latency the device can suffer without any visible adverse + effects on user space functionality. If that value is the + string "any", the latency does not matter to user space at all, + but hardware should not be allowed to set the latency tolerance + for the device automatically. + + Reading "auto" from this file means that the maximum memory + access latency for the device may be determined automatically + by the hardware as needed. Writing "auto" to it allows the + hardware to be switched to this mode if there are no other + latency tolerance requirements from the kernel side. + + This attribute is only present if the feature controlled by it + is supported by the hardware. + + This attribute has no effect on runtime suspend and resume of + devices and on system-wide suspend/resume and hibernation. + What: /sys/devices/.../power/pm_qos_no_power_off Date: September 2012 Contact: Rafael J. Wysocki <rjw@rjwysocki.net> diff --git a/Documentation/ABI/testing/sysfs-power b/Documentation/ABI/testing/sysfs-power index 205a7387844..64c9276e942 100644 --- a/Documentation/ABI/testing/sysfs-power +++ b/Documentation/ABI/testing/sysfs-power @@ -12,8 +12,9 @@ Contact: Rafael J. Wysocki <rjw@rjwysocki.net> Description: The /sys/power/state file controls the system power state. Reading from this file returns what states are supported, - which is hard-coded to 'standby' (Power-On Suspend), 'mem' - (Suspend-to-RAM), and 'disk' (Suspend-to-Disk). + which is hard-coded to 'freeze' (Low-Power Idle), 'standby' + (Power-On Suspend), 'mem' (Suspend-to-RAM), and 'disk' + (Suspend-to-Disk). Writing to this file one of these strings causes the system to transition into that state. Please see the file diff --git a/Documentation/PCI/MSI-HOWTO.txt b/Documentation/PCI/MSI-HOWTO.txt index a8d01005f48..10a93696e55 100644 --- a/Documentation/PCI/MSI-HOWTO.txt +++ b/Documentation/PCI/MSI-HOWTO.txt @@ -82,7 +82,19 @@ Most of the hard work is done for the driver in the PCI layer. It simply has to request that the PCI layer set up the MSI capability for this device. -4.2.1 pci_enable_msi_range +4.2.1 pci_enable_msi + +int pci_enable_msi(struct pci_dev *dev) + +A successful call allocates ONE interrupt to the device, regardless +of how many MSIs the device supports. The device is switched from +pin-based interrupt mode to MSI mode. The dev->irq number is changed +to a new number which represents the message signaled interrupt; +consequently, this function should be called before the driver calls +request_irq(), because an MSI is delivered via a vector that is +different from the vector of a pin-based interrupt. + +4.2.2 pci_enable_msi_range int pci_enable_msi_range(struct pci_dev *dev, int minvec, int maxvec) @@ -147,6 +159,11 @@ static int foo_driver_enable_msi(struct pci_dev *pdev, int nvec) return pci_enable_msi_range(pdev, nvec, nvec); } +Note, unlike pci_enable_msi_exact() function, which could be also used to +enable a particular number of MSI-X interrupts, pci_enable_msi_range() +returns either a negative errno or 'nvec' (not negative errno or 0 - as +pci_enable_msi_exact() does). + 4.2.1.3 Single MSI mode The most notorious example of the request type described above is @@ -158,7 +175,27 @@ static int foo_driver_enable_single_msi(struct pci_dev *pdev) return pci_enable_msi_range(pdev, 1, 1); } -4.2.2 pci_disable_msi +Note, unlike pci_enable_msi() function, which could be also used to +enable the single MSI mode, pci_enable_msi_range() returns either a +negative errno or 1 (not negative errno or 0 - as pci_enable_msi() +does). + +4.2.3 pci_enable_msi_exact + +int pci_enable_msi_exact(struct pci_dev *dev, int nvec) + +This variation on pci_enable_msi_range() call allows a device driver to +request exactly 'nvec' MSIs. + +If this function returns a negative number, it indicates an error and +the driver should not attempt to request any more MSI interrupts for +this device. + +By contrast with pci_enable_msi_range() function, pci_enable_msi_exact() +returns zero in case of success, which indicates MSI interrupts have been +successfully allocated. + +4.2.4 pci_disable_msi void pci_disable_msi(struct pci_dev *dev) @@ -172,7 +209,7 @@ on any interrupt for which it previously called request_irq(). Failure to do so results in a BUG_ON(), leaving the device with MSI enabled and thus leaking its vector. -4.2.3 pci_msi_vec_count +4.2.4 pci_msi_vec_count int pci_msi_vec_count(struct pci_dev *dev) @@ -257,8 +294,8 @@ possible, likely up to the limit returned by pci_msix_vec_count() function: static int foo_driver_enable_msix(struct foo_adapter *adapter, int nvec) { - return pci_enable_msi_range(adapter->pdev, adapter->msix_entries, - 1, nvec); + return pci_enable_msix_range(adapter->pdev, adapter->msix_entries, + 1, nvec); } Note the value of 'minvec' parameter is 1. As 'minvec' is inclusive, @@ -269,8 +306,8 @@ In this case the function could look like this: static int foo_driver_enable_msix(struct foo_adapter *adapter, int nvec) { - return pci_enable_msi_range(adapter->pdev, adapter->msix_entries, - FOO_DRIVER_MINIMUM_NVEC, nvec); + return pci_enable_msix_range(adapter->pdev, adapter->msix_entries, + FOO_DRIVER_MINIMUM_NVEC, nvec); } 4.3.1.2 Exact number of MSI-X interrupts @@ -282,10 +319,15 @@ parameters: static int foo_driver_enable_msix(struct foo_adapter *adapter, int nvec) { - return pci_enable_msi_range(adapter->pdev, adapter->msix_entries, - nvec, nvec); + return pci_enable_msix_range(adapter->pdev, adapter->msix_entries, + nvec, nvec); } +Note, unlike pci_enable_msix_exact() function, which could be also used to +enable a particular number of MSI-X interrupts, pci_enable_msix_range() +returns either a negative errno or 'nvec' (not negative errno or 0 - as +pci_enable_msix_exact() does). + 4.3.1.3 Specific requirements to the number of MSI-X interrupts As noted above, there could be devices that can not operate with just any @@ -332,7 +374,64 @@ Note how pci_enable_msix_range() return value is analized for a fallback - any error code other than -ENOSPC indicates a fatal error and should not be retried. -4.3.2 pci_disable_msix +4.3.2 pci_enable_msix_exact + +int pci_enable_msix_exact(struct pci_dev *dev, + struct msix_entry *entries, int nvec) + +This variation on pci_enable_msix_range() call allows a device driver to +request exactly 'nvec' MSI-Xs. + +If this function returns a negative number, it indicates an error and +the driver should not attempt to allocate any more MSI-X interrupts for +this device. + +By contrast with pci_enable_msix_range() function, pci_enable_msix_exact() +returns zero in case of success, which indicates MSI-X interrupts have been +successfully allocated. + +Another version of a routine that enables MSI-X mode for a device with +specific requirements described in chapter 4.3.1.3 might look like this: + +/* + * Assume 'minvec' and 'maxvec' are non-zero + */ +static int foo_driver_enable_msix(struct foo_adapter *adapter, + int minvec, int maxvec) +{ + int rc; + + minvec = roundup_pow_of_two(minvec); + maxvec = rounddown_pow_of_two(maxvec); + + if (minvec > maxvec) + return -ERANGE; + +retry: + rc = pci_enable_msix_exact(adapter->pdev, + adapter->msix_entries, maxvec); + + /* + * -ENOSPC is the only error code allowed to be analyzed + */ + if (rc == -ENOSPC) { + if (maxvec == 1) + return -ENOSPC; + + maxvec /= 2; + + if (minvec > maxvec) + return -ENOSPC; + + goto retry; + } else if (rc < 0) { + return rc; + } + + return maxvec; +} + +4.3.3 pci_disable_msix void pci_disable_msix(struct pci_dev *dev) diff --git a/Documentation/RCU/00-INDEX b/Documentation/RCU/00-INDEX index 1d7a885761f..fa57139f50b 100644 --- a/Documentation/RCU/00-INDEX +++ b/Documentation/RCU/00-INDEX @@ -8,6 +8,8 @@ listRCU.txt - Using RCU to Protect Read-Mostly Linked Lists lockdep.txt - RCU and lockdep checking +lockdep-splat.txt + - RCU Lockdep splats explained. NMI-RCU.txt - Using RCU to Protect Dynamic NMI Handlers rcubarrier.txt diff --git a/Documentation/RCU/RTFP.txt b/Documentation/RCU/RTFP.txt index 273e654d7d0..2f0fcb2112d 100644 --- a/Documentation/RCU/RTFP.txt +++ b/Documentation/RCU/RTFP.txt @@ -31,6 +31,14 @@ has lapsed, so this approach may be used in non-GPL software, if desired. (In contrast, implementation of RCU is permitted only in software licensed under either GPL or LGPL. Sorry!!!) +In 1987, Rashid et al. described lazy TLB-flush [RichardRashid87a]. +At first glance, this has nothing to do with RCU, but nevertheless +this paper helped inspire the update-side batching used in the later +RCU implementation in DYNIX/ptx. In 1988, Barbara Liskov published +a description of Argus that noted that use of out-of-date values can +be tolerated in some situations. Thus, this paper provides some early +theoretical justification for use of stale data. + In 1990, Pugh [Pugh90] noted that explicitly tracking which threads were reading a given data structure permitted deferred free to operate in the presence of non-terminating threads. However, this explicit @@ -41,11 +49,11 @@ providing a fine-grained locking design, however, it would be interesting to see how much of the performance advantage reported in 1990 remains today. -At about this same time, Adams [Adams91] described ``chaotic relaxation'', -where the normal barriers between successive iterations of convergent -numerical algorithms are relaxed, so that iteration $n$ might use -data from iteration $n-1$ or even $n-2$. This introduces error, -which typically slows convergence and thus increases the number of +At about this same time, Andrews [Andrews91textbook] described ``chaotic +relaxation'', where the normal barriers between successive iterations +of convergent numerical algorithms are relaxed, so that iteration $n$ +might use data from iteration $n-1$ or even $n-2$. This introduces +error, which typically slows convergence and thus increases the number of iterations required. However, this increase is sometimes more than made up for by a reduction in the number of expensive barrier operations, which are otherwise required to synchronize the threads at the end @@ -55,7 +63,8 @@ is thus inapplicable to most data structures in operating-system kernels. In 1992, Henry (now Alexia) Massalin completed a dissertation advising parallel programmers to defer processing when feasible to simplify -synchronization. RCU makes extremely heavy use of this advice. +synchronization [HMassalinPhD]. RCU makes extremely heavy use of +this advice. In 1993, Jacobson [Jacobson93] verbally described what is perhaps the simplest deferred-free technique: simply waiting a fixed amount of time @@ -90,27 +99,29 @@ mechanism, which is quite similar to RCU [Gamsa99]. These operating systems made pervasive use of RCU in place of "existence locks", which greatly simplifies locking hierarchies and helps avoid deadlocks. -2001 saw the first RCU presentation involving Linux [McKenney01a] -at OLS. The resulting abundance of RCU patches was presented the -following year [McKenney02a], and use of RCU in dcache was first -described that same year [Linder02a]. +The year 2000 saw an email exchange that would likely have +led to yet another independent invention of something like RCU +[RustyRussell2000a,RustyRussell2000b]. Instead, 2001 saw the first +RCU presentation involving Linux [McKenney01a] at OLS. The resulting +abundance of RCU patches was presented the following year [McKenney02a], +and use of RCU in dcache was first described that same year [Linder02a]. Also in 2002, Michael [Michael02b,Michael02a] presented "hazard-pointer" techniques that defer the destruction of data structures to simplify non-blocking synchronization (wait-free synchronization, lock-free synchronization, and obstruction-free synchronization are all examples of -non-blocking synchronization). In particular, this technique eliminates -locking, reduces contention, reduces memory latency for readers, and -parallelizes pipeline stalls and memory latency for writers. However, -these techniques still impose significant read-side overhead in the -form of memory barriers. Researchers at Sun worked along similar lines -in the same timeframe [HerlihyLM02]. These techniques can be thought -of as inside-out reference counts, where the count is represented by the -number of hazard pointers referencing a given data structure rather than -the more conventional counter field within the data structure itself. -The key advantage of inside-out reference counts is that they can be -stored in immortal variables, thus allowing races between access and -deletion to be avoided. +non-blocking synchronization). The corresponding journal article appeared +in 2004 [MagedMichael04a]. This technique eliminates locking, reduces +contention, reduces memory latency for readers, and parallelizes pipeline +stalls and memory latency for writers. However, these techniques still +impose significant read-side overhead in the form of memory barriers. +Researchers at Sun worked along similar lines in the same timeframe +[HerlihyLM02]. These techniques can be thought of as inside-out reference +counts, where the count is represented by the number of hazard pointers +referencing a given data structure rather than the more conventional +counter field within the data structure itself. The key advantage +of inside-out reference counts is that they can be stored in immortal +variables, thus allowing races between access and deletion to be avoided. By the same token, RCU can be thought of as a "bulk reference count", where some form of reference counter covers all reference by a given CPU @@ -123,8 +134,10 @@ can be thought of in other terms as well. In 2003, the K42 group described how RCU could be used to create hot-pluggable implementations of operating-system functions [Appavoo03a]. -Later that year saw a paper describing an RCU implementation of System -V IPC [Arcangeli03], and an introduction to RCU in Linux Journal +Later that year saw a paper describing an RCU implementation +of System V IPC [Arcangeli03] (following up on a suggestion by +Hugh Dickins [Dickins02a] and an implementation by Mingming Cao +[MingmingCao2002IPCRCU]), and an introduction to RCU in Linux Journal [McKenney03a]. 2004 has seen a Linux-Journal article on use of RCU in dcache @@ -383,6 +396,21 @@ for Programming Languages and Operating Systems}" } } +@phdthesis{HMassalinPhD +,author="H. Massalin" +,title="Synthesis: An Efficient Implementation of Fundamental Operating +System Services" +,school="Columbia University" +,address="New York, NY" +,year="1992" +,annotation={ + Mondo optimizing compiler. + Wait-free stuff. + Good advice: defer work to avoid synchronization. See page 90 + (PDF page 106), Section 5.4, fourth bullet point. +} +} + @unpublished{Jacobson93 ,author="Van Jacobson" ,title="Avoid Read-Side Locking Via Delayed Free" @@ -671,6 +699,20 @@ Orran Krieger and Rusty Russell and Dipankar Sarma and Maneesh Soni" [Viewed October 18, 2004]" } +@conference{Michael02b +,author="Maged M. Michael" +,title="High Performance Dynamic Lock-Free Hash Tables and List-Based Sets" +,Year="2002" +,Month="August" +,booktitle="{Proceedings of the 14\textsuperscript{th} Annual ACM +Symposium on Parallel +Algorithms and Architecture}" +,pages="73-82" +,annotation={ +Like the title says... +} +} + @Conference{Linder02a ,Author="Hanna Linder and Dipankar Sarma and Maneesh Soni" ,Title="Scalability of the Directory Entry Cache" @@ -727,6 +769,24 @@ Andrea Arcangeli and Andi Kleen and Orran Krieger and Rusty Russell" } } +@conference{Michael02a +,author="Maged M. Michael" +,title="Safe Memory Reclamation for Dynamic Lock-Free Objects Using Atomic +Reads and Writes" +,Year="2002" +,Month="August" +,booktitle="{Proceedings of the 21\textsuperscript{st} Annual ACM +Symposium on Principles of Distributed Computing}" +,pages="21-30" +,annotation={ + Each thread keeps an array of pointers to items that it is + currently referencing. Sort of an inside-out garbage collection + mechanism, but one that requires the accessing code to explicitly + state its needs. Also requires read-side memory barriers on + most architectures. +} +} + @unpublished{Dickins02a ,author="Hugh Dickins" ,title="Use RCU for System-V IPC" @@ -735,6 +795,17 @@ Andrea Arcangeli and Andi Kleen and Orran Krieger and Rusty Russell" ,note="private communication" } +@InProceedings{HerlihyLM02 +,author={Maurice Herlihy and Victor Luchangco and Mark Moir} +,title="The Repeat Offender Problem: A Mechanism for Supporting Dynamic-Sized, +Lock-Free Data Structures" +,booktitle={Proceedings of 16\textsuperscript{th} International +Symposium on Distributed Computing} +,year=2002 +,month="October" +,pages="339-353" +} + @unpublished{Sarma02b ,Author="Dipankar Sarma" ,Title="Some dcache\_rcu benchmark numbers" @@ -749,6 +820,19 @@ Andrea Arcangeli and Andi Kleen and Orran Krieger and Rusty Russell" } } +@unpublished{MingmingCao2002IPCRCU +,Author="Mingming Cao" +,Title="[PATCH]updated ipc lock patch" +,month="October" +,year="2002" +,note="Available: +\url{https://lkml.org/lkml/2002/10/24/262} +[Viewed February 15, 2014]" +,annotation={ + Mingming Cao's patch to introduce RCU to SysV IPC. +} +} + @unpublished{LinusTorvalds2003a ,Author="Linus Torvalds" ,Title="Re: {[PATCH]} small fixes in brlock.h" @@ -982,6 +1066,23 @@ Realtime Applications" } } +@article{MagedMichael04a +,author="Maged M. Michael" +,title="Hazard Pointers: Safe Memory Reclamation for Lock-Free Objects" +,Year="2004" +,Month="June" +,journal="IEEE Transactions on Parallel and Distributed Systems" +,volume="15" +,number="6" +,pages="491-504" +,url="Available: +\url{http://www.research.ibm.com/people/m/michael/ieeetpds-2004.pdf} +[Viewed March 1, 2005]" +,annotation={ + New canonical hazard-pointer citation. +} +} + @phdthesis{PaulEdwardMcKenneyPhD ,author="Paul E. McKenney" ,title="Exploiting Deferred Destruction: diff --git a/Documentation/RCU/checklist.txt b/Documentation/RCU/checklist.txt index 91266193b8f..9d10d1db16a 100644 --- a/Documentation/RCU/checklist.txt +++ b/Documentation/RCU/checklist.txt @@ -256,10 +256,10 @@ over a rather long period of time, but improvements are always welcome! variations on this theme. b. Limiting update rate. For example, if updates occur only - once per hour, then no explicit rate limiting is required, - unless your system is already badly broken. The dcache - subsystem takes this approach -- updates are guarded - by a global lock, limiting their rate. + once per hour, then no explicit rate limiting is + required, unless your system is already badly broken. + Older versions of the dcache subsystem take this approach, + guarding updates with a global lock, limiting their rate. c. Trusted update -- if updates can only be done manually by superuser or some other trusted user, then it might not @@ -268,7 +268,8 @@ over a rather long period of time, but improvements are always welcome! the machine. d. Use call_rcu_bh() rather than call_rcu(), in order to take - advantage of call_rcu_bh()'s faster grace periods. + advantage of call_rcu_bh()'s faster grace periods. (This + is only a partial solution, though.) e. Periodically invoke synchronize_rcu(), permitting a limited number of updates per grace period. @@ -276,6 +277,13 @@ over a rather long period of time, but improvements are always welcome! The same cautions apply to call_rcu_bh(), call_rcu_sched(), call_srcu(), and kfree_rcu(). + Note that although these primitives do take action to avoid memory + exhaustion when any given CPU has too many callbacks, a determined + user could still exhaust memory. This is especially the case + if a system with a large number of CPUs has been configured to + offload all of its RCU callbacks onto a single CPU, or if the + system has relatively little free memory. + 9. All RCU list-traversal primitives, which include rcu_dereference(), list_for_each_entry_rcu(), and list_for_each_safe_rcu(), must be either within an RCU read-side diff --git a/Documentation/arm/00-INDEX b/Documentation/arm/00-INDEX index 36420e116c9..a94090cc785 100644 --- a/Documentation/arm/00-INDEX +++ b/Documentation/arm/00-INDEX @@ -4,6 +4,8 @@ Booting - requirements for booting Interrupts - ARM Interrupt subsystem documentation +IXP4xx + - Intel IXP4xx Network processor. msm - MSM specific documentation Netwinder @@ -24,8 +26,16 @@ SPEAr - ST SPEAr platform Linux Overview VFP/ - Release notes for Linux Kernel Vector Floating Point support code +cluster-pm-race-avoidance.txt + - Algorithm for CPU and Cluster setup/teardown empeg/ - Ltd's Empeg MP3 Car Audio Player +firmware.txt + - Secure firmware registration and calling. +kernel_mode_neon.txt + - How to use NEON instructions in kernel mode +kernel_user_helpers.txt + - Helper functions in kernel space made available for userspace. mem_alignment - alignment abort handler documentation memory.txt @@ -34,3 +44,7 @@ nwfpe/ - NWFPE floating point emulator documentation swp_emulation - SWP/SWPB emulation handler/logging description +tcm.txt + - ARM Tightly Coupled Memory +vlocks.txt + - Voting locks, low-level mechanism relying on memory system atomic writes. diff --git a/Documentation/arm64/memory.txt b/Documentation/arm64/memory.txt index 5e054bfe4dd..85e24c4f215 100644 --- a/Documentation/arm64/memory.txt +++ b/Documentation/arm64/memory.txt @@ -35,11 +35,13 @@ ffffffbc00000000 ffffffbdffffffff 8GB vmemmap ffffffbe00000000 ffffffbffbbfffff ~8GB [guard, future vmmemap] -ffffffbffbc00000 ffffffbffbdfffff 2MB earlyprintk device +ffffffbffa000000 ffffffbffaffffff 16MB PCI I/O space + +ffffffbffb000000 ffffffbffbbfffff 12MB [guard] -ffffffbffbe00000 ffffffbffbe0ffff 64KB PCI I/O space +ffffffbffbc00000 ffffffbffbdfffff 2MB earlyprintk device -ffffffbffbe10000 ffffffbcffffffff ~2MB [guard] +ffffffbffbe00000 ffffffbffbffffff 2MB [guard] ffffffbffc000000 ffffffbfffffffff 64MB modules @@ -60,11 +62,13 @@ fffffdfc00000000 fffffdfdffffffff 8GB vmemmap fffffdfe00000000 fffffdfffbbfffff ~8GB [guard, future vmmemap] -fffffdfffbc00000 fffffdfffbdfffff 2MB earlyprintk device +fffffdfffa000000 fffffdfffaffffff 16MB PCI I/O space + +fffffdfffb000000 fffffdfffbbfffff 12MB [guard] -fffffdfffbe00000 fffffdfffbe0ffff 64KB PCI I/O space +fffffdfffbc00000 fffffdfffbdfffff 2MB earlyprintk device -fffffdfffbe10000 fffffdfffbffffff ~2MB [guard] +fffffdfffbe00000 fffffdfffbffffff 2MB [guard] fffffdfffc000000 fffffdffffffffff 64MB modules diff --git a/Documentation/blackfin/00-INDEX b/Documentation/blackfin/00-INDEX index 2df0365f2df..c54fcdd4ae9 100644 --- a/Documentation/blackfin/00-INDEX +++ b/Documentation/blackfin/00-INDEX @@ -1,8 +1,10 @@ 00-INDEX - This file - +Makefile + - Makefile for gptimers example file. bfin-gpio-notes.txt - Notes in developing/using bfin-gpio driver. - bfin-spi-notes.txt - Notes for using bfin spi bus driver. +gptimers-example.c + - gptimers example diff --git a/Documentation/block/00-INDEX b/Documentation/block/00-INDEX index 929d9904f74..e840b47613f 100644 --- a/Documentation/block/00-INDEX +++ b/Documentation/block/00-INDEX @@ -14,6 +14,8 @@ deadline-iosched.txt - Deadline IO scheduler tunables ioprio.txt - Block io priorities (in CFQ scheduler) +null_blk.txt + - Null block for block-layer benchmarking. queue-sysfs.txt - Queue's sysfs entries request.txt diff --git a/Documentation/cpu-freq/core.txt b/Documentation/cpu-freq/core.txt index ce0666e5103..0060d76b445 100644 --- a/Documentation/cpu-freq/core.txt +++ b/Documentation/cpu-freq/core.txt @@ -92,7 +92,3 @@ values: cpu - number of the affected CPU old - old frequency new - new frequency - -If the cpufreq core detects the frequency has changed while the system -was suspended, these notifiers are called with CPUFREQ_RESUMECHANGE as -second argument. diff --git a/Documentation/cpu-freq/cpu-drivers.txt b/Documentation/cpu-freq/cpu-drivers.txt index 8b1a4451422..48da5fdcb9f 100644 --- a/Documentation/cpu-freq/cpu-drivers.txt +++ b/Documentation/cpu-freq/cpu-drivers.txt @@ -61,7 +61,13 @@ target_index - See below on the differences. And optionally -cpufreq_driver.exit - A pointer to a per-CPU cleanup function. +cpufreq_driver.exit - A pointer to a per-CPU cleanup + function called during CPU_POST_DEAD + phase of cpu hotplug process. + +cpufreq_driver.stop_cpu - A pointer to a per-CPU stop function + called during CPU_DOWN_PREPARE phase of + cpu hotplug process. cpufreq_driver.resume - A pointer to a per-CPU resume function which is called with interrupts disabled diff --git a/Documentation/device-mapper/cache.txt b/Documentation/device-mapper/cache.txt index e6b72d35515..68c0f517c60 100644 --- a/Documentation/device-mapper/cache.txt +++ b/Documentation/device-mapper/cache.txt @@ -124,12 +124,11 @@ the default being 204800 sectors (or 100MB). Updating on-disk metadata ------------------------- -On-disk metadata is committed every time a REQ_SYNC or REQ_FUA bio is -written. If no such requests are made then commits will occur every -second. This means the cache behaves like a physical disk that has a -write cache (the same is true of the thin-provisioning target). If -power is lost you may lose some recent writes. The metadata should -always be consistent in spite of any crash. +On-disk metadata is committed every time a FLUSH or FUA bio is written. +If no such requests are made then commits will occur every second. This +means the cache behaves like a physical disk that has a volatile write +cache. If power is lost you may lose some recent writes. The metadata +should always be consistent in spite of any crash. The 'dirty' state for a cache block changes far too frequently for us to keep updating it on the fly. So we treat it as a hint. In normal diff --git a/Documentation/device-mapper/thin-provisioning.txt b/Documentation/device-mapper/thin-provisioning.txt index 8a7a3d46e0d..05a27e9442b 100644 --- a/Documentation/device-mapper/thin-provisioning.txt +++ b/Documentation/device-mapper/thin-provisioning.txt @@ -116,6 +116,35 @@ Resuming a device with a new table itself triggers an event so the userspace daemon can use this to detect a situation where a new table already exceeds the threshold. +A low water mark for the metadata device is maintained in the kernel and +will trigger a dm event if free space on the metadata device drops below +it. + +Updating on-disk metadata +------------------------- + +On-disk metadata is committed every time a FLUSH or FUA bio is written. +If no such requests are made then commits will occur every second. This +means the thin-provisioning target behaves like a physical disk that has +a volatile write cache. If power is lost you may lose some recent +writes. The metadata should always be consistent in spite of any crash. + +If data space is exhausted the pool will either error or queue IO +according to the configuration (see: error_if_no_space). If metadata +space is exhausted or a metadata operation fails: the pool will error IO +until the pool is taken offline and repair is performed to 1) fix any +potential inconsistencies and 2) clear the flag that imposes repair. +Once the pool's metadata device is repaired it may be resized, which +will allow the pool to return to normal operation. Note that if a pool +is flagged as needing repair, the pool's data and metadata devices +cannot be resized until repair is performed. It should also be noted +that when the pool's metadata space is exhausted the current metadata +transaction is aborted. Given that the pool will cache IO whose +completion may have already been acknowledged to upper IO layers +(e.g. filesystem) it is strongly suggested that consistency checks +(e.g. fsck) be performed on those layers when repair of the pool is +required. + Thin provisioning ----------------- @@ -258,10 +287,9 @@ ii) Status should register for the event and then check the target's status. held metadata root: - The location, in sectors, of the metadata root that has been + The location, in blocks, of the metadata root that has been 'held' for userspace read access. '-' indicates there is no - held root. This feature is not yet implemented so '-' is - always returned. + held root. discard_passdown|no_discard_passdown Whether or not discards are actually being passed down to the diff --git a/Documentation/devicetree/00-INDEX b/Documentation/devicetree/00-INDEX index b78f691fd84..8c4102c6a5e 100644 --- a/Documentation/devicetree/00-INDEX +++ b/Documentation/devicetree/00-INDEX @@ -8,3 +8,5 @@ https://lists.ozlabs.org/listinfo/devicetree-discuss - this file booting-without-of.txt - Booting Linux without Open Firmware, describes history and format of device trees. +usage-model.txt + - How Linux uses DT and what DT aims to solve.
\ No newline at end of file diff --git a/Documentation/devicetree/bindings/arm/armada-370-xp-mpic.txt b/Documentation/devicetree/bindings/arm/armada-370-xp-mpic.txt index d74091a8a3b..5fc03134a99 100644 --- a/Documentation/devicetree/bindings/arm/armada-370-xp-mpic.txt +++ b/Documentation/devicetree/bindings/arm/armada-370-xp-mpic.txt @@ -1,4 +1,4 @@ -Marvell Armada 370 and Armada XP Interrupt Controller +Marvell Armada 370, 375, 38x, XP Interrupt Controller ----------------------------------------------------- Required properties: @@ -16,7 +16,13 @@ Required properties: automatically map to the interrupt controller registers of the current CPU) +Optional properties: +- interrupts: If defined, then it indicates that this MPIC is + connected as a slave to another interrupt controller. This is + typically the case on Armada 375 and Armada 38x, where the MPIC is + connected as a slave to the Cortex-A9 GIC. The provided interrupt + indicate to which GIC interrupt the MPIC output is connected. Example: diff --git a/Documentation/devicetree/bindings/arm/omap/omap.txt b/Documentation/devicetree/bindings/arm/omap/omap.txt index 34dc40cffdf..af9b4a0d902 100644 --- a/Documentation/devicetree/bindings/arm/omap/omap.txt +++ b/Documentation/devicetree/bindings/arm/omap/omap.txt @@ -91,7 +91,7 @@ Boards: compatible = "ti,omap3-beagle", "ti,omap3" - OMAP3 Tobi with Overo : Commercial expansion board with daughter board - compatible = "ti,omap3-tobi", "ti,omap3-overo", "ti,omap3" + compatible = "gumstix,omap3-overo-tobi", "gumstix,omap3-overo", "ti,omap3" - OMAP4 SDP : Software Development Board compatible = "ti,omap4-sdp", "ti,omap4430" diff --git a/Documentation/devicetree/bindings/ata/ahci-platform.txt b/Documentation/devicetree/bindings/ata/ahci-platform.txt index 89de1564950..48b285ffa3a 100644 --- a/Documentation/devicetree/bindings/ata/ahci-platform.txt +++ b/Documentation/devicetree/bindings/ata/ahci-platform.txt @@ -4,17 +4,33 @@ SATA nodes are defined to describe on-chip Serial ATA controllers. Each SATA controller should have its own node. Required properties: -- compatible : compatible list, contains "snps,spear-ahci" +- compatible : compatible list, one of "snps,spear-ahci", + "snps,exynos5440-ahci", "ibm,476gtr-ahci", + "allwinner,sun4i-a10-ahci", "fsl,imx53-ahci" + "fsl,imx6q-ahci" or "snps,dwc-ahci" - interrupts : <interrupt mapping for SATA IRQ> - reg : <registers mapping> Optional properties: - dma-coherent : Present if dma operations are coherent +- clocks : a list of phandle + clock specifier pairs +- target-supply : regulator for SATA target power -Example: +"fsl,imx53-ahci", "fsl,imx6q-ahci" required properties: +- clocks : must contain the sata, sata_ref and ahb clocks +- clock-names : must contain "ahb" for the ahb clock + +Examples: sata@ffe08000 { compatible = "snps,spear-ahci"; reg = <0xffe08000 0x1000>; interrupts = <115>; - }; + + ahci: sata@01c18000 { + compatible = "allwinner,sun4i-a10-ahci"; + reg = <0x01c18000 0x1000>; + interrupts = <56>; + clocks = <&pll6 0>, <&ahb_gates 25>; + target-supply = <®_ahci_5v>; + }; diff --git a/Documentation/devicetree/bindings/ata/apm-xgene.txt b/Documentation/devicetree/bindings/ata/apm-xgene.txt new file mode 100644 index 00000000000..7bcfbf59810 --- /dev/null +++ b/Documentation/devicetree/bindings/ata/apm-xgene.txt @@ -0,0 +1,76 @@ +* APM X-Gene 6.0 Gb/s SATA host controller nodes + +SATA host controller nodes are defined to describe on-chip Serial ATA +controllers. Each SATA controller (pair of ports) have its own node. + +Required properties: +- compatible : Shall contain: + * "apm,xgene-ahci" +- reg : First memory resource shall be the AHCI memory + resource. + Second memory resource shall be the host controller + core memory resource. + Third memory resource shall be the host controller + diagnostic memory resource. + 4th memory resource shall be the host controller + AXI memory resource. + 5th optional memory resource shall be the host + controller MUX memory resource if required. +- interrupts : Interrupt-specifier for SATA host controller IRQ. +- clocks : Reference to the clock entry. +- phys : A list of phandles + phy-specifiers, one for each + entry in phy-names. +- phy-names : Should contain: + * "sata-phy" for the SATA 6.0Gbps PHY + +Optional properties: +- status : Shall be "ok" if enabled or "disabled" if disabled. + Default is "ok". + +Example: + sataclk: sataclk { + compatible = "fixed-clock"; + #clock-cells = <1>; + clock-frequency = <100000000>; + clock-output-names = "sataclk"; + }; + + phy2: phy@1f22a000 { + compatible = "apm,xgene-phy"; + reg = <0x0 0x1f22a000 0x0 0x100>; + #phy-cells = <1>; + }; + + phy3: phy@1f23a000 { + compatible = "apm,xgene-phy"; + reg = <0x0 0x1f23a000 0x0 0x100>; + #phy-cells = <1>; + }; + + sata2: sata@1a400000 { + compatible = "apm,xgene-ahci"; + reg = <0x0 0x1a400000 0x0 0x1000>, + <0x0 0x1f220000 0x0 0x1000>, + <0x0 0x1f22d000 0x0 0x1000>, + <0x0 0x1f22e000 0x0 0x1000>, + <0x0 0x1f227000 0x0 0x1000>; + interrupts = <0x0 0x87 0x4>; + status = "ok"; + clocks = <&sataclk 0>; + phys = <&phy2 0>; + phy-names = "sata-phy"; + }; + + sata3: sata@1a800000 { + compatible = "apm,xgene-ahci-pcie"; + reg = <0x0 0x1a800000 0x0 0x1000>, + <0x0 0x1f230000 0x0 0x1000>, + <0x0 0x1f23d000 0x0 0x1000>, + <0x0 0x1f23e000 0x0 0x1000>, + <0x0 0x1f237000 0x0 0x1000>; + interrupts = <0x0 0x88 0x4>; + status = "ok"; + clocks = <&sataclk 0>; + phys = <&phy3 0>; + phy-names = "sata-phy"; + }; diff --git a/Documentation/devicetree/bindings/clock/renesas,cpg-mstp-clocks.txt b/Documentation/devicetree/bindings/clock/renesas,cpg-mstp-clocks.txt index a6a352c2771..5992dceec7a 100644 --- a/Documentation/devicetree/bindings/clock/renesas,cpg-mstp-clocks.txt +++ b/Documentation/devicetree/bindings/clock/renesas,cpg-mstp-clocks.txt @@ -21,9 +21,9 @@ Required Properties: must appear in the same order as the output clocks. - #clock-cells: Must be 1 - clock-output-names: The name of the clocks as free-form strings - - renesas,indices: Indices of the gate clocks into the group (0 to 31) + - renesas,clock-indices: Indices of the gate clocks into the group (0 to 31) -The clocks, clock-output-names and renesas,indices properties contain one +The clocks, clock-output-names and renesas,clock-indices properties contain one entry per gate clock. The MSTP groups are sparsely populated. Unimplemented gate clocks must not be declared. diff --git a/Documentation/devicetree/bindings/dma/fsl-imx-sdma.txt b/Documentation/devicetree/bindings/dma/fsl-imx-sdma.txt index 68b83ecc385..ee9be996152 100644 --- a/Documentation/devicetree/bindings/dma/fsl-imx-sdma.txt +++ b/Documentation/devicetree/bindings/dma/fsl-imx-sdma.txt @@ -1,12 +1,16 @@ * Freescale Smart Direct Memory Access (SDMA) Controller for i.MX Required properties: -- compatible : Should be "fsl,imx31-sdma", "fsl,imx31-to1-sdma", - "fsl,imx31-to2-sdma", "fsl,imx35-sdma", "fsl,imx35-to1-sdma", - "fsl,imx35-to2-sdma", "fsl,imx51-sdma", "fsl,imx53-sdma" or - "fsl,imx6q-sdma". The -to variants should be preferred since they - allow to determnine the correct ROM script addresses needed for - the driver to work without additional firmware. +- compatible : Should be one of + "fsl,imx25-sdma" + "fsl,imx31-sdma", "fsl,imx31-to1-sdma", "fsl,imx31-to2-sdma" + "fsl,imx35-sdma", "fsl,imx35-to1-sdma", "fsl,imx35-to2-sdma" + "fsl,imx51-sdma" + "fsl,imx53-sdma" + "fsl,imx6q-sdma" + The -to variants should be preferred since they allow to determnine the + correct ROM script addresses needed for the driver to work without additional + firmware. - reg : Should contain SDMA registers location and length - interrupts : Should contain SDMA interrupt - #dma-cells : Must be <3>. diff --git a/Documentation/devicetree/bindings/interrupt-controller/allwinner,sun4i-ic.txt b/Documentation/devicetree/bindings/interrupt-controller/allwinner,sun4i-ic.txt index 32cec4b26cd..b290ca150d3 100644 --- a/Documentation/devicetree/bindings/interrupt-controller/allwinner,sun4i-ic.txt +++ b/Documentation/devicetree/bindings/interrupt-controller/allwinner,sun4i-ic.txt @@ -2,7 +2,7 @@ Allwinner Sunxi Interrupt Controller Required properties: -- compatible : should be "allwinner,sun4i-ic" +- compatible : should be "allwinner,sun4i-a10-ic" - reg : Specifies base physical address and size of the registers. - interrupt-controller : Identifies the node as an interrupt controller - #interrupt-cells : Specifies the number of cells needed to encode an @@ -11,7 +11,7 @@ Required properties: Example: intc: interrupt-controller { - compatible = "allwinner,sun4i-ic"; + compatible = "allwinner,sun4i-a10-ic"; reg = <0x01c20400 0x400>; interrupt-controller; #interrupt-cells = <1>; diff --git a/Documentation/devicetree/bindings/interrupt-controller/allwinner,sun67i-sc-nmi.txt b/Documentation/devicetree/bindings/interrupt-controller/allwinner,sun67i-sc-nmi.txt new file mode 100644 index 00000000000..d1c5cdabc3e --- /dev/null +++ b/Documentation/devicetree/bindings/interrupt-controller/allwinner,sun67i-sc-nmi.txt @@ -0,0 +1,27 @@ +Allwinner Sunxi NMI Controller +============================== + +Required properties: + +- compatible : should be "allwinner,sun7i-a20-sc-nmi" or + "allwinner,sun6i-a31-sc-nmi" +- reg : Specifies base physical address and size of the registers. +- interrupt-controller : Identifies the node as an interrupt controller +- #interrupt-cells : Specifies the number of cells needed to encode an + interrupt source. The value shall be 2. The first cell is the IRQ number, the + second cell the trigger type as defined in interrupt.txt in this directory. +- interrupt-parent: Specifies the parent interrupt controller. +- interrupts: Specifies the interrupt line (NMI) which is handled by + the interrupt controller in the parent controller's notation. This value + shall be the NMI. + +Example: + +sc-nmi-intc@01c00030 { + compatible = "allwinner,sun7i-a20-sc-nmi"; + interrupt-controller; + #interrupt-cells = <2>; + reg = <0x01c00030 0x0c>; + interrupt-parent = <&gic>; + interrupts = <0 0 4>; +}; diff --git a/Documentation/devicetree/bindings/mmc/atmel-hsmci.txt b/Documentation/devicetree/bindings/mmc/atmel-hsmci.txt index 0a85c70cd30..07ad02075a9 100644 --- a/Documentation/devicetree/bindings/mmc/atmel-hsmci.txt +++ b/Documentation/devicetree/bindings/mmc/atmel-hsmci.txt @@ -13,6 +13,9 @@ Required properties: - #address-cells: should be one. The cell is the slot id. - #size-cells: should be zero. - at least one slot node +- clock-names: tuple listing input clock names. + Required elements: "mci_clk" +- clocks: phandles to input clocks. The node contains child nodes for each slot that the platform uses @@ -24,6 +27,8 @@ mmc0: mmc@f0008000 { interrupts = <12 4>; #address-cells = <1>; #size-cells = <0>; + clock-names = "mci_clk"; + clocks = <&mci0_clk>; [ child node definitions...] }; diff --git a/Documentation/devicetree/bindings/net/allwinner,sun4i-emac.txt b/Documentation/devicetree/bindings/net/allwinner,sun4i-emac.txt index b90bfcd138f..863d5b8155c 100644 --- a/Documentation/devicetree/bindings/net/allwinner,sun4i-emac.txt +++ b/Documentation/devicetree/bindings/net/allwinner,sun4i-emac.txt @@ -1,7 +1,8 @@ * Allwinner EMAC ethernet controller Required properties: -- compatible: should be "allwinner,sun4i-emac". +- compatible: should be "allwinner,sun4i-a10-emac" (Deprecated: + "allwinner,sun4i-emac") - reg: address and length of the register set for the device. - interrupts: interrupt for the device - phy: A phandle to a phy node defining the PHY address (as the reg @@ -14,7 +15,7 @@ Optional properties: Example: emac: ethernet@01c0b000 { - compatible = "allwinner,sun4i-emac"; + compatible = "allwinner,sun4i-a10-emac"; reg = <0x01c0b000 0x1000>; interrupts = <55>; clocks = <&ahb_gates 17>; diff --git a/Documentation/devicetree/bindings/net/allwinner,sun4i-mdio.txt b/Documentation/devicetree/bindings/net/allwinner,sun4i-mdio.txt index 00b9f9a3ec1..4ec56413779 100644 --- a/Documentation/devicetree/bindings/net/allwinner,sun4i-mdio.txt +++ b/Documentation/devicetree/bindings/net/allwinner,sun4i-mdio.txt @@ -1,7 +1,8 @@ * Allwinner A10 MDIO Ethernet Controller interface Required properties: -- compatible: should be "allwinner,sun4i-mdio". +- compatible: should be "allwinner,sun4i-a10-mdio" + (Deprecated: "allwinner,sun4i-mdio"). - reg: address and length of the register set for the device. Optional properties: @@ -9,7 +10,7 @@ Optional properties: Example at the SoC level: mdio@01c0b080 { - compatible = "allwinner,sun4i-mdio"; + compatible = "allwinner,sun4i-a10-mdio"; reg = <0x01c0b080 0x14>; #address-cells = <1>; #size-cells = <0>; diff --git a/Documentation/devicetree/bindings/net/micrel-ks8851.txt b/Documentation/devicetree/bindings/net/micrel-ks8851.txt index 11ace3c3d80..4fc39276361 100644 --- a/Documentation/devicetree/bindings/net/micrel-ks8851.txt +++ b/Documentation/devicetree/bindings/net/micrel-ks8851.txt @@ -7,3 +7,4 @@ Required properties: Optional properties: - local-mac-address : Ethernet mac address to use +- vdd-supply: supply for Ethernet mac diff --git a/Documentation/devicetree/bindings/net/opencores-ethoc.txt b/Documentation/devicetree/bindings/net/opencores-ethoc.txt new file mode 100644 index 00000000000..2dc127c30d9 --- /dev/null +++ b/Documentation/devicetree/bindings/net/opencores-ethoc.txt @@ -0,0 +1,22 @@ +* OpenCores MAC 10/100 Mbps + +Required properties: +- compatible: Should be "opencores,ethoc". +- reg: two memory regions (address and length), + first region is for the device registers and descriptor rings, + second is for the device packet memory. +- interrupts: interrupt for the device. + +Optional properties: +- clocks: phandle to refer to the clk used as per + Documentation/devicetree/bindings/clock/clock-bindings.txt + +Examples: + + enet0: ethoc@fd030000 { + compatible = "opencores,ethoc"; + reg = <0xfd030000 0x4000 0xfd800000 0x4000>; + interrupts = <1>; + local-mac-address = [00 50 c2 13 6f 00]; + clocks = <&osc>; + }; diff --git a/Documentation/devicetree/bindings/net/sti-dwmac.txt b/Documentation/devicetree/bindings/net/sti-dwmac.txt new file mode 100644 index 00000000000..3dd3d0bf112 --- /dev/null +++ b/Documentation/devicetree/bindings/net/sti-dwmac.txt @@ -0,0 +1,58 @@ +STMicroelectronics SoC DWMAC glue layer controller + +The device node has following properties. + +Required properties: + - compatible : Can be "st,stih415-dwmac", "st,stih416-dwmac" or + "st,stid127-dwmac". + - reg : Offset of the glue configuration register map in system + configuration regmap pointed by st,syscon property and size. + + - reg-names : Should be "sti-ethconf". + + - st,syscon : Should be phandle to system configuration node which + encompases this glue registers. + + - st,tx-retime-src: On STi Parts for Giga bit speeds, 125Mhz clocks can be + wired up in from different sources. One via TXCLK pin and other via CLK_125 + pin. This wiring is totally board dependent. However the retiming glue + logic should be configured accordingly. Possible values for this property + + "txclk" - if 125Mhz clock is wired up via txclk line. + "clk_125" - if 125Mhz clock is wired up via clk_125 line. + + This property is only valid for Giga bit setup( GMII, RGMII), and it is + un-used for non-giga bit (MII and RMII) setups. Also note that internal + clockgen can not generate stable 125Mhz clock. + + - st,ext-phyclk: This boolean property indicates who is generating the clock + for tx and rx. This property is only valid for RMII case where the clock can + be generated from the MAC or PHY. + + - clock-names: should be "sti-ethclk". + - clocks: Should point to ethernet clockgen which can generate phyclk. + + +Example: + +ethernet0: dwmac@fe810000 { + device_type = "network"; + compatible = "st,stih416-dwmac", "snps,dwmac", "snps,dwmac-3.710"; + reg = <0xfe810000 0x8000>, <0x8bc 0x4>; + reg-names = "stmmaceth", "sti-ethconf"; + interrupts = <0 133 0>, <0 134 0>, <0 135 0>; + interrupt-names = "macirq", "eth_wake_irq", "eth_lpi"; + phy-mode = "mii"; + + st,syscon = <&syscfg_rear>; + + snps,pbl = <32>; + snps,mixed-burst; + + resets = <&softreset STIH416_ETH0_SOFTRESET>; + reset-names = "stmmaceth"; + pinctrl-0 = <&pinctrl_mii0>; + pinctrl-names = "default"; + clocks = <&CLK_S_GMAC0_PHY>; + clock-names = "stmmaceth"; +}; diff --git a/Documentation/devicetree/bindings/pinctrl/brcm,capri-pinctrl.txt b/Documentation/devicetree/bindings/pinctrl/brcm,bcm11351-pinctrl.txt index 9e9e9ef9f85..c119debe6ba 100644 --- a/Documentation/devicetree/bindings/pinctrl/brcm,capri-pinctrl.txt +++ b/Documentation/devicetree/bindings/pinctrl/brcm,bcm11351-pinctrl.txt @@ -1,4 +1,4 @@ -Broadcom Capri Pin Controller +Broadcom BCM281xx Pin Controller This is a pin controller for the Broadcom BCM281xx SoC family, which includes BCM11130, BCM11140, BCM11351, BCM28145, and BCM28155 SoCs. @@ -7,14 +7,14 @@ BCM11130, BCM11140, BCM11351, BCM28145, and BCM28155 SoCs. Required Properties: -- compatible: Must be "brcm,capri-pinctrl". +- compatible: Must be "brcm,bcm11351-pinctrl" - reg: Base address of the PAD Controller register block and the size of the block. For example, the following is the bare minimum node: pinctrl@35004800 { - compatible = "brcm,capri-pinctrl"; + compatible = "brcm,bcm11351-pinctrl"; reg = <0x35004800 0x430>; }; @@ -119,7 +119,7 @@ Optional Properties (for HDMI pins): Example: // pin controller node pinctrl@35004800 { - compatible = "brcm,capri-pinctrl"; + compatible = "brcmbcm11351-pinctrl"; reg = <0x35004800 0x430>; // pin configuration node diff --git a/Documentation/devicetree/bindings/power/bq2415x.txt b/Documentation/devicetree/bindings/power/bq2415x.txt new file mode 100644 index 00000000000..d0327f0b59a --- /dev/null +++ b/Documentation/devicetree/bindings/power/bq2415x.txt @@ -0,0 +1,47 @@ +Binding for TI bq2415x Li-Ion Charger + +Required properties: +- compatible: Should contain one of the following: + * "ti,bq24150" + * "ti,bq24150" + * "ti,bq24150a" + * "ti,bq24151" + * "ti,bq24151a" + * "ti,bq24152" + * "ti,bq24153" + * "ti,bq24153a" + * "ti,bq24155" + * "ti,bq24156" + * "ti,bq24156a" + * "ti,bq24158" +- reg: integer, i2c address of the device. +- ti,current-limit: integer, initial maximum current charger can pull + from power supply in mA. +- ti,weak-battery-voltage: integer, weak battery voltage threshold in mV. + The chip will use slow precharge if battery voltage + is below this value. +- ti,battery-regulation-voltage: integer, maximum charging voltage in mV. +- ti,charge-current: integer, maximum charging current in mA. +- ti,termination-current: integer, charge will be terminated when current in + constant-voltage phase drops below this value (in mA). +- ti,resistor-sense: integer, value of sensing resistor in milliohm. + +Optional properties: +- ti,usb-charger-detection: phandle to usb charger detection device. + (required for auto mode) + +Example from Nokia N900: + +bq24150a { + compatible = "ti,bq24150a"; + reg = <0x6b>; + + ti,current-limit = <100>; + ti,weak-battery-voltage = <3400>; + ti,battery-regulation-voltage = <4200>; + ti,charge-current = <650>; + ti,termination-current = <100>; + ti,resistor-sense = <68>; + + ti,usb-charger-detection = <&isp1704>; +}; diff --git a/Documentation/devicetree/bindings/spi/spi_atmel.txt b/Documentation/devicetree/bindings/spi/spi_atmel.txt index 07e04cdc0c9..4f8184d069c 100644 --- a/Documentation/devicetree/bindings/spi/spi_atmel.txt +++ b/Documentation/devicetree/bindings/spi/spi_atmel.txt @@ -5,6 +5,9 @@ Required properties: - reg: Address and length of the register set for the device - interrupts: Should contain spi interrupt - cs-gpios: chipselects +- clock-names: tuple listing input clock names. + Required elements: "spi_clk" +- clocks: phandles to input clocks. Example: @@ -14,6 +17,8 @@ spi1: spi@fffcc000 { interrupts = <13 4 5>; #address-cells = <1>; #size-cells = <0>; + clocks = <&spi1_clk>; + clock-names = "spi_clk"; cs-gpios = <&pioB 3 0>; status = "okay"; diff --git a/Documentation/devicetree/bindings/timer/allwinner,sun4i-timer.txt b/Documentation/devicetree/bindings/timer/allwinner,sun4i-timer.txt index 48aeb7884ed..5c2e23574ca 100644 --- a/Documentation/devicetree/bindings/timer/allwinner,sun4i-timer.txt +++ b/Documentation/devicetree/bindings/timer/allwinner,sun4i-timer.txt @@ -2,7 +2,7 @@ Allwinner A1X SoCs Timer Controller Required properties: -- compatible : should be "allwinner,sun4i-timer" +- compatible : should be "allwinner,sun4i-a10-timer" - reg : Specifies base physical address and size of the registers. - interrupts : The interrupt of the first timer - clocks: phandle to the source clock (usually a 24 MHz fixed clock) @@ -10,7 +10,7 @@ Required properties: Example: timer { - compatible = "allwinner,sun4i-timer"; + compatible = "allwinner,sun4i-a10-timer"; reg = <0x01c20c00 0x400>; interrupts = <22>; clocks = <&osc>; diff --git a/Documentation/devicetree/bindings/timer/ti,keystone-timer.txt b/Documentation/devicetree/bindings/timer/ti,keystone-timer.txt new file mode 100644 index 00000000000..5fbe361252b --- /dev/null +++ b/Documentation/devicetree/bindings/timer/ti,keystone-timer.txt @@ -0,0 +1,29 @@ +* Device tree bindings for Texas instruments Keystone timer + +This document provides bindings for the 64-bit timer in the KeyStone +architecture devices. The timer can be configured as a general-purpose 64-bit +timer, dual general-purpose 32-bit timers. When configured as dual 32-bit +timers, each half can operate in conjunction (chain mode) or independently +(unchained mode) of each other. + +It is global timer is a free running up-counter and can generate interrupt +when the counter reaches preset counter values. + +Documentation: +http://www.ti.com/lit/ug/sprugv5a/sprugv5a.pdf + +Required properties: + +- compatible : should be "ti,keystone-timer". +- reg : specifies base physical address and count of the registers. +- interrupts : interrupt generated by the timer. +- clocks : the clock feeding the timer clock. + +Example: + +timer@22f0000 { + compatible = "ti,keystone-timer"; + reg = <0x022f0000 0x80>; + interrupts = <GIC_SPI 110 IRQ_TYPE_EDGE_RISING>; + clocks = <&clktimer15>; +}; diff --git a/Documentation/devicetree/bindings/vendor-prefixes.txt b/Documentation/devicetree/bindings/vendor-prefixes.txt index 3f900cd51bf..40ce2df0e0e 100644 --- a/Documentation/devicetree/bindings/vendor-prefixes.txt +++ b/Documentation/devicetree/bindings/vendor-prefixes.txt @@ -8,6 +8,7 @@ ad Avionic Design GmbH adi Analog Devices, Inc. aeroflexgaisler Aeroflex Gaisler AB ak Asahi Kasei Corp. +allwinner Allwinner Technology Co., Ltd. altr Altera Corp. amcc Applied Micro Circuits Corporation (APM, formally AMCC) amstaos AMS-Taos Inc. @@ -40,6 +41,7 @@ gmt Global Mixed-mode Technology, Inc. gumstix Gumstix, Inc. haoyu Haoyu Microelectronic Co. Ltd. hisilicon Hisilicon Limited. +honeywell Honeywell hp Hewlett Packard ibm International Business Machines (IBM) idt Integrated Device Technologies, Inc. @@ -55,6 +57,7 @@ maxim Maxim Integrated Products microchip Microchip Technology Inc. mosaixtech Mosaix Technologies, Inc. national National Semiconductor +neonode Neonode Inc. nintendo Nintendo nvidia NVIDIA nxp NXP Semiconductors @@ -64,7 +67,7 @@ phytec PHYTEC Messtechnik GmbH picochip Picochip Ltd powervr PowerVR (deprecated, use img) qca Qualcomm Atheros, Inc. -qcom Qualcomm, Inc. +qcom Qualcomm Technologies, Inc ralink Mediatek/Ralink Technology Corp. ramtron Ramtron International realtek Realtek Semiconductor Corp. @@ -78,6 +81,7 @@ silabs Silicon Laboratories simtek sirf SiRF Technology, Inc. snps Synopsys, Inc. +spansion Spansion Inc. st STMicroelectronics ste ST-Ericsson stericsson ST-Ericsson diff --git a/Documentation/fb/00-INDEX b/Documentation/fb/00-INDEX index 30a70542e82..fe85e7c5907 100644 --- a/Documentation/fb/00-INDEX +++ b/Documentation/fb/00-INDEX @@ -5,6 +5,8 @@ please mail me. 00-INDEX - this file. +api.txt + - The frame buffer API between applications and buffer devices. arkfb.txt - info on the fbdev driver for ARK Logic chips. aty128fb.txt @@ -51,12 +53,16 @@ sh7760fb.txt - info on the SH7760/SH7763 integrated LCDC Framebuffer driver. sisfb.txt - info on the framebuffer device driver for various SiS chips. +sm501.txt + - info on the framebuffer device driver for sm501 videoframebuffer. sstfb.txt - info on the frame buffer driver for 3dfx' Voodoo Graphics boards. tgafb.txt - info on the TGA (DECChip 21030) frame buffer driver. tridentfb.txt info on the framebuffer driver for some Trident chip based cards. +udlfb.txt + - Driver for DisplayLink USB 2.0 chips. uvesafb.txt - info on the userspace VESA (VBE2+ compliant) frame buffer device. vesafb.txt diff --git a/Documentation/filesystems/00-INDEX b/Documentation/filesystems/00-INDEX index 632211cbdd5..ac28149aede 100644 --- a/Documentation/filesystems/00-INDEX +++ b/Documentation/filesystems/00-INDEX @@ -2,6 +2,8 @@ - this file (info on some of the filesystems supported by linux). Locking - info on locking rules as they pertain to Linux VFS. +Makefile + - Makefile for building the filsystems-part of DocBook. 9p.txt - 9p (v9fs) is an implementation of the Plan 9 remote fs protocol. adfs.txt diff --git a/Documentation/filesystems/nfs/00-INDEX b/Documentation/filesystems/nfs/00-INDEX index 66eb6c8c533..53f3b596ac0 100644 --- a/Documentation/filesystems/nfs/00-INDEX +++ b/Documentation/filesystems/nfs/00-INDEX @@ -12,6 +12,8 @@ nfs41-server.txt - info on the Linux server implementation of NFSv4 minor version 1. nfs-rdma.txt - how to install and setup the Linux NFS/RDMA client and server software +nfsd-admin-interfaces.txt + - Administrative interfaces for nfsd. nfsroot.txt - short guide on setting up a diskless box with NFS root filesystem. pnfs.txt @@ -20,5 +22,5 @@ rpc-cache.txt - introduction to the caching mechanisms in the sunrpc layer. idmapper.txt - information for configuring request-keys to be used by idmapper -knfsd-rpcgss.txt +rpc-server-gss.txt - Information on GSS authentication support in the NFS Server diff --git a/Documentation/i2c/instantiating-devices b/Documentation/i2c/instantiating-devices index c70e7a7638d..0d85ac1935b 100644 --- a/Documentation/i2c/instantiating-devices +++ b/Documentation/i2c/instantiating-devices @@ -8,8 +8,8 @@ reason, the kernel code must instantiate I2C devices explicitly. There are several ways to achieve this, depending on the context and requirements. -Method 1: Declare the I2C devices by bus number ------------------------------------------------ +Method 1a: Declare the I2C devices by bus number +------------------------------------------------ This method is appropriate when the I2C bus is a system bus as is the case for many embedded systems. On such systems, each I2C bus has a number @@ -51,6 +51,43 @@ The devices will be automatically unbound and destroyed when the I2C bus they sit on goes away (if ever.) +Method 1b: Declare the I2C devices via devicetree +------------------------------------------------- + +This method has the same implications as method 1a. The declaration of I2C +devices is here done via devicetree as subnodes of the master controller. + +Example: + + i2c1: i2c@400a0000 { + /* ... master properties skipped ... */ + clock-frequency = <100000>; + + flash@50 { + compatible = "atmel,24c256"; + reg = <0x50>; + }; + + pca9532: gpio@60 { + compatible = "nxp,pca9532"; + gpio-controller; + #gpio-cells = <2>; + reg = <0x60>; + }; + }; + +Here, two devices are attached to the bus using a speed of 100kHz. For +additional properties which might be needed to set up the device, please refer +to its devicetree documentation in Documentation/devicetree/bindings/. + + +Method 1c: Declare the I2C devices via ACPI +------------------------------------------- + +ACPI can also describe I2C devices. There is special documentation for this +which is currently located at Documentation/acpi/enumeration.txt. + + Method 2: Instantiate the devices explicitly -------------------------------------------- diff --git a/Documentation/ide/00-INDEX b/Documentation/ide/00-INDEX index d6b778842b7..22f98ca7953 100644 --- a/Documentation/ide/00-INDEX +++ b/Documentation/ide/00-INDEX @@ -10,3 +10,5 @@ ide-tape.txt - info on the IDE ATAPI streaming tape driver ide.txt - important info for users of ATA devices (IDE/EIDE disks and CD-ROMS). +warm-plug-howto.txt + - using sysfs to remove and add IDE devices.
\ No newline at end of file diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index 7116fda7077..121d5fcbd94 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -231,6 +231,14 @@ bytes respectively. Such letter suffixes can also be entirely omitted. acpi_no_auto_ssdt [HW,ACPI] Disable automatic loading of SSDT + acpica_no_return_repair [HW, ACPI] + Disable AML predefined validation mechanism + This mechanism can repair the evaluation result to make + the return objects more ACPI specification compliant. + This option is useful for developers to identify the + root cause of an AML interpreter issue when the issue + has something to do with the repair mechanism. + acpi_os_name= [HW,ACPI] Tell ACPI BIOS the name of the OS Format: To spoof as Windows 98: ="Microsoft Windows" @@ -1011,6 +1019,13 @@ bytes respectively. Such letter suffixes can also be entirely omitted. parameter will force ia64_sal_cache_flush to call ia64_pal_cache_flush instead of SAL_CACHE_FLUSH. + forcepae [X86-32] + Forcefully enable Physical Address Extension (PAE). + Many Pentium M systems disable PAE but may have a + functionally usable PAE implementation. + Warning: use of this parameter will taint the kernel + and may cause unknown problems. + ftrace=[tracer] [FTRACE] will set and start the specified tracer as early as possible in order to facilitate early @@ -2053,8 +2068,8 @@ bytes respectively. Such letter suffixes can also be entirely omitted. IOAPICs that may be present in the system. nokaslr [X86] - Disable kernel base offset ASLR (Address Space - Layout Randomization) if built into the kernel. + Disable kernel and module base offset ASLR (Address + Space Layout Randomization) if built into the kernel. noautogroup Disable scheduler automatic task group creation. diff --git a/Documentation/kernel-per-CPU-kthreads.txt b/Documentation/kernel-per-CPU-kthreads.txt index 827104fb936..f3cd299fcc4 100644 --- a/Documentation/kernel-per-CPU-kthreads.txt +++ b/Documentation/kernel-per-CPU-kthreads.txt @@ -162,7 +162,18 @@ Purpose: Execute workqueue requests To reduce its OS jitter, do any of the following: 1. Run your workload at a real-time priority, which will allow preempting the kworker daemons. -2. Do any of the following needed to avoid jitter that your +2. A given workqueue can be made visible in the sysfs filesystem + by passing the WQ_SYSFS to that workqueue's alloc_workqueue(). + Such a workqueue can be confined to a given subset of the + CPUs using the /sys/devices/virtual/workqueue/*/cpumask sysfs + files. The set of WQ_SYSFS workqueues can be displayed using + "ls sys/devices/virtual/workqueue". That said, the workqueues + maintainer would like to caution people against indiscriminately + sprinkling WQ_SYSFS across all the workqueues. The reason for + caution is that it is easy to add WQ_SYSFS, but because sysfs is + part of the formal user/kernel API, it can be nearly impossible + to remove it, even if its addition was a mistake. +3. Do any of the following needed to avoid jitter that your application cannot tolerate: a. Build your kernel with CONFIG_SLUB=y rather than CONFIG_SLAB=y, thus avoiding the slab allocator's periodic diff --git a/Documentation/laptops/00-INDEX b/Documentation/laptops/00-INDEX index fa688538e75..d13b9a9a9e0 100644 --- a/Documentation/laptops/00-INDEX +++ b/Documentation/laptops/00-INDEX @@ -1,13 +1,15 @@ 00-INDEX - This file -acer-wmi.txt - - information on the Acer Laptop WMI Extras driver. +Makefile + - Makefile for building dslm example program. asus-laptop.txt - information on the Asus Laptop Extras driver. disk-shock-protection.txt - information on hard disk shock protection. dslm.c - Simple Disk Sleep Monitor program +hpfall.c + - (HP) laptop accelerometer program for disk protection. laptop-mode.txt - how to conserve battery power using laptop-mode. sony-laptop.txt diff --git a/Documentation/leds/00-INDEX b/Documentation/leds/00-INDEX index 1ecd1596633..b4ef1f34e25 100644 --- a/Documentation/leds/00-INDEX +++ b/Documentation/leds/00-INDEX @@ -1,3 +1,7 @@ +00-INDEX + - This file +leds-blinkm.txt + - Driver for BlinkM LED-devices. leds-class.txt - documents LED handling under Linux. leds-lp3944.txt @@ -12,3 +16,7 @@ leds-lp55xx.txt - description about lp55xx common driver. leds-lm3556.txt - notes on how to use the leds-lm3556 driver. +ledtrig-oneshot.txt + - One-shot LED trigger for both sporadic and dense events. +ledtrig-transient.txt + - LED Transient Trigger, one shot timer activation. diff --git a/Documentation/m68k/00-INDEX b/Documentation/m68k/00-INDEX index a014e9f0076..2be8c6b00e7 100644 --- a/Documentation/m68k/00-INDEX +++ b/Documentation/m68k/00-INDEX @@ -1,5 +1,7 @@ 00-INDEX - this file +README.buddha + - Amiga Buddha and Catweasel IDE Driver kernel-options.txt - command line options for Linux/m68k diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt index 102dc19c411..11c1d204966 100644 --- a/Documentation/memory-barriers.txt +++ b/Documentation/memory-barriers.txt @@ -608,26 +608,30 @@ as follows: b = p; /* BUG: Compiler can reorder!!! */ do_something(); -The solution is again ACCESS_ONCE(), which preserves the ordering between -the load from variable 'a' and the store to variable 'b': +The solution is again ACCESS_ONCE() and barrier(), which preserves the +ordering between the load from variable 'a' and the store to variable 'b': q = ACCESS_ONCE(a); if (q) { + barrier(); ACCESS_ONCE(b) = p; do_something(); } else { + barrier(); ACCESS_ONCE(b) = p; do_something_else(); } -You could also use barrier() to prevent the compiler from moving -the stores to variable 'b', but barrier() would not prevent the -compiler from proving to itself that a==1 always, so ACCESS_ONCE() -is also needed. +The initial ACCESS_ONCE() is required to prevent the compiler from +proving the value of 'a', and the pair of barrier() invocations are +required to prevent the compiler from pulling the two identical stores +to 'b' out from the legs of the "if" statement. It is important to note that control dependencies absolutely require a a conditional. For example, the following "optimized" version of -the above example breaks ordering: +the above example breaks ordering, which is why the barrier() invocations +are absolutely required if you have identical stores in both legs of +the "if" statement: q = ACCESS_ONCE(a); ACCESS_ONCE(b) = p; /* BUG: No ordering vs. load from a!!! */ @@ -643,9 +647,11 @@ It is of course legal for the prior load to be part of the conditional, for example, as follows: if (ACCESS_ONCE(a) > 0) { + barrier(); ACCESS_ONCE(b) = q / 2; do_something(); } else { + barrier(); ACCESS_ONCE(b) = q / 3; do_something_else(); } @@ -659,9 +665,11 @@ the needed conditional. For example: q = ACCESS_ONCE(a); if (q % MAX) { + barrier(); ACCESS_ONCE(b) = p; do_something(); } else { + barrier(); ACCESS_ONCE(b) = p; do_something_else(); } @@ -723,8 +731,13 @@ In summary: use smb_rmb(), smp_wmb(), or, in the case of prior stores and later loads, smp_mb(). + (*) If both legs of the "if" statement begin with identical stores + to the same variable, a barrier() statement is required at the + beginning of each leg of the "if" statement. + (*) Control dependencies require at least one run-time conditional - between the prior load and the subsequent store. If the compiler + between the prior load and the subsequent store, and this + conditional must involve the prior load. If the compiler is able to optimize the conditional away, it will have also optimized away the ordering. Careful use of ACCESS_ONCE() can help to preserve the needed conditional. @@ -1249,6 +1262,23 @@ The ACCESS_ONCE() function can prevent any number of optimizations that, while perfectly safe in single-threaded code, can be fatal in concurrent code. Here are some examples of these sorts of optimizations: + (*) The compiler is within its rights to reorder loads and stores + to the same variable, and in some cases, the CPU is within its + rights to reorder loads to the same variable. This means that + the following code: + + a[0] = x; + a[1] = x; + + Might result in an older value of x stored in a[1] than in a[0]. + Prevent both the compiler and the CPU from doing this as follows: + + a[0] = ACCESS_ONCE(x); + a[1] = ACCESS_ONCE(x); + + In short, ACCESS_ONCE() provides cache coherence for accesses from + multiple CPUs to a single variable. + (*) The compiler is within its rights to merge successive loads from the same variable. Such merging can cause the compiler to "optimize" the following code: @@ -1644,12 +1674,12 @@ for each construct. These operations all imply certain barriers: Memory operations issued after the ACQUIRE will be completed after the ACQUIRE operation has completed. - Memory operations issued before the ACQUIRE may be completed after the - ACQUIRE operation has completed. An smp_mb__before_spinlock(), combined - with a following ACQUIRE, orders prior loads against subsequent stores and - stores and prior stores against subsequent stores. Note that this is - weaker than smp_mb()! The smp_mb__before_spinlock() primitive is free on - many architectures. + Memory operations issued before the ACQUIRE may be completed after + the ACQUIRE operation has completed. An smp_mb__before_spinlock(), + combined with a following ACQUIRE, orders prior loads against + subsequent loads and stores and also orders prior stores against + subsequent stores. Note that this is weaker than smp_mb()! The + smp_mb__before_spinlock() primitive is free on many architectures. (2) RELEASE operation implication: @@ -1694,24 +1724,21 @@ may occur as: ACQUIRE M, STORE *B, STORE *A, RELEASE M -This same reordering can of course occur if the lock's ACQUIRE and RELEASE are -to the same lock variable, but only from the perspective of another CPU not -holding that lock. - -In short, a RELEASE followed by an ACQUIRE may -not- be assumed to be a full -memory barrier because it is possible for a preceding RELEASE to pass a -later ACQUIRE from the viewpoint of the CPU, but not from the viewpoint -of the compiler. Note that deadlocks cannot be introduced by this -interchange because if such a deadlock threatened, the RELEASE would -simply complete. - -If it is necessary for a RELEASE-ACQUIRE pair to produce a full barrier, the -ACQUIRE can be followed by an smp_mb__after_unlock_lock() invocation. This -will produce a full barrier if either (a) the RELEASE and the ACQUIRE are -executed by the same CPU or task, or (b) the RELEASE and ACQUIRE act on the -same variable. The smp_mb__after_unlock_lock() primitive is free on many -architectures. Without smp_mb__after_unlock_lock(), the critical sections -corresponding to the RELEASE and the ACQUIRE can cross: +When the ACQUIRE and RELEASE are a lock acquisition and release, +respectively, this same reordering can occur if the lock's ACQUIRE and +RELEASE are to the same lock variable, but only from the perspective of +another CPU not holding that lock. In short, a ACQUIRE followed by an +RELEASE may -not- be assumed to be a full memory barrier. + +Similarly, the reverse case of a RELEASE followed by an ACQUIRE does not +imply a full memory barrier. If it is necessary for a RELEASE-ACQUIRE +pair to produce a full barrier, the ACQUIRE can be followed by an +smp_mb__after_unlock_lock() invocation. This will produce a full barrier +if either (a) the RELEASE and the ACQUIRE are executed by the same +CPU or task, or (b) the RELEASE and ACQUIRE act on the same variable. +The smp_mb__after_unlock_lock() primitive is free on many architectures. +Without smp_mb__after_unlock_lock(), the CPU's execution of the critical +sections corresponding to the RELEASE and the ACQUIRE can cross, so that: *A = a; RELEASE M @@ -1722,7 +1749,36 @@ could occur as: ACQUIRE N, STORE *B, STORE *A, RELEASE M -With smp_mb__after_unlock_lock(), they cannot, so that: +It might appear that this reordering could introduce a deadlock. +However, this cannot happen because if such a deadlock threatened, +the RELEASE would simply complete, thereby avoiding the deadlock. + + Why does this work? + + One key point is that we are only talking about the CPU doing + the reordering, not the compiler. If the compiler (or, for + that matter, the developer) switched the operations, deadlock + -could- occur. + + But suppose the CPU reordered the operations. In this case, + the unlock precedes the lock in the assembly code. The CPU + simply elected to try executing the later lock operation first. + If there is a deadlock, this lock operation will simply spin (or + try to sleep, but more on that later). The CPU will eventually + execute the unlock operation (which preceded the lock operation + in the assembly code), which will unravel the potential deadlock, + allowing the lock operation to succeed. + + But what if the lock is a sleeplock? In that case, the code will + try to enter the scheduler, where it will eventually encounter + a memory barrier, which will force the earlier unlock operation + to complete, again unraveling the deadlock. There might be + a sleep-unlock race, but the locking primitive needs to resolve + such races properly in any case. + +With smp_mb__after_unlock_lock(), the two critical sections cannot overlap. +For example, with the following code, the store to *A will always be +seen by other CPUs before the store to *B: *A = a; RELEASE M @@ -1730,13 +1786,18 @@ With smp_mb__after_unlock_lock(), they cannot, so that: smp_mb__after_unlock_lock(); *B = b; -will always occur as either of the following: +The operations will always occur in one of the following orders: - STORE *A, RELEASE, ACQUIRE, STORE *B - STORE *A, ACQUIRE, RELEASE, STORE *B + STORE *A, RELEASE, ACQUIRE, smp_mb__after_unlock_lock(), STORE *B + STORE *A, ACQUIRE, RELEASE, smp_mb__after_unlock_lock(), STORE *B + ACQUIRE, STORE *A, RELEASE, smp_mb__after_unlock_lock(), STORE *B If the RELEASE and ACQUIRE were instead both operating on the same lock -variable, only the first of these two alternatives can occur. +variable, only the first of these alternatives can occur. In addition, +the more strongly ordered systems may rule out some of the above orders. +But in any case, as noted earlier, the smp_mb__after_unlock_lock() +ensures that the store to *A will always be seen as happening before +the store to *B. Locks and semaphores may not provide any guarantee of ordering on UP compiled systems, and so cannot be counted on in such a situation to actually achieve @@ -2757,7 +2818,7 @@ in that order, but, without intervention, the sequence may have almost any combination of elements combined or discarded, provided the program's view of the world remains consistent. Note that ACCESS_ONCE() is -not- optional in the above example, as there are architectures where a given CPU might -interchange successive loads to the same location. On such architectures, +reorder successive loads to the same location. On such architectures, ACCESS_ONCE() does whatever is necessary to prevent this, for example, on Itanium the volatile casts used by ACCESS_ONCE() cause GCC to emit the special ld.acq and st.rel instructions that prevent such reordering. diff --git a/Documentation/networking/00-INDEX b/Documentation/networking/00-INDEX index f11580f8719..557b6ef70c2 100644 --- a/Documentation/networking/00-INDEX +++ b/Documentation/networking/00-INDEX @@ -6,8 +6,14 @@ - information on the 3Com Etherlink III Series Ethernet cards. 6pack.txt - info on the 6pack protocol, an alternative to KISS for AX.25 -DLINK.txt - - info on the D-Link DE-600/DE-620 parallel port pocket adapters +LICENSE.qla3xxx + - GPLv2 for QLogic Linux Networking HBA Driver +LICENSE.qlge + - GPLv2 for QLogic Linux qlge NIC Driver +LICENSE.qlcnic + - GPLv2 for QLogic Linux qlcnic NIC Driver +Makefile + - Makefile for docsrc. PLIP.txt - PLIP: The Parallel Line Internet Protocol device driver README.ipw2100 @@ -17,7 +23,7 @@ README.ipw2200 README.sb1000 - info on General Instrument/NextLevel SURFboard1000 cable modem. alias.txt - - info on using alias network devices + - info on using alias network devices. arcnet-hardware.txt - tons of info on ARCnet, hubs, jumper settings for ARCnet cards, etc. arcnet.txt @@ -80,7 +86,7 @@ framerelay.txt - info on using Frame Relay/Data Link Connection Identifier (DLCI). gen_stats.txt - Generic networking statistics for netlink users. -generic_hdlc.txt +generic-hdlc.txt - The generic High Level Data Link Control (HDLC) layer. generic_netlink.txt - info on Generic Netlink @@ -88,6 +94,8 @@ gianfar.txt - Gianfar Ethernet Driver. i40e.txt - README for the Intel Ethernet Controller XL710 Driver (i40e). +i40evf.txt + - Short note on the Driver for the Intel(R) XL710 X710 Virtual Function ieee802154.txt - Linux IEEE 802.15.4 implementation, API and drivers igb.txt @@ -102,6 +110,8 @@ ipddp.txt - AppleTalk-IP Decapsulation and AppleTalk-IP Encapsulation iphase.txt - Interphase PCI ATM (i)Chip IA Linux driver info. +ipsec.txt + - Note on not compressing IPSec payload and resulting failed policy check. ipv6.txt - Options to the ipv6 kernel module. ipvs-sysctl.txt @@ -120,6 +130,8 @@ lapb-module.txt - programming information of the LAPB module. ltpc.txt - the Apple or Farallon LocalTalk PC card driver +mac80211-auth-assoc-deauth.txt + - authentication and association / deauth-disassoc with max80211 mac80211-injection.txt - HOWTO use packet injection with mac80211 multiqueue.txt @@ -134,6 +146,10 @@ netdevices.txt - info on network device driver functions exported to the kernel. netif-msg.txt - Design of the network interface message level setting (NETIF_MSG_*). +netlink_mmap.txt + - memory mapped I/O with netlink +nf_conntrack-sysctl.txt + - list of netfilter-sysctl knobs. nfc.txt - The Linux Near Field Communication (NFS) subsystem. openvswitch.txt @@ -176,7 +192,7 @@ skfp.txt - SysKonnect FDDI (SK-5xxx, Compaq Netelligent) driver info. smc9.txt - the driver for SMC's 9000 series of Ethernet cards -spider-net.txt +spider_net.txt - README for the Spidernet Driver (as found in PS3 / Cell BE). stmmac.txt - README for the STMicro Synopsys Ethernet driver. @@ -188,6 +204,8 @@ tcp.txt - short blurb on how TCP output takes place. tcp-thin.txt - kernel tuning options for low rate 'thin' TCP streams. +team.txt + - pointer to information for ethernet teaming devices. tlan.txt - ThunderLAN (Compaq Netelligent 10/100, Olicom OC-2xxx) driver info. tproxy.txt @@ -200,6 +218,8 @@ vortex.txt - info on using 3Com Vortex (3c590, 3c592, 3c595, 3c597) Ethernet cards. vxge.txt - README for the Neterion X3100 PCIe Server Adapter. +vxlan.txt + - Virtual extensible LAN overview x25.txt - general info on X.25 development. x25-iface.txt diff --git a/Documentation/networking/3c505.txt b/Documentation/networking/3c505.txt deleted file mode 100644 index 72f38b13101..00000000000 --- a/Documentation/networking/3c505.txt +++ /dev/null @@ -1,45 +0,0 @@ -The 3Com Etherlink Plus (3c505) driver. - -This driver now uses DMA. There is currently no support for PIO operation. -The default DMA channel is 6; this is _not_ autoprobed, so you must -make sure you configure it correctly. If loading the driver as a -module, you can do this with "modprobe 3c505 dma=n". If the driver is -linked statically into the kernel, you must either use an "ether=" -statement on the command line, or change the definition of ELP_DMA in 3c505.h. - -The driver will warn you if it has to fall back on the compiled in -default DMA channel. - -If no base address is given at boot time, the driver will autoprobe -ports 0x300, 0x280 and 0x310 (in that order). If no IRQ is given, the driver -will try to probe for it. - -The driver can be used as a loadable module. - -Theoretically, one instance of the driver can now run multiple cards, -in the standard way (when loading a module, say "modprobe 3c505 -io=0x300,0x340 irq=10,11 dma=6,7" or whatever). I have not tested -this, though. - -The driver may now support revision 2 hardware; the dependency on -being able to read the host control register has been removed. This -is also untested, since I don't have a suitable card. - -Known problems: - I still see "DMA upload timed out" messages from time to time. These -seem to be fairly non-fatal though. - The card is old and slow. - -To do: - Improve probe/setup code - Test multicast and promiscuous operation - -Authors: - The driver is mainly written by Craig Southeren, email - <craigs@ineluki.apana.org.au>. - Parts of the driver (adapting the driver to 1.1.4+ kernels, - IRQ/address detection, some changes) and this README by - Juha Laiho <jlaiho@ichaos.nullnet.fi>. - DMA mode, more fixes, etc, by Philip Blundell <pjb27@cam.ac.uk> - Multicard support, Software configurable DMA, etc., by - Christopher Collins <ccollins@pcug.org.au> diff --git a/Documentation/networking/can.txt b/Documentation/networking/can.txt index f3089d42351..0cbe6ec22d6 100644 --- a/Documentation/networking/can.txt +++ b/Documentation/networking/can.txt @@ -554,12 +554,6 @@ solution for a couple of reasons: not specified in the struct can_frame and therefore it is only valid in CANFD_MTU sized CAN FD frames. - As long as the payload length is <=8 the received CAN frames from CAN FD - capable CAN devices can be received and read by legacy sockets too. When - user-generated CAN FD frames have a payload length <=8 these can be send - by legacy CAN network interfaces too. Sending CAN FD frames with payload - length > 8 to a legacy CAN network interface returns an -EMSGSIZE error. - Implementation hint for new CAN applications: To build a CAN FD aware application use struct canfd_frame as basic CAN diff --git a/Documentation/networking/netlink_mmap.txt b/Documentation/networking/netlink_mmap.txt index b2612297352..c6af4bac5aa 100644 --- a/Documentation/networking/netlink_mmap.txt +++ b/Documentation/networking/netlink_mmap.txt @@ -226,9 +226,9 @@ Ring setup: void *rx_ring, *tx_ring; /* Configure ring parameters */ - if (setsockopt(fd, NETLINK_RX_RING, &req, sizeof(req)) < 0) + if (setsockopt(fd, SOL_NETLINK, NETLINK_RX_RING, &req, sizeof(req)) < 0) exit(1); - if (setsockopt(fd, NETLINK_TX_RING, &req, sizeof(req)) < 0) + if (setsockopt(fd, SOL_NETLINK, NETLINK_TX_RING, &req, sizeof(req)) < 0) exit(1) /* Calculate size of each individual ring */ diff --git a/Documentation/networking/packet_mmap.txt b/Documentation/networking/packet_mmap.txt index 1404674c0a0..6fea79efb4c 100644 --- a/Documentation/networking/packet_mmap.txt +++ b/Documentation/networking/packet_mmap.txt @@ -453,7 +453,7 @@ TP_STATUS_COPY : This flag indicates that the frame (and associated enabled previously with setsockopt() and the PACKET_COPY_THRESH option. - The number of frames than can be buffered to + The number of frames that can be buffered to be read with recvfrom is limited like a normal socket. See the SO_RCVBUF option in the socket (7) man page. diff --git a/Documentation/networking/timestamping.txt b/Documentation/networking/timestamping.txt index 661d3c316a1..048c92b487f 100644 --- a/Documentation/networking/timestamping.txt +++ b/Documentation/networking/timestamping.txt @@ -21,26 +21,38 @@ has such a feature). SO_TIMESTAMPING: -Instructs the socket layer which kind of information is wanted. The -parameter is an integer with some of the following bits set. Setting -other bits is an error and doesn't change the current state. - -SOF_TIMESTAMPING_TX_HARDWARE: try to obtain send time stamp in hardware -SOF_TIMESTAMPING_TX_SOFTWARE: if SOF_TIMESTAMPING_TX_HARDWARE is off or - fails, then do it in software -SOF_TIMESTAMPING_RX_HARDWARE: return the original, unmodified time stamp - as generated by the hardware -SOF_TIMESTAMPING_RX_SOFTWARE: if SOF_TIMESTAMPING_RX_HARDWARE is off or - fails, then do it in software -SOF_TIMESTAMPING_RAW_HARDWARE: return original raw hardware time stamp -SOF_TIMESTAMPING_SYS_HARDWARE: return hardware time stamp transformed to - the system time base -SOF_TIMESTAMPING_SOFTWARE: return system time stamp generated in - software - -SOF_TIMESTAMPING_TX/RX determine how time stamps are generated. -SOF_TIMESTAMPING_RAW/SYS determine how they are reported in the -following control message: +Instructs the socket layer which kind of information should be collected +and/or reported. The parameter is an integer with some of the following +bits set. Setting other bits is an error and doesn't change the current +state. + +Four of the bits are requests to the stack to try to generate +timestamps. Any combination of them is valid. + +SOF_TIMESTAMPING_TX_HARDWARE: try to obtain send time stamps in hardware +SOF_TIMESTAMPING_TX_SOFTWARE: try to obtain send time stamps in software +SOF_TIMESTAMPING_RX_HARDWARE: try to obtain receive time stamps in hardware +SOF_TIMESTAMPING_RX_SOFTWARE: try to obtain receive time stamps in software + +The other three bits control which timestamps will be reported in a +generated control message. If none of these bits are set or if none of +the set bits correspond to data that is available, then the control +message will not be generated: + +SOF_TIMESTAMPING_SOFTWARE: report systime if available +SOF_TIMESTAMPING_SYS_HARDWARE: report hwtimetrans if available +SOF_TIMESTAMPING_RAW_HARDWARE: report hwtimeraw if available + +It is worth noting that timestamps may be collected for reasons other +than being requested by a particular socket with +SOF_TIMESTAMPING_[TR]X_(HARD|SOFT)WARE. For example, most drivers that +can generate hardware receive timestamps ignore +SOF_TIMESTAMPING_RX_HARDWARE. It is still a good idea to set that flag +in case future drivers pay attention. + +If timestamps are reported, they will appear in a control message with +cmsg_level==SOL_SOCKET, cmsg_type==SO_TIMESTAMPING, and a payload like +this: struct scm_timestamping { struct timespec systime; diff --git a/Documentation/phy.txt b/Documentation/phy.txt index 0103e4b15b0..ebff6ee5244 100644 --- a/Documentation/phy.txt +++ b/Documentation/phy.txt @@ -75,14 +75,26 @@ Before the controller can make use of the PHY, it has to get a reference to it. This framework provides the following APIs to get a reference to the PHY. struct phy *phy_get(struct device *dev, const char *string); +struct phy *phy_optional_get(struct device *dev, const char *string); struct phy *devm_phy_get(struct device *dev, const char *string); - -phy_get and devm_phy_get can be used to get the PHY. In the case of dt boot, -the string arguments should contain the phy name as given in the dt data and -in the case of non-dt boot, it should contain the label of the PHY. -The only difference between the two APIs is that devm_phy_get associates the -device with the PHY using devres on successful PHY get. On driver detach, -release function is invoked on the the devres data and devres data is freed. +struct phy *devm_phy_optional_get(struct device *dev, const char *string); + +phy_get, phy_optional_get, devm_phy_get and devm_phy_optional_get can +be used to get the PHY. In the case of dt boot, the string arguments +should contain the phy name as given in the dt data and in the case of +non-dt boot, it should contain the label of the PHY. The two +devm_phy_get associates the device with the PHY using devres on +successful PHY get. On driver detach, release function is invoked on +the the devres data and devres data is freed. phy_optional_get and +devm_phy_optional_get should be used when the phy is optional. These +two functions will never return -ENODEV, but instead returns NULL when +the phy cannot be found. + +It should be noted that NULL is a valid phy reference. All phy +consumer calls on the NULL phy become NOPs. That is the release calls, +the phy_init() and phy_exit() calls, and phy_power_on() and +phy_power_off() calls are all NOP when applied to a NULL phy. The NULL +phy is useful in devices for handling optional phy devices. 5. Releasing a reference to the PHY diff --git a/Documentation/power/00-INDEX b/Documentation/power/00-INDEX index a4d682f5423..ad04cc8097e 100644 --- a/Documentation/power/00-INDEX +++ b/Documentation/power/00-INDEX @@ -4,6 +4,8 @@ apm-acpi.txt - basic info about the APM and ACPI support. basic-pm-debugging.txt - Debugging suspend and resume +charger-manager.txt + - Battery charger management. devices.txt - How drivers interact with system-wide power management drivers-testing.txt @@ -22,6 +24,8 @@ pm_qos_interface.txt - info on Linux PM Quality of Service interface power_supply_class.txt - Tells userspace about battery, UPS, AC or DC power supply properties +runtime_pm.txt + - Power management framework for I/O devices. s2ram.txt - How to get suspend to ram working (and debug it when it isn't) states.txt @@ -38,7 +42,5 @@ tricks.txt - How to trick software suspend (to disk) into working when it isn't userland-swsusp.txt - Experimental implementation of software suspend in userspace -video_extension.txt - - ACPI video extensions video.txt - Video issues during resume from suspend diff --git a/Documentation/power/pm_qos_interface.txt b/Documentation/power/pm_qos_interface.txt index 48363208778..a5da5c7e712 100644 --- a/Documentation/power/pm_qos_interface.txt +++ b/Documentation/power/pm_qos_interface.txt @@ -88,17 +88,19 @@ node. 2. PM QoS per-device latency and flags framework -For each device, there are two lists of PM QoS requests. One is maintained -along with the aggregated target of latency value and the other is for PM QoS -flags. Values are updated in response to changes of the request list. +For each device, there are three lists of PM QoS requests. Two of them are +maintained along with the aggregated targets of resume latency and active +state latency tolerance (in microseconds) and the third one is for PM QoS flags. +Values are updated in response to changes of the request list. -Target latency value is simply the minimum of the request values held in the -parameter list elements. The PM QoS flags aggregate value is a gather (bitwise -OR) of all list elements' values. Two device PM QoS flags are defined currently: -PM_QOS_FLAG_NO_POWER_OFF and PM_QOS_FLAG_REMOTE_WAKEUP. +The target values of resume latency and active state latency tolerance are +simply the minimum of the request values held in the parameter list elements. +The PM QoS flags aggregate value is a gather (bitwise OR) of all list elements' +values. Two device PM QoS flags are defined currently: PM_QOS_FLAG_NO_POWER_OFF +and PM_QOS_FLAG_REMOTE_WAKEUP. -Note: the aggregated target value is implemented as an atomic variable so that -reading the aggregated value does not require any locking mechanism. +Note: The aggregated target values are implemented in such a way that reading +the aggregated value does not require any locking mechanism. From kernel mode the use of this interface is the following: @@ -132,19 +134,21 @@ The meaning of the return values is as follows: PM_QOS_FLAGS_UNDEFINED: The device's PM QoS structure has not been initialized or the list of requests is empty. -int dev_pm_qos_add_ancestor_request(dev, handle, value) +int dev_pm_qos_add_ancestor_request(dev, handle, type, value) Add a PM QoS request for the first direct ancestor of the given device whose -power.ignore_children flag is unset. +power.ignore_children flag is unset (for DEV_PM_QOS_RESUME_LATENCY requests) +or whose power.set_latency_tolerance callback pointer is not NULL (for +DEV_PM_QOS_LATENCY_TOLERANCE requests). int dev_pm_qos_expose_latency_limit(device, value) -Add a request to the device's PM QoS list of latency constraints and create -a sysfs attribute pm_qos_resume_latency_us under the device's power directory -allowing user space to manipulate that request. +Add a request to the device's PM QoS list of resume latency constraints and +create a sysfs attribute pm_qos_resume_latency_us under the device's power +directory allowing user space to manipulate that request. void dev_pm_qos_hide_latency_limit(device) Drop the request added by dev_pm_qos_expose_latency_limit() from the device's -PM QoS list of latency constraints and remove sysfs attribute pm_qos_resume_latency_us -from the device's power directory. +PM QoS list of resume latency constraints and remove sysfs attribute +pm_qos_resume_latency_us from the device's power directory. int dev_pm_qos_expose_flags(device, value) Add a request to the device's PM QoS list of flags and create sysfs attributes @@ -163,7 +167,7 @@ a per-device notification tree and a global notification tree. int dev_pm_qos_add_notifier(device, notifier): Adds a notification callback function for the device. The callback is called when the aggregated value of the device constraints list -is changed. +is changed (for resume latency device PM QoS only). int dev_pm_qos_remove_notifier(device, notifier): Removes the notification callback function for the device. @@ -171,14 +175,48 @@ Removes the notification callback function for the device. int dev_pm_qos_add_global_notifier(notifier): Adds a notification callback function in the global notification tree of the framework. -The callback is called when the aggregated value for any device is changed. +The callback is called when the aggregated value for any device is changed +(for resume latency device PM QoS only). int dev_pm_qos_remove_global_notifier(notifier): Removes the notification callback function from the global notification tree of the framework. -From user mode: -No API for user space access to the per-device latency constraints is provided -yet - still under discussion. - +Active state latency tolerance + +This device PM QoS type is used to support systems in which hardware may switch +to energy-saving operation modes on the fly. In those systems, if the operation +mode chosen by the hardware attempts to save energy in an overly aggressive way, +it may cause excess latencies to be visible to software, causing it to miss +certain protocol requirements or target frame or sample rates etc. + +If there is a latency tolerance control mechanism for a given device available +to software, the .set_latency_tolerance callback in that device's dev_pm_info +structure should be populated. The routine pointed to by it is should implement +whatever is necessary to transfer the effective requirement value to the +hardware. + +Whenever the effective latency tolerance changes for the device, its +.set_latency_tolerance() callback will be executed and the effective value will +be passed to it. If that value is negative, which means that the list of +latency tolerance requirements for the device is empty, the callback is expected +to switch the underlying hardware latency tolerance control mechanism to an +autonomous mode if available. If that value is PM_QOS_LATENCY_ANY, in turn, and +the hardware supports a special "no requirement" setting, the callback is +expected to use it. That allows software to prevent the hardware from +automatically updating the device's latency tolerance in response to its power +state changes (e.g. during transitions from D3cold to D0), which generally may +be done in the autonomous latency tolerance control mode. + +If .set_latency_tolerance() is present for the device, sysfs attribute +pm_qos_latency_tolerance_us will be present in the devivce's power directory. +Then, user space can use that attribute to specify its latency tolerance +requirement for the device, if any. Writing "any" to it means "no requirement, +but do not let the hardware control latency tolerance" and writing "auto" to it +allows the hardware to be switched to the autonomous mode if there are no other +requirements from the kernel side in the device's list. + +Kernel code can use the functions described above along with the +DEV_PM_QOS_LATENCY_TOLERANCE device PM QoS type to add, remove and update +latency tolerance requirements for devices. diff --git a/Documentation/ptp/testptp.c b/Documentation/ptp/testptp.c index a74d0a84d32..4aba0436da6 100644 --- a/Documentation/ptp/testptp.c +++ b/Documentation/ptp/testptp.c @@ -117,6 +117,7 @@ static void usage(char *progname) " -f val adjust the ptp clock frequency by 'val' ppb\n" " -g get the ptp clock time\n" " -h prints this message\n" + " -i val index for event/trigger\n" " -k val measure the time offset between system and phc clock\n" " for 'val' times (Maximum 25)\n" " -p val enable output with a period of 'val' nanoseconds\n" @@ -154,6 +155,7 @@ int main(int argc, char *argv[]) int capabilities = 0; int extts = 0; int gettime = 0; + int index = 0; int oneshot = 0; int pct_offset = 0; int n_samples = 0; @@ -167,7 +169,7 @@ int main(int argc, char *argv[]) progname = strrchr(argv[0], '/'); progname = progname ? 1+progname : argv[0]; - while (EOF != (c = getopt(argc, argv, "a:A:cd:e:f:ghk:p:P:sSt:v"))) { + while (EOF != (c = getopt(argc, argv, "a:A:cd:e:f:ghi:k:p:P:sSt:v"))) { switch (c) { case 'a': oneshot = atoi(optarg); @@ -190,6 +192,9 @@ int main(int argc, char *argv[]) case 'g': gettime = 1; break; + case 'i': + index = atoi(optarg); + break; case 'k': pct_offset = 1; n_samples = atoi(optarg); @@ -301,7 +306,7 @@ int main(int argc, char *argv[]) if (extts) { memset(&extts_request, 0, sizeof(extts_request)); - extts_request.index = 0; + extts_request.index = index; extts_request.flags = PTP_ENABLE_FEATURE; if (ioctl(fd, PTP_EXTTS_REQUEST, &extts_request)) { perror("PTP_EXTTS_REQUEST"); @@ -375,7 +380,7 @@ int main(int argc, char *argv[]) return -1; } memset(&perout_request, 0, sizeof(perout_request)); - perout_request.index = 0; + perout_request.index = index; perout_request.start.sec = ts.tv_sec + 2; perout_request.start.nsec = 0; perout_request.period.sec = 0; diff --git a/Documentation/s390/00-INDEX b/Documentation/s390/00-INDEX index 3a2b96302ec..10c874ebdfe 100644 --- a/Documentation/s390/00-INDEX +++ b/Documentation/s390/00-INDEX @@ -16,11 +16,13 @@ Debugging390.txt - hints for debugging on s390 systems. driver-model.txt - information on s390 devices and the driver model. +kvm.txt + - ioctl calls to /dev/kvm on s390. monreader.txt - information on accessing the z/VM monitor stream from Linux. +qeth.txt + - HiperSockets Bridge Port Support. s390dbf.txt - information on using the s390 debug feature. -TAPE - - information on the driver for channel-attached tapes. -zfcpdump +zfcpdump.txt - information on the s390 SCSI dump tool. diff --git a/Documentation/scheduler/00-INDEX b/Documentation/scheduler/00-INDEX index 46702e4f89c..eccf7ad2e7f 100644 --- a/Documentation/scheduler/00-INDEX +++ b/Documentation/scheduler/00-INDEX @@ -2,6 +2,8 @@ - this file. sched-arch.txt - CPU Scheduler implementation hints for architecture specific code. +sched-bwc.txt + - CFS bandwidth control overview. sched-design-CFS.txt - goals, design and implementation of the Completely Fair Scheduler. sched-domains.txt diff --git a/Documentation/scsi/00-INDEX b/Documentation/scsi/00-INDEX index 2044be565d9..c4b978a72f7 100644 --- a/Documentation/scsi/00-INDEX +++ b/Documentation/scsi/00-INDEX @@ -36,6 +36,8 @@ NinjaSCSI.txt - info on WorkBiT NinjaSCSI-32/32Bi driver aacraid.txt - Driver supporting Adaptec RAID controllers +advansys.txt + - List of Advansys Host Adapters aha152x.txt - info on driver for Adaptec AHA152x based adapters aic79xx.txt @@ -44,6 +46,12 @@ aic7xxx.txt - info on driver for Adaptec controllers arcmsr_spec.txt - ARECA FIRMWARE SPEC (for IOP331 adapter) +bfa.txt + - Brocade FC/FCOE adapter driver. +bnx2fc.txt + - FCoE hardware offload for Broadcom network interfaces. +cxgb3i.txt + - Chelsio iSCSI Linux Driver dc395x.txt - README file for the dc395x SCSI driver dpti.txt @@ -52,18 +60,24 @@ dtc3x80.txt - info on driver for DTC 2x80 based adapters g_NCR5380.txt - info on driver for NCR5380 and NCR53c400 based adapters +hpsa.txt + - HP Smart Array Controller SCSI driver. hptiop.txt - HIGHPOINT ROCKETRAID 3xxx RAID DRIVER in2000.txt - info on in2000 driver libsas.txt - Serial Attached SCSI management layer. +link_power_management_policy.txt + - Link power management options. lpfc.txt - LPFC driver release notes megaraid.txt - Common Management Module, shared code handling ioctls for LSI drivers ncr53c8xx.txt - info on driver for NCR53c8xx based adapters +osd.txt + Object-Based Storage Device, command set introduction. osst.txt - info on driver for OnStream SC-x0 SCSI tape ppa.txt @@ -74,6 +88,8 @@ scsi-changer.txt - README for the SCSI media changer driver scsi-generic.txt - info on the sg driver for generic (non-disk/CD/tape) SCSI devices. +scsi-parameters.txt + - List of SCSI-parameters to pass to the kernel at module load-time. scsi.txt - short blurb on using SCSI support as a module. scsi_mid_low_api.txt diff --git a/Documentation/serial/00-INDEX b/Documentation/serial/00-INDEX index 1f1b22fbd73..f9c6b5ed03e 100644 --- a/Documentation/serial/00-INDEX +++ b/Documentation/serial/00-INDEX @@ -4,10 +4,12 @@ README.cycladesZ - info on Cyclades-Z firmware loading. digiepca.txt - info on Digi Intl. {PC,PCI,EISA}Xx and Xem series cards. -hayes-esp.txt - - info on using the Hayes ESP serial driver. +driver + - intro to the low level serial driver. moxa-smartio - file with info on installing/using Moxa multiport serial driver. +n_gsm.txt + - GSM 0710 tty multiplexer howto. riscom8.txt - notes on using the RISCom/8 multi-port serial driver. rocket.txt diff --git a/Documentation/spi/00-INDEX b/Documentation/spi/00-INDEX new file mode 100644 index 00000000000..a128fa83551 --- /dev/null +++ b/Documentation/spi/00-INDEX @@ -0,0 +1,22 @@ +00-INDEX + - this file. +Makefile + - Makefile for the example sourcefiles. +butterfly + - AVR Butterfly SPI driver overview and pin configuration. +ep93xx_spi + - Basic EP93xx SPI driver configuration. +pxa2xx + - PXA2xx SPI master controller build by spi_message fifo wq +spidev + - Intro to the userspace API for spi devices +spidev_fdx.c + - spidev example file +spi-lm70llp + - Connecting an LM70-LLP sensor to the kernel via the SPI subsys. +spi-sc18is602 + - NXP SC18IS602/603 I2C-bus to SPI bridge +spi-summary + - (Linux) SPI overview. If unsure about SPI or SPI in Linux, start here. +spidev_test.c + - SPI testing utility. diff --git a/Documentation/spi/spi-summary b/Documentation/spi/spi-summary index f72e0d1e0da..7982bcc4d15 100644 --- a/Documentation/spi/spi-summary +++ b/Documentation/spi/spi-summary @@ -543,7 +543,22 @@ SPI MASTER METHODS queuing transfers that arrive in the meantime. When the driver is finished with this message, it must call spi_finalize_current_message() so the subsystem can issue the next - transfer. This may sleep. + message. This may sleep. + + master->transfer_one(struct spi_master *master, struct spi_device *spi, + struct spi_transfer *transfer) + The subsystem calls the driver to transfer a single transfer while + queuing transfers that arrive in the meantime. When the driver is + finished with this transfer, it must call + spi_finalize_current_transfer() so the subsystem can issue the next + transfer. This may sleep. Note: transfer_one and transfer_one_message + are mutually exclusive; when both are set, the generic subsystem does + not call your transfer_one callback. + + Return values: + negative errno: error + 0: transfer is finished + 1: transfer is still in progress DEPRECATED METHODS diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt index e55124e7c40..ec8be46bf48 100644 --- a/Documentation/sysctl/kernel.txt +++ b/Documentation/sysctl/kernel.txt @@ -320,10 +320,11 @@ This file shows up if CONFIG_DETECT_HUNG_TASK is enabled. ============================================================== -hung_task_warning: +hung_task_warnings: The maximum number of warnings to report. During a check interval -When this value is reached, no more the warnings will be reported. +if a hung task is detected, this value is decreased by 1. +When this value reaches 0, no more warnings will be reported. This file shows up if CONFIG_DETECT_HUNG_TASK is enabled. -1: report an infinite number of warnings. @@ -441,8 +442,7 @@ feature should be disabled. Otherwise, if the system overhead from the feature is too high then the rate the kernel samples for NUMA hinting faults may be controlled by the numa_balancing_scan_period_min_ms, numa_balancing_scan_delay_ms, numa_balancing_scan_period_max_ms, -numa_balancing_scan_size_mb, numa_balancing_settle_count sysctls and -numa_balancing_migrate_deferred. +numa_balancing_scan_size_mb, and numa_balancing_settle_count sysctls. ============================================================== @@ -483,13 +483,6 @@ rate for each task. numa_balancing_scan_size_mb is how many megabytes worth of pages are scanned for a given scan. -numa_balancing_migrate_deferred is how many page migrations get skipped -unconditionally, after a page migration is skipped because a page is shared -with other tasks. This reduces page migration overhead, and determines -how much stronger the "move task near its memory" policy scheduler becomes, -versus the "move memory near its task" memory management policy, for workloads -with shared memory. - ============================================================== osrelease, ostype & version: diff --git a/Documentation/timers/00-INDEX b/Documentation/timers/00-INDEX index ef2ccbf77fa..6d042dc1cce 100644 --- a/Documentation/timers/00-INDEX +++ b/Documentation/timers/00-INDEX @@ -8,6 +8,8 @@ hpet_example.c - sample hpet timer test program hrtimers.txt - subsystem for high-resolution kernel timers +Makefile + - Build and link hpet_example NO_HZ.txt - Summary of the different methods for the scheduler clock-interrupts management. timers-howto.txt diff --git a/Documentation/trace/events-power.txt b/Documentation/trace/events-power.txt index 3bd33b8dc7c..21d514ced21 100644 --- a/Documentation/trace/events-power.txt +++ b/Documentation/trace/events-power.txt @@ -92,5 +92,5 @@ dev_pm_qos_remove_request "device=%s type=%s new_value=%d" The first parameter gives the device name which tries to add/update/remove QoS requests. -The second parameter gives the request type (e.g. "DEV_PM_QOS_LATENCY"). +The second parameter gives the request type (e.g. "DEV_PM_QOS_RESUME_LATENCY"). The third parameter is value to be added/updated/removed. diff --git a/Documentation/virtual/kvm/00-INDEX b/Documentation/virtual/kvm/00-INDEX index 641ec922017..fee9f2bf9c6 100644 --- a/Documentation/virtual/kvm/00-INDEX +++ b/Documentation/virtual/kvm/00-INDEX @@ -20,5 +20,7 @@ ppc-pv.txt - the paravirtualization interface on PowerPC. review-checklist.txt - review checklist for KVM patches. +s390-diag.txt + - Diagnose hypercall description (for IBM S/390) timekeeping.txt - timekeeping virtualization for x86-based architectures. diff --git a/Documentation/vm/00-INDEX b/Documentation/vm/00-INDEX index a39d06680e1..081c49777ab 100644 --- a/Documentation/vm/00-INDEX +++ b/Documentation/vm/00-INDEX @@ -16,8 +16,6 @@ hwpoison.txt - explains what hwpoison is ksm.txt - how to use the Kernel Samepage Merging feature. -locking - - info on how locking and synchronization is done in the Linux vm code. numa - information about NUMA specific code in the Linux vm. numa_memory_policy.txt @@ -32,6 +30,8 @@ slub.txt - a short users guide for SLUB. soft-dirty.txt - short explanation for soft-dirty PTEs +split_page_table_lock + - Separate per-table lock to improve scalability of the old page_table_lock. transhuge.txt - Transparent Hugepage Support, alternative way of using hugepages. unevictable-lru.txt diff --git a/Documentation/w1/masters/00-INDEX b/Documentation/w1/masters/00-INDEX index d63fa024ac0..8330cf9325f 100644 --- a/Documentation/w1/masters/00-INDEX +++ b/Documentation/w1/masters/00-INDEX @@ -4,7 +4,9 @@ ds2482 - The Maxim/Dallas Semiconductor DS2482 provides 1-wire busses. ds2490 - The Maxim/Dallas Semiconductor DS2490 builds USB <-> W1 bridges. -mxc_w1 +mxc-w1 - W1 master controller driver found on Freescale MX2/MX3 SoCs +omap-hdq + - HDQ/1-wire module of TI OMAP 2430/3430. w1-gpio - GPIO 1-wire bus master driver. diff --git a/Documentation/w1/slaves/00-INDEX b/Documentation/w1/slaves/00-INDEX index 75613c9ac4d..6e18c70c347 100644 --- a/Documentation/w1/slaves/00-INDEX +++ b/Documentation/w1/slaves/00-INDEX @@ -4,3 +4,5 @@ w1_therm - The Maxim/Dallas Semiconductor ds18*20 temperature sensor. w1_ds2423 - The Maxim/Dallas Semiconductor ds2423 counter device. +w1_ds28e04 + - The Maxim/Dallas Semiconductor ds28e04 eeprom. diff --git a/Documentation/x86/00-INDEX b/Documentation/x86/00-INDEX index f37b46d3486..692264456f0 100644 --- a/Documentation/x86/00-INDEX +++ b/Documentation/x86/00-INDEX @@ -1,6 +1,20 @@ 00-INDEX - this file -mtrr.txt - - how to use x86 Memory Type Range Registers to increase performance +boot.txt + - List of boot protocol versions +early-microcode.txt + - How to load microcode from an initrd-CPIO archive early to fix CPU issues. +earlyprintk.txt + - Using earlyprintk with a USB2 debug port key. +entry_64.txt + - Describe (some of the) kernel entry points for x86. exception-tables.txt - why and how Linux kernel uses exception tables on x86 +mtrr.txt + - how to use x86 Memory Type Range Registers to increase performance +pat.txt + - Page Attribute Table intro and API +usb-legacy-support.txt + - how to fix/avoid quirks when using emulated PS/2 mouse/keyboard. +zero-page.txt + - layout of the first page of memory. diff --git a/Documentation/x86/boot.txt b/Documentation/x86/boot.txt index cb81741d3b0..a75e3adaa39 100644 --- a/Documentation/x86/boot.txt +++ b/Documentation/x86/boot.txt @@ -182,7 +182,7 @@ Offset Proto Name Meaning 0226/1 2.02+(3 ext_loader_ver Extended boot loader version 0227/1 2.02+(3 ext_loader_type Extended boot loader ID 0228/4 2.02+ cmd_line_ptr 32-bit pointer to the kernel command line -022C/4 2.03+ ramdisk_max Highest legal initrd address +022C/4 2.03+ initrd_addr_max Highest legal initrd address 0230/4 2.05+ kernel_alignment Physical addr alignment required for kernel 0234/1 2.05+ relocatable_kernel Whether kernel is relocatable or not 0235/1 2.10+ min_alignment Minimum alignment, as a power of two @@ -534,7 +534,7 @@ Protocol: 2.02+ zero, the kernel will assume that your boot loader does not support the 2.02+ protocol. -Field name: ramdisk_max +Field name: initrd_addr_max Type: read Offset/size: 0x22c/4 Protocol: 2.03+ diff --git a/Documentation/zh_CN/arm64/booting.txt b/Documentation/zh_CN/arm64/booting.txt index 28fa325b746..6f6d956ac1c 100644 --- a/Documentation/zh_CN/arm64/booting.txt +++ b/Documentation/zh_CN/arm64/booting.txt @@ -7,7 +7,7 @@ help. Contact the Chinese maintainer if this translation is outdated or if there is a problem with the translation. Maintainer: Will Deacon <will.deacon@arm.com> -Chinese maintainer: Fu Wei <tekkamanninja@gmail.com> +Chinese maintainer: Fu Wei <wefu@redhat.com> --------------------------------------------------------------------- Documentation/arm64/booting.txt 的中文翻译 @@ -16,9 +16,9 @@ Documentation/arm64/booting.txt 的中文翻译 译存在问题,请联系中文版维护者。 英文版维护者: Will Deacon <will.deacon@arm.com> -中文版维护者: 傅炜 Fu Wei <tekkamanninja@gmail.com> -中文版翻译者: 傅炜 Fu Wei <tekkamanninja@gmail.com> -中文版校译者: 傅炜 Fu Wei <tekkamanninja@gmail.com> +中文版维护者: 傅炜 Fu Wei <wefu@redhat.com> +中文版翻译者: 傅炜 Fu Wei <wefu@redhat.com> +中文版校译者: 傅炜 Fu Wei <wefu@redhat.com> 以下为正文 --------------------------------------------------------------------- @@ -64,8 +64,8 @@ RAM,或可能使用对这个设备已知的 RAM 信息,还可能使用任何 必要性: 强制 -设备树数据块(dtb)大小必须不大于 2 MB,且位于从内核映像起始算起第一个 -512MB 内的 2MB 边界上。这使得内核可以通过初始页表中的单个节描述符来 +设备树数据块(dtb)必须 8 字节对齐,并位于从内核映像起始算起第一个 512MB +内,且不得跨越 2MB 对齐边界。这使得内核可以通过初始页表中的单个节描述符来 映射此数据块。 @@ -84,13 +84,23 @@ AArch64 内核当前没有提供自解压代码,因此如果使用了压缩内 必要性: 强制 -已解压的内核映像包含一个 32 字节的头,内容如下: +已解压的内核映像包含一个 64 字节的头,内容如下: - u32 magic = 0x14000008; /* 跳转到 stext, 小端 */ - u32 res0 = 0; /* 保留 */ + u32 code0; /* 可执行代码 */ + u32 code1; /* 可执行代码 */ u64 text_offset; /* 映像装载偏移 */ + u64 res0 = 0; /* 保留 */ u64 res1 = 0; /* 保留 */ u64 res2 = 0; /* 保留 */ + u64 res3 = 0; /* 保留 */ + u64 res4 = 0; /* 保留 */ + u32 magic = 0x644d5241; /* 魔数, 小端, "ARM\x64" */ + u32 res5 = 0; /* 保留 */ + + +映像头注释: + +- code0/code1 负责跳转到 stext. 映像必须位于系统 RAM 起始处的特定偏移(当前是 0x80000)。系统 RAM 的起始地址必须是以 2MB 对齐的。 @@ -118,9 +128,9 @@ AArch64 内核当前没有提供自解压代码,因此如果使用了压缩内 外部高速缓存(如果存在)必须配置并禁用。 - 架构计时器 - CNTFRQ 必须设定为计时器的频率。 - 如果在 EL1 模式下进入内核,则 CNTHCTL_EL2 中的 EL1PCTEN (bit 0) - 必须置位。 + CNTFRQ 必须设定为计时器的频率,且 CNTVOFF 必须设定为对所有 CPU + 都一致的值。如果在 EL1 模式下进入内核,则 CNTHCTL_EL2 中的 + EL1PCTEN (bit 0) 必须置位。 - 一致性 通过内核启动的所有 CPU 在内核入口地址上必须处于相同的一致性域中。 @@ -131,23 +141,40 @@ AArch64 内核当前没有提供自解压代码,因此如果使用了压缩内 在进入内核映像的异常级中,所有构架中可写的系统寄存器必须通过软件 在一个更高的异常级别下初始化,以防止在 未知 状态下运行。 +以上对于 CPU 模式、高速缓存、MMU、架构计时器、一致性、系统寄存器的 +必要条件描述适用于所有 CPU。所有 CPU 必须在同一异常级别跳入内核。 + 引导装载程序必须在每个 CPU 处于以下状态时跳入内核入口: - 主 CPU 必须直接跳入内核映像的第一条指令。通过此 CPU 传递的设备树 - 数据块必须在每个 CPU 节点中包含以下内容: - - 1、‘enable-method’属性。目前,此字段支持的值仅为字符串“spin-table”。 - - 2、‘cpu-release-addr’标识一个 64-bit、初始化为零的内存位置。 + 数据块必须在每个 CPU 节点中包含一个 ‘enable-method’ 属性,所 + 支持的 enable-method 请见下文。 引导装载程序必须生成这些设备树属性,并在跳入内核入口之前将其插入 数据块。 -- 任何辅助 CPU 必须在内存保留区(通过设备树中的 /memreserve/ 域传递 +- enable-method 为 “spin-table” 的 CPU 必须在它们的 CPU + 节点中包含一个 ‘cpu-release-addr’ 属性。这个属性标识了一个 + 64 位自然对齐且初始化为零的内存位置。 + + 这些 CPU 必须在内存保留区(通过设备树中的 /memreserve/ 域传递 给内核)中自旋于内核之外,轮询它们的 cpu-release-addr 位置(必须 包含在保留区中)。可通过插入 wfe 指令来降低忙循环开销,而主 CPU 将 发出 sev 指令。当对 cpu-release-addr 所指位置的读取操作返回非零值 - 时,CPU 必须直接跳入此值所指向的地址。 + 时,CPU 必须跳入此值所指向的地址。此值为一个单独的 64 位小端值, + 因此 CPU 须在跳转前将所读取的值转换为其本身的端模式。 + +- enable-method 为 “psci” 的 CPU 保持在内核外(比如,在 + memory 节点中描述为内核空间的内存区外,或在通过设备树 /memreserve/ + 域中描述为内核保留区的空间中)。内核将会发起在 ARM 文档(编号 + ARM DEN 0022A:用于 ARM 上的电源状态协调接口系统软件)中描述的 + CPU_ON 调用来将 CPU 带入内核。 + + *译者注:到文档翻译时,此文档已更新为 ARM DEN 0022B。 + + 设备树必须包含一个 ‘psci’ 节点,请参考以下文档: + Documentation/devicetree/bindings/arm/psci.txt + - 辅助 CPU 通用寄存器设置 x0 = 0 (保留,将来可能使用) diff --git a/Documentation/zh_CN/arm64/memory.txt b/Documentation/zh_CN/arm64/memory.txt index a5f6283829f..a782704c1cb 100644 --- a/Documentation/zh_CN/arm64/memory.txt +++ b/Documentation/zh_CN/arm64/memory.txt @@ -7,7 +7,7 @@ help. Contact the Chinese maintainer if this translation is outdated or if there is a problem with the translation. Maintainer: Catalin Marinas <catalin.marinas@arm.com> -Chinese maintainer: Fu Wei <tekkamanninja@gmail.com> +Chinese maintainer: Fu Wei <wefu@redhat.com> --------------------------------------------------------------------- Documentation/arm64/memory.txt 的中文翻译 @@ -16,9 +16,9 @@ Documentation/arm64/memory.txt 的中文翻译 译存在问题,请联系中文版维护者。 英文版维护者: Catalin Marinas <catalin.marinas@arm.com> -中文版维护者: 傅炜 Fu Wei <tekkamanninja@gmail.com> -中文版翻译者: 傅炜 Fu Wei <tekkamanninja@gmail.com> -中文版校译者: 傅炜 Fu Wei <tekkamanninja@gmail.com> +中文版维护者: 傅炜 Fu Wei <wefu@redhat.com> +中文版翻译者: 傅炜 Fu Wei <wefu@redhat.com> +中文版校译者: 傅炜 Fu Wei <wefu@redhat.com> 以下为正文 --------------------------------------------------------------------- @@ -41,7 +41,7 @@ AArch64 Linux 使用页大小为 4KB 的 3 级转换表配置,对于用户和 TTBR1 中,且从不写入 TTBR0。 -AArch64 Linux 内存布局: +AArch64 Linux 在页大小为 4KB 时的内存布局: 起始地址 结束地址 大小 用途 ----------------------------------------------------------------------- @@ -55,15 +55,42 @@ ffffffbc00000000 ffffffbdffffffff 8GB vmemmap ffffffbe00000000 ffffffbffbbfffff ~8GB [防护页,未来用于 vmmemap] +ffffffbffbc00000 ffffffbffbdfffff 2MB earlyprintk 设备 + ffffffbffbe00000 ffffffbffbe0ffff 64KB PCI I/O 空间 -ffffffbbffff0000 ffffffbcffffffff ~2MB [防护页] +ffffffbffbe10000 ffffffbcffffffff ~2MB [防护页] ffffffbffc000000 ffffffbfffffffff 64MB 模块 ffffffc000000000 ffffffffffffffff 256GB 内核逻辑内存映射 +AArch64 Linux 在页大小为 64KB 时的内存布局: + +起始地址 结束地址 大小 用途 +----------------------------------------------------------------------- +0000000000000000 000003ffffffffff 4TB 用户空间 + +fffffc0000000000 fffffdfbfffeffff ~2TB vmalloc + +fffffdfbffff0000 fffffdfbffffffff 64KB [防护页] + +fffffdfc00000000 fffffdfdffffffff 8GB vmemmap + +fffffdfe00000000 fffffdfffbbfffff ~8GB [防护页,未来用于 vmmemap] + +fffffdfffbc00000 fffffdfffbdfffff 2MB earlyprintk 设备 + +fffffdfffbe00000 fffffdfffbe0ffff 64KB PCI I/O 空间 + +fffffdfffbe10000 fffffdfffbffffff ~2MB [防护页] + +fffffdfffc000000 fffffdffffffffff 64MB 模块 + +fffffe0000000000 ffffffffffffffff 2TB 内核逻辑内存映射 + + 4KB 页大小的转换表查找: +--------+--------+--------+--------+--------+--------+--------+--------+ @@ -91,3 +118,10 @@ ffffffc000000000 ffffffffffffffff 256GB 内核逻辑内存映射 | | +--------------------------> [41:29] L2 索引 (仅使用 38:29 ) | +-------------------------------> [47:42] L1 索引 (未使用) +-------------------------------------------------> [63] TTBR0/1 + +当使用 KVM 时, 管理程序(hypervisor)在 EL2 中通过相对内核虚拟地址的 +一个固定偏移来映射内核页(内核虚拟地址的高 24 位设为零): + +起始地址 结束地址 大小 用途 +----------------------------------------------------------------------- +0000004000000000 0000007fffffffff 256GB 在 HYP 中映射的内核对象 diff --git a/Documentation/zh_CN/arm64/tagged-pointers.txt b/Documentation/zh_CN/arm64/tagged-pointers.txt new file mode 100644 index 00000000000..2664d1bd5a1 --- /dev/null +++ b/Documentation/zh_CN/arm64/tagged-pointers.txt @@ -0,0 +1,52 @@ +Chinese translated version of Documentation/arm64/tagged-pointers.txt + +If you have any comment or update to the content, please contact the +original document maintainer directly. However, if you have a problem +communicating in English you can also ask the Chinese maintainer for +help. Contact the Chinese maintainer if this translation is outdated +or if there is a problem with the translation. + +Maintainer: Will Deacon <will.deacon@arm.com> +Chinese maintainer: Fu Wei <wefu@redhat.com> +--------------------------------------------------------------------- +Documentation/arm64/tagged-pointers.txt 的中文翻译 + +如果想评论或更新本文的内容,请直接联系原文档的维护者。如果你使用英文 +交流有困难的话,也可以向中文版维护者求助。如果本翻译更新不及时或者翻 +译存在问题,请联系中文版维护者。 + +英文版维护者: Will Deacon <will.deacon@arm.com> +中文版维护者: 傅炜 Fu Wei <wefu@redhat.com> +中文版翻译者: 傅炜 Fu Wei <wefu@redhat.com> +中文版校译者: 傅炜 Fu Wei <wefu@redhat.com> + +以下为正文 +--------------------------------------------------------------------- + Linux 在 AArch64 中带标记的虚拟地址 + ================================= + +作者: Will Deacon <will.deacon@arm.com> +日期: 2013 年 06 月 12 日 + +本文档简述了在 AArch64 地址转换系统中提供的带标记的虚拟地址及其在 +AArch64 Linux 中的潜在用途。 + +内核提供的地址转换表配置使通过 TTBR0 完成的虚拟地址转换(即用户空间 +映射),其虚拟地址的最高 8 位(63:56)会被转换硬件所忽略。这种机制 +让这些位可供应用程序自由使用,其注意事项如下: + + (1) 内核要求所有传递到 EL1 的用户空间地址带有 0x00 标记。 + 这意味着任何携带用户空间虚拟地址的系统调用(syscall) + 参数 *必须* 在陷入内核前使它们的最高字节被清零。 + + (2) 非零标记在传递信号时不被保存。这意味着在应用程序中利用了 + 标记的信号处理函数无法依赖 siginfo_t 的用户空间虚拟 + 地址所携带的包含其内部域信息的标记。此规则的一个例外是 + 当信号是在调试观察点的异常处理程序中产生的,此时标记的 + 信息将被保存。 + + (3) 当使用带标记的指针时需特别留心,因为仅对两个虚拟地址 + 的高字节,C 编译器很可能无法判断它们是不同的。 + +此构架会阻止对带标记的 PC 指针的利用,因此在异常返回时,其高字节 +将被设置成一个为 “55” 的扩展符。 |