diff options
Diffstat (limited to 'Documentation')
59 files changed, 1469 insertions, 350 deletions
diff --git a/Documentation/ABI/testing/sysfs-block b/Documentation/ABI/testing/sysfs-block index 2b5d56127fc..c1eb41cb987 100644 --- a/Documentation/ABI/testing/sysfs-block +++ b/Documentation/ABI/testing/sysfs-block @@ -206,16 +206,3 @@ Description: when a discarded area is read the discard_zeroes_data parameter will be set to one. Otherwise it will be 0 and the result of reading a discarded area is undefined. -What: /sys/block/<disk>/alias -Date: Aug 2011 -Contact: Nao Nishijima <nao.nishijima.xt@hitachi.com> -Description: - A raw device name of a disk does not always point a same disk - each boot-up time. Therefore, users have to use persistent - device names, which udev creates when the kernel finds a disk, - instead of raw device name. However, kernel doesn't show those - persistent names on its messages (e.g. dmesg). - This file can store an alias of the disk and it would be - appeared in kernel messages if it is set. A disk can have an - alias which length is up to 255bytes. Users can use alphabets, - numbers, "-" and "_" in alias name. This file is writeonce. diff --git a/Documentation/ABI/testing/sysfs-bus-rbd b/Documentation/ABI/testing/sysfs-bus-rbd index fa72ccb2282..dbedafb095e 100644 --- a/Documentation/ABI/testing/sysfs-bus-rbd +++ b/Documentation/ABI/testing/sysfs-bus-rbd @@ -57,13 +57,6 @@ create_snap $ echo <snap-name> > /sys/bus/rbd/devices/<dev-id>/snap_create -rollback_snap - - Rolls back data to the specified snapshot. This goes over the entire - list of rados blocks and sends a rollback command to each. - - $ echo <snap-name> > /sys/bus/rbd/devices/<dev-id>/snap_rollback - snap_* A directory per each snapshot diff --git a/Documentation/DocBook/debugobjects.tmpl b/Documentation/DocBook/debugobjects.tmpl index 08ff908aa7a..24979f691e3 100644 --- a/Documentation/DocBook/debugobjects.tmpl +++ b/Documentation/DocBook/debugobjects.tmpl @@ -96,6 +96,7 @@ <listitem><para>debug_object_deactivate</para></listitem> <listitem><para>debug_object_destroy</para></listitem> <listitem><para>debug_object_free</para></listitem> + <listitem><para>debug_object_assert_init</para></listitem> </itemizedlist> Each of these functions takes the address of the real object and a pointer to the object type specific debug description @@ -273,6 +274,26 @@ debug checks. </para> </sect1> + + <sect1 id="debug_object_assert_init"> + <title>debug_object_assert_init</title> + <para> + This function is called to assert that an object has been + initialized. + </para> + <para> + When the real object is not tracked by debugobjects, it calls + fixup_assert_init of the object type description structure + provided by the caller, with the hardcoded object state + ODEBUG_NOT_AVAILABLE. The fixup function can correct the problem + by calling debug_object_init and other specific initializing + functions. + </para> + <para> + When the real object is already tracked by debugobjects it is + ignored. + </para> + </sect1> </chapter> <chapter id="fixupfunctions"> <title>Fixup functions</title> @@ -381,6 +402,35 @@ statistics. </para> </sect1> + <sect1 id="fixup_assert_init"> + <title>fixup_assert_init</title> + <para> + This function is called from the debug code whenever a problem + in debug_object_assert_init is detected. + </para> + <para> + Called from debug_object_assert_init() with a hardcoded state + ODEBUG_STATE_NOTAVAILABLE when the object is not found in the + debug bucket. + </para> + <para> + The function returns 1 when the fixup was successful, + otherwise 0. The return value is used to update the + statistics. + </para> + <para> + Note, this function should make sure debug_object_init() is + called before returning. + </para> + <para> + The handling of statically initialized objects is a special + case. The fixup function should check if this is a legitimate + case of a statically initialized object or not. In this case only + debug_object_init() should be called to make the object known to + the tracker. Then the function should return 0 because this is not + a real fixup. + </para> + </sect1> </chapter> <chapter id="bugs"> <title>Known Bugs And Assumptions</title> diff --git a/Documentation/DocBook/drm.tmpl b/Documentation/DocBook/drm.tmpl index c2791589397..196b8b9dba1 100644 --- a/Documentation/DocBook/drm.tmpl +++ b/Documentation/DocBook/drm.tmpl @@ -32,7 +32,7 @@ The Linux DRM layer contains code intended to support the needs of complex graphics devices, usually containing programmable pipelines well suited to 3D graphics acceleration. Graphics - drivers in the kernel can make use of DRM functions to make + drivers in the kernel may make use of DRM functions to make tasks like memory management, interrupt handling and DMA easier, and provide a uniform interface to applications. </para> @@ -57,10 +57,10 @@ existing drivers. </para> <para> - First, we'll go over some typical driver initialization + First, we go over some typical driver initialization requirements, like setting up command buffers, creating an initial output configuration, and initializing core services. - Subsequent sections will cover core internals in more detail, + Subsequent sections cover core internals in more detail, providing implementation notes and examples. </para> <para> @@ -74,7 +74,7 @@ </para> <para> The core of every DRM driver is struct drm_driver. Drivers - will typically statically initialize a drm_driver structure, + typically statically initialize a drm_driver structure, then pass it to drm_init() at load time. </para> @@ -88,8 +88,8 @@ </para> <programlisting> static struct drm_driver driver = { - /* don't use mtrr's here, the Xserver or user space app should - * deal with them for intel hardware. + /* Don't use MTRRs here; the Xserver or userspace app should + * deal with them for Intel hardware. */ .driver_features = DRIVER_USE_AGP | DRIVER_REQUIRE_AGP | @@ -154,8 +154,8 @@ </programlisting> <para> In the example above, taken from the i915 DRM driver, the driver - sets several flags indicating what core features it supports. - We'll go over the individual callbacks in later sections. Since + sets several flags indicating what core features it supports; + we go over the individual callbacks in later sections. Since flags indicate which features your driver supports to the DRM core, you need to set most of them prior to calling drm_init(). Some, like DRIVER_MODESET can be set later based on user supplied parameters, @@ -203,8 +203,8 @@ <term>DRIVER_HAVE_IRQ</term><term>DRIVER_IRQ_SHARED</term> <listitem> <para> - DRIVER_HAVE_IRQ indicates whether the driver has a IRQ - handler, DRIVER_IRQ_SHARED indicates whether the device & + DRIVER_HAVE_IRQ indicates whether the driver has an IRQ + handler. DRIVER_IRQ_SHARED indicates whether the device & handler support shared IRQs (note that this is required of PCI drivers). </para> @@ -214,8 +214,8 @@ <term>DRIVER_DMA_QUEUE</term> <listitem> <para> - If the driver queues DMA requests and completes them - asynchronously, this flag should be set. Deprecated. + Should be set if the driver queues DMA requests and completes them + asynchronously. Deprecated. </para> </listitem> </varlistentry> @@ -238,7 +238,7 @@ </variablelist> <para> In this specific case, the driver requires AGP and supports - IRQs. DMA, as we'll see, is handled by device specific ioctls + IRQs. DMA, as discussed later, is handled by device-specific ioctls in this case. It also supports the kernel mode setting APIs, though unlike in the actual i915 driver source, this example unconditionally exports KMS capability. @@ -269,36 +269,34 @@ initial output configuration. </para> <para> - Note that the tasks performed at driver load time must not - conflict with DRM client requirements. For instance, if user + If compatibility is a concern (e.g. with drivers converted over + to the new interfaces from the old ones), care must be taken to + prevent device initialization and control that is incompatible with + currently active userspace drivers. For instance, if user level mode setting drivers are in use, it would be problematic to perform output discovery & configuration at load time. - Likewise, if pre-memory management aware user level drivers are + Likewise, if user-level drivers unaware of memory management are in use, memory management and command buffer setup may need to - be omitted. These requirements are driver specific, and care + be omitted. These requirements are driver-specific, and care needs to be taken to keep both old and new applications and libraries working. The i915 driver supports the "modeset" module parameter to control whether advanced features are - enabled at load time or in legacy fashion. If compatibility is - a concern (e.g. with drivers converted over to the new interfaces - from the old ones), care must be taken to prevent incompatible - device initialization and control with the currently active - userspace drivers. + enabled at load time or in legacy fashion. </para> <sect2> <title>Driver private & performance counters</title> <para> The driver private hangs off the main drm_device structure and - can be used for tracking various device specific bits of + can be used for tracking various device-specific bits of information, like register offsets, command buffer status, register state for suspend/resume, etc. At load time, a - driver can simply allocate one and set drm_device.dev_priv - appropriately; at unload the driver can free it and set - drm_device.dev_priv to NULL. + driver may simply allocate one and set drm_device.dev_priv + appropriately; it should be freed and drm_device.dev_priv set + to NULL when the driver is unloaded. </para> <para> - The DRM supports several counters which can be used for rough + The DRM supports several counters which may be used for rough performance characterization. Note that the DRM stat counter system is not often used by applications, and supporting additional counters is completely optional. @@ -307,15 +305,15 @@ These interfaces are deprecated and should not be used. If performance monitoring is desired, the developer should investigate and potentially enhance the kernel perf and tracing infrastructure to export - GPU related performance information to performance monitoring - tools and applications. + GPU related performance information for consumption by performance + monitoring tools and applications. </para> </sect2> <sect2> <title>Configuring the device</title> <para> - Obviously, device configuration will be device specific. + Obviously, device configuration is device-specific. However, there are several common operations: finding a device's PCI resources, mapping them, and potentially setting up an IRQ handler. @@ -323,10 +321,10 @@ <para> Finding & mapping resources is fairly straightforward. The DRM wrapper functions, drm_get_resource_start() and - drm_get_resource_len() can be used to find BARs on the given + drm_get_resource_len(), may be used to find BARs on the given drm_device struct. Once those values have been retrieved, the driver load function can call drm_addmap() to create a new - mapping for the BAR in question. Note you'll probably want a + mapping for the BAR in question. Note that you probably want a drm_local_map_t in your driver private structure to track any mappings you create. <!-- !Fdrivers/gpu/drm/drm_bufs.c drm_get_resource_* --> @@ -335,20 +333,20 @@ <para> if compatibility with other operating systems isn't a concern (DRM drivers can run under various BSD variants and OpenSolaris), - native Linux calls can be used for the above, e.g. pci_resource_* + native Linux calls may be used for the above, e.g. pci_resource_* and iomap*/iounmap. See the Linux device driver book for more info. </para> <para> - Once you have a register map, you can use the DRM_READn() and + Once you have a register map, you may use the DRM_READn() and DRM_WRITEn() macros to access the registers on your device, or - use driver specific versions to offset into your MMIO space - relative to a driver specific base pointer (see I915_READ for - example). + use driver-specific versions to offset into your MMIO space + relative to a driver-specific base pointer (see I915_READ for + an example). </para> <para> If your device supports interrupt generation, you may want to - setup an interrupt handler at driver load time as well. This + set up an interrupt handler when the driver is loaded. This is done using the drm_irq_install() function. If your device supports vertical blank interrupts, it should call drm_vblank_init() to initialize the core vblank handling code before @@ -357,7 +355,7 @@ </para> <!--!Fdrivers/char/drm/drm_irq.c drm_irq_install--> <para> - Once your interrupt handler is registered (it'll use your + Once your interrupt handler is registered (it uses your drm_driver.irq_handler as the actual interrupt handling function), you can safely enable interrupts on your device, assuming any other state your interrupt handler uses is also @@ -371,10 +369,10 @@ using the pci_map_rom() call, a convenience function that takes care of mapping the actual ROM, whether it has been shadowed into memory (typically at address 0xc0000) or exists - on the PCI device in the ROM BAR. Note that once you've - mapped the ROM and extracted any necessary information, be - sure to unmap it; on many devices the ROM address decoder is - shared with other BARs, so leaving it mapped can cause + on the PCI device in the ROM BAR. Note that after the ROM + has been mapped and any necessary information has been extracted, + it should be unmapped; on many devices, the ROM address decoder is + shared with other BARs, so leaving it mapped could cause undesired behavior like hangs or memory corruption. <!--!Fdrivers/pci/rom.c pci_map_rom--> </para> @@ -389,9 +387,9 @@ should support a memory manager. </para> <para> - If your driver supports memory management (it should!), you'll + If your driver supports memory management (it should!), you need to set that up at load time as well. How you initialize - it depends on which memory manager you're using, TTM or GEM. + it depends on which memory manager you're using: TTM or GEM. </para> <sect3> <title>TTM initialization</title> @@ -401,7 +399,7 @@ and devices with dedicated video RAM (VRAM), i.e. most discrete graphics devices. If your device has dedicated RAM, supporting TTM is desirable. TTM also integrates tightly with your - driver specific buffer execution function. See the radeon + driver-specific buffer execution function. See the radeon driver for examples. </para> <para> @@ -429,21 +427,21 @@ created by the memory manager at runtime. Your global TTM should have a type of TTM_GLOBAL_TTM_MEM. The size field for the global object should be sizeof(struct ttm_mem_global), and the init and - release hooks should point at your driver specific init and - release routines, which will probably eventually call - ttm_mem_global_init and ttm_mem_global_release respectively. + release hooks should point at your driver-specific init and + release routines, which probably eventually call + ttm_mem_global_init and ttm_mem_global_release, respectively. </para> <para> Once your global TTM accounting structure is set up and initialized - (done by calling ttm_global_item_ref on the global object you - just created), you'll need to create a buffer object TTM to + by calling ttm_global_item_ref() on it, + you need to create a buffer object TTM to provide a pool for buffer object allocation by clients and the kernel itself. The type of this object should be TTM_GLOBAL_TTM_BO, and its size should be sizeof(struct ttm_bo_global). Again, - driver specific init and release functions can be provided, - likely eventually calling ttm_bo_global_init and - ttm_bo_global_release, respectively. Also like the previous - object, ttm_global_item_ref is used to create an initial reference + driver-specific init and release functions may be provided, + likely eventually calling ttm_bo_global_init() and + ttm_bo_global_release(), respectively. Also, like the previous + object, ttm_global_item_ref() is used to create an initial reference count for the TTM, which will call your initialization function. </para> </sect3> @@ -453,27 +451,26 @@ GEM is an alternative to TTM, designed specifically for UMA devices. It has simpler initialization and execution requirements than TTM, but has no VRAM management capability. Core GEM - initialization is comprised of a basic drm_mm_init call to create + is initialized by calling drm_mm_init() to create a GTT DRM MM object, which provides an address space pool for - object allocation. In a KMS configuration, the driver will - need to allocate and initialize a command ring buffer following - basic GEM initialization. Most UMA devices have a so-called + object allocation. In a KMS configuration, the driver + needs to allocate and initialize a command ring buffer following + core GEM initialization. A UMA device usually has what is called a "stolen" memory region, which provides space for the initial framebuffer and large, contiguous memory regions required by the - device. This space is not typically managed by GEM, and must + device. This space is not typically managed by GEM, and it must be initialized separately into its own DRM MM object. </para> <para> - Initialization will be driver specific, and will depend on - the architecture of the device. In the case of Intel + Initialization is driver-specific. In the case of Intel integrated graphics chips like 965GM, GEM initialization can be done by calling the internal GEM init function, i915_gem_do_init(). Since the 965GM is a UMA device - (i.e. it doesn't have dedicated VRAM), GEM will manage + (i.e. it doesn't have dedicated VRAM), GEM manages making regular RAM available for GPU operations. Memory set aside by the BIOS (called "stolen" memory by the i915 - driver) will be managed by the DRM memrange allocator; the - rest of the aperture will be managed by GEM. + driver) is managed by the DRM memrange allocator; the + rest of the aperture is managed by GEM. <programlisting> /* Basic memrange allocator for stolen space (aka vram) */ drm_memrange_init(&dev_priv->vram, 0, prealloc_size); @@ -483,7 +480,7 @@ <!--!Edrivers/char/drm/drm_memrange.c--> </para> <para> - Once the memory manager has been set up, we can allocate the + Once the memory manager has been set up, we may allocate the command buffer. In the i915 case, this is also done with a GEM function, i915_gem_init_ringbuffer(). </para> @@ -493,16 +490,25 @@ <sect2> <title>Output configuration</title> <para> - The final initialization task is output configuration. This involves - finding and initializing the CRTCs, encoders and connectors - for your device, creating an initial configuration and - registering a framebuffer console driver. + The final initialization task is output configuration. This involves: + <itemizedlist> + <listitem> + Finding and initializing the CRTCs, encoders, and connectors + for the device. + </listitem> + <listitem> + Creating an initial configuration. + </listitem> + <listitem> + Registering a framebuffer console driver. + </listitem> + </itemizedlist> </para> <sect3> <title>Output discovery and initialization</title> <para> - Several core functions exist to create CRTCs, encoders and - connectors, namely drm_crtc_init(), drm_connector_init() and + Several core functions exist to create CRTCs, encoders, and + connectors, namely: drm_crtc_init(), drm_connector_init(), and drm_encoder_init(), along with several "helper" functions to perform common tasks. </para> @@ -555,10 +561,10 @@ void intel_crt_init(struct drm_device *dev) </programlisting> <para> In the example above (again, taken from the i915 driver), a - CRT connector and encoder combination is created. A device - specific i2c bus is also created, for fetching EDID data and + CRT connector and encoder combination is created. A device-specific + i2c bus is also created for fetching EDID data and performing monitor detection. Once the process is complete, - the new connector is registered with sysfs, to make its + the new connector is registered with sysfs to make its properties available to applications. </para> <sect4> @@ -567,12 +573,12 @@ void intel_crt_init(struct drm_device *dev) Since many PC-class graphics devices have similar display output designs, the DRM provides a set of helper functions to make output management easier. The core helper routines handle - encoder re-routing and disabling of unused functions following - mode set. Using the helpers is optional, but recommended for + encoder re-routing and the disabling of unused functions following + mode setting. Using the helpers is optional, but recommended for devices with PC-style architectures (i.e. a set of display planes for feeding pixels to encoders which are in turn routed to connectors). Devices with more complex requirements needing - finer grained management can opt to use the core callbacks + finer grained management may opt to use the core callbacks directly. </para> <para> @@ -580,17 +586,25 @@ void intel_crt_init(struct drm_device *dev) </para> </sect4> <para> - For each encoder, CRTC and connector, several functions must - be provided, depending on the object type. Encoder objects - need to provide a DPMS (basically on/off) function, mode fixup - (for converting requested modes into native hardware timings), - and prepare, set and commit functions for use by the core DRM - helper functions. Connector helpers need to provide mode fetch and - validity functions as well as an encoder matching function for - returning an ideal encoder for a given connector. The core - connector functions include a DPMS callback, (deprecated) - save/restore routines, detection, mode probing, property handling, - and cleanup functions. + Each encoder object needs to provide: + <itemizedlist> + <listitem> + A DPMS (basically on/off) function. + </listitem> + <listitem> + A mode-fixup function (for converting requested modes into + native hardware timings). + </listitem> + <listitem> + Functions (prepare, set, and commit) for use by the core DRM + helper functions. + </listitem> + </itemizedlist> + Connector helpers need to provide functions (mode-fetch, validity, + and encoder-matching) for returning an ideal encoder for a given + connector. The core connector functions include a DPMS callback, + save/restore routines (deprecated), detection, mode probing, + property handling, and cleanup functions. </para> <!--!Edrivers/char/drm/drm_crtc.h--> <!--!Edrivers/char/drm/drm_crtc.c--> @@ -605,23 +619,34 @@ void intel_crt_init(struct drm_device *dev) <title>VBlank event handling</title> <para> The DRM core exposes two vertical blank related ioctls: - DRM_IOCTL_WAIT_VBLANK and DRM_IOCTL_MODESET_CTL. + <variablelist> + <varlistentry> + <term>DRM_IOCTL_WAIT_VBLANK</term> + <listitem> + <para> + This takes a struct drm_wait_vblank structure as its argument, + and it is used to block or request a signal when a specified + vblank event occurs. + </para> + </listitem> + </varlistentry> + <varlistentry> + <term>DRM_IOCTL_MODESET_CTL</term> + <listitem> + <para> + This should be called by application level drivers before and + after mode setting, since on many devices the vertical blank + counter is reset at that time. Internally, the DRM snapshots + the last vblank count when the ioctl is called with the + _DRM_PRE_MODESET command, so that the counter won't go backwards + (which is dealt with when _DRM_POST_MODESET is used). + </para> + </listitem> + </varlistentry> + </variablelist> <!--!Edrivers/char/drm/drm_irq.c--> </para> <para> - DRM_IOCTL_WAIT_VBLANK takes a struct drm_wait_vblank structure - as its argument, and is used to block or request a signal when a - specified vblank event occurs. - </para> - <para> - DRM_IOCTL_MODESET_CTL should be called by application level - drivers before and after mode setting, since on many devices the - vertical blank counter will be reset at that time. Internally, - the DRM snapshots the last vblank count when the ioctl is called - with the _DRM_PRE_MODESET command so that the counter won't go - backwards (which is dealt with when _DRM_POST_MODESET is used). - </para> - <para> To support the functions above, the DRM core provides several helper functions for tracking vertical blank counters, and requires drivers to provide several callbacks: @@ -632,24 +657,24 @@ void intel_crt_init(struct drm_device *dev) register. The enable and disable vblank callbacks should enable and disable vertical blank interrupts, respectively. In the absence of DRM clients waiting on vblank events, the core DRM - code will use the disable_vblank() function to disable - interrupts, which saves power. They'll be re-enabled again when + code uses the disable_vblank() function to disable + interrupts, which saves power. They are re-enabled again when a client calls the vblank wait ioctl above. </para> <para> - Devices that don't provide a count register can simply use an + A device that doesn't provide a count register may simply use an internal atomic counter incremented on every vertical blank - interrupt, and can make their enable and disable vblank - functions into no-ops. + interrupt (and then treat the enable_vblank() and disable_vblank() + callbacks as no-ops). </para> </sect1> <sect1> <title>Memory management</title> <para> - The memory manager lies at the heart of many DRM operations, and - is also required to support advanced client features like OpenGL - pbuffers. The DRM currently contains two memory managers, TTM + The memory manager lies at the heart of many DRM operations; it + is required to support advanced client features like OpenGL + pbuffers. The DRM currently contains two memory managers: TTM and GEM. </para> @@ -679,41 +704,46 @@ void intel_crt_init(struct drm_device *dev) <para> GEM-enabled drivers must provide gem_init_object() and gem_free_object() callbacks to support the core memory - allocation routines. They should also provide several driver - specific ioctls to support command execution, pinning, buffer + allocation routines. They should also provide several driver-specific + ioctls to support command execution, pinning, buffer read & write, mapping, and domain ownership transfers. </para> <para> - On a fundamental level, GEM involves several operations: memory - allocation and freeing, command execution, and aperture management - at command execution time. Buffer object allocation is relatively + On a fundamental level, GEM involves several operations: + <itemizedlist> + <listitem>Memory allocation and freeing</listitem> + <listitem>Command execution</listitem> + <listitem>Aperture management at command execution time</listitem> + </itemizedlist> + Buffer object allocation is relatively straightforward and largely provided by Linux's shmem layer, which provides memory to back each object. When mapped into the GTT or used in a command buffer, the backing pages for an object are flushed to memory and marked write combined so as to be coherent - with the GPU. Likewise, when the GPU finishes rendering to an object, - if the CPU accesses it, it must be made coherent with the CPU's view + with the GPU. Likewise, if the CPU accesses an object after the GPU + has finished rendering to the object, then the object must be made + coherent with the CPU's view of memory, usually involving GPU cache flushing of various kinds. - This core CPU<->GPU coherency management is provided by the GEM - set domain function, which evaluates an object's current domain and + This core CPU<->GPU coherency management is provided by a + device-specific ioctl, which evaluates an object's current domain and performs any necessary flushing or synchronization to put the object into the desired coherency domain (note that the object may be busy, - i.e. an active render target; in that case the set domain function - will block the client and wait for rendering to complete before + i.e. an active render target; in that case, setting the domain + blocks the client and waits for rendering to complete before performing any necessary flushing operations). </para> <para> Perhaps the most important GEM function is providing a command execution interface to clients. Client programs construct command - buffers containing references to previously allocated memory objects - and submit them to GEM. At that point, GEM will take care to bind + buffers containing references to previously allocated memory objects, + and then submit them to GEM. At that point, GEM takes care to bind all the objects into the GTT, execute the buffer, and provide necessary synchronization between clients accessing the same buffers. This often involves evicting some objects from the GTT and re-binding others (a fairly expensive operation), and providing relocation support which hides fixed GTT offsets from clients. Clients must take care not to submit command buffers that reference more objects - than can fit in the GTT or GEM will reject them and no rendering + than can fit in the GTT; otherwise, GEM will reject them and no rendering will occur. Similarly, if several objects in the buffer require fence registers to be allocated for correct rendering (e.g. 2D blits on pre-965 chips), care must be taken not to require more fence @@ -729,7 +759,7 @@ void intel_crt_init(struct drm_device *dev) <title>Output management</title> <para> At the core of the DRM output management code is a set of - structures representing CRTCs, encoders and connectors. + structures representing CRTCs, encoders, and connectors. </para> <para> A CRTC is an abstraction representing a part of the chip that @@ -765,21 +795,19 @@ void intel_crt_init(struct drm_device *dev) <sect1> <title>Framebuffer management</title> <para> - In order to set a mode on a given CRTC, encoder and connector - configuration, clients need to provide a framebuffer object which - will provide a source of pixels for the CRTC to deliver to the encoder(s) - and ultimately the connector(s) in the configuration. A framebuffer - is fundamentally a driver specific memory object, made into an opaque - handle by the DRM addfb function. Once an fb has been created this - way it can be passed to the KMS mode setting routines for use in - a configuration. + Clients need to provide a framebuffer object which provides a source + of pixels for a CRTC to deliver to the encoder(s) and ultimately the + connector(s). A framebuffer is fundamentally a driver-specific memory + object, made into an opaque handle by the DRM's addfb() function. + Once a framebuffer has been created this way, it may be passed to the + KMS mode setting routines for use in a completed configuration. </para> </sect1> <sect1> <title>Command submission & fencing</title> <para> - This should cover a few device specific command submission + This should cover a few device-specific command submission implementations. </para> </sect1> @@ -789,7 +817,7 @@ void intel_crt_init(struct drm_device *dev) <para> The DRM core provides some suspend/resume code, but drivers wanting full suspend/resume support should provide save() and - restore() functions. These will be called at suspend, + restore() functions. These are called at suspend, hibernate, or resume time, and should perform any state save or restore required by your device across suspend or hibernate states. @@ -812,8 +840,8 @@ void intel_crt_init(struct drm_device *dev) <para> The DRM core exports several interfaces to applications, generally intended to be used through corresponding libdrm - wrapper functions. In addition, drivers export device specific - interfaces for use by userspace drivers & device aware + wrapper functions. In addition, drivers export device-specific + interfaces for use by userspace drivers & device-aware applications through ioctls and sysfs files. </para> <para> @@ -822,8 +850,8 @@ void intel_crt_init(struct drm_device *dev) management, memory management, and output management. </para> <para> - Cover generic ioctls and sysfs layout here. Only need high - level info, since man pages will cover the rest. + Cover generic ioctls and sysfs layout here. We only need high-level + info, since man pages should cover the rest. </para> </chapter> diff --git a/Documentation/DocBook/uio-howto.tmpl b/Documentation/DocBook/uio-howto.tmpl index 54883de5d5f..ac3d0018140 100644 --- a/Documentation/DocBook/uio-howto.tmpl +++ b/Documentation/DocBook/uio-howto.tmpl @@ -521,6 +521,11 @@ Here's a description of the fields of <varname>struct uio_mem</varname>: <itemizedlist> <listitem><para> +<varname>const char *name</varname>: Optional. Set this to help identify +the memory region, it will show up in the corresponding sysfs node. +</para></listitem> + +<listitem><para> <varname>int memtype</varname>: Required if the mapping is used. Set this to <varname>UIO_MEM_PHYS</varname> if you you have physical memory on your card to be mapped. Use <varname>UIO_MEM_LOGICAL</varname> for logical @@ -553,7 +558,7 @@ instead to remember such an address. </itemizedlist> <para> -Please do not touch the <varname>kobj</varname> element of +Please do not touch the <varname>map</varname> element of <varname>struct uio_mem</varname>! It is used by the UIO framework to set up sysfs files for this mapping. Simply leave it alone. </para> diff --git a/Documentation/HOWTO b/Documentation/HOWTO index 81bc1a9ab9d..f7ade3b3b40 100644 --- a/Documentation/HOWTO +++ b/Documentation/HOWTO @@ -275,8 +275,8 @@ versions. If no 2.6.x.y kernel is available, then the highest numbered 2.6.x kernel is the current stable kernel. -2.6.x.y are maintained by the "stable" team <stable@kernel.org>, and are -released as needs dictate. The normal release period is approximately +2.6.x.y are maintained by the "stable" team <stable@vger.kernel.org>, and +are released as needs dictate. The normal release period is approximately two weeks, but it can be longer if there are no pressing problems. A security-related problem, instead, can cause a release to happen almost instantly. diff --git a/Documentation/RCU/checklist.txt b/Documentation/RCU/checklist.txt index 0c134f8afc6..bff2d8be1e1 100644 --- a/Documentation/RCU/checklist.txt +++ b/Documentation/RCU/checklist.txt @@ -328,6 +328,12 @@ over a rather long period of time, but improvements are always welcome! RCU rather than SRCU, because RCU is almost always faster and easier to use than is SRCU. + If you need to enter your read-side critical section in a + hardirq or exception handler, and then exit that same read-side + critical section in the task that was interrupted, then you need + to srcu_read_lock_raw() and srcu_read_unlock_raw(), which avoid + the lockdep checking that would otherwise this practice illegal. + Also unlike other forms of RCU, explicit initialization and cleanup is required via init_srcu_struct() and cleanup_srcu_struct(). These are passed a "struct srcu_struct" diff --git a/Documentation/RCU/rcu.txt b/Documentation/RCU/rcu.txt index 31852705b58..bf778332a28 100644 --- a/Documentation/RCU/rcu.txt +++ b/Documentation/RCU/rcu.txt @@ -38,11 +38,11 @@ o How can the updater tell when a grace period has completed Preemptible variants of RCU (CONFIG_TREE_PREEMPT_RCU) get the same effect, but require that the readers manipulate CPU-local - counters. These counters allow limited types of blocking - within RCU read-side critical sections. SRCU also uses - CPU-local counters, and permits general blocking within - RCU read-side critical sections. These two variants of - RCU detect grace periods by sampling these counters. + counters. These counters allow limited types of blocking within + RCU read-side critical sections. SRCU also uses CPU-local + counters, and permits general blocking within RCU read-side + critical sections. These variants of RCU detect grace periods + by sampling these counters. o If I am running on a uniprocessor kernel, which can only do one thing at a time, why should I wait for a grace period? diff --git a/Documentation/RCU/stallwarn.txt b/Documentation/RCU/stallwarn.txt index 4e959208f73..083d88cbc08 100644 --- a/Documentation/RCU/stallwarn.txt +++ b/Documentation/RCU/stallwarn.txt @@ -101,6 +101,11 @@ o A CPU-bound real-time task in a CONFIG_PREEMPT_RT kernel that CONFIG_TREE_PREEMPT_RCU case, you might see stall-warning messages. +o A hardware or software issue shuts off the scheduler-clock + interrupt on a CPU that is not in dyntick-idle mode. This + problem really has happened, and seems to be most likely to + result in RCU CPU stall warnings for CONFIG_NO_HZ=n kernels. + o A bug in the RCU implementation. o A hardware failure. This is quite unlikely, but has occurred @@ -109,12 +114,11 @@ o A hardware failure. This is quite unlikely, but has occurred This resulted in a series of RCU CPU stall warnings, eventually leading the realization that the CPU had failed. -The RCU, RCU-sched, and RCU-bh implementations have CPU stall -warning. SRCU does not have its own CPU stall warnings, but its -calls to synchronize_sched() will result in RCU-sched detecting -RCU-sched-related CPU stalls. Please note that RCU only detects -CPU stalls when there is a grace period in progress. No grace period, -no CPU stall warnings. +The RCU, RCU-sched, and RCU-bh implementations have CPU stall warning. +SRCU does not have its own CPU stall warnings, but its calls to +synchronize_sched() will result in RCU-sched detecting RCU-sched-related +CPU stalls. Please note that RCU only detects CPU stalls when there is +a grace period in progress. No grace period, no CPU stall warnings. To diagnose the cause of the stall, inspect the stack traces. The offending function will usually be near the top of the stack. diff --git a/Documentation/RCU/torture.txt b/Documentation/RCU/torture.txt index 783d6c134d3..d67068d0d2b 100644 --- a/Documentation/RCU/torture.txt +++ b/Documentation/RCU/torture.txt @@ -61,11 +61,24 @@ nreaders This is the number of RCU reading threads supported. To properly exercise RCU implementations with preemptible read-side critical sections. +onoff_interval + The number of seconds between each attempt to execute a + randomly selected CPU-hotplug operation. Defaults to + zero, which disables CPU hotplugging. In HOTPLUG_CPU=n + kernels, rcutorture will silently refuse to do any + CPU-hotplug operations regardless of what value is + specified for onoff_interval. + shuffle_interval The number of seconds to keep the test threads affinitied to a particular subset of the CPUs, defaults to 3 seconds. Used in conjunction with test_no_idle_hz. +shutdown_secs The number of seconds to run the test before terminating + the test and powering off the system. The default is + zero, which disables test termination and system shutdown. + This capability is useful for automated testing. + stat_interval The number of seconds between output of torture statistics (via printk()). Regardless of the interval, statistics are printed when the module is unloaded. diff --git a/Documentation/RCU/trace.txt b/Documentation/RCU/trace.txt index aaf65f6c6cd..49587abfc2f 100644 --- a/Documentation/RCU/trace.txt +++ b/Documentation/RCU/trace.txt @@ -105,14 +105,10 @@ o "dt" is the current value of the dyntick counter that is incremented or one greater than the interrupt-nesting depth otherwise. The number after the second "/" is the NMI nesting depth. - This field is displayed only for CONFIG_NO_HZ kernels. - o "df" is the number of times that some other CPU has forced a quiescent state on behalf of this CPU due to this CPU being in dynticks-idle state. - This field is displayed only for CONFIG_NO_HZ kernels. - o "of" is the number of times that some other CPU has forced a quiescent state on behalf of this CPU due to this CPU being offline. In a perfect world, this might never happen, but it diff --git a/Documentation/RCU/whatisRCU.txt b/Documentation/RCU/whatisRCU.txt index 6ef692667e2..6bbe8dcdc3d 100644 --- a/Documentation/RCU/whatisRCU.txt +++ b/Documentation/RCU/whatisRCU.txt @@ -4,6 +4,7 @@ to start learning about RCU: 1. What is RCU, Fundamentally? http://lwn.net/Articles/262464/ 2. What is RCU? Part 2: Usage http://lwn.net/Articles/263130/ 3. RCU part 3: the RCU API http://lwn.net/Articles/264090/ +4. The RCU API, 2010 Edition http://lwn.net/Articles/418853/ What is RCU? @@ -834,6 +835,8 @@ SRCU: Critical sections Grace period Barrier srcu_read_lock synchronize_srcu N/A srcu_read_unlock synchronize_srcu_expedited + srcu_read_lock_raw + srcu_read_unlock_raw srcu_dereference SRCU: Initialization/cleanup @@ -855,27 +858,33 @@ list can be helpful: a. Will readers need to block? If so, you need SRCU. -b. What about the -rt patchset? If readers would need to block +b. Is it necessary to start a read-side critical section in a + hardirq handler or exception handler, and then to complete + this read-side critical section in the task that was + interrupted? If so, you need SRCU's srcu_read_lock_raw() and + srcu_read_unlock_raw() primitives. + +c. What about the -rt patchset? If readers would need to block in an non-rt kernel, you need SRCU. If readers would block in a -rt kernel, but not in a non-rt kernel, SRCU is not necessary. -c. Do you need to treat NMI handlers, hardirq handlers, +d. Do you need to treat NMI handlers, hardirq handlers, and code segments with preemption disabled (whether via preempt_disable(), local_irq_save(), local_bh_disable(), or some other mechanism) as if they were explicit RCU readers? If so, you need RCU-sched. -d. Do you need RCU grace periods to complete even in the face +e. Do you need RCU grace periods to complete even in the face of softirq monopolization of one or more of the CPUs? For example, is your code subject to network-based denial-of-service attacks? If so, you need RCU-bh. -e. Is your workload too update-intensive for normal use of +f. Is your workload too update-intensive for normal use of RCU, but inappropriate for other synchronization mechanisms? If so, consider SLAB_DESTROY_BY_RCU. But please be careful! -f. Otherwise, use RCU. +g. Otherwise, use RCU. Of course, this all assumes that you have determined that RCU is in fact the right tool for your job. diff --git a/Documentation/arm/memory.txt b/Documentation/arm/memory.txt index 771d48d3b33..208a2d465b9 100644 --- a/Documentation/arm/memory.txt +++ b/Documentation/arm/memory.txt @@ -51,15 +51,14 @@ ffc00000 ffefffff DMA memory mapping region. Memory returned ff000000 ffbfffff Reserved for future expansion of DMA mapping region. -VMALLOC_END feffffff Free for platform use, recommended. - VMALLOC_END must be aligned to a 2MB - boundary. - VMALLOC_START VMALLOC_END-1 vmalloc() / ioremap() space. Memory returned by vmalloc/ioremap will be dynamically placed in this region. - VMALLOC_START may be based upon the value - of the high_memory variable. + Machine specific static mappings are also + located here through iotable_init(). + VMALLOC_START is based upon the value + of the high_memory variable, and VMALLOC_END + is equal to 0xff000000. PAGE_OFFSET high_memory-1 Kernel direct-mapped RAM region. This maps the platforms RAM, and typically diff --git a/Documentation/atomic_ops.txt b/Documentation/atomic_ops.txt index 3bd585b4492..27f2b21a9d5 100644 --- a/Documentation/atomic_ops.txt +++ b/Documentation/atomic_ops.txt @@ -84,6 +84,93 @@ compiler optimizes the section accessing atomic_t variables. *** YOU HAVE BEEN WARNED! *** +Properly aligned pointers, longs, ints, and chars (and unsigned +equivalents) may be atomically loaded from and stored to in the same +sense as described for atomic_read() and atomic_set(). The ACCESS_ONCE() +macro should be used to prevent the compiler from using optimizations +that might otherwise optimize accesses out of existence on the one hand, +or that might create unsolicited accesses on the other. + +For example consider the following code: + + while (a > 0) + do_something(); + +If the compiler can prove that do_something() does not store to the +variable a, then the compiler is within its rights transforming this to +the following: + + tmp = a; + if (a > 0) + for (;;) + do_something(); + +If you don't want the compiler to do this (and you probably don't), then +you should use something like the following: + + while (ACCESS_ONCE(a) < 0) + do_something(); + +Alternatively, you could place a barrier() call in the loop. + +For another example, consider the following code: + + tmp_a = a; + do_something_with(tmp_a); + do_something_else_with(tmp_a); + +If the compiler can prove that do_something_with() does not store to the +variable a, then the compiler is within its rights to manufacture an +additional load as follows: + + tmp_a = a; + do_something_with(tmp_a); + tmp_a = a; + do_something_else_with(tmp_a); + +This could fatally confuse your code if it expected the same value +to be passed to do_something_with() and do_something_else_with(). + +The compiler would be likely to manufacture this additional load if +do_something_with() was an inline function that made very heavy use +of registers: reloading from variable a could save a flush to the +stack and later reload. To prevent the compiler from attacking your +code in this manner, write the following: + + tmp_a = ACCESS_ONCE(a); + do_something_with(tmp_a); + do_something_else_with(tmp_a); + +For a final example, consider the following code, assuming that the +variable a is set at boot time before the second CPU is brought online +and never changed later, so that memory barriers are not needed: + + if (a) + b = 9; + else + b = 42; + +The compiler is within its rights to manufacture an additional store +by transforming the above code into the following: + + b = 42; + if (a) + b = 9; + +This could come as a fatal surprise to other code running concurrently +that expected b to never have the value 42 if a was zero. To prevent +the compiler from doing this, write something like: + + if (a) + ACCESS_ONCE(b) = 9; + else + ACCESS_ONCE(b) = 42; + +Don't even -think- about doing this without proper use of memory barriers, +locks, or atomic operations if variable a can change at runtime! + +*** WARNING: ACCESS_ONCE() DOES NOT IMPLY A BARRIER! *** + Now, we move onto the atomic operation interfaces typically implemented with the help of assembly code. diff --git a/Documentation/blockdev/cciss.txt b/Documentation/blockdev/cciss.txt index 71464e09ec1..b79d0a13e7c 100644 --- a/Documentation/blockdev/cciss.txt +++ b/Documentation/blockdev/cciss.txt @@ -98,14 +98,12 @@ You must enable "SCSI tape drive support for Smart Array 5xxx" and "SCSI support" in your kernel configuration to be able to use SCSI tape drives with your Smart Array 5xxx controller. -Additionally, note that the driver will not engage the SCSI core at init -time. The driver must be directed to dynamically engage the SCSI core via -the /proc filesystem entry which the "block" side of the driver creates as -/proc/driver/cciss/cciss* at runtime. This is because at driver init time, -the SCSI core may not yet be initialized (because the driver is a block -driver) and attempting to register it with the SCSI core in such a case -would cause a hang. This is best done via an initialization script -(typically in /etc/init.d, but could vary depending on distribution). +Additionally, note that the driver will engage the SCSI core at init +time if any tape drives or medium changers are detected. The driver may +also be directed to dynamically engage the SCSI core via the /proc filesystem +entry which the "block" side of the driver creates as +/proc/driver/cciss/cciss* at runtime. This is best done via a script. + For example: for x in /proc/driver/cciss/cciss[0-9]* diff --git a/Documentation/cgroups/freezer-subsystem.txt b/Documentation/cgroups/freezer-subsystem.txt index c21d77742a0..7e62de1e59f 100644 --- a/Documentation/cgroups/freezer-subsystem.txt +++ b/Documentation/cgroups/freezer-subsystem.txt @@ -33,9 +33,9 @@ demonstrate this problem using nested bash shells: From a second, unrelated bash shell: $ kill -SIGSTOP 16690 - $ kill -SIGCONT 16990 + $ kill -SIGCONT 16690 - <at this point 16990 exits and causes 16644 to exit too> + <at this point 16690 exits and causes 16644 to exit too> This happens because bash can observe both signals and choose how it responds to them. diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt index cc0ebc5241b..4d8774f6f48 100644 --- a/Documentation/cgroups/memory.txt +++ b/Documentation/cgroups/memory.txt @@ -44,8 +44,8 @@ Features: - oom-killer disable knob and oom-notifier - Root cgroup has no limit controls. - Kernel memory and Hugepages are not under control yet. We just manage - pages on LRU. To add more controls, we have to take care of performance. + Kernel memory support is work in progress, and the current version provides + basically functionality. (See Section 2.7) Brief summary of control files. @@ -72,6 +72,9 @@ Brief summary of control files. memory.oom_control # set/show oom controls. memory.numa_stat # show the number of memory usage per numa node + memory.kmem.tcp.limit_in_bytes # set/show hard limit for tcp buf memory + memory.kmem.tcp.usage_in_bytes # show current tcp buf memory allocation + 1. History The memory controller has a long history. A request for comments for the memory @@ -255,6 +258,27 @@ When oom event notifier is registered, event will be delivered. per-zone-per-cgroup LRU (cgroup's private LRU) is just guarded by zone->lru_lock, it has no lock of its own. +2.7 Kernel Memory Extension (CONFIG_CGROUP_MEM_RES_CTLR_KMEM) + +With the Kernel memory extension, the Memory Controller is able to limit +the amount of kernel memory used by the system. Kernel memory is fundamentally +different than user memory, since it can't be swapped out, which makes it +possible to DoS the system by consuming too much of this precious resource. + +Kernel memory limits are not imposed for the root cgroup. Usage for the root +cgroup may or may not be accounted. + +Currently no soft limit is implemented for kernel memory. It is future work +to trigger slab reclaim when those limits are reached. + +2.7.1 Current Kernel Memory resources accounted + +* sockets memory pressure: some sockets protocols have memory pressure +thresholds. The Memory Controller allows them to be controlled individually +per cgroup, instead of globally. + +* tcp memory pressure: sockets memory pressure for the tcp protocol. + 3. User Interface 0. Configuration diff --git a/Documentation/cgroups/net_prio.txt b/Documentation/cgroups/net_prio.txt new file mode 100644 index 00000000000..01b32263559 --- /dev/null +++ b/Documentation/cgroups/net_prio.txt @@ -0,0 +1,53 @@ +Network priority cgroup +------------------------- + +The Network priority cgroup provides an interface to allow an administrator to +dynamically set the priority of network traffic generated by various +applications + +Nominally, an application would set the priority of its traffic via the +SO_PRIORITY socket option. This however, is not always possible because: + +1) The application may not have been coded to set this value +2) The priority of application traffic is often a site-specific administrative + decision rather than an application defined one. + +This cgroup allows an administrator to assign a process to a group which defines +the priority of egress traffic on a given interface. Network priority groups can +be created by first mounting the cgroup filesystem. + +# mount -t cgroup -onet_prio none /sys/fs/cgroup/net_prio + +With the above step, the initial group acting as the parent accounting group +becomes visible at '/sys/fs/cgroup/net_prio'. This group includes all tasks in +the system. '/sys/fs/cgroup/net_prio/tasks' lists the tasks in this cgroup. + +Each net_prio cgroup contains two files that are subsystem specific + +net_prio.prioidx +This file is read-only, and is simply informative. It contains a unique integer +value that the kernel uses as an internal representation of this cgroup. + +net_prio.ifpriomap +This file contains a map of the priorities assigned to traffic originating from +processes in this group and egressing the system on various interfaces. It +contains a list of tuples in the form <ifname priority>. Contents of this file +can be modified by echoing a string into the file using the same tuple format. +for example: + +echo "eth0 5" > /sys/fs/cgroups/net_prio/iscsi/net_prio.ifpriomap + +This command would force any traffic originating from processes belonging to the +iscsi net_prio cgroup and egressing on interface eth0 to have the priority of +said traffic set to the value 5. The parent accounting group also has a +writeable 'net_prio.ifpriomap' file that can be used to set a system default +priority. + +Priorities are set immediately prior to queueing a frame to the device +queueing discipline (qdisc) so priorities will be assigned prior to the hardware +queue selection being made. + +One usage for the net_prio cgroup is with mqprio qdisc allowing application +traffic to be steered to hardware/driver based traffic classes. These mappings +can then be managed by administrators or other networking protocols such as +DCBX. diff --git a/Documentation/development-process/5.Posting b/Documentation/development-process/5.Posting index 903a2546f13..8a48c9b6286 100644 --- a/Documentation/development-process/5.Posting +++ b/Documentation/development-process/5.Posting @@ -271,10 +271,10 @@ copies should go to: the linux-kernel list. - If you are fixing a bug, think about whether the fix should go into the - next stable update. If so, stable@kernel.org should get a copy of the - patch. Also add a "Cc: stable@kernel.org" to the tags within the patch - itself; that will cause the stable team to get a notification when your - fix goes into the mainline. + next stable update. If so, stable@vger.kernel.org should get a copy of + the patch. Also add a "Cc: stable@vger.kernel.org" to the tags within + the patch itself; that will cause the stable team to get a notification + when your fix goes into the mainline. When selecting recipients for a patch, it is good to have an idea of who you think will eventually accept the patch and get it merged. While it diff --git a/Documentation/devicetree/bindings/arm/gic.txt b/Documentation/devicetree/bindings/arm/gic.txt index 52916b4aa1f..9b4b82a721b 100644 --- a/Documentation/devicetree/bindings/arm/gic.txt +++ b/Documentation/devicetree/bindings/arm/gic.txt @@ -42,6 +42,10 @@ Optional - interrupts : Interrupt source of the parent interrupt controller. Only present on secondary GICs. +- cpu-offset : per-cpu offset within the distributor and cpu interface + regions, used when the GIC doesn't have banked registers. The offset is + cpu-offset * cpu-nr. + Example: intc: interrupt-controller@fff11000 { diff --git a/Documentation/devicetree/bindings/arm/vic.txt b/Documentation/devicetree/bindings/arm/vic.txt new file mode 100644 index 00000000000..266716b2343 --- /dev/null +++ b/Documentation/devicetree/bindings/arm/vic.txt @@ -0,0 +1,29 @@ +* ARM Vectored Interrupt Controller + +One or more Vectored Interrupt Controllers (VIC's) can be connected in an ARM +system for interrupt routing. For multiple controllers they can either be +nested or have the outputs wire-OR'd together. + +Required properties: + +- compatible : should be one of + "arm,pl190-vic" + "arm,pl192-vic" +- interrupt-controller : Identifies the node as an interrupt controller +- #interrupt-cells : The number of cells to define the interrupts. Must be 1 as + the VIC has no configuration options for interrupt sources. The cell is a u32 + and defines the interrupt number. +- reg : The register bank for the VIC. + +Optional properties: + +- interrupts : Interrupt source for parent controllers if the VIC is nested. + +Example: + + vic0: interrupt-controller@60000 { + compatible = "arm,pl192-vic"; + interrupt-controller; + #interrupt-cells = <1>; + reg = <0x60000 0x1000>; + }; diff --git a/Documentation/devicetree/bindings/i2c/i2c-designware.txt b/Documentation/devicetree/bindings/i2c/i2c-designware.txt new file mode 100644 index 00000000000..e42a2ee233e --- /dev/null +++ b/Documentation/devicetree/bindings/i2c/i2c-designware.txt @@ -0,0 +1,22 @@ +* Synopsys DesignWare I2C + +Required properties : + + - compatible : should be "snps,designware-i2c" + - reg : Offset and length of the register set for the device + - interrupts : <IRQ> where IRQ is the interrupt number. + +Recommended properties : + + - clock-frequency : desired I2C bus clock frequency in Hz. + +Example : + + i2c@f0000 { + #address-cells = <1>; + #size-cells = <0>; + compatible = "snps,designware-i2c"; + reg = <0xf0000 0x1000>; + interrupts = <11>; + clock-frequency = <400000>; + }; diff --git a/Documentation/devicetree/bindings/i2c/trivial-devices.txt b/Documentation/devicetree/bindings/i2c/trivial-devices.txt new file mode 100644 index 00000000000..1a85f986961 --- /dev/null +++ b/Documentation/devicetree/bindings/i2c/trivial-devices.txt @@ -0,0 +1,58 @@ +This is a list of trivial i2c devices that have simple device tree +bindings, consisting only of a compatible field, an address and +possibly an interrupt line. + +If a device needs more specific bindings, such as properties to +describe some aspect of it, there needs to be a specific binding +document for it just like any other devices. + + +Compatible Vendor / Chip +========== ============= +ad,ad7414 SMBus/I2C Digital Temperature Sensor in 6-Pin SOT with SMBus Alert and Over Temperature Pin +ad,adm9240 ADM9240: Complete System Hardware Monitor for uProcessor-Based Systems +adi,adt7461 +/-1C TDM Extended Temp Range I.C +adt7461 +/-1C TDM Extended Temp Range I.C +at,24c08 i2c serial eeprom (24cxx) +atmel,24c02 i2c serial eeprom (24cxx) +catalyst,24c32 i2c serial eeprom +dallas,ds1307 64 x 8, Serial, I2C Real-Time Clock +dallas,ds1338 I2C RTC with 56-Byte NV RAM +dallas,ds1339 I2C Serial Real-Time Clock +dallas,ds1340 I2C RTC with Trickle Charger +dallas,ds1374 I2C, 32-Bit Binary Counter Watchdog RTC with Trickle Charger and Reset Input/Output +dallas,ds1631 High-Precision Digital Thermometer +dallas,ds1682 Total-Elapsed-Time Recorder with Alarm +dallas,ds1775 Tiny Digital Thermometer and Thermostat +dallas,ds3232 Extremely Accurate I²C RTC with Integrated Crystal and SRAM +dallas,ds4510 CPU Supervisor with Nonvolatile Memory and Programmable I/O +dallas,ds75 Digital Thermometer and Thermostat +dialog,da9053 DA9053: flexible system level PMIC with multicore support +epson,rx8025 High-Stability. I2C-Bus INTERFACE REAL TIME CLOCK MODULE +epson,rx8581 I2C-BUS INTERFACE REAL TIME CLOCK MODULE +fsl,mag3110 MAG3110: Xtrinsic High Accuracy, 3D Magnetometer +fsl,mc13892 MC13892: Power Management Integrated Circuit (PMIC) for i.MX35/51 +fsl,mma8450 MMA8450Q: Xtrinsic Low-power, 3-axis Xtrinsic Accelerometer +fsl,mpr121 MPR121: Proximity Capacitive Touch Sensor Controller +fsl,sgtl5000 SGTL5000: Ultra Low-Power Audio Codec +maxim,ds1050 5 Bit Programmable, Pulse-Width Modulator +maxim,max1237 Low-Power, 4-/12-Channel, 2-Wire Serial, 12-Bit ADCs +maxim,max6625 9-Bit/12-Bit Temperature Sensors with I²C-Compatible Serial Interface +mc,rv3029c2 Real Time Clock Module with I2C-Bus +national,lm75 I2C TEMP SENSOR +national,lm80 Serial Interface ACPI-Compatible Microprocessor System Hardware Monitor +national,lm92 ±0.33°C Accurate, 12-Bit + Sign Temperature Sensor and Thermal Window Comparator with Two-Wire Interface +nxp,pca9556 Octal SMBus and I2C registered interface +nxp,pca9557 8-bit I2C-bus and SMBus I/O port with reset +nxp,pcf8563 Real-time clock/calendar +ovti,ov5642 OV5642: Color CMOS QSXGA (5-megapixel) Image Sensor with OmniBSI and Embedded TrueFocus +pericom,pt7c4338 Real-time Clock Module +plx,pex8648 48-Lane, 12-Port PCI Express Gen 2 (5.0 GT/s) Switch +ramtron,24c64 i2c serial eeprom (24cxx) +ricoh,rs5c372a I2C bus SERIAL INTERFACE REAL-TIME CLOCK IC +samsung,24ad0xd1 S524AD0XF1 (128K/256K-bit Serial EEPROM for Low Power) +st-micro,24c256 i2c serial eeprom (24cxx) +stm,m41t00 Serial Access TIMEKEEPER +stm,m41t62 Serial real-time clock (RTC) with alarm +stm,m41t80 M41T80 - SERIAL ACCESS RTC WITH ALARMS +ti,tsc2003 I2C Touch-Screen Controller diff --git a/Documentation/devicetree/bindings/net/calxeda-xgmac.txt b/Documentation/devicetree/bindings/net/calxeda-xgmac.txt new file mode 100644 index 00000000000..411727a3f82 --- /dev/null +++ b/Documentation/devicetree/bindings/net/calxeda-xgmac.txt @@ -0,0 +1,15 @@ +* Calxeda Highbank 10Gb XGMAC Ethernet + +Required properties: +- compatible : Should be "calxeda,hb-xgmac" +- reg : Address and length of the register set for the device +- interrupts : Should contain 3 xgmac interrupts. The 1st is main interrupt. + The 2nd is pwr mgt interrupt. The 3rd is low power state interrupt. + +Example: + +ethernet@fff50000 { + compatible = "calxeda,hb-xgmac"; + reg = <0xfff50000 0x1000>; + interrupts = <0 77 4 0 78 4 0 79 4>; +}; diff --git a/Documentation/devicetree/bindings/net/can/cc770.txt b/Documentation/devicetree/bindings/net/can/cc770.txt new file mode 100644 index 00000000000..77027bf6460 --- /dev/null +++ b/Documentation/devicetree/bindings/net/can/cc770.txt @@ -0,0 +1,53 @@ +Memory mapped Bosch CC770 and Intel AN82527 CAN controller + +Note: The CC770 is a CAN controller from Bosch, which is 100% +compatible with the old AN82527 from Intel, but with "bugs" being fixed. + +Required properties: + +- compatible : should be "bosch,cc770" for the CC770 and "intc,82527" + for the AN82527. + +- reg : should specify the chip select, address offset and size required + to map the registers of the controller. The size is usually 0x80. + +- interrupts : property with a value describing the interrupt source + (number and sensitivity) required for the controller. + +Optional properties: + +- bosch,external-clock-frequency : frequency of the external oscillator + clock in Hz. Note that the internal clock frequency used by the + controller is half of that value. If not specified, a default + value of 16000000 (16 MHz) is used. + +- bosch,clock-out-frequency : slock frequency in Hz on the CLKOUT pin. + If not specified or if the specified value is 0, the CLKOUT pin + will be disabled. + +- bosch,slew-rate : slew rate of the CLKOUT signal. If not specified, + a resonable value will be calculated. + +- bosch,disconnect-rx0-input : see data sheet. + +- bosch,disconnect-rx1-input : see data sheet. + +- bosch,disconnect-tx1-output : see data sheet. + +- bosch,polarity-dominant : see data sheet. + +- bosch,divide-memory-clock : see data sheet. + +- bosch,iso-low-speed-mux : see data sheet. + +For further information, please have a look to the CC770 or AN82527. + +Examples: + +can@3,100 { + compatible = "bosch,cc770"; + reg = <3 0x100 0x80>; + interrupts = <2 0>; + interrupt-parent = <&mpic>; + bosch,external-clock-frequency = <16000000>; +}; diff --git a/Documentation/devicetree/bindings/powerpc/fsl/srio-rmu.txt b/Documentation/devicetree/bindings/powerpc/fsl/srio-rmu.txt new file mode 100644 index 00000000000..b9a8a2bcfae --- /dev/null +++ b/Documentation/devicetree/bindings/powerpc/fsl/srio-rmu.txt @@ -0,0 +1,163 @@ +Message unit node: + +For SRIO controllers that implement the message unit as part of the controller +this node is required. For devices with RMAN this node should NOT exist. The +node is composed of three types of sub-nodes ("fsl-srio-msg-unit", +"fsl-srio-dbell-unit" and "fsl-srio-port-write-unit"). + +See srio.txt for more details about generic SRIO controller details. + + - compatible + Usage: required + Value type: <string> + Definition: Must include "fsl,srio-rmu-vX.Y", "fsl,srio-rmu". + + The version X.Y should match the general SRIO controller's IP Block + revision register's Major(X) and Minor (Y) value. + + - reg + Usage: required + Value type: <prop-encoded-array> + Definition: A standard property. Specifies the physical address and + length of the SRIO configuration registers for message units + and doorbell units. + + - fsl,liodn + Usage: optional-but-recommended (for devices with PAMU) + Value type: <prop-encoded-array> + Definition: The logical I/O device number for the PAMU (IOMMU) to be + correctly configured for SRIO accesses. The property should + not exist on devices that do not support PAMU. + + The LIODN value is associated with all RMU transactions + (msg-unit, doorbell, port-write). + +Sub-Nodes for RMU: The RMU node is composed of multiple sub-nodes that +correspond to the actual sub-controllers in the RMU. The manual for a given +SoC will detail which and how many of these sub-controllers are implemented. + +Message Unit: + + - compatible + Usage: required + Value type: <string> + Definition: Must include "fsl,srio-msg-unit-vX.Y", "fsl,srio-msg-unit". + + The version X.Y should match the general SRIO controller's IP Block + revision register's Major(X) and Minor (Y) value. + + - reg + Usage: required + Value type: <prop-encoded-array> + Definition: A standard property. Specifies the physical address and + length of the SRIO configuration registers for message units + and doorbell units. + + - interrupts + Usage: required + Value type: <prop_encoded-array> + Definition: Specifies the interrupts generated by this device. The + value of the interrupts property consists of one interrupt + specifier. The format of the specifier is defined by the + binding document describing the node's interrupt parent. + + A pair of IRQs are specified in this property. The first + element is associated with the transmit (TX) interrupt and the + second element is associated with the receive (RX) interrupt. + +Doorbell Unit: + + - compatible + Usage: required + Value type: <string> + Definition: Must include: + "fsl,srio-dbell-unit-vX.Y", "fsl,srio-dbell-unit" + + The version X.Y should match the general SRIO controller's IP Block + revision register's Major(X) and Minor (Y) value. + + - reg + Usage: required + Value type: <prop-encoded-array> + Definition: A standard property. Specifies the physical address and + length of the SRIO configuration registers for message units + and doorbell units. + + - interrupts + Usage: required + Value type: <prop_encoded-array> + Definition: Specifies the interrupts generated by this device. The + value of the interrupts property consists of one interrupt + specifier. The format of the specifier is defined by the + binding document describing the node's interrupt parent. + + A pair of IRQs are specified in this property. The first + element is associated with the transmit (TX) interrupt and the + second element is associated with the receive (RX) interrupt. + +Port-Write Unit: + + - compatible + Usage: required + Value type: <string> + Definition: Must include: + "fsl,srio-port-write-unit-vX.Y", "fsl,srio-port-write-unit" + + The version X.Y should match the general SRIO controller's IP Block + revision register's Major(X) and Minor (Y) value. + + - reg + Usage: required + Value type: <prop-encoded-array> + Definition: A standard property. Specifies the physical address and + length of the SRIO configuration registers for message units + and doorbell units. + + - interrupts + Usage: required + Value type: <prop_encoded-array> + Definition: Specifies the interrupts generated by this device. The + value of the interrupts property consists of one interrupt + specifier. The format of the specifier is defined by the + binding document describing the node's interrupt parent. + + A single IRQ that handles port-write conditions is + specified by this property. (Typically shared with error). + + Note: All other standard properties (see the ePAPR) are allowed + but are optional. + +Example: + rmu: rmu@d3000 { + compatible = "fsl,srio-rmu"; + reg = <0xd3000 0x400>; + ranges = <0x0 0xd3000 0x400>; + fsl,liodn = <0xc8>; + + message-unit@0 { + compatible = "fsl,srio-msg-unit"; + reg = <0x0 0x100>; + interrupts = < + 60 2 0 0 /* msg1_tx_irq */ + 61 2 0 0>;/* msg1_rx_irq */ + }; + message-unit@100 { + compatible = "fsl,srio-msg-unit"; + reg = <0x100 0x100>; + interrupts = < + 62 2 0 0 /* msg2_tx_irq */ + 63 2 0 0>;/* msg2_rx_irq */ + }; + doorbell-unit@400 { + compatible = "fsl,srio-dbell-unit"; + reg = <0x400 0x80>; + interrupts = < + 56 2 0 0 /* bell_outb_irq */ + 57 2 0 0>;/* bell_inb_irq */ + }; + port-write-unit@4e0 { + compatible = "fsl,srio-port-write-unit"; + reg = <0x4e0 0x20>; + interrupts = <16 2 1 11>; + }; + }; diff --git a/Documentation/devicetree/bindings/powerpc/fsl/srio.txt b/Documentation/devicetree/bindings/powerpc/fsl/srio.txt new file mode 100644 index 00000000000..b039bcbee13 --- /dev/null +++ b/Documentation/devicetree/bindings/powerpc/fsl/srio.txt @@ -0,0 +1,103 @@ +* Freescale Serial RapidIO (SRIO) Controller + +RapidIO port node: +Properties: + - compatible + Usage: required + Value type: <string> + Definition: Must include "fsl,srio" for IP blocks with IP Block + Revision Register (SRIO IPBRR1) Major ID equal to 0x01c0. + + Optionally, a compatiable string of "fsl,srio-vX.Y" where X is Major + version in IP Block Revision Register and Y is Minor version. If this + compatiable is provided it should be ordered before "fsl,srio". + + - reg + Usage: required + Value type: <prop-encoded-array> + Definition: A standard property. Specifies the physical address and + length of the SRIO configuration registers. The size should + be set to 0x11000. + + - interrupts + Usage: required + Value type: <prop_encoded-array> + Definition: Specifies the interrupts generated by this device. The + value of the interrupts property consists of one interrupt + specifier. The format of the specifier is defined by the + binding document describing the node's interrupt parent. + + A single IRQ that handles error conditions is specified by this + property. (Typically shared with port-write). + + - fsl,srio-rmu-handle: + Usage: required if rmu node is defined + Value type: <phandle> + Definition: A single <phandle> value that points to the RMU. + (See srio-rmu.txt for more details on RMU node binding) + +Port Child Nodes: There should a port child node for each port that exists in +the controller. The ports are numbered starting at one (1) and should have +the following properties: + + - cell-index + Usage: required + Value type: <u32> + Definition: A standard property. Matches the port id. + + - ranges + Usage: required if local access windows preset + Value type: <prop-encoded-array> + Definition: A standard property. Utilized to describe the memory mapped + IO space utilized by the controller. This corresponds to the + setting of the local access windows that are targeted to this + SRIO port. + + - fsl,liodn + Usage: optional-but-recommended (for devices with PAMU) + Value type: <prop-encoded-array> + Definition: The logical I/O device number for the PAMU (IOMMU) to be + correctly configured for SRIO accesses. The property should + not exist on devices that do not support PAMU. + + For HW (ie, the P4080) that only supports a LIODN for both + memory and maintenance transactions then a single LIODN is + represented in the property for both transactions. + + For HW (ie, the P304x/P5020, etc) that supports an LIODN for + memory transactions and a unique LIODN for maintenance + transactions then a pair of LIODNs are represented in the + property. Within the pair, the first element represents the + LIODN associated with memory transactions and the second element + represents the LIODN associated with maintenance transactions + for the port. + +Note: All other standard properties (see ePAPR) are allowed but are optional. + +Example: + + rapidio: rapidio@ffe0c0000 { + #address-cells = <2>; + #size-cells = <2>; + reg = <0xf 0xfe0c0000 0 0x11000>; + compatible = "fsl,srio"; + interrupts = <16 2 1 11>; /* err_irq */ + fsl,srio-rmu-handle = <&rmu>; + ranges; + + port1 { + cell-index = <1>; + #address-cells = <2>; + #size-cells = <2>; + fsl,liodn = <34>; + ranges = <0 0 0xc 0x20000000 0 0x10000000>; + }; + + port2 { + cell-index = <2>; + #address-cells = <2>; + #size-cells = <2>; + fsl,liodn = <48>; + ranges = <0 0 0xc 0x30000000 0 0x10000000>; + }; + }; diff --git a/Documentation/devicetree/bindings/vendor-prefixes.txt b/Documentation/devicetree/bindings/vendor-prefixes.txt index e8552782b44..18626965159 100644 --- a/Documentation/devicetree/bindings/vendor-prefixes.txt +++ b/Documentation/devicetree/bindings/vendor-prefixes.txt @@ -8,7 +8,9 @@ amcc Applied Micro Circuits Corporation (APM, formally AMCC) apm Applied Micro Circuits Corporation (APM) arm ARM Ltd. atmel Atmel Corporation +cavium Cavium, Inc. chrp Common Hardware Reference Platform +cortina Cortina Systems, Inc. dallas Maxim Integrated Products (formerly Dallas Semiconductor) denx Denx Software Engineering epson Seiko Epson Corp. @@ -33,8 +35,10 @@ qcom Qualcomm, Inc. ramtron Ramtron International samsung Samsung Semiconductor schindler Schindler +sil Silicon Image simtek sirf SiRF Technology, Inc. +st STMicroelectronics stericsson ST-Ericsson ti Texas Instruments xlnx Xilinx diff --git a/Documentation/driver-model/devres.txt b/Documentation/driver-model/devres.txt index d79aead9418..10c64c8a13d 100644 --- a/Documentation/driver-model/devres.txt +++ b/Documentation/driver-model/devres.txt @@ -262,6 +262,7 @@ IOMAP devm_ioremap() devm_ioremap_nocache() devm_iounmap() + devm_request_and_ioremap() : checks resource, requests region, ioremaps pcim_iomap() pcim_iounmap() pcim_iomap_table() : array of mapped addresses indexed by BAR diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt index 3d849122b5b..33f7327d045 100644 --- a/Documentation/feature-removal-schedule.txt +++ b/Documentation/feature-removal-schedule.txt @@ -263,8 +263,7 @@ Who: Ravikiran Thirumalai <kiran@scalex86.org> What: Code that is now under CONFIG_WIRELESS_EXT_SYSFS (in net/core/net-sysfs.c) -When: After the only user (hal) has seen a release with the patches - for enough time, probably some time in 2010. +When: 3.5 Why: Over 1K .text/.data size reduction, data is available in other ways (ioctls) Who: Johannes Berg <johannes@sipsolutions.net> diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking index d819ba16a0c..4fca82e5276 100644 --- a/Documentation/filesystems/Locking +++ b/Documentation/filesystems/Locking @@ -37,15 +37,15 @@ d_manage: no no yes (ref-walk) maybe --------------------------- inode_operations --------------------------- prototypes: - int (*create) (struct inode *,struct dentry *,int, struct nameidata *); + int (*create) (struct inode *,struct dentry *,umode_t, struct nameidata *); struct dentry * (*lookup) (struct inode *,struct dentry *, struct nameid ata *); int (*link) (struct dentry *,struct inode *,struct dentry *); int (*unlink) (struct inode *,struct dentry *); int (*symlink) (struct inode *,struct dentry *,const char *); - int (*mkdir) (struct inode *,struct dentry *,int); + int (*mkdir) (struct inode *,struct dentry *,umode_t); int (*rmdir) (struct inode *,struct dentry *); - int (*mknod) (struct inode *,struct dentry *,int,dev_t); + int (*mknod) (struct inode *,struct dentry *,umode_t,dev_t); int (*rename) (struct inode *, struct dentry *, struct inode *, struct dentry *); int (*readlink) (struct dentry *, char __user *,int); @@ -117,7 +117,7 @@ prototypes: int (*statfs) (struct dentry *, struct kstatfs *); int (*remount_fs) (struct super_block *, int *, char *); void (*umount_begin) (struct super_block *); - int (*show_options)(struct seq_file *, struct vfsmount *); + int (*show_options)(struct seq_file *, struct dentry *); ssize_t (*quota_read)(struct super_block *, int, char *, size_t, loff_t); ssize_t (*quota_write)(struct super_block *, int, const char *, size_t, loff_t); int (*bdev_try_to_free_page)(struct super_block*, struct page*, gfp_t); diff --git a/Documentation/filesystems/btrfs.txt b/Documentation/filesystems/btrfs.txt index 64087c34327..7671352216f 100644 --- a/Documentation/filesystems/btrfs.txt +++ b/Documentation/filesystems/btrfs.txt @@ -63,8 +63,8 @@ IRC network. Userspace tools for creating and manipulating Btrfs file systems are available from the git repository at the following location: - http://git.kernel.org/?p=linux/kernel/git/mason/btrfs-progs-unstable.git - git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs-unstable.git + http://git.kernel.org/?p=linux/kernel/git/mason/btrfs-progs.git + git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs.git These include the following tools: diff --git a/Documentation/filesystems/configfs/configfs.txt b/Documentation/filesystems/configfs/configfs.txt index dd57bb6bb39..b40fec9d3f5 100644 --- a/Documentation/filesystems/configfs/configfs.txt +++ b/Documentation/filesystems/configfs/configfs.txt @@ -192,7 +192,7 @@ attribute value uses the store_attribute() method. struct configfs_attribute { char *ca_name; struct module *ca_owner; - mode_t ca_mode; + umode_t ca_mode; }; When a config_item wants an attribute to appear as a file in the item's diff --git a/Documentation/filesystems/debugfs.txt b/Documentation/filesystems/debugfs.txt index 742cc06e138..6872c91bce3 100644 --- a/Documentation/filesystems/debugfs.txt +++ b/Documentation/filesystems/debugfs.txt @@ -35,7 +35,7 @@ described below will work. The most general way to create a file within a debugfs directory is with: - struct dentry *debugfs_create_file(const char *name, mode_t mode, + struct dentry *debugfs_create_file(const char *name, umode_t mode, struct dentry *parent, void *data, const struct file_operations *fops); @@ -53,13 +53,13 @@ actually necessary; the debugfs code provides a number of helper functions for simple situations. Files containing a single integer value can be created with any of: - struct dentry *debugfs_create_u8(const char *name, mode_t mode, + struct dentry *debugfs_create_u8(const char *name, umode_t mode, struct dentry *parent, u8 *value); - struct dentry *debugfs_create_u16(const char *name, mode_t mode, + struct dentry *debugfs_create_u16(const char *name, umode_t mode, struct dentry *parent, u16 *value); - struct dentry *debugfs_create_u32(const char *name, mode_t mode, + struct dentry *debugfs_create_u32(const char *name, umode_t mode, struct dentry *parent, u32 *value); - struct dentry *debugfs_create_u64(const char *name, mode_t mode, + struct dentry *debugfs_create_u64(const char *name, umode_t mode, struct dentry *parent, u64 *value); These files support both reading and writing the given value; if a specific @@ -67,13 +67,13 @@ file should not be written to, simply set the mode bits accordingly. The values in these files are in decimal; if hexadecimal is more appropriate, the following functions can be used instead: - struct dentry *debugfs_create_x8(const char *name, mode_t mode, + struct dentry *debugfs_create_x8(const char *name, umode_t mode, struct dentry *parent, u8 *value); - struct dentry *debugfs_create_x16(const char *name, mode_t mode, + struct dentry *debugfs_create_x16(const char *name, umode_t mode, struct dentry *parent, u16 *value); - struct dentry *debugfs_create_x32(const char *name, mode_t mode, + struct dentry *debugfs_create_x32(const char *name, umode_t mode, struct dentry *parent, u32 *value); - struct dentry *debugfs_create_x64(const char *name, mode_t mode, + struct dentry *debugfs_create_x64(const char *name, umode_t mode, struct dentry *parent, u64 *value); These functions are useful as long as the developer knows the size of the @@ -81,7 +81,7 @@ value to be exported. Some types can have different widths on different architectures, though, complicating the situation somewhat. There is a function meant to help out in one special case: - struct dentry *debugfs_create_size_t(const char *name, mode_t mode, + struct dentry *debugfs_create_size_t(const char *name, umode_t mode, struct dentry *parent, size_t *value); @@ -90,21 +90,22 @@ a variable of type size_t. Boolean values can be placed in debugfs with: - struct dentry *debugfs_create_bool(const char *name, mode_t mode, + struct dentry *debugfs_create_bool(const char *name, umode_t mode, struct dentry *parent, u32 *value); A read on the resulting file will yield either Y (for non-zero values) or N, followed by a newline. If written to, it will accept either upper- or lower-case values, or 1 or 0. Any other input will be silently ignored. -Finally, a block of arbitrary binary data can be exported with: +Another option is exporting a block of arbitrary binary data, with +this structure and function: struct debugfs_blob_wrapper { void *data; unsigned long size; }; - struct dentry *debugfs_create_blob(const char *name, mode_t mode, + struct dentry *debugfs_create_blob(const char *name, umode_t mode, struct dentry *parent, struct debugfs_blob_wrapper *blob); @@ -115,6 +116,35 @@ can be used to export binary information, but there does not appear to be any code which does so in the mainline. Note that all files created with debugfs_create_blob() are read-only. +If you want to dump a block of registers (something that happens quite +often during development, even if little such code reaches mainline. +Debugfs offers two functions: one to make a registers-only file, and +another to insert a register block in the middle of another sequential +file. + + struct debugfs_reg32 { + char *name; + unsigned long offset; + }; + + struct debugfs_regset32 { + struct debugfs_reg32 *regs; + int nregs; + void __iomem *base; + }; + + struct dentry *debugfs_create_regset32(const char *name, mode_t mode, + struct dentry *parent, + struct debugfs_regset32 *regset); + + int debugfs_print_regs32(struct seq_file *s, struct debugfs_reg32 *regs, + int nregs, void __iomem *base, char *prefix); + +The "base" argument may be 0, but you may want to build the reg32 array +using __stringify, and a number of register names (macros) are actually +byte offsets over a base for the register block. + + There are a couple of other directory-oriented helper functions: struct dentry *debugfs_rename(struct dentry *old_dir, diff --git a/Documentation/filesystems/sysfs.txt b/Documentation/filesystems/sysfs.txt index 07235caec22..a6619b7064b 100644 --- a/Documentation/filesystems/sysfs.txt +++ b/Documentation/filesystems/sysfs.txt @@ -70,7 +70,7 @@ An attribute definition is simply: struct attribute { char * name; struct module *owner; - mode_t mode; + umode_t mode; }; diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt index 43cbd082172..3d9393b845b 100644 --- a/Documentation/filesystems/vfs.txt +++ b/Documentation/filesystems/vfs.txt @@ -225,7 +225,7 @@ struct super_operations { void (*clear_inode) (struct inode *); void (*umount_begin) (struct super_block *); - int (*show_options)(struct seq_file *, struct vfsmount *); + int (*show_options)(struct seq_file *, struct dentry *); ssize_t (*quota_read)(struct super_block *, int, char *, size_t, loff_t); ssize_t (*quota_write)(struct super_block *, int, const char *, size_t, loff_t); @@ -341,14 +341,14 @@ This describes how the VFS can manipulate an inode in your filesystem. As of kernel 2.6.22, the following members are defined: struct inode_operations { - int (*create) (struct inode *,struct dentry *,int, struct nameidata *); + int (*create) (struct inode *,struct dentry *, umode_t, struct nameidata *); struct dentry * (*lookup) (struct inode *,struct dentry *, struct nameidata *); int (*link) (struct dentry *,struct inode *,struct dentry *); int (*unlink) (struct inode *,struct dentry *); int (*symlink) (struct inode *,struct dentry *,const char *); - int (*mkdir) (struct inode *,struct dentry *,int); + int (*mkdir) (struct inode *,struct dentry *,umode_t); int (*rmdir) (struct inode *,struct dentry *); - int (*mknod) (struct inode *,struct dentry *,int,dev_t); + int (*mknod) (struct inode *,struct dentry *,umode_t,dev_t); int (*rename) (struct inode *, struct dentry *, struct inode *, struct dentry *); int (*readlink) (struct dentry *, char __user *,int); diff --git a/Documentation/i2c/ten-bit-addresses b/Documentation/i2c/ten-bit-addresses index e9890709c50..cdfe13901b9 100644 --- a/Documentation/i2c/ten-bit-addresses +++ b/Documentation/i2c/ten-bit-addresses @@ -1,22 +1,24 @@ The I2C protocol knows about two kinds of device addresses: normal 7 bit addresses, and an extended set of 10 bit addresses. The sets of addresses do not intersect: the 7 bit address 0x10 is not the same as the 10 bit -address 0x10 (though a single device could respond to both of them). You -select a 10 bit address by adding an extra byte after the address -byte: - S Addr7 Rd/Wr .... -becomes - S 11110 Addr10 Rd/Wr -S is the start bit, Rd/Wr the read/write bit, and if you count the number -of bits, you will see the there are 8 after the S bit for 7 bit addresses, -and 16 after the S bit for 10 bit addresses. +address 0x10 (though a single device could respond to both of them). -WARNING! The current 10 bit address support is EXPERIMENTAL. There are -several places in the code that will cause SEVERE PROBLEMS with 10 bit -addresses, even though there is some basic handling and hooks. Also, -almost no supported adapter handles the 10 bit addresses correctly. +I2C messages to and from 10-bit address devices have a different format. +See the I2C specification for the details. -As soon as a real 10 bit address device is spotted 'in the wild', we -can and will add proper support. Right now, 10 bit address devices -are defined by the I2C protocol, but we have never seen a single device -which supports them. +The current 10 bit address support is minimal. It should work, however +you can expect some problems along the way: +* Not all bus drivers support 10-bit addresses. Some don't because the + hardware doesn't support them (SMBus doesn't require 10-bit address + support for example), some don't because nobody bothered adding the + code (or it's there but not working properly.) Software implementation + (i2c-algo-bit) is known to work. +* Some optional features do not support 10-bit addresses. This is the + case of automatic detection and instantiation of devices by their, + drivers, for example. +* Many user-space packages (for example i2c-tools) lack support for + 10-bit addresses. + +Note that 10-bit address devices are still pretty rare, so the limitations +listed above could stay for a long time, maybe even forever if nobody +needs them to be fixed. diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index a0c5c5f4fce..e229769606f 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -315,12 +315,12 @@ bytes respectively. Such letter suffixes can also be entirely omitted. CPU-intensive style benchmark, and it can vary highly in a microbenchmark depending on workload and compiler. - 1: only for 32-bit processes - 2: only for 64-bit processes + 32: only for 32-bit processes + 64: only for 64-bit processes on: enable for both 32- and 64-bit processes off: disable for both 32- and 64-bit processes - amd_iommu= [HW,X86-84] + amd_iommu= [HW,X86-64] Pass parameters to the AMD IOMMU driver in the system. Possible values are: fullflush - enable flushing of IO/TLB entries when @@ -1885,6 +1885,11 @@ bytes respectively. Such letter suffixes can also be entirely omitted. arch_perfmon: [X86] Force use of architectural perfmon on Intel CPUs instead of the CPU specific event set. + timer: [X86] Force use of architectural NMI + timer mode (see also oprofile.timer + for generic hr timer mode) + [s390] Force legacy basic mode sampling + (report cpu_type "timer") oops=panic Always panic on oopses. Default is to just kill the process, but there is a small probability of @@ -2750,11 +2755,10 @@ bytes respectively. Such letter suffixes can also be entirely omitted. functions are at fixed addresses, they make nice targets for exploits that can control RIP. - emulate Vsyscalls turn into traps and are emulated - reasonably safely. + emulate [default] Vsyscalls turn into traps and are + emulated reasonably safely. - native [default] Vsyscalls are native syscall - instructions. + native Vsyscalls are native syscall instructions. This is a little bit faster than trapping and makes a few dynamic recompilers work better than they would in emulation mode. diff --git a/Documentation/lockdep-design.txt b/Documentation/lockdep-design.txt index abf768c681e..5dbc99c04f6 100644 --- a/Documentation/lockdep-design.txt +++ b/Documentation/lockdep-design.txt @@ -221,3 +221,66 @@ when the chain is validated for the first time, is then put into a hash table, which hash-table can be checked in a lockfree manner. If the locking chain occurs again later on, the hash table tells us that we dont have to validate the chain again. + +Troubleshooting: +---------------- + +The validator tracks a maximum of MAX_LOCKDEP_KEYS number of lock classes. +Exceeding this number will trigger the following lockdep warning: + + (DEBUG_LOCKS_WARN_ON(id >= MAX_LOCKDEP_KEYS)) + +By default, MAX_LOCKDEP_KEYS is currently set to 8191, and typical +desktop systems have less than 1,000 lock classes, so this warning +normally results from lock-class leakage or failure to properly +initialize locks. These two problems are illustrated below: + +1. Repeated module loading and unloading while running the validator + will result in lock-class leakage. The issue here is that each + load of the module will create a new set of lock classes for + that module's locks, but module unloading does not remove old + classes (see below discussion of reuse of lock classes for why). + Therefore, if that module is loaded and unloaded repeatedly, + the number of lock classes will eventually reach the maximum. + +2. Using structures such as arrays that have large numbers of + locks that are not explicitly initialized. For example, + a hash table with 8192 buckets where each bucket has its own + spinlock_t will consume 8192 lock classes -unless- each spinlock + is explicitly initialized at runtime, for example, using the + run-time spin_lock_init() as opposed to compile-time initializers + such as __SPIN_LOCK_UNLOCKED(). Failure to properly initialize + the per-bucket spinlocks would guarantee lock-class overflow. + In contrast, a loop that called spin_lock_init() on each lock + would place all 8192 locks into a single lock class. + + The moral of this story is that you should always explicitly + initialize your locks. + +One might argue that the validator should be modified to allow +lock classes to be reused. However, if you are tempted to make this +argument, first review the code and think through the changes that would +be required, keeping in mind that the lock classes to be removed are +likely to be linked into the lock-dependency graph. This turns out to +be harder to do than to say. + +Of course, if you do run out of lock classes, the next thing to do is +to find the offending lock classes. First, the following command gives +you the number of lock classes currently in use along with the maximum: + + grep "lock-classes" /proc/lockdep_stats + +This command produces the following output on a modest system: + + lock-classes: 748 [max: 8191] + +If the number allocated (748 above) increases continually over time, +then there is likely a leak. The following command can be used to +identify the leaking lock classes: + + grep "BD" /proc/lockdep + +Run the command and save the output, then compare against the output from +a later run of this command to identify the leakers. This same output +can also help you find situations where runtime lock initialization has +been omitted. diff --git a/Documentation/networking/00-INDEX b/Documentation/networking/00-INDEX index bbce1215434..9ad9ddeb384 100644 --- a/Documentation/networking/00-INDEX +++ b/Documentation/networking/00-INDEX @@ -144,6 +144,8 @@ nfc.txt - The Linux Near Field Communication (NFS) subsystem. olympic.txt - IBM PCI Pit/Pit-Phy/Olympic Token Ring driver info. +openvswitch.txt + - Open vSwitch developer documentation. operstates.txt - Overview of network interface operational states. packet_mmap.txt diff --git a/Documentation/networking/batman-adv.txt b/Documentation/networking/batman-adv.txt index c86d03f18a5..221ad0cdf11 100644 --- a/Documentation/networking/batman-adv.txt +++ b/Documentation/networking/batman-adv.txt @@ -200,15 +200,16 @@ abled during run time. Following log_levels are defined: 0 - All debug output disabled 1 - Enable messages related to routing / flooding / broadcasting -2 - Enable route or tt entry added / changed / deleted -3 - Enable all messages +2 - Enable messages related to route added / changed / deleted +4 - Enable messages related to translation table operations +7 - Enable all messages The debug output can be changed at runtime using the file /sys/class/net/bat0/mesh/log_level. e.g. # echo 2 > /sys/class/net/bat0/mesh/log_level -will enable debug messages for when routes or TTs change. +will enable debug messages for when routes change. BATCTL diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt index 91df678fb7f..080ad26690a 100644 --- a/Documentation/networking/bonding.txt +++ b/Documentation/networking/bonding.txt @@ -196,6 +196,23 @@ or, for backwards compatibility, the option value. E.g., The parameters are as follows: +active_slave + + Specifies the new active slave for modes that support it + (active-backup, balance-alb and balance-tlb). Possible values + are the name of any currently enslaved interface, or an empty + string. If a name is given, the slave and its link must be up in order + to be selected as the new active slave. If an empty string is + specified, the current active slave is cleared, and a new active + slave is selected automatically. + + Note that this is only available through the sysfs interface. No module + parameter by this name exists. + + The normal value of this option is the name of the currently + active slave, or the empty string if there is no active slave or + the current mode does not use an active slave. + ad_select Specifies the 802.3ad aggregation selection logic to use. The diff --git a/Documentation/networking/ieee802154.txt b/Documentation/networking/ieee802154.txt index f41ea240522..1dc1c24a754 100644 --- a/Documentation/networking/ieee802154.txt +++ b/Documentation/networking/ieee802154.txt @@ -78,3 +78,30 @@ in software. This is currently WIP. See header include/net/mac802154.h and several drivers in drivers/ieee802154/. +6LoWPAN Linux implementation +============================ + +The IEEE 802.15.4 standard specifies an MTU of 128 bytes, yielding about 80 +octets of actual MAC payload once security is turned on, on a wireless link +with a link throughput of 250 kbps or less. The 6LoWPAN adaptation format +[RFC4944] was specified to carry IPv6 datagrams over such constrained links, +taking into account limited bandwidth, memory, or energy resources that are +expected in applications such as wireless Sensor Networks. [RFC4944] defines +a Mesh Addressing header to support sub-IP forwarding, a Fragmentation header +to support the IPv6 minimum MTU requirement [RFC2460], and stateless header +compression for IPv6 datagrams (LOWPAN_HC1 and LOWPAN_HC2) to reduce the +relatively large IPv6 and UDP headers down to (in the best case) several bytes. + +In Semptember 2011 the standard update was published - [RFC6282]. +It deprecates HC1 and HC2 compression and defines IPHC encoding format which is +used in this Linux implementation. + +All the code related to 6lowpan you may find in files: net/ieee802154/6lowpan.* + +To setup 6lowpan interface you need (busybox release > 1.17.0): +1. Add IEEE802.15.4 interface and initialize PANid; +2. Add 6lowpan interface by command like: + # ip link add link wpan0 name lowpan0 type lowpan +3. Set MAC (if needs): + # ip link set lowpan0 address de:ad:be:ef:ca:fe:ba:be +4. Bring up 'lowpan0' interface diff --git a/Documentation/networking/ifenslave.c b/Documentation/networking/ifenslave.c index 65968fbf1e4..ac5debb2f16 100644 --- a/Documentation/networking/ifenslave.c +++ b/Documentation/networking/ifenslave.c @@ -539,12 +539,14 @@ static int if_getconfig(char *ifname) metric = 0; } else metric = ifr.ifr_metric; + printf("The result of SIOCGIFMETRIC is %d\n", metric); strcpy(ifr.ifr_name, ifname); if (ioctl(skfd, SIOCGIFMTU, &ifr) < 0) mtu = 0; else mtu = ifr.ifr_mtu; + printf("The result of SIOCGIFMTU is %d\n", mtu); strcpy(ifr.ifr_name, ifname); if (ioctl(skfd, SIOCGIFDSTADDR, &ifr) < 0) { diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index cb7f3148035..ad3e80e17b4 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt @@ -20,7 +20,7 @@ ip_no_pmtu_disc - BOOLEAN default FALSE min_pmtu - INTEGER - default 562 - minimum discovered Path MTU + default 552 - minimum discovered Path MTU route/max_size - INTEGER Maximum number of routes allowed in the kernel. Increase @@ -31,6 +31,16 @@ neigh/default/gc_thresh3 - INTEGER when using large numbers of interfaces and when communicating with large numbers of directly-connected peers. +neigh/default/unres_qlen_bytes - INTEGER + The maximum number of bytes which may be used by packets + queued for each unresolved address by other network layers. + (added in linux 3.3) + +neigh/default/unres_qlen - INTEGER + The maximum number of packets which may be queued for each + unresolved address by other network layers. + (deprecated in linux 3.3) : use unres_qlen_bytes instead. + mtu_expires - INTEGER Time, in seconds, that cached PMTU information is kept. @@ -165,6 +175,9 @@ tcp_congestion_control - STRING connections. The algorithm "reno" is always available, but additional choices may be available based on kernel configuration. Default is set as part of kernel configuration. + For passive connections, the listener congestion control choice + is inherited. + [see setsockopt(listenfd, SOL_TCP, TCP_CONGESTION, "name" ...) ] tcp_cookie_size - INTEGER Default size of TCP Cookie Transactions (TCPCT) option, that may be @@ -282,11 +295,11 @@ tcp_max_ssthresh - INTEGER Default: 0 (off) tcp_max_syn_backlog - INTEGER - Maximal number of remembered connection requests, which are - still did not receive an acknowledgment from connecting client. - Default value is 1024 for systems with more than 128Mb of memory, - and 128 for low memory machines. If server suffers of overload, - try to increase this number. + Maximal number of remembered connection requests, which have not + received an acknowledgment from connecting client. + The minimal value is 128 for low memory machines, and it will + increase in proportion to the memory of machine. + If server suffers from overload, try increasing this number. tcp_max_tw_buckets - INTEGER Maximal number of timewait sockets held by system simultaneously. diff --git a/Documentation/networking/openvswitch.txt b/Documentation/networking/openvswitch.txt new file mode 100644 index 00000000000..b8a048b8df3 --- /dev/null +++ b/Documentation/networking/openvswitch.txt @@ -0,0 +1,195 @@ +Open vSwitch datapath developer documentation +============================================= + +The Open vSwitch kernel module allows flexible userspace control over +flow-level packet processing on selected network devices. It can be +used to implement a plain Ethernet switch, network device bonding, +VLAN processing, network access control, flow-based network control, +and so on. + +The kernel module implements multiple "datapaths" (analogous to +bridges), each of which can have multiple "vports" (analogous to ports +within a bridge). Each datapath also has associated with it a "flow +table" that userspace populates with "flows" that map from keys based +on packet headers and metadata to sets of actions. The most common +action forwards the packet to another vport; other actions are also +implemented. + +When a packet arrives on a vport, the kernel module processes it by +extracting its flow key and looking it up in the flow table. If there +is a matching flow, it executes the associated actions. If there is +no match, it queues the packet to userspace for processing (as part of +its processing, userspace will likely set up a flow to handle further +packets of the same type entirely in-kernel). + + +Flow key compatibility +---------------------- + +Network protocols evolve over time. New protocols become important +and existing protocols lose their prominence. For the Open vSwitch +kernel module to remain relevant, it must be possible for newer +versions to parse additional protocols as part of the flow key. It +might even be desirable, someday, to drop support for parsing +protocols that have become obsolete. Therefore, the Netlink interface +to Open vSwitch is designed to allow carefully written userspace +applications to work with any version of the flow key, past or future. + +To support this forward and backward compatibility, whenever the +kernel module passes a packet to userspace, it also passes along the +flow key that it parsed from the packet. Userspace then extracts its +own notion of a flow key from the packet and compares it against the +kernel-provided version: + + - If userspace's notion of the flow key for the packet matches the + kernel's, then nothing special is necessary. + + - If the kernel's flow key includes more fields than the userspace + version of the flow key, for example if the kernel decoded IPv6 + headers but userspace stopped at the Ethernet type (because it + does not understand IPv6), then again nothing special is + necessary. Userspace can still set up a flow in the usual way, + as long as it uses the kernel-provided flow key to do it. + + - If the userspace flow key includes more fields than the + kernel's, for example if userspace decoded an IPv6 header but + the kernel stopped at the Ethernet type, then userspace can + forward the packet manually, without setting up a flow in the + kernel. This case is bad for performance because every packet + that the kernel considers part of the flow must go to userspace, + but the forwarding behavior is correct. (If userspace can + determine that the values of the extra fields would not affect + forwarding behavior, then it could set up a flow anyway.) + +How flow keys evolve over time is important to making this work, so +the following sections go into detail. + + +Flow key format +--------------- + +A flow key is passed over a Netlink socket as a sequence of Netlink +attributes. Some attributes represent packet metadata, defined as any +information about a packet that cannot be extracted from the packet +itself, e.g. the vport on which the packet was received. Most +attributes, however, are extracted from headers within the packet, +e.g. source and destination addresses from Ethernet, IP, or TCP +headers. + +The <linux/openvswitch.h> header file defines the exact format of the +flow key attributes. For informal explanatory purposes here, we write +them as comma-separated strings, with parentheses indicating arguments +and nesting. For example, the following could represent a flow key +corresponding to a TCP packet that arrived on vport 1: + + in_port(1), eth(src=e0:91:f5:21:d0:b2, dst=00:02:e3:0f:80:a4), + eth_type(0x0800), ipv4(src=172.16.0.20, dst=172.18.0.52, proto=17, tos=0, + frag=no), tcp(src=49163, dst=80) + +Often we ellipsize arguments not important to the discussion, e.g.: + + in_port(1), eth(...), eth_type(0x0800), ipv4(...), tcp(...) + + +Basic rule for evolving flow keys +--------------------------------- + +Some care is needed to really maintain forward and backward +compatibility for applications that follow the rules listed under +"Flow key compatibility" above. + +The basic rule is obvious: + + ------------------------------------------------------------------ + New network protocol support must only supplement existing flow + key attributes. It must not change the meaning of already defined + flow key attributes. + ------------------------------------------------------------------ + +This rule does have less-obvious consequences so it is worth working +through a few examples. Suppose, for example, that the kernel module +did not already implement VLAN parsing. Instead, it just interpreted +the 802.1Q TPID (0x8100) as the Ethertype then stopped parsing the +packet. The flow key for any packet with an 802.1Q header would look +essentially like this, ignoring metadata: + + eth(...), eth_type(0x8100) + +Naively, to add VLAN support, it makes sense to add a new "vlan" flow +key attribute to contain the VLAN tag, then continue to decode the +encapsulated headers beyond the VLAN tag using the existing field +definitions. With this change, an TCP packet in VLAN 10 would have a +flow key much like this: + + eth(...), vlan(vid=10, pcp=0), eth_type(0x0800), ip(proto=6, ...), tcp(...) + +But this change would negatively affect a userspace application that +has not been updated to understand the new "vlan" flow key attribute. +The application could, following the flow compatibility rules above, +ignore the "vlan" attribute that it does not understand and therefore +assume that the flow contained IP packets. This is a bad assumption +(the flow only contains IP packets if one parses and skips over the +802.1Q header) and it could cause the application's behavior to change +across kernel versions even though it follows the compatibility rules. + +The solution is to use a set of nested attributes. This is, for +example, why 802.1Q support uses nested attributes. A TCP packet in +VLAN 10 is actually expressed as: + + eth(...), eth_type(0x8100), vlan(vid=10, pcp=0), encap(eth_type(0x0800), + ip(proto=6, ...), tcp(...))) + +Notice how the "eth_type", "ip", and "tcp" flow key attributes are +nested inside the "encap" attribute. Thus, an application that does +not understand the "vlan" key will not see either of those attributes +and therefore will not misinterpret them. (Also, the outer eth_type +is still 0x8100, not changed to 0x0800.) + +Handling malformed packets +-------------------------- + +Don't drop packets in the kernel for malformed protocol headers, bad +checksums, etc. This would prevent userspace from implementing a +simple Ethernet switch that forwards every packet. + +Instead, in such a case, include an attribute with "empty" content. +It doesn't matter if the empty content could be valid protocol values, +as long as those values are rarely seen in practice, because userspace +can always forward all packets with those values to userspace and +handle them individually. + +For example, consider a packet that contains an IP header that +indicates protocol 6 for TCP, but which is truncated just after the IP +header, so that the TCP header is missing. The flow key for this +packet would include a tcp attribute with all-zero src and dst, like +this: + + eth(...), eth_type(0x0800), ip(proto=6, ...), tcp(src=0, dst=0) + +As another example, consider a packet with an Ethernet type of 0x8100, +indicating that a VLAN TCI should follow, but which is truncated just +after the Ethernet type. The flow key for this packet would include +an all-zero-bits vlan and an empty encap attribute, like this: + + eth(...), eth_type(0x8100), vlan(0), encap() + +Unlike a TCP packet with source and destination ports 0, an +all-zero-bits VLAN TCI is not that rare, so the CFI bit (aka +VLAN_TAG_PRESENT inside the kernel) is ordinarily set in a vlan +attribute expressly to allow this situation to be distinguished. +Thus, the flow key in this second example unambiguously indicates a +missing or malformed VLAN TCI. + +Other rules +----------- + +The other rules for flow keys are much less subtle: + + - Duplicate attributes are not allowed at a given nesting level. + + - Ordering of attributes is not significant. + + - When the kernel sends a given flow key to userspace, it always + composes it the same way. This allows userspace to hash and + compare entire flow keys that it may not be able to fully + interpret. diff --git a/Documentation/networking/packet_mmap.txt b/Documentation/networking/packet_mmap.txt index 4acea660372..1c08a4b0981 100644 --- a/Documentation/networking/packet_mmap.txt +++ b/Documentation/networking/packet_mmap.txt @@ -155,7 +155,7 @@ As capture, each frame contains two parts: /* fill sockaddr_ll struct to prepare binding */ my_addr.sll_family = AF_PACKET; - my_addr.sll_protocol = ETH_P_ALL; + my_addr.sll_protocol = htons(ETH_P_ALL); my_addr.sll_ifindex = s_ifr.ifr_ifindex; /* bind socket to eth0 */ diff --git a/Documentation/networking/scaling.txt b/Documentation/networking/scaling.txt index a177de21d28..579994afbe0 100644 --- a/Documentation/networking/scaling.txt +++ b/Documentation/networking/scaling.txt @@ -208,7 +208,7 @@ The counter in rps_dev_flow_table values records the length of the current CPU's backlog when a packet in this flow was last enqueued. Each backlog queue has a head counter that is incremented on dequeue. A tail counter is computed as head counter + queue length. In other words, the counter -in rps_dev_flow_table[i] records the last element in flow i that has +in rps_dev_flow[i] records the last element in flow i that has been enqueued onto the currently designated CPU for flow i (of course, entry i is actually selected by hash and multiple flows may hash to the same entry i). @@ -224,7 +224,7 @@ following is true: - The current CPU's queue head counter >= the recorded tail counter value in rps_dev_flow[i] -- The current CPU is unset (equal to NR_CPUS) +- The current CPU is unset (equal to RPS_NO_CPU) - The current CPU is offline After this check, the packet is sent to the (possibly updated) current @@ -235,7 +235,7 @@ CPU. ==== RFS Configuration -RFS is only available if the kconfig symbol CONFIG_RFS is enabled (on +RFS is only available if the kconfig symbol CONFIG_RPS is enabled (on by default for SMP). The functionality remains disabled until explicitly configured. The number of entries in the global flow table is set through: @@ -258,7 +258,7 @@ For a single queue device, the rps_flow_cnt value for the single queue would normally be configured to the same value as rps_sock_flow_entries. For a multi-queue device, the rps_flow_cnt for each queue might be configured as rps_sock_flow_entries / N, where N is the number of -queues. So for instance, if rps_flow_entries is set to 32768 and there +queues. So for instance, if rps_sock_flow_entries is set to 32768 and there are 16 configured receive queues, rps_flow_cnt for each queue might be configured as 2048. diff --git a/Documentation/networking/stmmac.txt b/Documentation/networking/stmmac.txt index 8d67980fabe..d0aeeadd264 100644 --- a/Documentation/networking/stmmac.txt +++ b/Documentation/networking/stmmac.txt @@ -4,14 +4,16 @@ Copyright (C) 2007-2010 STMicroelectronics Ltd Author: Giuseppe Cavallaro <peppe.cavallaro@st.com> This is the driver for the MAC 10/100/1000 on-chip Ethernet controllers -(Synopsys IP blocks); it has been fully tested on STLinux platforms. +(Synopsys IP blocks). Currently this network device driver is for all STM embedded MAC/GMAC -(i.e. 7xxx/5xxx SoCs) and it's known working on other platforms i.e. ARM SPEAr. +(i.e. 7xxx/5xxx SoCs), SPEAr (arm), Loongson1B (mips) and XLINX XC2V3000 +FF1152AMT0221 D1215994A VIRTEX FPGA board. -DWC Ether MAC 10/100/1000 Universal version 3.41a and DWC Ether MAC 10/100 -Universal version 4.0 have been used for developing the first code -implementation. +DWC Ether MAC 10/100/1000 Universal version 3.60a (and older) and DWC Ether MAC 10/100 +Universal version 4.0 have been used for developing this driver. + +This driver supports both the platform bus and PCI. Please, for more information also visit: www.stlinux.com @@ -277,5 +279,5 @@ In fact, these can generate an huge amount of debug messages. 6) TODO: o XGMAC is not supported. - o Review the timer optimisation code to use an embedded device that will be - available in new chip generations. + o Add the EEE - Energy Efficient Ethernet + o Add the PTP - precision time protocol diff --git a/Documentation/networking/team.txt b/Documentation/networking/team.txt new file mode 100644 index 00000000000..5a013686b9e --- /dev/null +++ b/Documentation/networking/team.txt @@ -0,0 +1,2 @@ +Team devices are driven from userspace via libteam library which is here: + https://github.com/jpirko/libteam diff --git a/Documentation/power/devices.txt b/Documentation/power/devices.txt index 646a89e0c07..3139fb505dc 100644 --- a/Documentation/power/devices.txt +++ b/Documentation/power/devices.txt @@ -123,9 +123,10 @@ please refer directly to the source code for more information about it. Subsystem-Level Methods ----------------------- The core methods to suspend and resume devices reside in struct dev_pm_ops -pointed to by the pm member of struct bus_type, struct device_type and -struct class. They are mostly of interest to the people writing infrastructure -for buses, like PCI or USB, or device type and device class drivers. +pointed to by the ops member of struct dev_pm_domain, or by the pm member of +struct bus_type, struct device_type and struct class. They are mostly of +interest to the people writing infrastructure for platforms and buses, like PCI +or USB, or device type and device class drivers. Bus drivers implement these methods as appropriate for the hardware and the drivers using it; PCI works differently from USB, and so on. Not many people @@ -139,41 +140,57 @@ sequencing in the driver model tree. /sys/devices/.../power/wakeup files ----------------------------------- -All devices in the driver model have two flags to control handling of wakeup -events (hardware signals that can force the device and/or system out of a low -power state). These flags are initialized by bus or device driver code using +All device objects in the driver model contain fields that control the handling +of system wakeup events (hardware signals that can force the system out of a +sleep state). These fields are initialized by bus or device driver code using device_set_wakeup_capable() and device_set_wakeup_enable(), defined in include/linux/pm_wakeup.h. -The "can_wakeup" flag just records whether the device (and its driver) can +The "power.can_wakeup" flag just records whether the device (and its driver) can physically support wakeup events. The device_set_wakeup_capable() routine -affects this flag. The "should_wakeup" flag controls whether the device should -try to use its wakeup mechanism. device_set_wakeup_enable() affects this flag; -for the most part drivers should not change its value. The initial value of -should_wakeup is supposed to be false for the majority of devices; the major -exceptions are power buttons, keyboards, and Ethernet adapters whose WoL -(wake-on-LAN) feature has been set up with ethtool. It should also default -to true for devices that don't generate wakeup requests on their own but merely -forward wakeup requests from one bus to another (like PCI bridges). +affects this flag. The "power.wakeup" field is a pointer to an object of type +struct wakeup_source used for controlling whether or not the device should use +its system wakeup mechanism and for notifying the PM core of system wakeup +events signaled by the device. This object is only present for wakeup-capable +devices (i.e. devices whose "can_wakeup" flags are set) and is created (or +removed) by device_set_wakeup_capable(). Whether or not a device is capable of issuing wakeup events is a hardware matter, and the kernel is responsible for keeping track of it. By contrast, whether or not a wakeup-capable device should issue wakeup events is a policy decision, and it is managed by user space through a sysfs attribute: the -power/wakeup file. User space can write the strings "enabled" or "disabled" to -set or clear the "should_wakeup" flag, respectively. This file is only present -for wakeup-capable devices (i.e. devices whose "can_wakeup" flags are set) -and is created (or removed) by device_set_wakeup_capable(). Reads from the -file will return the corresponding string. - -The device_may_wakeup() routine returns true only if both flags are set. +"power/wakeup" file. User space can write the strings "enabled" or "disabled" +to it to indicate whether or not, respectively, the device is supposed to signal +system wakeup. This file is only present if the "power.wakeup" object exists +for the given device and is created (or removed) along with that object, by +device_set_wakeup_capable(). Reads from the file will return the corresponding +string. + +The "power/wakeup" file is supposed to contain the "disabled" string initially +for the majority of devices; the major exceptions are power buttons, keyboards, +and Ethernet adapters whose WoL (wake-on-LAN) feature has been set up with +ethtool. It should also default to "enabled" for devices that don't generate +wakeup requests on their own but merely forward wakeup requests from one bus to +another (like PCI Express ports). + +The device_may_wakeup() routine returns true only if the "power.wakeup" object +exists and the corresponding "power/wakeup" file contains the string "enabled". This information is used by subsystems, like the PCI bus type code, to see whether or not to enable the devices' wakeup mechanisms. If device wakeup mechanisms are enabled or disabled directly by drivers, they also should use device_may_wakeup() to decide what to do during a system sleep transition. -However for runtime power management, wakeup events should be enabled whenever -the device and driver both support them, regardless of the should_wakeup flag. - +Device drivers, however, are not supposed to call device_set_wakeup_enable() +directly in any case. + +It ought to be noted that system wakeup is conceptually different from "remote +wakeup" used by runtime power management, although it may be supported by the +same physical mechanism. Remote wakeup is a feature allowing devices in +low-power states to trigger specific interrupts to signal conditions in which +they should be put into the full-power state. Those interrupts may or may not +be used to signal system wakeup events, depending on the hardware design. On +some systems it is impossible to trigger them from system sleep states. In any +case, remote wakeup should always be enabled for runtime power management for +all devices and drivers that support it. /sys/devices/.../power/control files ------------------------------------ @@ -249,20 +266,31 @@ for every device before the next phase begins. Not all busses or classes support all these callbacks and not all drivers use all the callbacks. The various phases always run after tasks have been frozen and before they are unfrozen. Furthermore, the *_noirq phases run at a time when IRQ handlers have -been disabled (except for those marked with the IRQ_WAKEUP flag). - -All phases use bus, type, or class callbacks (that is, methods defined in -dev->bus->pm, dev->type->pm, or dev->class->pm). These callbacks are mutually -exclusive, so if the device type provides a struct dev_pm_ops object pointed to -by its pm field (i.e. both dev->type and dev->type->pm are defined), the -callbacks included in that object (i.e. dev->type->pm) will be used. Otherwise, -if the class provides a struct dev_pm_ops object pointed to by its pm field -(i.e. both dev->class and dev->class->pm are defined), the PM core will use the -callbacks from that object (i.e. dev->class->pm). Finally, if the pm fields of -both the device type and class objects are NULL (or those objects do not exist), -the callbacks provided by the bus (that is, the callbacks from dev->bus->pm) -will be used (this allows device types to override callbacks provided by bus -types or classes if necessary). +been disabled (except for those marked with the IRQF_NO_SUSPEND flag). + +All phases use PM domain, bus, type, or class callbacks (that is, methods +defined in dev->pm_domain->ops, dev->bus->pm, dev->type->pm, or dev->class->pm). +These callbacks are regarded by the PM core as mutually exclusive. Moreover, +PM domain callbacks always take precedence over bus, type and class callbacks, +while type callbacks take precedence over bus and class callbacks, and class +callbacks take precedence over bus callbacks. To be precise, the following +rules are used to determine which callback to execute in the given phase: + + 1. If dev->pm_domain is present, the PM core will attempt to execute the + callback included in dev->pm_domain->ops. If that callback is not + present, no action will be carried out for the given device. + + 2. Otherwise, if both dev->type and dev->type->pm are present, the callback + included in dev->type->pm will be executed. + + 3. Otherwise, if both dev->class and dev->class->pm are present, the + callback included in dev->class->pm will be executed. + + 4. Otherwise, if both dev->bus and dev->bus->pm are present, the callback + included in dev->bus->pm will be executed. + +This allows PM domains and device types to override callbacks provided by bus +types or device classes if necessary. These callbacks may in turn invoke device- or driver-specific methods stored in dev->driver->pm, but they don't have to. @@ -283,9 +311,8 @@ When the system goes into the standby or memory sleep state, the phases are: After the prepare callback method returns, no new children may be registered below the device. The method may also prepare the device or - driver in some way for the upcoming system power transition (for - example, by allocating additional memory required for this purpose), but - it should not put the device into a low-power state. + driver in some way for the upcoming system power transition, but it + should not put the device into a low-power state. 2. The suspend methods should quiesce the device to stop it from performing I/O. They also may save the device registers and put it into the diff --git a/Documentation/power/runtime_pm.txt b/Documentation/power/runtime_pm.txt index 5336149f831..c2ae8bf77d4 100644 --- a/Documentation/power/runtime_pm.txt +++ b/Documentation/power/runtime_pm.txt @@ -44,25 +44,33 @@ struct dev_pm_ops { }; The ->runtime_suspend(), ->runtime_resume() and ->runtime_idle() callbacks -are executed by the PM core for either the power domain, or the device type -(if the device power domain's struct dev_pm_ops does not exist), or the class -(if the device power domain's and type's struct dev_pm_ops object does not -exist), or the bus type (if the device power domain's, type's and class' -struct dev_pm_ops objects do not exist) of the given device, so the priority -order of callbacks from high to low is that power domain callbacks, device -type callbacks, class callbacks and bus type callbacks, and the high priority -one will take precedence over low priority one. The bus type, device type and -class callbacks are referred to as subsystem-level callbacks in what follows, -and generally speaking, the power domain callbacks are used for representing -power domains within a SoC. +are executed by the PM core for the device's subsystem that may be either of +the following: + + 1. PM domain of the device, if the device's PM domain object, dev->pm_domain, + is present. + + 2. Device type of the device, if both dev->type and dev->type->pm are present. + + 3. Device class of the device, if both dev->class and dev->class->pm are + present. + + 4. Bus type of the device, if both dev->bus and dev->bus->pm are present. + +The PM core always checks which callback to use in the order given above, so the +priority order of callbacks from high to low is: PM domain, device type, class +and bus type. Moreover, the high-priority one will always take precedence over +a low-priority one. The PM domain, bus type, device type and class callbacks +are referred to as subsystem-level callbacks in what follows. By default, the callbacks are always invoked in process context with interrupts enabled. However, subsystems can use the pm_runtime_irq_safe() helper function -to tell the PM core that a device's ->runtime_suspend() and ->runtime_resume() -callbacks should be invoked in atomic context with interrupts disabled. -This implies that these callback routines must not block or sleep, but it also -means that the synchronous helper functions listed at the end of Section 4 can -be used within an interrupt handler or in an atomic context. +to tell the PM core that their ->runtime_suspend(), ->runtime_resume() and +->runtime_idle() callbacks may be invoked in atomic context with interrupts +disabled for a given device. This implies that the callback routines in +question must not block or sleep, but it also means that the synchronous helper +functions listed at the end of Section 4 may be used for that device within an +interrupt handler or generally in an atomic context. The subsystem-level suspend callback is _entirely_ _responsible_ for handling the suspend of the device as appropriate, which may, but need not include diff --git a/Documentation/serial/serial-rs485.txt b/Documentation/serial/serial-rs485.txt index 079cb3df62c..41c8378c0b2 100644 --- a/Documentation/serial/serial-rs485.txt +++ b/Documentation/serial/serial-rs485.txt @@ -97,15 +97,23 @@ struct serial_rs485 rs485conf; - /* Set RS485 mode: */ + /* Enable RS485 mode: */ rs485conf.flags |= SER_RS485_ENABLED; + /* Set logical level for RTS pin equal to 1 when sending: */ + rs485conf.flags |= SER_RS485_RTS_ON_SEND; + /* or, set logical level for RTS pin equal to 0 when sending: */ + rs485conf.flags &= ~(SER_RS485_RTS_ON_SEND); + + /* Set logical level for RTS pin equal to 1 after sending: */ + rs485conf.flags |= SER_RS485_RTS_AFTER_SEND; + /* or, set logical level for RTS pin equal to 0 after sending: */ + rs485conf.flags &= ~(SER_RS485_RTS_AFTER_SEND); + /* Set rts delay before send, if needed: */ - rs485conf.flags |= SER_RS485_RTS_BEFORE_SEND; rs485conf.delay_rts_before_send = ...; /* Set rts delay after send, if needed: */ - rs485conf.flags |= SER_RS485_RTS_AFTER_SEND; rs485conf.delay_rts_after_send = ...; /* Set this flag if you want to receive data even whilst sending data */ diff --git a/Documentation/sound/alsa/HD-Audio-Models.txt b/Documentation/sound/alsa/HD-Audio-Models.txt index 4f3443230d8..edad99abec2 100644 --- a/Documentation/sound/alsa/HD-Audio-Models.txt +++ b/Documentation/sound/alsa/HD-Audio-Models.txt @@ -349,6 +349,7 @@ STAC92HD83* ref Reference board mic-ref Reference board with power management for ports dell-s14 Dell laptop + dell-vostro-3500 Dell Vostro 3500 laptop hp HP laptops with (inverted) mute-LED hp-dv7-4000 HP dv-7 4000 auto BIOS setup (default) diff --git a/Documentation/sound/alsa/HD-Audio.txt b/Documentation/sound/alsa/HD-Audio.txt index 03e2771ddee..91fee3b45fb 100644 --- a/Documentation/sound/alsa/HD-Audio.txt +++ b/Documentation/sound/alsa/HD-Audio.txt @@ -579,7 +579,7 @@ Development Tree ~~~~~~~~~~~~~~~~ The latest development codes for HD-audio are found on sound git tree: -- git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6.git +- git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound.git The master branch or for-next branches can be used as the main development branches in general while the HD-audio specific patches @@ -594,7 +594,7 @@ is, installed via the usual spells: configure, make and make install(-modules). See INSTALL in the package. The snapshot tarballs are found at: -- ftp://ftp.kernel.org/pub/linux/kernel/people/tiwai/snapshot/ +- ftp://ftp.suse.com/pub/people/tiwai/snapshot/ Sending a Bug Report @@ -696,7 +696,7 @@ via hda-verb won't change the mixer value. The hda-verb program is found in the ftp directory: -- ftp://ftp.kernel.org/pub/linux/kernel/people/tiwai/misc/ +- ftp://ftp.suse.com/pub/people/tiwai/misc/ Also a git repository is available: @@ -764,7 +764,7 @@ operation, the jack plugging simulation, etc. The package is found in: -- ftp://ftp.kernel.org/pub/linux/kernel/people/tiwai/misc/ +- ftp://ftp.suse.com/pub/people/tiwai/misc/ A git repository is available: diff --git a/Documentation/sound/alsa/soc/machine.txt b/Documentation/sound/alsa/soc/machine.txt index 3e2ec9cbf39..d50c14df341 100644 --- a/Documentation/sound/alsa/soc/machine.txt +++ b/Documentation/sound/alsa/soc/machine.txt @@ -50,8 +50,7 @@ Machine DAI Configuration The machine DAI configuration glues all the codec and CPU DAIs together. It can also be used to set up the DAI system clock and for any machine related DAI initialisation e.g. the machine audio map can be connected to the codec audio -map, unconnected codec pins can be set as such. Please see corgi.c, spitz.c -for examples. +map, unconnected codec pins can be set as such. struct snd_soc_dai_link is used to set up each DAI in your machine. e.g. @@ -83,8 +82,7 @@ Machine Power Map The machine driver can optionally extend the codec power map and to become an audio power map of the audio subsystem. This allows for automatic power up/down of speaker/HP amplifiers, etc. Codec pins can be connected to the machines jack -sockets in the machine init function. See soc/pxa/spitz.c and dapm.txt for -details. +sockets in the machine init function. Machine Controls diff --git a/Documentation/trace/events.txt b/Documentation/trace/events.txt index b510564aac7..bb24c2a0e87 100644 --- a/Documentation/trace/events.txt +++ b/Documentation/trace/events.txt @@ -191,8 +191,6 @@ And for string fields they are: Currently, only exact string matches are supported. -Currently, the maximum number of predicates in a filter is 16. - 5.2 Setting filters ------------------- diff --git a/Documentation/usb/linux-cdc-acm.inf b/Documentation/usb/linux-cdc-acm.inf index 37a02ce5484..f0ffc27d4c0 100644 --- a/Documentation/usb/linux-cdc-acm.inf +++ b/Documentation/usb/linux-cdc-acm.inf @@ -90,10 +90,10 @@ ServiceBinary=%12%\USBSER.sys [SourceDisksFiles] [SourceDisksNames] [DeviceList] -%DESCRIPTION%=DriverInstall, USB\VID_0525&PID_A4A7, USB\VID_1D6B&PID_0104&MI_02 +%DESCRIPTION%=DriverInstall, USB\VID_0525&PID_A4A7, USB\VID_1D6B&PID_0104&MI_02, USB\VID_1D6B&PID_0106&MI_00 [DeviceList.NTamd64] -%DESCRIPTION%=DriverInstall, USB\VID_0525&PID_A4A7, USB\VID_1D6B&PID_0104&MI_02 +%DESCRIPTION%=DriverInstall, USB\VID_0525&PID_A4A7, USB\VID_1D6B&PID_0104&MI_02, USB\VID_1D6B&PID_0106&MI_00 ;------------------------------------------------------------------------------ diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index 7945b0bd35e..e2a4b528736 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -1100,6 +1100,15 @@ emulate them efficiently. The fields in each entry are defined as follows: eax, ebx, ecx, edx: the values returned by the cpuid instruction for this function/index combination +The TSC deadline timer feature (CPUID leaf 1, ecx[24]) is always returned +as false, since the feature depends on KVM_CREATE_IRQCHIP for local APIC +support. Instead it is reported via + + ioctl(KVM_CHECK_EXTENSION, KVM_CAP_TSC_DEADLINE_TIMER) + +if that returns true and you use KVM_CREATE_IRQCHIP, or if you emulate the +feature in userspace, then you can enable the feature for KVM_SET_CPUID2. + 4.47 KVM_PPC_GET_PVINFO Capability: KVM_CAP_PPC_GET_PVINFO @@ -1151,6 +1160,13 @@ following flags are specified: /* Depends on KVM_CAP_IOMMU */ #define KVM_DEV_ASSIGN_ENABLE_IOMMU (1 << 0) +The KVM_DEV_ASSIGN_ENABLE_IOMMU flag is a mandatory option to ensure +isolation of the device. Usages not specifying this flag are deprecated. + +Only PCI header type 0 devices with PCI BAR resources are supported by +device assignment. The user requesting this ioctl must have read/write +access to the PCI sysfs resource files associated with the device. + 4.49 KVM_DEASSIGN_PCI_DEVICE Capability: KVM_CAP_DEVICE_DEASSIGNMENT |