8 files changed, 350 insertions, 17 deletions
diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking
index 42d4b30b104..c2992bc54f2 100644
--- a/Documentation/filesystems/Locking
+++ b/Documentation/filesystems/Locking
@@ -511,7 +511,6 @@ prototypes:
 	void (*open)(struct vm_area_struct*);
 	void (*close)(struct vm_area_struct*);
 	int (*fault)(struct vm_area_struct*, struct vm_fault *);
-	struct page *(*nopage)(struct vm_area_struct*, unsigned long, int *);
 	int (*page_mkwrite)(struct vm_area_struct *, struct page *);
 
 locking rules:
@@ -519,7 +518,6 @@ locking rules:
 open:		no	yes
 close:		no	yes
 fault:		no	yes
-nopage:		no	yes
 page_mkwrite:	no	yes		no
 
 	->page_mkwrite() is called when a previously read-only page is
@@ -537,4 +535,3 @@ NULL.
 
 ipc/shm.c::shm_delete() - may need BKL.
 ->read() and ->write() in many drivers are (probably) missing BKL.
-drivers/sgi/char/graphics.c::sgi_graphics_nopage() - may need BKL.
diff --git a/Documentation/filesystems/nfs-rdma.txt b/Documentation/filesystems/nfs-rdma.txt
new file mode 100644
index 00000000000..d0ec45ae4e7
--- /dev/null
+++ b/Documentation/filesystems/nfs-rdma.txt
@@ -0,0 +1,256 @@
+################################################################################
+#									       #
+#				NFS/RDMA README				       #
+#									       #
+################################################################################
+
+ Author: NetApp and Open Grid Computing
+ Date: April 15, 2008
+
+Table of Contents
+~~~~~~~~~~~~~~~~~
+ - Overview
+ - Getting Help
+ - Installation
+ - Check RDMA and NFS Setup
+ - NFS/RDMA Setup
+
+Overview
+~~~~~~~~
+
+  This document describes how to install and setup the Linux NFS/RDMA client
+  and server software.
+
+  The NFS/RDMA client was first included in Linux 2.6.24. The NFS/RDMA server
+  was first included in the following release, Linux 2.6.25.
+
+  In our testing, we have obtained excellent performance results (full 10Gbit
+  wire bandwidth at minimal client CPU) under many workloads. The code passes
+  the full Connectathon test suite and operates over both Infiniband and iWARP
+  RDMA adapters.
+
+Getting Help
+~~~~~~~~~~~~
+
+  If you get stuck, you can ask questions on the
+
+                nfs-rdma-devel@lists.sourceforge.net
+
+  mailing list.
+
+Installation
+~~~~~~~~~~~~
+
+  These instructions are a step by step guide to building a machine for
+  use with NFS/RDMA.
+
+  - Install an RDMA device
+
+    Any device supported by the drivers in drivers/infiniband/hw is acceptable.
+
+    Testing has been performed using several Mellanox-based IB cards, the
+    Ammasso AMS1100 iWARP adapter, and the Chelsio cxgb3 iWARP adapter.
+
+  - Install a Linux distribution and tools
+
+    The first kernel release to contain both the NFS/RDMA client and server was
+    Linux 2.6.25  Therefore, a distribution compatible with this and subsequent
+    Linux kernel release should be installed.
+
+    The procedures described in this document have been tested with
+    distributions from Red Hat's Fedora Project (http://fedora.redhat.com/).
+
+  - Install nfs-utils-1.1.1 or greater on the client
+
+    An NFS/RDMA mount point can only be obtained by using the mount.nfs
+    command in nfs-utils-1.1.1 or greater. To see which version of mount.nfs
+    you are using, type:
+
+    > /sbin/mount.nfs -V
+
+    If the version is less than 1.1.1 or the command does not exist,
+    then you will need to install the latest version of nfs-utils.
+
+    Download the latest package from:
+
+    http://www.kernel.org/pub/linux/utils/nfs
+
+    Uncompress the package and follow the installation instructions.
+
+    If you will not be using GSS and NFSv4, the installation process
+    can be simplified by disabling these features when running configure:
+
+    > ./configure --disable-gss --disable-nfsv4
+
+    For more information on this see the package's README and INSTALL files.
+
+    After building the nfs-utils package, there will be a mount.nfs binary in
+    the utils/mount directory. This binary can be used to initiate NFS v2, v3,
+    or v4 mounts. To initiate a v4 mount, the binary must be called mount.nfs4.
+    The standard technique is to create a symlink called mount.nfs4 to mount.nfs.
+
+    NOTE: mount.nfs and therefore nfs-utils-1.1.1 or greater is only needed
+    on the NFS client machine. You do not need this specific version of
+    nfs-utils on the server. Furthermore, only the mount.nfs command from
+    nfs-utils-1.1.1 is needed on the client.
+
+  - Install a Linux kernel with NFS/RDMA
+
+    The NFS/RDMA client and server are both included in the mainline Linux
+    kernel version 2.6.25 and later. This and other versions of the 2.6 Linux
+    kernel can be found at:
+
+    ftp://ftp.kernel.org/pub/linux/kernel/v2.6/
+
+    Download the sources and place them in an appropriate location.
+
+  - Configure the RDMA stack
+
+    Make sure your kernel configuration has RDMA support enabled. Under
+    Device Drivers -> InfiniBand support, update the kernel configuration
+    to enable InfiniBand support [NOTE: the option name is misleading. Enabling
+    InfiniBand support is required for all RDMA devices (IB, iWARP, etc.)].
+
+    Enable the appropriate IB HCA support (mlx4, mthca, ehca, ipath, etc.) or
+    iWARP adapter support (amso, cxgb3, etc.).
+
+    If you are using InfiniBand, be sure to enable IP-over-InfiniBand support.
+
+  - Configure the NFS client and server
+
+    Your kernel configuration must also have NFS file system support and/or
+    NFS server support enabled. These and other NFS related configuration
+    options can be found under File Systems -> Network File Systems.
+
+  - Build, install, reboot
+
+    The NFS/RDMA code will be enabled automatically if NFS and RDMA
+    are turned on. The NFS/RDMA client and server are configured via the hidden
+    SUNRPC_XPRT_RDMA config option that depends on SUNRPC and INFINIBAND. The
+    value of SUNRPC_XPRT_RDMA will be:
+
+     - N if either SUNRPC or INFINIBAND are N, in this case the NFS/RDMA client
+       and server will not be built
+     - M if both SUNRPC and INFINIBAND are on (M or Y) and at least one is M,
+       in this case the NFS/RDMA client and server will be built as modules
+     - Y if both SUNRPC and INFINIBAND are Y, in this case the NFS/RDMA client
+       and server will be built into the kernel
+
+    Therefore, if you have followed the steps above and turned no NFS and RDMA,
+    the NFS/RDMA client and server will be built.
+
+    Build a new kernel, install it, boot it.
+
+Check RDMA and NFS Setup
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+    Before configuring the NFS/RDMA software, it is a good idea to test
+    your new kernel to ensure that the kernel is working correctly.
+    In particular, it is a good idea to verify that the RDMA stack
+    is functioning as expected and standard NFS over TCP/IP and/or UDP/IP
+    is working properly.
+
+  - Check RDMA Setup
+
+    If you built the RDMA components as modules, load them at
+    this time. For example, if you are using a Mellanox Tavor/Sinai/Arbel
+    card:
+
+    > modprobe ib_mthca
+    > modprobe ib_ipoib
+
+    If you are using InfiniBand, make sure there is a Subnet Manager (SM)
+    running on the network. If your IB switch has an embedded SM, you can
+    use it. Otherwise, you will need to run an SM, such as OpenSM, on one
+    of your end nodes.
+
+    If an SM is running on your network, you should see the following:
+
+    > cat /sys/class/infiniband/driverX/ports/1/state
+    4: ACTIVE
+
+    where driverX is mthca0, ipath5, ehca3, etc.
+
+    To further test the InfiniBand software stack, use IPoIB (this
+    assumes you have two IB hosts named host1 and host2):
+
+    host1> ifconfig ib0 a.b.c.x
+    host2> ifconfig ib0 a.b.c.y
+    host1> ping a.b.c.y
+    host2> ping a.b.c.x
+
+    For other device types, follow the appropriate procedures.
+
+  - Check NFS Setup
+
+    For the NFS components enabled above (client and/or server),
+    test their functionality over standard Ethernet using TCP/IP or UDP/IP.
+
+NFS/RDMA Setup
+~~~~~~~~~~~~~~
+
+  We recommend that you use two machines, one to act as the client and
+  one to act as the server.
+
+  One time configuration:
+
+  - On the server system, configure the /etc/exports file and
+    start the NFS/RDMA server.
+
+    Exports entries with the following formats have been tested:
+
+    /vol0   192.168.0.47(fsid=0,rw,async,insecure,no_root_squash)
+    /vol0   192.168.0.0/255.255.255.0(fsid=0,rw,async,insecure,no_root_squash)
+
+    The IP address(es) is(are) the client's IPoIB address for an InfiniBand HCA or the
+    cleint's iWARP address(es) for an RNIC.
+
+    NOTE: The "insecure" option must be used because the NFS/RDMA client does not
+    use a reserved port.
+
+ Each time a machine boots:
+
+  - Load and configure the RDMA drivers
+
+    For InfiniBand using a Mellanox adapter:
+
+    > modprobe ib_mthca
+    > modprobe ib_ipoib
+    > ifconfig ib0 a.b.c.d
+
+    NOTE: use unique addresses for the client and server
+
+  - Start the NFS server
+
+    If the NFS/RDMA server was built as a module (CONFIG_SUNRPC_XPRT_RDMA=m in kernel config),
+    load the RDMA transport module:
+
+    > modprobe svcrdma
+
+    Regardless of how the server was built (module or built-in), start the server:
+
+    > /etc/init.d/nfs start
+
+    or
+
+    > service nfs start
+
+    Instruct the server to listen on the RDMA transport:
+
+    > echo rdma 2050 > /proc/fs/nfsd/portlist
+
+  - On the client system
+
+    If the NFS/RDMA client was built as a module (CONFIG_SUNRPC_XPRT_RDMA=m in kernel config),
+    load the RDMA client module:
+
+    > modprobe xprtrdma.ko
+
+    Regardless of how the client was built (module or built-in), issue the mount.nfs command:
+
+    > /path/to/your/mount.nfs <IPoIB-server-name-or-address>:/<export> /mnt -i -o rdma,port=2050
+
+    To verify that the mount is using RDMA, run "cat /proc/mounts" and check the
+    "proto" field for the given mount.
+
+  Congratulations! You're using NFS/RDMA!
diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
index 518ebe609e2..2a99116edc4 100644
--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -43,6 +43,7 @@ Table of Contents
   2.13	/proc/<pid>/oom_score - Display current oom-killer score
   2.14	/proc/<pid>/io - Display the IO accounting fields
   2.15	/proc/<pid>/coredump_filter - Core dump filtering settings
+  2.16	/proc/<pid>/mountinfo - Information about mounts
 
 ------------------------------------------------------------------------------
 Preface
@@ -2348,4 +2349,41 @@ For example:
   $ echo 0x7 > /proc/self/coredump_filter
   $ ./some_program
 
+2.16	/proc/<pid>/mountinfo - Information about mounts
+--------------------------------------------------------
+
+This file contains lines of the form:
+
+36 35 98:0 /mnt1 /mnt2 rw,noatime master:1 - ext3 /dev/root rw,errors=continue
+(1)(2)(3)   (4)   (5)      (6)      (7)   (8) (9)   (10)         (11)
+
+(1) mount ID:  unique identifier of the mount (may be reused after umount)
+(2) parent ID:  ID of parent (or of self for the top of the mount tree)
+(3) major:minor:  value of st_dev for files on filesystem
+(4) root:  root of the mount within the filesystem
+(5) mount point:  mount point relative to the process's root
+(6) mount options:  per mount options
+(7) optional fields:  zero or more fields of the form "tag[:value]"
+(8) separator:  marks the end of the optional fields
+(9) filesystem type:  name of filesystem of the form "type[.subtype]"
+(10) mount source:  filesystem specific information or "none"
+(11) super options:  per super block options
+
+Parsers should ignore all unrecognised optional fields.  Currently the
+possible optional fields are:
+
+shared:X  mount is shared in peer group X
+master:X  mount is slave to peer group X
+propagate_from:X  mount is slave and receives propagation from peer group X (*)
+unbindable  mount is unbindable
+
+(*) X is the closest dominant peer group under the process's root.  If
+X is the immediate master of the mount, or if there's no dominant peer
+group under the same root, then only the "master:X" field is present
+and not the "propagate_from:X" field.
+
+For more information on mount propagation see:
+
+  Documentation/filesystems/sharedsubtree.txt
+
 ------------------------------------------------------------------------------
diff --git a/Documentation/filesystems/seq_file.txt b/Documentation/filesystems/seq_file.txt
index 7fb8e6dc62b..b843743aa0b 100644
--- a/Documentation/filesystems/seq_file.txt
+++ b/Documentation/filesystems/seq_file.txt
@@ -122,8 +122,7 @@ stop() is the place to free it.
 	}
 
 Finally, the show() function should format the object currently pointed to
-by the iterator for output. It should return zero, or an error code if
-something goes wrong. The example module's show() function is:
+by the iterator for output.  The example module's show() function is:
 
 	static int ct_seq_show(struct seq_file *s, void *v)
 	{
@@ -132,6 +131,12 @@ something goes wrong. The example module's show() function is:
 	        return 0;
 	}
 
+If all is well, the show() function should return zero.  A negative error
+code in the usual manner indicates that something went wrong; it will be
+passed back to user space.  This function can also return SEQ_SKIP, which
+causes the current item to be skipped; if the show() function has already
+generated output before returning SEQ_SKIP, that output will be dropped.
+
 We will look at seq_printf() in a moment. But first, the definition of the
 seq_file iterator is finished by creating a seq_operations structure with
 the four functions we have just defined:
@@ -182,12 +187,18 @@ The first two output a single character and a string, just like one would
 expect. seq_escape() is like seq_puts(), except that any character in s
 which is in the string esc will be represented in octal form in the output.
 
-There is also a function for printing filenames:
+There is also a pair of functions for printing filenames:
 
 	int seq_path(struct seq_file *m, struct path *path, char *esc);
+	int seq_path_root(struct seq_file *m, struct path *path,
+			  struct path *root, char *esc)
 
 Here, path indicates the file of interest, and esc is a set of characters
-which should be escaped in the output.
+which should be escaped in the output.  A call to seq_path() will output
+the path relative to the current process's filesystem root.  If a different
+root is desired, it can be used with seq_path_root().  Note that, if it
+turns out that path cannot be reached from root, the value of root will be
+changed in seq_file_root() to a root which *does* work.
 
 
 Making it all work
diff --git a/Documentation/filesystems/sysfs.txt b/Documentation/filesystems/sysfs.txt
index 4598ef7b622..7f27b8f840d 100644
--- a/Documentation/filesystems/sysfs.txt
+++ b/Documentation/filesystems/sysfs.txt
@@ -176,8 +176,10 @@ implementations:
   Recall that an attribute should only be exporting one value, or an
   array of similar values, so this shouldn't be that expensive. 
 
-  This allows userspace to do partial reads and seeks arbitrarily over
-  the entire file at will. 
+  This allows userspace to do partial reads and forward seeks
+  arbitrarily over the entire file at will. If userspace seeks back to
+  zero or does a pread(2) with an offset of '0' the show() method will
+  be called again, rearmed, to fill the buffer.
 
 - On write(2), sysfs expects the entire buffer to be passed during the
   first write. Sysfs then passes the entire buffer to the store()
@@ -192,6 +194,9 @@ implementations:
 
 Other notes:
 
+- Writing causes the show() method to be rearmed regardless of current
+  file position.
+
 - The buffer will always be PAGE_SIZE bytes in length. On i386, this
   is 4096. 
 
diff --git a/Documentation/filesystems/tmpfs.txt b/Documentation/filesystems/tmpfs.txt
index 145e4408635..222437efd75 100644
--- a/Documentation/filesystems/tmpfs.txt
+++ b/Documentation/filesystems/tmpfs.txt
@@ -92,6 +92,18 @@ NodeList format is a comma-separated list of decimal numbers and ranges,
 a range being two hyphen-separated decimal numbers, the smallest and
 largest node numbers in the range.  For example, mpol=bind:0-3,5,7,9-15
 
+NUMA memory allocation policies have optional flags that can be used in
+conjunction with their modes.  These optional flags can be specified
+when tmpfs is mounted by appending them to the mode before the NodeList.
+See Documentation/vm/numa_memory_policy.txt for a list of all available
+memory allocation policy mode flags.
+
+	=static		is equivalent to	MPOL_F_STATIC_NODES
+	=relative	is equivalent to	MPOL_F_RELATIVE_NODES
+
+For example, mpol=bind=static:NodeList, is the equivalent of an
+allocation policy of MPOL_BIND | MPOL_F_STATIC_NODES.
+
 Note that trying to mount a tmpfs with an mpol option will fail if the
 running kernel does not support NUMA; and will fail if its nodelist
 specifies a node which is not online.  If your system relies on that
diff --git a/Documentation/filesystems/vfat.txt b/Documentation/filesystems/vfat.txt
index fcc123ffa25..2d5e1e582e1 100644
--- a/Documentation/filesystems/vfat.txt
+++ b/Documentation/filesystems/vfat.txt
@@ -17,6 +17,21 @@ dmask=###     -- The permission mask for the directory.
 fmask=###     -- The permission mask for files.
                  The default is the umask of current process.
 
+allow_utime=### -- This option controls the permission check of mtime/atime.
+
+                  20 - If current process is in group of file's group ID,
+                       you can change timestamp.
+                   2 - Other users can change timestamp.
+
+                 The default is set from `dmask' option. (If the directory is
+                 writable, utime(2) is also allowed. I.e. ~dmask & 022)
+
+                 Normally utime(2) checks current process is owner of
+                 the file, or it has CAP_FOWNER capability.  But FAT
+                 filesystem doesn't have uid/gid on disk, so normal
+                 check is too unflexible. With this option you can
+                 relax it.
+
 codepage=###  -- Sets the codepage number for converting to shortname
 		 characters on FAT filesystem.
 		 By default, FAT_DEFAULT_CODEPAGE setting is used.
diff --git a/Documentation/filesystems/xfs.txt b/Documentation/filesystems/xfs.txt
index 74aeb142ae5..0a1668ba260 100644
--- a/Documentation/filesystems/xfs.txt
+++ b/Documentation/filesystems/xfs.txt
@@ -52,16 +52,15 @@ When mounting an XFS filesystem, the following options are accepted.
 	and also gets the setgid bit set if it is a directory itself.
 
   ihashsize=value
-	Sets the number of hash buckets available for hashing the
-	in-memory inodes of the specified mount point.  If a value
-	of zero is used, the value selected by the default algorithm
-	will be displayed in /proc/mounts.
+	In memory inode hashes have been removed, so this option has
+	no function as of August 2007. Option is deprecated.
 
   ikeep/noikeep
-	When inode clusters are emptied of inodes, keep them around
-	on the disk (ikeep) - this is the traditional XFS behaviour
-	and is still the default for now.  Using the noikeep option,
-	inode clusters are returned to the free space pool.
+	When ikeep is specified, XFS does not delete empty inode clusters
+	and keeps them around on disk. ikeep is the traditional XFS
+	behaviour. When noikeep is specified, empty inode clusters
+	are returned to the free space pool. The default is noikeep for
+	non-DMAPI mounts, while ikeep is the default when DMAPI is in use.
 
   inode64
 	Indicates that XFS is allowed to create inodes at any location