summaryrefslogtreecommitdiffstats
path: root/Documentation/filesystems
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/filesystems')
-rw-r--r--Documentation/filesystems/9p.txt43
-rw-r--r--Documentation/filesystems/afs.txt26
-rw-r--r--Documentation/filesystems/ext4.txt24
-rw-r--r--Documentation/filesystems/gfs2-uevents.txt100
-rw-r--r--Documentation/filesystems/ncpfs.txt6
-rw-r--r--Documentation/filesystems/nfs.txt98
-rw-r--r--Documentation/filesystems/nfs41-server.txt54
-rw-r--r--Documentation/filesystems/nfsroot.txt2
-rw-r--r--Documentation/filesystems/proc.txt42
-rw-r--r--Documentation/filesystems/seq_file.txt2
-rw-r--r--Documentation/filesystems/sharedsubtree.txt220
-rw-r--r--Documentation/filesystems/vfs.txt7
12 files changed, 387 insertions, 237 deletions
diff --git a/Documentation/filesystems/9p.txt b/Documentation/filesystems/9p.txt
index bf8080640eb..57e0b80a527 100644
--- a/Documentation/filesystems/9p.txt
+++ b/Documentation/filesystems/9p.txt
@@ -18,11 +18,11 @@ the 9p client is available in the form of a USENIX paper:
Other applications are described in the following papers:
* XCPU & Clustering
- http://www.xcpu.org/xcpu-talk.pdf
+ http://xcpu.org/papers/xcpu-talk.pdf
* KVMFS: control file system for KVM
- http://www.xcpu.org/kvmfs.pdf
- * CellFS: A New ProgrammingModel for the Cell BE
- http://www.xcpu.org/cellfs-talk.pdf
+ http://xcpu.org/papers/kvmfs.pdf
+ * CellFS: A New Programming Model for the Cell BE
+ http://xcpu.org/papers/cellfs-talk.pdf
* PROSE I/O: Using 9p to enable Application Partitions
http://plan9.escet.urjc.es/iwp9/cready/PROSE_iwp9_2006.pdf
@@ -48,6 +48,7 @@ OPTIONS
(see rfdno and wfdno)
virtio - connect to the next virtio channel available
(from lguest or KVM with trans_virtio module)
+ rdma - connect to a specified RDMA channel
uname=name user name to attempt mount as on the remote server. The
server may override or ignore this value. Certain user
@@ -59,16 +60,22 @@ OPTIONS
cache=mode specifies a caching policy. By default, no caches are used.
loose = no attempts are made at consistency,
intended for exclusive, read-only mounts
+ fscache = use FS-Cache for a persistent, read-only
+ cache backend.
debug=n specifies debug level. The debug level is a bitmask.
- 0x01 = display verbose error messages
- 0x02 = developer debug (DEBUG_CURRENT)
- 0x04 = display 9p trace
- 0x08 = display VFS trace
- 0x10 = display Marshalling debug
- 0x20 = display RPC debug
- 0x40 = display transport debug
- 0x80 = display allocation debug
+ 0x01 = display verbose error messages
+ 0x02 = developer debug (DEBUG_CURRENT)
+ 0x04 = display 9p trace
+ 0x08 = display VFS trace
+ 0x10 = display Marshalling debug
+ 0x20 = display RPC debug
+ 0x40 = display transport debug
+ 0x80 = display allocation debug
+ 0x100 = display protocol message debug
+ 0x200 = display Fid debug
+ 0x400 = display packet debug
+ 0x800 = display fscache tracing debug
rfdno=n the file descriptor for reading with trans=fd
@@ -100,6 +107,10 @@ OPTIONS
any = v9fs does single attach and performs all
operations as one user
+ cachetag cache tag to use the specified persistent cache.
+ cache tags for existing cache sessions can be listed at
+ /sys/fs/9p/caches. (applies only to cache=fscache)
+
RESOURCES
=========
@@ -118,12 +129,16 @@ and export.
A Linux version of the 9p server is now maintained under the npfs project
on sourceforge (http://sourceforge.net/projects/npfs). The currently
maintained version is the single-threaded version of the server (named spfs)
-available from the same CVS repository.
+available from the same SVN repository.
There are user and developer mailing lists available through the v9fs project
on sourceforge (http://sourceforge.net/projects/v9fs).
-News and other information is maintained on SWiK (http://swik.net/v9fs).
+A stand-alone version of the module (which should build for any 2.6 kernel)
+is available via (http://github.com/ericvh/9p-sac/tree/master)
+
+News and other information is maintained on SWiK (http://swik.net/v9fs)
+and the Wiki (http://sf.net/apps/mediawiki/v9fs/index.php).
Bug reports may be issued through the kernel.org bugzilla
(http://bugzilla.kernel.org)
diff --git a/Documentation/filesystems/afs.txt b/Documentation/filesystems/afs.txt
index 12ad6c7f4e5..ffef91c4e0d 100644
--- a/Documentation/filesystems/afs.txt
+++ b/Documentation/filesystems/afs.txt
@@ -23,15 +23,13 @@ it does support include:
(*) Security (currently only AFS kaserver and KerberosIV tickets).
- (*) File reading.
+ (*) File reading and writing.
(*) Automounting.
-It does not yet support the following AFS features:
-
- (*) Write support.
+ (*) Local caching (via fscache).
- (*) Local caching.
+It does not yet support the following AFS features:
(*) pioctl() system call.
@@ -56,7 +54,7 @@ They permit the debugging messages to be turned on dynamically by manipulating
the masks in the following files:
/sys/module/af_rxrpc/parameters/debug
- /sys/module/afs/parameters/debug
+ /sys/module/kafs/parameters/debug
=====
@@ -66,9 +64,9 @@ USAGE
When inserting the driver modules the root cell must be specified along with a
list of volume location server IP addresses:
- insmod af_rxrpc.o
- insmod rxkad.o
- insmod kafs.o rootcell=cambridge.redhat.com:172.16.18.73:172.16.18.91
+ modprobe af_rxrpc
+ modprobe rxkad
+ modprobe kafs rootcell=cambridge.redhat.com:172.16.18.73:172.16.18.91
The first module is the AF_RXRPC network protocol driver. This provides the
RxRPC remote operation protocol and may also be accessed from userspace. See:
@@ -81,7 +79,7 @@ is the actual filesystem driver for the AFS filesystem.
Once the module has been loaded, more modules can be added by the following
procedure:
- echo add grand.central.org 18.7.14.88:128.2.191.224 >/proc/fs/afs/cells
+ echo add grand.central.org 18.9.48.14:128.2.203.61:130.237.48.87 >/proc/fs/afs/cells
Where the parameters to the "add" command are the name of a cell and a list of
volume location servers within that cell, with the latter separated by colons.
@@ -101,7 +99,7 @@ The name of the volume can be suffixes with ".backup" or ".readonly" to
specify connection to only volumes of those types.
The name of the cell is optional, and if not given during a mount, then the
-named volume will be looked up in the cell specified during insmod.
+named volume will be looked up in the cell specified during modprobe.
Additional cells can be added through /proc (see later section).
@@ -163,14 +161,14 @@ THE CELL DATABASE
The filesystem maintains an internal database of all the cells it knows and the
IP addresses of the volume location servers for those cells. The cell to which
-the system belongs is added to the database when insmod is performed by the
+the system belongs is added to the database when modprobe is performed by the
"rootcell=" argument or, if compiled in, using a "kafs.rootcell=" argument on
the kernel command line.
Further cells can be added by commands similar to the following:
echo add CELLNAME VLADDR[:VLADDR][:VLADDR]... >/proc/fs/afs/cells
- echo add grand.central.org 18.7.14.88:128.2.191.224 >/proc/fs/afs/cells
+ echo add grand.central.org 18.9.48.14:128.2.203.61:130.237.48.87 >/proc/fs/afs/cells
No other cell database operations are available at this time.
@@ -233,7 +231,7 @@ insmod /tmp/kafs.o rootcell=cambridge.redhat.com:172.16.18.91
mount -t afs \%root.afs. /afs
mount -t afs \%cambridge.redhat.com:root.cell. /afs/cambridge.redhat.com/
-echo add grand.central.org 18.7.14.88:128.2.191.224 > /proc/fs/afs/cells
+echo add grand.central.org 18.9.48.14:128.2.203.61:130.237.48.87 > /proc/fs/afs/cells
mount -t afs "#grand.central.org:root.cell." /afs/grand.central.org/
mount -t afs "#grand.central.org:root.archive." /afs/grand.central.org/archive
mount -t afs "#grand.central.org:root.contrib." /afs/grand.central.org/contrib
diff --git a/Documentation/filesystems/ext4.txt b/Documentation/filesystems/ext4.txt
index 7be02ac5fa3..18b5ec8cea4 100644
--- a/Documentation/filesystems/ext4.txt
+++ b/Documentation/filesystems/ext4.txt
@@ -134,15 +134,9 @@ ro Mount filesystem read only. Note that ext4 will
mount options "ro,noload" can be used to prevent
writes to the filesystem.
-journal_checksum Enable checksumming of the journal transactions.
- This will allow the recovery code in e2fsck and the
- kernel to detect corruption in the kernel. It is a
- compatible change and will be ignored by older kernels.
-
journal_async_commit Commit block can be written to disk without waiting
for descriptor blocks. If enabled older kernels cannot
- mount the device. This will enable 'journal_checksum'
- internally.
+ mount the device.
journal=update Update the ext4 file system's journal to the current
format.
@@ -263,10 +257,18 @@ resuid=n The user ID which may use the reserved blocks.
sb=n Use alternate superblock at this location.
-quota
-noquota
-grpquota
-usrquota
+quota These options are ignored by the filesystem. They
+noquota are used only by quota tools to recognize volumes
+grpquota where quota should be turned on. See documentation
+usrquota in the quota-tools package for more details
+ (http://sourceforge.net/projects/linuxquota).
+
+jqfmt=<quota type> These options tell filesystem details about quota
+usrjquota=<file> so that quota information can be properly updated
+grpjquota=<file> during journal replay. They replace the above
+ quota options. See documentation in the quota-tools
+ package for more details
+ (http://sourceforge.net/projects/linuxquota).
bh (*) ext4 associates buffer heads to data pages to
nobh (a) cache disk block mapping information
diff --git a/Documentation/filesystems/gfs2-uevents.txt b/Documentation/filesystems/gfs2-uevents.txt
new file mode 100644
index 00000000000..fd966dc9979
--- /dev/null
+++ b/Documentation/filesystems/gfs2-uevents.txt
@@ -0,0 +1,100 @@
+ uevents and GFS2
+ ==================
+
+During the lifetime of a GFS2 mount, a number of uevents are generated.
+This document explains what the events are and what they are used
+for (by gfs_controld in gfs2-utils).
+
+A list of GFS2 uevents
+-----------------------
+
+1. ADD
+
+The ADD event occurs at mount time. It will always be the first
+uevent generated by the newly created filesystem. If the mount
+is successful, an ONLINE uevent will follow. If it is not successful
+then a REMOVE uevent will follow.
+
+The ADD uevent has two environment variables: SPECTATOR=[0|1]
+and RDONLY=[0|1] that specify the spectator status (a read-only mount
+with no journal assigned), and read-only (with journal assigned) status
+of the filesystem respectively.
+
+2. ONLINE
+
+The ONLINE uevent is generated after a successful mount or remount. It
+has the same environment variables as the ADD uevent. The ONLINE
+uevent, along with the two environment variables for spectator and
+RDONLY are a relatively recent addition (2.6.32-rc+) and will not
+be generated by older kernels.
+
+3. CHANGE
+
+The CHANGE uevent is used in two places. One is when reporting the
+successful mount of the filesystem by the first node (FIRSTMOUNT=Done).
+This is used as a signal by gfs_controld that it is then ok for other
+nodes in the cluster to mount the filesystem.
+
+The other CHANGE uevent is used to inform of the completion
+of journal recovery for one of the filesystems journals. It has
+two environment variables, JID= which specifies the journal id which
+has just been recovered, and RECOVERY=[Done|Failed] to indicate the
+success (or otherwise) of the operation. These uevents are generated
+for every journal recovered, whether it is during the initial mount
+process or as the result of gfs_controld requesting a specific journal
+recovery via the /sys/fs/gfs2/<fsname>/lock_module/recovery file.
+
+Because the CHANGE uevent was used (in early versions of gfs_controld)
+without checking the environment variables to discover the state, we
+cannot add any more functions to it without running the risk of
+someone using an older version of the user tools and breaking their
+cluster. For this reason the ONLINE uevent was used when adding a new
+uevent for a successful mount or remount.
+
+4. OFFLINE
+
+The OFFLINE uevent is only generated due to filesystem errors and is used
+as part of the "withdraw" mechanism. Currently this doesn't give any
+information about what the error is, which is something that needs to
+be fixed.
+
+5. REMOVE
+
+The REMOVE uevent is generated at the end of an unsuccessful mount
+or at the end of a umount of the filesystem. All REMOVE uevents will
+have been preceeded by at least an ADD uevent for the same fileystem,
+and unlike the other uevents is generated automatically by the kernel's
+kobject subsystem.
+
+
+Information common to all GFS2 uevents (uevent environment variables)
+----------------------------------------------------------------------
+
+1. LOCKTABLE=
+
+The LOCKTABLE is a string, as supplied on the mount command
+line (locktable=) or via fstab. It is used as a filesystem label
+as well as providing the information for a lock_dlm mount to be
+able to join the cluster.
+
+2. LOCKPROTO=
+
+The LOCKPROTO is a string, and its value depends on what is set
+on the mount command line, or via fstab. It will be either
+lock_nolock or lock_dlm. In the future other lock managers
+may be supported.
+
+3. JOURNALID=
+
+If a journal is in use by the filesystem (journals are not
+assigned for spectator mounts) then this will give the
+numeric journal id in all GFS2 uevents.
+
+4. UUID=
+
+With recent versions of gfs2-utils, mkfs.gfs2 writes a UUID
+into the filesystem superblock. If it exists, this will
+be included in every uevent relating to the filesystem.
+
+
+
diff --git a/Documentation/filesystems/ncpfs.txt b/Documentation/filesystems/ncpfs.txt
index f12c30c93f2..5af164f4b37 100644
--- a/Documentation/filesystems/ncpfs.txt
+++ b/Documentation/filesystems/ncpfs.txt
@@ -7,6 +7,6 @@ ftp.gwdg.de/pub/linux/misc/ncpfs, but sunsite and its many mirrors
will have it as well.
Related products are linware and mars_nwe, which will give Linux partial
-NetWare server functionality. Linware's home site is
-klokan.sh.cvut.cz/pub/linux/linware; mars_nwe can be found on
-ftp.gwdg.de/pub/linux/misc/ncpfs.
+NetWare server functionality.
+
+mars_nwe can be found on ftp.gwdg.de/pub/linux/misc/ncpfs.
diff --git a/Documentation/filesystems/nfs.txt b/Documentation/filesystems/nfs.txt
new file mode 100644
index 00000000000..f50f26ce6cd
--- /dev/null
+++ b/Documentation/filesystems/nfs.txt
@@ -0,0 +1,98 @@
+
+The NFS client
+==============
+
+The NFS version 2 protocol was first documented in RFC1094 (March 1989).
+Since then two more major releases of NFS have been published, with NFSv3
+being documented in RFC1813 (June 1995), and NFSv4 in RFC3530 (April
+2003).
+
+The Linux NFS client currently supports all the above published versions,
+and work is in progress on adding support for minor version 1 of the NFSv4
+protocol.
+
+The purpose of this document is to provide information on some of the
+upcall interfaces that are used in order to provide the NFS client with
+some of the information that it requires in order to fully comply with
+the NFS spec.
+
+The DNS resolver
+================
+
+NFSv4 allows for one server to refer the NFS client to data that has been
+migrated onto another server by means of the special "fs_locations"
+attribute. See
+ http://tools.ietf.org/html/rfc3530#section-6
+and
+ http://tools.ietf.org/html/draft-ietf-nfsv4-referrals-00
+
+The fs_locations information can take the form of either an ip address and
+a path, or a DNS hostname and a path. The latter requires the NFS client to
+do a DNS lookup in order to mount the new volume, and hence the need for an
+upcall to allow userland to provide this service.
+
+Assuming that the user has the 'rpc_pipefs' filesystem mounted in the usual
+/var/lib/nfs/rpc_pipefs, the upcall consists of the following steps:
+
+ (1) The process checks the dns_resolve cache to see if it contains a
+ valid entry. If so, it returns that entry and exits.
+
+ (2) If no valid entry exists, the helper script '/sbin/nfs_cache_getent'
+ (may be changed using the 'nfs.cache_getent' kernel boot parameter)
+ is run, with two arguments:
+ - the cache name, "dns_resolve"
+ - the hostname to resolve
+
+ (3) After looking up the corresponding ip address, the helper script
+ writes the result into the rpc_pipefs pseudo-file
+ '/var/lib/nfs/rpc_pipefs/cache/dns_resolve/channel'
+ in the following (text) format:
+
+ "<ip address> <hostname> <ttl>\n"
+
+ Where <ip address> is in the usual IPv4 (123.456.78.90) or IPv6
+ (ffee:ddcc:bbaa:9988:7766:5544:3322:1100, ffee::1100, ...) format.
+ <hostname> is identical to the second argument of the helper
+ script, and <ttl> is the 'time to live' of this cache entry (in
+ units of seconds).
+
+ Note: If <ip address> is invalid, say the string "0", then a negative
+ entry is created, which will cause the kernel to treat the hostname
+ as having no valid DNS translation.
+
+
+
+
+A basic sample /sbin/nfs_cache_getent
+=====================================
+
+#!/bin/bash
+#
+ttl=600
+#
+cut=/usr/bin/cut
+getent=/usr/bin/getent
+rpc_pipefs=/var/lib/nfs/rpc_pipefs
+#
+die()
+{
+ echo "Usage: $0 cache_name entry_name"
+ exit 1
+}
+
+[ $# -lt 2 ] && die
+cachename="$1"
+cache_path=${rpc_pipefs}/cache/${cachename}/channel
+
+case "${cachename}" in
+ dns_resolve)
+ name="$2"
+ result="$(${getent} hosts ${name} | ${cut} -f1 -d\ )"
+ [ -z "${result}" ] && result="0"
+ ;;
+ *)
+ die
+ ;;
+esac
+echo "${result} ${name} ${ttl}" >${cache_path}
+
diff --git a/Documentation/filesystems/nfs41-server.txt b/Documentation/filesystems/nfs41-server.txt
index 05d81cbcb2e..5920fe26e6f 100644
--- a/Documentation/filesystems/nfs41-server.txt
+++ b/Documentation/filesystems/nfs41-server.txt
@@ -11,6 +11,11 @@ the /proc/fs/nfsd/versions control file. Note that to write this
control file, the nfsd service must be taken down. Use your user-mode
nfs-utils to set this up; see rpc.nfsd(8)
+(Warning: older servers will interpret "+4.1" and "-4.1" as "+4" and
+"-4", respectively. Therefore, code meant to work on both new and old
+kernels must turn 4.1 on or off *before* turning support for version 4
+on or off; rpc.nfsd does this correctly.)
+
The NFSv4 minorversion 1 (NFSv4.1) implementation in nfsd is based
on the latest NFSv4.1 Internet Draft:
http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion1-29
@@ -25,6 +30,49 @@ are still under development out of tree.
See http://wiki.linux-nfs.org/wiki/index.php/PNFS_prototype_design
for more information.
+The current implementation is intended for developers only: while it
+does support ordinary file operations on clients we have tested against
+(including the linux client), it is incomplete in ways which may limit
+features unexpectedly, cause known bugs in rare cases, or cause
+interoperability problems with future clients. Known issues:
+
+ - gss support is questionable: currently mounts with kerberos
+ from a linux client are possible, but we aren't really
+ conformant with the spec (for example, we don't use kerberos
+ on the backchannel correctly).
+ - no trunking support: no clients currently take advantage of
+ trunking, but this is a mandatory failure, and its use is
+ recommended to clients in a number of places. (E.g. to ensure
+ timely renewal in case an existing connection's retry timeouts
+ have gotten too long; see section 8.3 of the draft.)
+ Therefore, lack of this feature may cause future clients to
+ fail.
+ - Incomplete backchannel support: incomplete backchannel gss
+ support and no support for BACKCHANNEL_CTL mean that
+ callbacks (hence delegations and layouts) may not be
+ available and clients confused by the incomplete
+ implementation may fail.
+ - Server reboot recovery is unsupported; if the server reboots,
+ clients may fail.
+ - We do not support SSV, which provides security for shared
+ client-server state (thus preventing unauthorized tampering
+ with locks and opens, for example). It is mandatory for
+ servers to support this, though no clients use it yet.
+ - Mandatory operations which we do not support, such as
+ DESTROY_CLIENTID, FREE_STATEID, SECINFO_NO_NAME, and
+ TEST_STATEID, are not currently used by clients, but will be
+ (and the spec recommends their uses in common cases), and
+ clients should not be expected to know how to recover from the
+ case where they are not supported. This will eventually cause
+ interoperability failures.
+
+In addition, some limitations are inherited from the current NFSv4
+implementation:
+
+ - Incomplete delegation enforcement: if a file is renamed or
+ unlinked, a client holding a delegation may continue to
+ indefinitely allow opens of the file under the old name.
+
The table below, taken from the NFSv4.1 document, lists
the operations that are mandatory to implement (REQ), optional
(OPT), and NFSv4.0 operations that are required not to implement (MNI)
@@ -142,6 +190,12 @@ NS*| CB_WANTS_CANCELLED | OPT | FDELG, | Section 20.10 |
Implementation notes:
+DELEGPURGE:
+* mandatory only for servers that support CLAIM_DELEGATE_PREV and/or
+ CLAIM_DELEG_PREV_FH (which allows clients to keep delegations that
+ persist across client reboots). Thus we need not implement this for
+ now.
+
EXCHANGE_ID:
* only SP4_NONE state protection supported
* implementation ids are ignored
diff --git a/Documentation/filesystems/nfsroot.txt b/Documentation/filesystems/nfsroot.txt
index 68baddf3c3e..3ba0b945aaf 100644
--- a/Documentation/filesystems/nfsroot.txt
+++ b/Documentation/filesystems/nfsroot.txt
@@ -105,7 +105,7 @@ ip=<client-ip>:<server-ip>:<gw-ip>:<netmask>:<hostname>:<device>:<autoconf>
the client address and this parameter is NOT empty only
replies from the specified server are accepted.
- Only required for for NFS root. That is autoconfiguration
+ Only required for NFS root. That is autoconfiguration
will not be triggered if it is missing and NFS root is not
in operation.
diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
index fad18f9456e..b5aee7838a0 100644
--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -176,6 +176,7 @@ read the file /proc/PID/status:
CapBnd: ffffffffffffffff
voluntary_ctxt_switches: 0
nonvoluntary_ctxt_switches: 1
+ Stack usage: 12 kB
This shows you nearly the same information you would get if you viewed it with
the ps command. In fact, ps uses the proc file system to obtain its
@@ -229,6 +230,7 @@ Table 1-2: Contents of the statm files (as of 2.6.30-rc7)
Mems_allowed_list Same as previous, but in "list format"
voluntary_ctxt_switches number of voluntary context switches
nonvoluntary_ctxt_switches number of non voluntary context switches
+ Stack usage: stack usage high water mark (round up to page size)
..............................................................................
Table 1-3: Contents of the statm files (as of 2.6.8-rc3)
@@ -307,7 +309,7 @@ address perms offset dev inode pathname
08049000-0804a000 rw-p 00001000 03:00 8312 /opt/test
0804a000-0806b000 rw-p 00000000 00:00 0 [heap]
a7cb1000-a7cb2000 ---p 00000000 00:00 0
-a7cb2000-a7eb2000 rw-p 00000000 00:00 0
+a7cb2000-a7eb2000 rw-p 00000000 00:00 0 [threadstack:001ff4b4]
a7eb2000-a7eb3000 ---p 00000000 00:00 0
a7eb3000-a7ed5000 rw-p 00000000 00:00 0
a7ed5000-a8008000 r-xp 00000000 03:00 4222 /lib/libc.so.6
@@ -343,6 +345,7 @@ is not associated with a file:
[stack] = the stack of the main process
[vdso] = the "virtual dynamic shared object",
the kernel system call handler
+ [threadstack:xxxxxxxx] = the stack of the thread, xxxxxxxx is the stack size
or if empty, the mapping is anonymous.
@@ -375,6 +378,19 @@ of memory currently marked as referenced or accessed.
This file is only present if the CONFIG_MMU kernel configuration option is
enabled.
+The /proc/PID/clear_refs is used to reset the PG_Referenced and ACCESSED/YOUNG
+bits on both physical and virtual pages associated with a process.
+To clear the bits for all the pages associated with the process
+ > echo 1 > /proc/PID/clear_refs
+
+To clear the bits for the anonymous pages associated with the process
+ > echo 2 > /proc/PID/clear_refs
+
+To clear the bits for the file mapped pages associated with the process
+ > echo 3 > /proc/PID/clear_refs
+Any other value written to /proc/PID/clear_refs will have no effect.
+
+
1.2 Kernel data
---------------
@@ -1032,9 +1048,9 @@ Various pieces of information about kernel activity are available in the
since the system first booted. For a quick look, simply cat the file:
> cat /proc/stat
- cpu 2255 34 2290 22625563 6290 127 456 0
- cpu0 1132 34 1441 11311718 3675 127 438 0
- cpu1 1123 0 849 11313845 2614 0 18 0
+ cpu 2255 34 2290 22625563 6290 127 456 0 0
+ cpu0 1132 34 1441 11311718 3675 127 438 0 0
+ cpu1 1123 0 849 11313845 2614 0 18 0 0
intr 114930548 113199788 3 0 5 263 0 4 [... lots more numbers ...]
ctxt 1990473
btime 1062191376
@@ -1056,6 +1072,7 @@ second). The meanings of the columns are as follows, from left to right:
- irq: servicing interrupts
- softirq: servicing softirqs
- steal: involuntary wait
+- guest: running a guest
The "intr" line gives counts of interrupts serviced since boot time, for each
of the possible system interrupts. The first column is the total of all
@@ -1167,13 +1184,11 @@ CHAPTER 3: PER-PROCESS PARAMETERS
3.1 /proc/<pid>/oom_adj - Adjust the oom-killer score
------------------------------------------------------
-This file can be used to adjust the score used to select which processes should
-be killed in an out-of-memory situation. The oom_adj value is a characteristic
-of the task's mm, so all threads that share an mm with pid will have the same
-oom_adj value. A high value will increase the likelihood of this process being
-killed by the oom-killer. Valid values are in the range -16 to +15 as
-explained below and a special value of -17, which disables oom-killing
-altogether for threads sharing pid's mm.
+This file can be used to adjust the score used to select which processes
+should be killed in an out-of-memory situation. Giving it a high score will
+increase the likelihood of this process being killed by the oom-killer. Valid
+values are in the range -16 to +15, plus the special value -17, which disables
+oom-killing altogether for this process.
The process to be killed in an out-of-memory situation is selected among all others
based on its badness score. This value equals the original memory size of the process
@@ -1187,16 +1202,13 @@ the parent's score if they do not share the same memory. Thus forking servers
are the prime candidates to be killed. Having only one 'hungry' child will make
parent less preferable than the child.
-/proc/<pid>/oom_adj cannot be changed for kthreads since they are immune from
-oom-killing already.
-
/proc/<pid>/oom_score shows process' current badness score.
The following heuristics are then applied:
* if the task was reniced, its score doubles
* superuser or direct hardware access tasks (CAP_SYS_ADMIN, CAP_SYS_RESOURCE
or CAP_SYS_RAWIO) have their score divided by 4
- * if oom condition happened in one cpuset and checked task does not belong
+ * if oom condition happened in one cpuset and checked process does not belong
to it, its score is divided by 8
* the resulting score is multiplied by two to the power of oom_adj, i.e.
points <<= oom_adj when it is positive and
diff --git a/Documentation/filesystems/seq_file.txt b/Documentation/filesystems/seq_file.txt
index b843743aa0b..0d15ebccf5b 100644
--- a/Documentation/filesystems/seq_file.txt
+++ b/Documentation/filesystems/seq_file.txt
@@ -46,7 +46,7 @@ better to do. The file is seekable, in that one can do something like the
following:
dd if=/proc/sequence of=out1 count=1
- dd if=/proc/sequence skip=1 out=out2 count=1
+ dd if=/proc/sequence skip=1 of=out2 count=1
Then concatenate the output files out1 and out2 and get the right
result. Yes, it is a thoroughly useless module, but the point is to show
diff --git a/Documentation/filesystems/sharedsubtree.txt b/Documentation/filesystems/sharedsubtree.txt
index 736540045dc..23a181074f9 100644
--- a/Documentation/filesystems/sharedsubtree.txt
+++ b/Documentation/filesystems/sharedsubtree.txt
@@ -4,7 +4,7 @@ Shared Subtrees
Contents:
1) Overview
2) Features
- 3) smount command
+ 3) Setting mount states
4) Use-case
5) Detailed semantics
6) Quiz
@@ -41,14 +41,14 @@ replicas continue to be exactly same.
Here is an example:
- Lets say /mnt has a mount that is shared.
+ Let's say /mnt has a mount that is shared.
mount --make-shared /mnt
- note: mount command does not yet support the --make-shared flag.
- I have included a small C program which does the same by executing
- 'smount /mnt shared'
+ Note: mount(8) command now supports the --make-shared flag,
+ so the sample 'smount' program is no longer needed and has been
+ removed.
- #mount --bind /mnt /tmp
+ # mount --bind /mnt /tmp
The above command replicates the mount at /mnt to the mountpoint /tmp
and the contents of both the mounts remain identical.
@@ -58,8 +58,8 @@ replicas continue to be exactly same.
#ls /tmp
a b c
- Now lets say we mount a device at /tmp/a
- #mount /dev/sd0 /tmp/a
+ Now let's say we mount a device at /tmp/a
+ # mount /dev/sd0 /tmp/a
#ls /tmp/a
t1 t2 t2
@@ -80,21 +80,20 @@ replicas continue to be exactly same.
Here is an example:
- Lets say /mnt has a mount which is shared.
- #mount --make-shared /mnt
+ Let's say /mnt has a mount which is shared.
+ # mount --make-shared /mnt
- Lets bind mount /mnt to /tmp
- #mount --bind /mnt /tmp
+ Let's bind mount /mnt to /tmp
+ # mount --bind /mnt /tmp
the new mount at /tmp becomes a shared mount and it is a replica of
the mount at /mnt.
- Now lets make the mount at /tmp; a slave of /mnt
- #mount --make-slave /tmp
- [or smount /tmp slave]
+ Now let's make the mount at /tmp; a slave of /mnt
+ # mount --make-slave /tmp
- lets mount /dev/sd0 on /mnt/a
- #mount /dev/sd0 /mnt/a
+ let's mount /dev/sd0 on /mnt/a
+ # mount /dev/sd0 /mnt/a
#ls /mnt/a
t1 t2 t3
@@ -104,9 +103,9 @@ replicas continue to be exactly same.
Note the mount event has propagated to the mount at /tmp
- However lets see what happens if we mount something on the mount at /tmp
+ However let's see what happens if we mount something on the mount at /tmp
- #mount /dev/sd1 /tmp/b
+ # mount /dev/sd1 /tmp/b
#ls /tmp/b
s1 s2 s3
@@ -124,12 +123,11 @@ replicas continue to be exactly same.
2d) A unbindable mount is a unbindable private mount
- lets say we have a mount at /mnt and we make is unbindable
+ let's say we have a mount at /mnt and we make is unbindable
- #mount --make-unbindable /mnt
- [ smount /mnt unbindable ]
+ # mount --make-unbindable /mnt
- Lets try to bind mount this mount somewhere else.
+ Let's try to bind mount this mount somewhere else.
# mount --bind /mnt /tmp
mount: wrong fs type, bad option, bad superblock on /mnt,
or too many mounted file systems
@@ -137,149 +135,15 @@ replicas continue to be exactly same.
Binding a unbindable mount is a invalid operation.
-3) smount command
+3) Setting mount states
- Currently the mount command is not aware of shared subtree features.
- Work is in progress to add the support in mount ( util-linux package ).
- Till then use the following program.
+ The mount command (util-linux package) can be used to set mount
+ states:
- ------------------------------------------------------------------------
- //
- //this code was developed my Miklos Szeredi <miklos@szeredi.hu>
- //and modified by Ram Pai <linuxram@us.ibm.com>
- // sample usage:
- // smount /tmp shared
- //
- #include <stdio.h>
- #include <stdlib.h>
- #include <unistd.h>
- #include <string.h>
- #include <sys/mount.h>
- #include <sys/fsuid.h>
-
- #ifndef MS_REC
- #define MS_REC 0x4000 /* 16384: Recursive loopback */
- #endif
-
- #ifndef MS_SHARED
- #define MS_SHARED 1<<20 /* Shared */
- #endif
-
- #ifndef MS_PRIVATE
- #define MS_PRIVATE 1<<18 /* Private */
- #endif
-
- #ifndef MS_SLAVE
- #define MS_SLAVE 1<<19 /* Slave */
- #endif
-
- #ifndef MS_UNBINDABLE
- #define MS_UNBINDABLE 1<<17 /* Unbindable */
- #endif
-
- int main(int argc, char *argv[])
- {
- int type;
- if(argc != 3) {
- fprintf(stderr, "usage: %s dir "
- "<rshared|rslave|rprivate|runbindable|shared|slave"
- "|private|unbindable>\n" , argv[0]);
- return 1;
- }
-
- fprintf(stdout, "%s %s %s\n", argv[0], argv[1], argv[2]);
-
- if (strcmp(argv[2],"rshared")==0)
- type=(MS_SHARED|MS_REC);
- else if (strcmp(argv[2],"rslave")==0)
- type=(MS_SLAVE|MS_REC);
- else if (strcmp(argv[2],"rprivate")==0)
- type=(MS_PRIVATE|MS_REC);
- else if (strcmp(argv[2],"runbindable")==0)
- type=(MS_UNBINDABLE|MS_REC);
- else if (strcmp(argv[2],"shared")==0)
- type=MS_SHARED;
- else if (strcmp(argv[2],"slave")==0)
- type=MS_SLAVE;
- else if (strcmp(argv[2],"private")==0)
- type=MS_PRIVATE;
- else if (strcmp(argv[2],"unbindable")==0)
- type=MS_UNBINDABLE;
- else {
- fprintf(stderr, "invalid operation: %s\n", argv[2]);
- return 1;
- }
- setfsuid(getuid());
-
- if(mount("", argv[1], "dontcare", type, "") == -1) {
- perror("mount");
- return 1;
- }
- return 0;
- }
- -----------------------------------------------------------------------
-
- Copy the above code snippet into smount.c
- gcc -o smount smount.c
-
-
- (i) To mark all the mounts under /mnt as shared execute the following
- command:
-
- smount /mnt rshared
- the corresponding syntax planned for mount command is
- mount --make-rshared /mnt
-
- just to mark a mount /mnt as shared, execute the following
- command:
- smount /mnt shared
- the corresponding syntax planned for mount command is
- mount --make-shared /mnt
-
- (ii) To mark all the shared mounts under /mnt as slave execute the
- following
-
- command:
- smount /mnt rslave
- the corresponding syntax planned for mount command is
- mount --make-rslave /mnt
-
- just to mark a mount /mnt as slave, execute the following
- command:
- smount /mnt slave
- the corresponding syntax planned for mount command is
- mount --make-slave /mnt
-
- (iii) To mark all the mounts under /mnt as private execute the
- following command:
-
- smount /mnt rprivate
- the corresponding syntax planned for mount command is
- mount --make-rprivate /mnt
-
- just to mark a mount /mnt as private, execute the following
- command:
- smount /mnt private
- the corresponding syntax planned for mount command is
- mount --make-private /mnt
-
- NOTE: by default all the mounts are created as private. But if
- you want to change some shared/slave/unbindable mount as
- private at a later point in time, this command can help.
-
- (iv) To mark all the mounts under /mnt as unbindable execute the
- following
-
- command:
- smount /mnt runbindable
- the corresponding syntax planned for mount command is
- mount --make-runbindable /mnt
-
- just to mark a mount /mnt as unbindable, execute the following
- command:
- smount /mnt unbindable
- the corresponding syntax planned for mount command is
- mount --make-unbindable /mnt
+ mount --make-shared mountpoint
+ mount --make-slave mountpoint
+ mount --make-private mountpoint
+ mount --make-unbindable mountpoint
4) Use cases
@@ -350,7 +214,7 @@ replicas continue to be exactly same.
mount --rbind / /view/v3
mount --rbind / /view/v4
- and if /usr has a versioning filesystem mounted, than that
+ and if /usr has a versioning filesystem mounted, then that
mount appears at /view/v1/usr, /view/v2/usr, /view/v3/usr and
/view/v4/usr too
@@ -390,7 +254,7 @@ replicas continue to be exactly same.
For example:
mount --make-shared /mnt
- mount --bin /mnt /tmp
+ mount --bind /mnt /tmp
The mount at /mnt and that at /tmp are both shared and belong
to the same peer group. Anything mounted or unmounted under
@@ -558,7 +422,7 @@ replicas continue to be exactly same.
then the subtree under the unbindable mount is pruned in the new
location.
- eg: lets say we have the following mount tree.
+ eg: let's say we have the following mount tree.
A
/ \
@@ -566,7 +430,7 @@ replicas continue to be exactly same.
/ \ / \
D E F G
- Lets say all the mount except the mount C in the tree are
+ Let's say all the mount except the mount C in the tree are
of a type other than unbindable.
If this tree is rbound to say Z
@@ -683,13 +547,13 @@ replicas continue to be exactly same.
'b' on mounts that receive propagation from mount 'B' and does not have
sub-mounts within them are unmounted.
- Example: Lets say 'B1', 'B2', 'B3' are shared mounts that propagate to
+ Example: Let's say 'B1', 'B2', 'B3' are shared mounts that propagate to
each other.
- lets say 'A1', 'A2', 'A3' are first mounted at dentry 'b' on mount
+ let's say 'A1', 'A2', 'A3' are first mounted at dentry 'b' on mount
'B1', 'B2' and 'B3' respectively.
- lets say 'C1', 'C2', 'C3' are next mounted at the same dentry 'b' on
+ let's say 'C1', 'C2', 'C3' are next mounted at the same dentry 'b' on
mount 'B1', 'B2' and 'B3' respectively.
if 'C1' is unmounted, all the mounts that are most-recently-mounted on
@@ -710,7 +574,7 @@ replicas continue to be exactly same.
A cloned namespace contains all the mounts as that of the parent
namespace.
- Lets say 'A' and 'B' are the corresponding mounts in the parent and the
+ Let's say 'A' and 'B' are the corresponding mounts in the parent and the
child namespace.
If 'A' is shared, then 'B' is also shared and 'A' and 'B' propagate to
@@ -759,11 +623,11 @@ replicas continue to be exactly same.
mount --make-slave /mnt
At this point we have the first mount at /tmp and
- its root dentry is 1. Lets call this mount 'A'
+ its root dentry is 1. Let's call this mount 'A'
And then we have a second mount at /tmp1 with root
- dentry 2. Lets call this mount 'B'
+ dentry 2. Let's call this mount 'B'
Next we have a third mount at /mnt with root dentry
- mnt. Lets call this mount 'C'
+ mnt. Let's call this mount 'C'
'B' is the slave of 'A' and 'C' is a slave of 'B'
A -> B -> C
@@ -794,7 +658,7 @@ replicas continue to be exactly same.
Q3 Why is unbindable mount needed?
- Lets say we want to replicate the mount tree at multiple
+ Let's say we want to replicate the mount tree at multiple
locations within the same subtree.
if one rbind mounts a tree within the same subtree 'n' times
@@ -803,7 +667,7 @@ replicas continue to be exactly same.
mounts. Here is a example.
step 1:
- lets say the root tree has just two directories with
+ let's say the root tree has just two directories with
one vfsmount.
root
/ \
@@ -875,7 +739,7 @@ replicas continue to be exactly same.
Unclonable mounts come in handy here.
step 1:
- lets say the root tree has just two directories with
+ let's say the root tree has just two directories with
one vfsmount.
root
/ \
diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt
index f49eecf2e57..623f094c9d8 100644
--- a/Documentation/filesystems/vfs.txt
+++ b/Documentation/filesystems/vfs.txt
@@ -536,6 +536,7 @@ struct address_space_operations {
/* migrate the contents of a page to the specified target */
int (*migratepage) (struct page *, struct page *);
int (*launder_page) (struct page *);
+ int (*error_remove_page) (struct mapping *mapping, struct page *page);
};
writepage: called by the VM to write a dirty page to backing store.
@@ -694,6 +695,12 @@ struct address_space_operations {
prevent redirtying the page, it is kept locked during the whole
operation.
+ error_remove_page: normally set to generic_error_remove_page if truncation
+ is ok for this address space. Used for memory failure handling.
+ Setting this implies you deal with pages going away under you,
+ unless you have them locked or reference counts increased.
+
+
The File Object
===============