diff options
Diffstat (limited to 'Documentation/filesystems')
-rw-r--r-- | Documentation/filesystems/Locking | 2 | ||||
-rw-r--r-- | Documentation/filesystems/adfs.txt | 18 | ||||
-rw-r--r-- | Documentation/filesystems/autofs4-mount-control.txt | 2 | ||||
-rw-r--r-- | Documentation/filesystems/caching/netfs-api.txt | 18 | ||||
-rw-r--r-- | Documentation/filesystems/configfs/configfs.txt | 2 | ||||
-rw-r--r-- | Documentation/filesystems/exofs.txt | 10 | ||||
-rw-r--r-- | Documentation/filesystems/ext4.txt | 211 | ||||
-rw-r--r-- | Documentation/filesystems/gfs2-uevents.txt | 2 | ||||
-rw-r--r-- | Documentation/filesystems/gfs2.txt | 2 | ||||
-rw-r--r-- | Documentation/filesystems/ntfs.txt | 2 | ||||
-rw-r--r-- | Documentation/filesystems/ocfs2.txt | 2 | ||||
-rw-r--r-- | Documentation/filesystems/path-lookup.txt | 4 | ||||
-rw-r--r-- | Documentation/filesystems/pohmelfs/network_protocol.txt | 2 | ||||
-rw-r--r-- | Documentation/filesystems/porting | 16 | ||||
-rw-r--r-- | Documentation/filesystems/proc.txt | 6 | ||||
-rw-r--r-- | Documentation/filesystems/squashfs.txt | 30 | ||||
-rw-r--r-- | Documentation/filesystems/sysfs.txt | 2 | ||||
-rw-r--r-- | Documentation/filesystems/vfs.txt | 4 | ||||
-rw-r--r-- | Documentation/filesystems/xfs-delayed-logging-design.txt | 15 |
19 files changed, 295 insertions, 55 deletions
diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking index 2e994efe12c..61b31acb917 100644 --- a/Documentation/filesystems/Locking +++ b/Documentation/filesystems/Locking @@ -128,7 +128,7 @@ alloc_inode: destroy_inode: dirty_inode: (must not sleep) write_inode: -drop_inode: !!!inode_lock!!! +drop_inode: !!!inode->i_lock!!! evict_inode: put_super: write write_super: read diff --git a/Documentation/filesystems/adfs.txt b/Documentation/filesystems/adfs.txt index 9e8811f92b8..5949766353f 100644 --- a/Documentation/filesystems/adfs.txt +++ b/Documentation/filesystems/adfs.txt @@ -9,6 +9,9 @@ Mount options for ADFS will be nnn. Default 0700. othmask=nnn The permission mask for ADFS 'other' permissions will be nnn. Default 0077. + ftsuffix=n When ftsuffix=0, no file type suffix will be applied. + When ftsuffix=1, a hexadecimal suffix corresponding to + the RISC OS file type will be added. Default 0. Mapping of ADFS permissions to Linux permissions ------------------------------------------------ @@ -55,3 +58,18 @@ Mapping of ADFS permissions to Linux permissions You can therefore tailor the permission translation to whatever you desire the permissions should be under Linux. + +RISC OS file type suffix +------------------------ + + RISC OS file types are stored in bits 19..8 of the file load address. + + To enable non-RISC OS systems to be used to store files without losing + file type information, a file naming convention was devised (initially + for use with NFS) such that a hexadecimal suffix of the form ,xyz + denoted the file type: e.g. BasicFile,ffb is a BASIC (0xffb) file. This + naming convention is now also used by RISC OS emulators such as RPCEmu. + + Mounting an ADFS disc with option ftsuffix=1 will cause appropriate file + type suffixes to be appended to file names read from a directory. If the + ftsuffix option is zero or omitted, no file type suffixes will be added. diff --git a/Documentation/filesystems/autofs4-mount-control.txt b/Documentation/filesystems/autofs4-mount-control.txt index 51986bf08a4..4c95935cbcf 100644 --- a/Documentation/filesystems/autofs4-mount-control.txt +++ b/Documentation/filesystems/autofs4-mount-control.txt @@ -309,7 +309,7 @@ ioctlfd field set to the descriptor obtained from the open call. AUTOFS_DEV_IOCTL_TIMEOUT_CMD ---------------------------- -Set the expire timeout for mounts withing an autofs mount point. +Set the expire timeout for mounts within an autofs mount point. The call requires an initialized struct autofs_dev_ioctl with the ioctlfd field set to the descriptor obtained from the open call. diff --git a/Documentation/filesystems/caching/netfs-api.txt b/Documentation/filesystems/caching/netfs-api.txt index 1902c57b72e..a167ab876c3 100644 --- a/Documentation/filesystems/caching/netfs-api.txt +++ b/Documentation/filesystems/caching/netfs-api.txt @@ -95,7 +95,7 @@ restraints as possible on how an index is structured and where it is placed in the tree. The netfs can even mix indices and data files at the same level, but it's not recommended. -Each index entry consists of a key of indeterminate length plus some auxilliary +Each index entry consists of a key of indeterminate length plus some auxiliary data, also of indeterminate length. There are some limits on indices: @@ -203,23 +203,23 @@ This has the following fields: If the function is absent, a file size of 0 is assumed. - (6) A function to retrieve auxilliary data from the netfs [optional]. + (6) A function to retrieve auxiliary data from the netfs [optional]. This function will be called with the netfs data that was passed to the - cookie acquisition function and the maximum length of auxilliary data that - it may provide. It should write the auxilliary data into the given buffer + cookie acquisition function and the maximum length of auxiliary data that + it may provide. It should write the auxiliary data into the given buffer and return the quantity it wrote. - If this function is absent, the auxilliary data length will be set to 0. + If this function is absent, the auxiliary data length will be set to 0. - The length of the auxilliary data buffer may be dependent on the key + The length of the auxiliary data buffer may be dependent on the key length. A netfs mustn't rely on being able to provide more than 400 bytes for both. - (7) A function to check the auxilliary data [optional]. + (7) A function to check the auxiliary data [optional]. This function will be called to check that a match found in the cache for - this object is valid. For instance with AFS it could check the auxilliary + this object is valid. For instance with AFS it could check the auxiliary data against the data version number returned by the server to determine whether the index entry in a cache is still valid. @@ -232,7 +232,7 @@ This has the following fields: (*) FSCACHE_CHECKAUX_NEEDS_UPDATE - the entry requires update (*) FSCACHE_CHECKAUX_OBSOLETE - the entry should be deleted - This function can also be used to extract data from the auxilliary data in + This function can also be used to extract data from the auxiliary data in the cache and copy it into the netfs's structures. (8) A pair of functions to manage contexts for the completion callback diff --git a/Documentation/filesystems/configfs/configfs.txt b/Documentation/filesystems/configfs/configfs.txt index fabcb0e00f2..dd57bb6bb39 100644 --- a/Documentation/filesystems/configfs/configfs.txt +++ b/Documentation/filesystems/configfs/configfs.txt @@ -409,7 +409,7 @@ As a consequence of this, default_groups cannot be removed directly via rmdir(2). They also are not considered when rmdir(2) on the parent group is checking for children. -[Dependant Subsystems] +[Dependent Subsystems] Sometimes other drivers depend on particular configfs items. For example, ocfs2 mounts depend on a heartbeat region item. If that diff --git a/Documentation/filesystems/exofs.txt b/Documentation/filesystems/exofs.txt index abd2a9b5b78..23583a13697 100644 --- a/Documentation/filesystems/exofs.txt +++ b/Documentation/filesystems/exofs.txt @@ -104,7 +104,15 @@ Where: exofs specific options: Options are separated by commas (,) pid=<integer> - The partition number to mount/create as container of the filesystem. - This option is mandatory. + This option is mandatory. integer can be + Hex by pre-pending an 0x to the number. + osdname=<id> - Mount by a device's osdname. + osdname is usually a 36 character uuid of the + form "d2683732-c906-4ee1-9dbd-c10c27bb40df". + It is one of the device's uuid specified in the + mkfs.exofs format command. + If this option is specified then the /dev/osdX + above can be empty and is ignored. to=<integer> - Timeout in ticks for a single command. default is (60 * HZ) [for debugging only] diff --git a/Documentation/filesystems/ext4.txt b/Documentation/filesystems/ext4.txt index 6ab9442d7ee..c79ec58fd7f 100644 --- a/Documentation/filesystems/ext4.txt +++ b/Documentation/filesystems/ext4.txt @@ -97,7 +97,7 @@ Note: More extensive information for getting started with ext4 can be * Inode allocation using large virtual block groups via flex_bg * delayed allocation * large block (up to pagesize) support -* efficent new ordered mode in JBD2 and ext4(avoid using buffer head to force +* efficient new ordered mode in JBD2 and ext4(avoid using buffer head to force the ordering) [1] Filesystems with a block size of 1k may see a limit imposed by the @@ -106,7 +106,7 @@ directory hash tree having a maximum depth of two. 2.2 Candidate features for future inclusion * Online defrag (patches available but not well tested) -* reduced mke2fs time via lazy itable initialization in conjuction with +* reduced mke2fs time via lazy itable initialization in conjunction with the uninit_bg feature (capability to do this is available in e2fsprogs but a kernel thread to do lazy zeroing of unused inode table blocks after filesystem is first mounted is required for safety) @@ -367,12 +367,47 @@ init_itable=n The lazy itable init code will wait n times the minimizes the impact on the systme performance while file system's inode table is being initialized. -discard Controls whether ext4 should issue discard/TRIM +discard Controls whether ext4 should issue discard/TRIM nodiscard(*) commands to the underlying block device when blocks are freed. This is useful for SSD devices and sparse/thinly-provisioned LUNs, but it is off by default until sufficient testing has been done. +nouid32 Disables 32-bit UIDs and GIDs. This is for + interoperability with older kernels which only + store and expect 16-bit values. + +resize Allows to resize filesystem to the end of the last + existing block group, further resize has to be done + with resize2fs either online, or offline. It can be + used only with conjunction with remount. + +block_validity This options allows to enables/disables the in-kernel +noblock_validity facility for tracking filesystem metadata blocks + within internal data structures. This allows multi- + block allocator and other routines to quickly locate + extents which might overlap with filesystem metadata + blocks. This option is intended for debugging + purposes and since it negatively affects the + performance, it is off by default. + +dioread_lock Controls whether or not ext4 should use the DIO read +dioread_nolock locking. If the dioread_nolock option is specified + ext4 will allocate uninitialized extent before buffer + write and convert the extent to initialized after IO + completes. This approach allows ext4 code to avoid + using inode mutex, which improves scalability on high + speed storages. However this does not work with nobh + option and the mount will fail. Nor does it work with + data journaling and dioread_nolock option will be + ignored with kernel warning. Note that dioread_nolock + code path is only used for extent-based files. + Because of the restrictions this options comprises + it is off by default (e.g. dioread_lock). + +i_version Enable 64-bit inode version support. This option is + off by default. + Data Mode ========= There are 3 different data modes: @@ -400,6 +435,176 @@ needs to be read from and written to disk at the same time where it outperforms all others modes. Currently ext4 does not have delayed allocation support if this data journalling mode is selected. +/proc entries +============= + +Information about mounted ext4 file systems can be found in +/proc/fs/ext4. Each mounted filesystem will have a directory in +/proc/fs/ext4 based on its device name (i.e., /proc/fs/ext4/hdc or +/proc/fs/ext4/dm-0). The files in each per-device directory are shown +in table below. + +Files in /proc/fs/ext4/<devname> +.............................................................................. + File Content + mb_groups details of multiblock allocator buddy cache of free blocks +.............................................................................. + +/sys entries +============ + +Information about mounted ext4 file systems can be found in +/sys/fs/ext4. Each mounted filesystem will have a directory in +/sys/fs/ext4 based on its device name (i.e., /sys/fs/ext4/hdc or +/sys/fs/ext4/dm-0). The files in each per-device directory are shown +in table below. + +Files in /sys/fs/ext4/<devname> +(see also Documentation/ABI/testing/sysfs-fs-ext4) +.............................................................................. + File Content + + delayed_allocation_blocks This file is read-only and shows the number of + blocks that are dirty in the page cache, but + which do not have their location in the + filesystem allocated yet. + + inode_goal Tuning parameter which (if non-zero) controls + the goal inode used by the inode allocator in + preference to all other allocation heuristics. + This is intended for debugging use only, and + should be 0 on production systems. + + inode_readahead_blks Tuning parameter which controls the maximum + number of inode table blocks that ext4's inode + table readahead algorithm will pre-read into + the buffer cache + + lifetime_write_kbytes This file is read-only and shows the number of + kilobytes of data that have been written to this + filesystem since it was created. + + max_writeback_mb_bump The maximum number of megabytes the writeback + code will try to write out before move on to + another inode. + + mb_group_prealloc The multiblock allocator will round up allocation + requests to a multiple of this tuning parameter if + the stripe size is not set in the ext4 superblock + + mb_max_to_scan The maximum number of extents the multiblock + allocator will search to find the best extent + + mb_min_to_scan The minimum number of extents the multiblock + allocator will search to find the best extent + + mb_order2_req Tuning parameter which controls the minimum size + for requests (as a power of 2) where the buddy + cache is used + + mb_stats Controls whether the multiblock allocator should + collect statistics, which are shown during the + unmount. 1 means to collect statistics, 0 means + not to collect statistics + + mb_stream_req Files which have fewer blocks than this tunable + parameter will have their blocks allocated out + of a block group specific preallocation pool, so + that small files are packed closely together. + Each large file will have its blocks allocated + out of its own unique preallocation pool. + + session_write_kbytes This file is read-only and shows the number of + kilobytes of data that have been written to this + filesystem since it was mounted. +.............................................................................. + +Ioctls +====== + +There is some Ext4 specific functionality which can be accessed by applications +through the system call interfaces. The list of all Ext4 specific ioctls are +shown in the table below. + +Table of Ext4 specific ioctls +.............................................................................. + Ioctl Description + EXT4_IOC_GETFLAGS Get additional attributes associated with inode. + The ioctl argument is an integer bitfield, with + bit values described in ext4.h. This ioctl is an + alias for FS_IOC_GETFLAGS. + + EXT4_IOC_SETFLAGS Set additional attributes associated with inode. + The ioctl argument is an integer bitfield, with + bit values described in ext4.h. This ioctl is an + alias for FS_IOC_SETFLAGS. + + EXT4_IOC_GETVERSION + EXT4_IOC_GETVERSION_OLD + Get the inode i_generation number stored for + each inode. The i_generation number is normally + changed only when new inode is created and it is + particularly useful for network filesystems. The + '_OLD' version of this ioctl is an alias for + FS_IOC_GETVERSION. + + EXT4_IOC_SETVERSION + EXT4_IOC_SETVERSION_OLD + Set the inode i_generation number stored for + each inode. The '_OLD' version of this ioctl + is an alias for FS_IOC_SETVERSION. + + EXT4_IOC_GROUP_EXTEND This ioctl has the same purpose as the resize + mount option. It allows to resize filesystem + to the end of the last existing block group, + further resize has to be done with resize2fs, + either online, or offline. The argument points + to the unsigned logn number representing the + filesystem new block count. + + EXT4_IOC_MOVE_EXT Move the block extents from orig_fd (the one + this ioctl is pointing to) to the donor_fd (the + one specified in move_extent structure passed + as an argument to this ioctl). Then, exchange + inode metadata between orig_fd and donor_fd. + This is especially useful for online + defragmentation, because the allocator has the + opportunity to allocate moved blocks better, + ideally into one contiguous extent. + + EXT4_IOC_GROUP_ADD Add a new group descriptor to an existing or + new group descriptor block. The new group + descriptor is described by ext4_new_group_input + structure, which is passed as an argument to + this ioctl. This is especially useful in + conjunction with EXT4_IOC_GROUP_EXTEND, + which allows online resize of the filesystem + to the end of the last existing block group. + Those two ioctls combined is used in userspace + online resize tool (e.g. resize2fs). + + EXT4_IOC_MIGRATE This ioctl operates on the filesystem itself. + It converts (migrates) ext3 indirect block mapped + inode to ext4 extent mapped inode by walking + through indirect block mapping of the original + inode and converting contiguous block ranges + into ext4 extents of the temporary inode. Then, + inodes are swapped. This ioctl might help, when + migrating from ext3 to ext4 filesystem, however + suggestion is to create fresh ext4 filesystem + and copy data from the backup. Note, that + filesystem has to support extents for this ioctl + to work. + + EXT4_IOC_ALLOC_DA_BLKS Force all of the delay allocated blocks to be + allocated to preserve application-expected ext3 + behaviour. Note that this will also start + triggering a write of the data blocks, but this + behaviour may change in the future as it is + not necessary and has been done this way only + for sake of simplicity. +.............................................................................. + References ========== diff --git a/Documentation/filesystems/gfs2-uevents.txt b/Documentation/filesystems/gfs2-uevents.txt index fd966dc9979..d8188966929 100644 --- a/Documentation/filesystems/gfs2-uevents.txt +++ b/Documentation/filesystems/gfs2-uevents.txt @@ -62,7 +62,7 @@ be fixed. The REMOVE uevent is generated at the end of an unsuccessful mount or at the end of a umount of the filesystem. All REMOVE uevents will -have been preceeded by at least an ADD uevent for the same fileystem, +have been preceded by at least an ADD uevent for the same fileystem, and unlike the other uevents is generated automatically by the kernel's kobject subsystem. diff --git a/Documentation/filesystems/gfs2.txt b/Documentation/filesystems/gfs2.txt index 0b59c020091..4cda926628a 100644 --- a/Documentation/filesystems/gfs2.txt +++ b/Documentation/filesystems/gfs2.txt @@ -11,7 +11,7 @@ their I/O so file system consistency is maintained. One of the nifty features of GFS is perfect consistency -- changes made to the file system on one machine show up immediately on all other machines in the cluster. -GFS uses interchangable inter-node locking mechanisms, the currently +GFS uses interchangeable inter-node locking mechanisms, the currently supported mechanisms are: lock_nolock -- allows gfs to be used as a local file system diff --git a/Documentation/filesystems/ntfs.txt b/Documentation/filesystems/ntfs.txt index 933bc66ccff..791af8dac06 100644 --- a/Documentation/filesystems/ntfs.txt +++ b/Documentation/filesystems/ntfs.txt @@ -350,7 +350,7 @@ Note the "Should sync?" parameter "nosync" means that the two mirrors are already in sync which will be the case on a clean shutdown of Windows. If the mirrors are not clean, you can specify the "sync" option instead of "nosync" and the Device-Mapper driver will then copy the entirety of the "Source Device" -to the "Target Device" or if you specified multipled target devices to all of +to the "Target Device" or if you specified multiple target devices to all of them. Once you have your table, save it in a file somewhere (e.g. /etc/ntfsvolume1), diff --git a/Documentation/filesystems/ocfs2.txt b/Documentation/filesystems/ocfs2.txt index 5393e661169..9ed920a8cd7 100644 --- a/Documentation/filesystems/ocfs2.txt +++ b/Documentation/filesystems/ocfs2.txt @@ -80,7 +80,7 @@ user_xattr (*) Enables Extended User Attributes. nouser_xattr Disables Extended User Attributes. acl Enables POSIX Access Control Lists support. noacl (*) Disables POSIX Access Control Lists support. -resv_level=2 (*) Set how agressive allocation reservations will be. +resv_level=2 (*) Set how aggressive allocation reservations will be. Valid values are between 0 (reservations off) to 8 (maximum space for reservations). dir_resv_level= (*) By default, directory reservations will scale with file diff --git a/Documentation/filesystems/path-lookup.txt b/Documentation/filesystems/path-lookup.txt index eb59c8b44be..3571667c710 100644 --- a/Documentation/filesystems/path-lookup.txt +++ b/Documentation/filesystems/path-lookup.txt @@ -42,7 +42,7 @@ Path walking overview A name string specifies a start (root directory, cwd, fd-relative) and a sequence of elements (directory entry names), which together refer to a path in the namespace. A path is represented as a (dentry, vfsmount) tuple. The name -elements are sub-strings, seperated by '/'. +elements are sub-strings, separated by '/'. Name lookups will want to find a particular path that a name string refers to (usually the final element, or parent of final element). This is done by taking @@ -354,7 +354,7 @@ vfstest 24185492 4945 708725(2.9%) 1076136(4.4%) 0 2651 What this shows is that failed rcu-walk lookups, ie. ones that are restarted entirely with ref-walk, are quite rare. Even the "vfstest" case which -specifically has concurrent renames/mkdir/rmdir/ creat/unlink/etc to excercise +specifically has concurrent renames/mkdir/rmdir/ creat/unlink/etc to exercise such races is not showing a huge amount of restarts. Dropping from rcu-walk to ref-walk mean that we have encountered a dentry where diff --git a/Documentation/filesystems/pohmelfs/network_protocol.txt b/Documentation/filesystems/pohmelfs/network_protocol.txt index 40ea6c295af..65e03dd4482 100644 --- a/Documentation/filesystems/pohmelfs/network_protocol.txt +++ b/Documentation/filesystems/pohmelfs/network_protocol.txt @@ -20,7 +20,7 @@ Commands can be embedded into transaction command (which in turn has own command so one can extend protocol as needed without breaking backward compatibility as long as old commands are supported. All string lengths include tail 0 byte. -All commans are transfered over the network in big-endian. CPU endianess is used at the end peers. +All commands are transferred over the network in big-endian. CPU endianess is used at the end peers. @cmd - command number, which specifies command to be processed. Following commands are used currently: diff --git a/Documentation/filesystems/porting b/Documentation/filesystems/porting index 0c986c9e851..6e29954851a 100644 --- a/Documentation/filesystems/porting +++ b/Documentation/filesystems/porting @@ -298,11 +298,14 @@ be used instead. It gets called whenever the inode is evicted, whether it has remaining links or not. Caller does *not* evict the pagecache or inode-associated metadata buffers; getting rid of those is responsibility of method, as it had been for ->delete_inode(). - ->drop_inode() returns int now; it's called on final iput() with inode_lock -held and it returns true if filesystems wants the inode to be dropped. As before, -generic_drop_inode() is still the default and it's been updated appropriately. -generic_delete_inode() is also alive and it consists simply of return 1. Note that -all actual eviction work is done by caller after ->drop_inode() returns. + + ->drop_inode() returns int now; it's called on final iput() with +inode->i_lock held and it returns true if filesystems wants the inode to be +dropped. As before, generic_drop_inode() is still the default and it's been +updated appropriately. generic_delete_inode() is also alive and it consists +simply of return 1. Note that all actual eviction work is done by caller after +->drop_inode() returns. + clear_inode() is gone; use end_writeback() instead. As before, it must be called exactly once on each call of ->evict_inode() (as it used to be for each call of ->delete_inode()). Unlike before, if you are using inode-associated @@ -397,6 +400,9 @@ a file off. -- [mandatory] + +-- +[mandatory] ->get_sb() is gone. Switch to use of ->mount(). Typically it's just a matter of switching from calling get_sb_... to mount_... and changing the function type. If you were doing it manually, just switch from setting ->mnt_root diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt index 23cae6548d3..b0b814d75ca 100644 --- a/Documentation/filesystems/proc.txt +++ b/Documentation/filesystems/proc.txt @@ -543,7 +543,7 @@ just those considered 'most important'. The new vectors are: their statistics are used by kernel developers and interested users to determine the occurrence of interrupts of the given type. -The above IRQ vectors are displayed only when relevent. For example, +The above IRQ vectors are displayed only when relevant. For example, the threshold vector does not exist on x86_64 platforms. Others are suppressed when the system is a uniprocessor. As of this writing, only i386 and x86_64 platforms support the new IRQ vector displays. @@ -1202,7 +1202,7 @@ The columns are: W = can do write operations U = can do unblank flags E = it is enabled - C = it is prefered console + C = it is preferred console B = it is primary boot console p = it is used for printk buffer b = it is not a TTY but a Braille device @@ -1331,7 +1331,7 @@ NOTICE: /proc/<pid>/oom_adj is deprecated and will be removed, please see Documentation/feature-removal-schedule.txt. Caveat: when a parent task is selected, the oom killer will sacrifice any first -generation children with seperate address spaces instead, if possible. This +generation children with separate address spaces instead, if possible. This avoids servers and important system daemons from being killed and loses the minimal amount of work. diff --git a/Documentation/filesystems/squashfs.txt b/Documentation/filesystems/squashfs.txt index 66699afd66c..d4d41465a0b 100644 --- a/Documentation/filesystems/squashfs.txt +++ b/Documentation/filesystems/squashfs.txt @@ -59,12 +59,15 @@ obtained from this site also. 3. SQUASHFS FILESYSTEM DESIGN ----------------------------- -A squashfs filesystem consists of a maximum of eight parts, packed together on a byte -alignment: +A squashfs filesystem consists of a maximum of nine parts, packed together on a +byte alignment: --------------- | superblock | |---------------| + | compression | + | options | + |---------------| | datablocks | | & fragments | |---------------| @@ -91,7 +94,14 @@ the source directory, and checked for duplicates. Once all file data has been written the completed inode, directory, fragment, export and uid/gid lookup tables are written. -3.1 Inodes +3.1 Compression options +----------------------- + +Compressors can optionally support compression specific options (e.g. +dictionary size). If non-default compression options have been used, then +these are stored here. + +3.2 Inodes ---------- Metadata (inodes and directories) are compressed in 8Kbyte blocks. Each @@ -114,7 +124,7 @@ directory inode are defined: inodes optimised for frequently occurring regular files and directories, and extended types where extra information has to be stored. -3.2 Directories +3.3 Directories --------------- Like inodes, directories are packed into compressed metadata blocks, stored @@ -144,7 +154,7 @@ decompressed to do a lookup irrespective of the length of the directory. This scheme has the advantage that it doesn't require extra memory overhead and doesn't require much extra storage on disk. -3.3 File data +3.4 File data ------------- Regular files consist of a sequence of contiguous compressed blocks, and/or a @@ -163,7 +173,7 @@ Larger files use multiple slots, with 1.75 TiB files using all 8 slots. The index cache is designed to be memory efficient, and by default uses 16 KiB. -3.4 Fragment lookup table +3.5 Fragment lookup table ------------------------- Regular files can contain a fragment index which is mapped to a fragment @@ -173,7 +183,7 @@ A second index table is used to locate these. This second index table for speed of access (and because it is small) is read at mount time and cached in memory. -3.5 Uid/gid lookup table +3.6 Uid/gid lookup table ------------------------ For space efficiency regular files store uid and gid indexes, which are @@ -182,7 +192,7 @@ stored compressed into metadata blocks. A second index table is used to locate these. This second index table for speed of access (and because it is small) is read at mount time and cached in memory. -3.6 Export table +3.7 Export table ---------------- To enable Squashfs filesystems to be exportable (via NFS etc.) filesystems @@ -196,7 +206,7 @@ This table is stored compressed into metadata blocks. A second index table is used to locate these. This second index table for speed of access (and because it is small) is read at mount time and cached in memory. -3.7 Xattr table +3.8 Xattr table --------------- The xattr table contains extended attributes for each inode. The xattrs @@ -209,7 +219,7 @@ or if it is stored out of line (in which case the value field stores a reference to where the actual value is stored). This allows large values to be stored out of line improving scanning and lookup performance and it also allows values to be de-duplicated, the value being stored once, and -all other occurences holding an out of line reference to that value. +all other occurrences holding an out of line reference to that value. The xattr lists are packed into compressed 8K metadata blocks. To reduce overhead in inodes, rather than storing the on-disk diff --git a/Documentation/filesystems/sysfs.txt b/Documentation/filesystems/sysfs.txt index f806e50aaa6..597f728e7b4 100644 --- a/Documentation/filesystems/sysfs.txt +++ b/Documentation/filesystems/sysfs.txt @@ -62,7 +62,7 @@ values of the same type. Mixing types, expressing multiple lines of data, and doing fancy formatting of data is heavily frowned upon. Doing these things may get -you publically humiliated and your code rewritten without notice. +you publicly humiliated and your code rewritten without notice. An attribute definition is simply: diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt index 306f0ae8df0..21a7dc467bb 100644 --- a/Documentation/filesystems/vfs.txt +++ b/Documentation/filesystems/vfs.txt @@ -97,7 +97,7 @@ functions: The passed struct file_system_type describes your filesystem. When a request is made to mount a filesystem onto a directory in your namespace, the VFS will call the appropriate mount() method for the specific -filesystem. New vfsmount refering to the tree returned by ->mount() +filesystem. New vfsmount referring to the tree returned by ->mount() will be attached to the mountpoint, so that when pathname resolution reaches the mountpoint it will jump into the root of that vfsmount. @@ -254,7 +254,7 @@ or bottom half). should be synchronous or not, not all filesystems check this flag. drop_inode: called when the last access to the inode is dropped, - with the inode_lock spinlock held. + with the inode->i_lock spinlock held. This method should be either NULL (normal UNIX filesystem semantics) or "generic_delete_inode" (for filesystems that do not diff --git a/Documentation/filesystems/xfs-delayed-logging-design.txt b/Documentation/filesystems/xfs-delayed-logging-design.txt index 7445bf335da..2ce36439c09 100644 --- a/Documentation/filesystems/xfs-delayed-logging-design.txt +++ b/Documentation/filesystems/xfs-delayed-logging-design.txt @@ -42,7 +42,7 @@ the aggregation of all the previous changes currently held only in the log. This relogging technique also allows objects to be moved forward in the log so that an object being relogged does not prevent the tail of the log from ever moving forward. This can be seen in the table above by the changing -(increasing) LSN of each subsquent transaction - the LSN is effectively a +(increasing) LSN of each subsequent transaction - the LSN is effectively a direct encoding of the location in the log of the transaction. This relogging is also used to implement long-running, multiple-commit @@ -338,7 +338,7 @@ the same time another transaction modifies the item and inserts the log item into the new CIL, then checkpoint transaction commit code cannot use log items to store the list of log vectors that need to be written into the transaction. Hence log vectors need to be able to be chained together to allow them to be -detatched from the log items. That is, when the CIL is flushed the memory +detached from the log items. That is, when the CIL is flushed the memory buffer and log vector attached to each log item needs to be attached to the checkpoint context so that the log item can be released. In diagrammatic form, the CIL would look like this before the flush: @@ -577,7 +577,7 @@ only becomes unpinned when all the transactions complete and there are no pending transactions. Thus the pinning and unpinning of a log item is symmetric as there is a 1:1 relationship with transaction commit and log item completion. -For delayed logging, however, we have an assymetric transaction commit to +For delayed logging, however, we have an asymmetric transaction commit to completion relationship. Every time an object is relogged in the CIL it goes through the commit process without a corresponding completion being registered. That is, we now have a many-to-one relationship between transaction commit and @@ -780,7 +780,7 @@ With delayed logging, there are new steps inserted into the life cycle: From this, it can be seen that the only life cycle differences between the two logging methods are in the middle of the life cycle - they still have the same beginning and end and execution constraints. The only differences are in the -commiting of the log items to the log itself and the completion processing. +committing of the log items to the log itself and the completion processing. Hence delayed logging should not introduce any constraints on log item behaviour, allocation or freeing that don't already exist. @@ -791,10 +791,3 @@ mount option. Fundamentally, there is no reason why the log manager would not be able to swap methods automatically and transparently depending on load characteristics, but this should not be necessary if delayed logging works as designed. - -Roadmap: - -2.6.39 Switch default mount option to use delayed logging - => should be roughly 12 months after initial merge - => enough time to shake out remaining problems before next round of - enterprise distro kernel rebases |