diff options
Diffstat (limited to 'Documentation/filesystems')
-rw-r--r-- | Documentation/filesystems/configfs/configfs.txt | 57 | ||||
-rw-r--r-- | Documentation/filesystems/configfs/configfs_example.c | 2 | ||||
-rw-r--r-- | Documentation/filesystems/proc.txt | 65 | ||||
-rw-r--r-- | Documentation/filesystems/vfs.txt | 51 |
4 files changed, 136 insertions, 39 deletions
diff --git a/Documentation/filesystems/configfs/configfs.txt b/Documentation/filesystems/configfs/configfs.txt index b34cdb50eab..d1b98257d00 100644 --- a/Documentation/filesystems/configfs/configfs.txt +++ b/Documentation/filesystems/configfs/configfs.txt @@ -238,6 +238,8 @@ config_item_type. struct config_group *(*make_group)(struct config_group *group, const char *name); int (*commit_item)(struct config_item *item); + void (*disconnect_notify)(struct config_group *group, + struct config_item *item); void (*drop_item)(struct config_group *group, struct config_item *item); }; @@ -268,6 +270,16 @@ the item in other threads, the memory is safe. It may take some time for the item to actually disappear from the subsystem's usage. But it is gone from configfs. +When drop_item() is called, the item's linkage has already been torn +down. It no longer has a reference on its parent and has no place in +the item hierarchy. If a client needs to do some cleanup before this +teardown happens, the subsystem can implement the +ct_group_ops->disconnect_notify() method. The method is called after +configfs has removed the item from the filesystem view but before the +item is removed from its parent group. Like drop_item(), +disconnect_notify() is void and cannot fail. Client subsystems should +not drop any references here, as they still must do it in drop_item(). + A config_group cannot be removed while it still has child items. This is implemented in the configfs rmdir(2) code. ->drop_item() will not be called, as the item has not been dropped. rmdir(2) will fail, as the @@ -280,18 +292,18 @@ tells configfs to make the subsystem appear in the file tree. struct configfs_subsystem { struct config_group su_group; - struct semaphore su_sem; + struct mutex su_mutex; }; int configfs_register_subsystem(struct configfs_subsystem *subsys); void configfs_unregister_subsystem(struct configfs_subsystem *subsys); - A subsystem consists of a toplevel config_group and a semaphore. + A subsystem consists of a toplevel config_group and a mutex. The group is where child config_items are created. For a subsystem, this group is usually defined statically. Before calling configfs_register_subsystem(), the subsystem must have initialized the group via the usual group _init() functions, and it must also have -initialized the semaphore. +initialized the mutex. When the register call returns, the subsystem is live, and it will be visible via configfs. At that point, mkdir(2) can be called and the subsystem must be ready for it. @@ -303,7 +315,7 @@ subsystem/group and the simple_child item in configfs_example.c It shows a trivial object displaying and storing an attribute, and a simple group creating and destroying these children. -[Hierarchy Navigation and the Subsystem Semaphore] +[Hierarchy Navigation and the Subsystem Mutex] There is an extra bonus that configfs provides. The config_groups and config_items are arranged in a hierarchy due to the fact that they @@ -314,19 +326,19 @@ and config_item->ci_parent structure members. A subsystem can navigate the cg_children list and the ci_parent pointer to see the tree created by the subsystem. This can race with configfs' -management of the hierarchy, so configfs uses the subsystem semaphore to +management of the hierarchy, so configfs uses the subsystem mutex to protect modifications. Whenever a subsystem wants to navigate the hierarchy, it must do so under the protection of the subsystem -semaphore. +mutex. -A subsystem will be prevented from acquiring the semaphore while a newly +A subsystem will be prevented from acquiring the mutex while a newly allocated item has not been linked into this hierarchy. Similarly, it -will not be able to acquire the semaphore while a dropping item has not +will not be able to acquire the mutex while a dropping item has not yet been unlinked. This means that an item's ci_parent pointer will never be NULL while the item is in configfs, and that an item will only be in its parent's cg_children list for the same duration. This allows a subsystem to trust ci_parent and cg_children while they hold the -semaphore. +mutex. [Item Aggregation Via symlink(2)] @@ -386,6 +398,33 @@ As a consequence of this, default_groups cannot be removed directly via rmdir(2). They also are not considered when rmdir(2) on the parent group is checking for children. +[Dependant Subsystems] + +Sometimes other drivers depend on particular configfs items. For +example, ocfs2 mounts depend on a heartbeat region item. If that +region item is removed with rmdir(2), the ocfs2 mount must BUG or go +readonly. Not happy. + +configfs provides two additional API calls: configfs_depend_item() and +configfs_undepend_item(). A client driver can call +configfs_depend_item() on an existing item to tell configfs that it is +depended on. configfs will then return -EBUSY from rmdir(2) for that +item. When the item is no longer depended on, the client driver calls +configfs_undepend_item() on it. + +These API cannot be called underneath any configfs callbacks, as +they will conflict. They can block and allocate. A client driver +probably shouldn't calling them of its own gumption. Rather it should +be providing an API that external subsystems call. + +How does this work? Imagine the ocfs2 mount process. When it mounts, +it asks for a heartbeat region item. This is done via a call into the +heartbeat code. Inside the heartbeat code, the region item is looked +up. Here, the heartbeat code calls configfs_depend_item(). If it +succeeds, then heartbeat knows the region is safe to give to ocfs2. +If it fails, it was being torn down anyway, and heartbeat can gracefully +pass up an error. + [Committable Items] NOTE: Committable items are currently unimplemented. diff --git a/Documentation/filesystems/configfs/configfs_example.c b/Documentation/filesystems/configfs/configfs_example.c index 2d6a14a463e..e56d49264b3 100644 --- a/Documentation/filesystems/configfs/configfs_example.c +++ b/Documentation/filesystems/configfs/configfs_example.c @@ -453,7 +453,7 @@ static int __init configfs_example_init(void) subsys = example_subsys[i]; config_group_init(&subsys->su_group); - init_MUTEX(&subsys->su_sem); + mutex_init(&subsys->su_mutex); ret = configfs_register_subsystem(subsys); if (ret) { printk(KERN_ERR "Error %d while registering subsystem %s\n", diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt index 8756a07f4dc..460b892d089 100644 --- a/Documentation/filesystems/proc.txt +++ b/Documentation/filesystems/proc.txt @@ -171,7 +171,9 @@ read the file /proc/PID/status: This shows you nearly the same information you would get if you viewed it with the ps command. In fact, ps uses the proc file system to obtain its information. The statm file contains more detailed information about the -process memory usage. Its seven fields are explained in Table 1-2. +process memory usage. Its seven fields are explained in Table 1-2. The stat +file contains details information about the process itself. Its fields are +explained in Table 1-3. Table 1-2: Contents of the statm files (as of 2.6.8-rc3) @@ -188,16 +190,65 @@ Table 1-2: Contents of the statm files (as of 2.6.8-rc3) dt number of dirty pages (always 0 on 2.6) .............................................................................. + +Table 1-3: Contents of the stat files (as of 2.6.22-rc3) +.............................................................................. + Field Content + pid process id + tcomm filename of the executable + state state (R is running, S is sleeping, D is sleeping in an + uninterruptible wait, Z is zombie, T is traced or stopped) + ppid process id of the parent process + pgrp pgrp of the process + sid session id + tty_nr tty the process uses + tty_pgrp pgrp of the tty + flags task flags + min_flt number of minor faults + cmin_flt number of minor faults with child's + maj_flt number of major faults + cmaj_flt number of major faults with child's + utime user mode jiffies + stime kernel mode jiffies + cutime user mode jiffies with child's + cstime kernel mode jiffies with child's + priority priority level + nice nice level + num_threads number of threads + start_time time the process started after system boot + vsize virtual memory size + rss resident set memory size + rsslim current limit in bytes on the rss + start_code address above which program text can run + end_code address below which program text can run + start_stack address of the start of the stack + esp current value of ESP + eip current value of EIP + pending bitmap of pending signals (obsolete) + blocked bitmap of blocked signals (obsolete) + sigign bitmap of ignored signals (obsolete) + sigcatch bitmap of catched signals (obsolete) + wchan address where process went to sleep + 0 (place holder) + 0 (place holder) + exit_signal signal to send to parent thread on exit + task_cpu which CPU the task is scheduled on + rt_priority realtime priority + policy scheduling policy (man sched_setscheduler) + blkio_ticks time spent waiting for block IO +.............................................................................. + + 1.2 Kernel data --------------- Similar to the process entries, the kernel data files give information about the running kernel. The files used to obtain this information are contained in -/proc and are listed in Table 1-3. Not all of these will be present in your +/proc and are listed in Table 1-4. Not all of these will be present in your system. It depends on the kernel configuration and the loaded modules, which files are there, and which are missing. -Table 1-3: Kernel info in /proc +Table 1-4: Kernel info in /proc .............................................................................. File Content apm Advanced power management info @@ -473,10 +524,10 @@ IDE devices: More detailed information can be found in the controller specific subdirectories. These are named ide0, ide1 and so on. Each of these -directories contains the files shown in table 1-4. +directories contains the files shown in table 1-5. -Table 1-4: IDE controller info in /proc/ide/ide? +Table 1-5: IDE controller info in /proc/ide/ide? .............................................................................. File Content channel IDE channel (0 or 1) @@ -486,11 +537,11 @@ Table 1-4: IDE controller info in /proc/ide/ide? .............................................................................. Each device connected to a controller has a separate subdirectory in the -controllers directory. The files listed in table 1-5 are contained in these +controllers directory. The files listed in table 1-6 are contained in these directories. -Table 1-5: IDE device information +Table 1-6: IDE device information .............................................................................. File Content cache The cache diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt index a47cc819f37..045f3e055a2 100644 --- a/Documentation/filesystems/vfs.txt +++ b/Documentation/filesystems/vfs.txt @@ -3,7 +3,7 @@ Original author: Richard Gooch <rgooch@atnf.csiro.au> - Last updated on October 28, 2005 + Last updated on June 24, 2007. Copyright (C) 1999 Richard Gooch Copyright (C) 2005 Pekka Enberg @@ -107,7 +107,7 @@ file /proc/filesystems. struct file_system_type ----------------------- -This describes the filesystem. As of kernel 2.6.13, the following +This describes the filesystem. As of kernel 2.6.22, the following members are defined: struct file_system_type { @@ -119,6 +119,8 @@ struct file_system_type { struct module *owner; struct file_system_type * next; struct list_head fs_supers; + struct lock_class_key s_lock_key; + struct lock_class_key s_umount_key; }; name: the name of the filesystem type, such as "ext2", "iso9660", @@ -137,11 +139,12 @@ struct file_system_type { next: for internal VFS use: you should initialize this to NULL + s_lock_key, s_umount_key: lockdep-specific + The get_sb() method has the following arguments: - struct super_block *sb: the superblock structure. This is partially - initialized by the VFS and the rest must be initialized by the - get_sb() method + struct file_system_type *fs_type: decribes the filesystem, partly initialized + by the specific filesystem code int flags: mount flags @@ -150,12 +153,13 @@ The get_sb() method has the following arguments: void *data: arbitrary mount options, usually comes as an ASCII string - int silent: whether or not to be silent on error + struct vfsmount *mnt: a vfs-internal representation of a mount point The get_sb() method must determine if the block device specified -in the superblock contains a filesystem of the type the method -supports. On success the method returns the superblock pointer, on -failure it returns NULL. +in the dev_name and fs_type contains a filesystem of the type the method +supports. If it succeeds in opening the named block device, it initializes a +struct super_block descriptor for the filesystem contained by the block device. +On failure it returns an error. The most interesting member of the superblock structure that the get_sb() method fills in is the "s_op" field. This is a pointer to @@ -193,7 +197,7 @@ struct super_operations ----------------------- This describes how the VFS can manipulate the superblock of your -filesystem. As of kernel 2.6.13, the following members are defined: +filesystem. As of kernel 2.6.22, the following members are defined: struct super_operations { struct inode *(*alloc_inode)(struct super_block *sb); @@ -216,8 +220,6 @@ struct super_operations { void (*clear_inode) (struct inode *); void (*umount_begin) (struct super_block *); - void (*sync_inodes) (struct super_block *sb, - struct writeback_control *wbc); int (*show_options)(struct seq_file *, struct vfsmount *); ssize_t (*quota_read)(struct super_block *, int, char *, size_t, loff_t); @@ -300,9 +302,6 @@ or bottom half). umount_begin: called when the VFS is unmounting a filesystem. - sync_inodes: called when the VFS is writing out dirty data associated with - a superblock. - show_options: called by the VFS to show mount options for /proc/<pid>/mounts. quota_read: called by the VFS to read from filesystem quota file. @@ -324,7 +323,7 @@ struct inode_operations ----------------------- This describes how the VFS can manipulate an inode in your -filesystem. As of kernel 2.6.13, the following members are defined: +filesystem. As of kernel 2.6.22, the following members are defined: struct inode_operations { int (*create) (struct inode *,struct dentry *,int, struct nameidata *); @@ -348,6 +347,7 @@ struct inode_operations { ssize_t (*getxattr) (struct dentry *, const char *, void *, size_t); ssize_t (*listxattr) (struct dentry *, char *, size_t); int (*removexattr) (struct dentry *, const char *); + void (*truncate_range)(struct inode *, loff_t, loff_t); }; Again, all methods are called without any locks being held, unless @@ -444,6 +444,9 @@ otherwise noted. removexattr: called by the VFS to remove an extended attribute from a file. This method is called by removexattr(2) system call. + truncate_range: a method provided by the underlying filesystem to truncate a + range of blocks , i.e. punch a hole somewhere in a file. + The Address Space Object ======================== @@ -522,7 +525,7 @@ struct address_space_operations ------------------------------- This describes how the VFS can manipulate mapping of a file to page cache in -your filesystem. As of kernel 2.6.16, the following members are defined: +your filesystem. As of kernel 2.6.22, the following members are defined: struct address_space_operations { int (*writepage)(struct page *page, struct writeback_control *wbc); @@ -543,6 +546,7 @@ struct address_space_operations { int); /* migrate the contents of a page to the specified target */ int (*migratepage) (struct page *, struct page *); + int (*launder_page) (struct page *); }; writepage: called by the VM to write a dirty page to backing store. @@ -689,6 +693,10 @@ struct address_space_operations { transfer any private data across and update any references that it has to the page. + launder_page: Called before freeing a page - it writes back the dirty page. To + prevent redirtying the page, it is kept locked during the whole + operation. + The File Object =============== @@ -699,9 +707,10 @@ struct file_operations ---------------------- This describes how the VFS can manipulate an open file. As of kernel -2.6.17, the following members are defined: +2.6.22, the following members are defined: struct file_operations { + struct module *owner; loff_t (*llseek) (struct file *, loff_t, int); ssize_t (*read) (struct file *, char __user *, size_t, loff_t *); ssize_t (*write) (struct file *, const char __user *, size_t, loff_t *); @@ -728,10 +737,8 @@ struct file_operations { int (*check_flags)(int); int (*dir_notify)(struct file *filp, unsigned long arg); int (*flock) (struct file *, int, struct file_lock *); - ssize_t (*splice_write)(struct pipe_inode_info *, struct file *, size_t, unsigned -int); - ssize_t (*splice_read)(struct file *, struct pipe_inode_info *, size_t, unsigned -int); + ssize_t (*splice_write)(struct pipe_inode_info *, struct file *, size_t, unsigned int); + ssize_t (*splice_read)(struct file *, struct pipe_inode_info *, size_t, unsigned int); }; Again, all methods are called without any locks being held, unless |