From 88393161210493e317ae391696ee8ef463cb3c23 Mon Sep 17 00:00:00 2001
From: Thomas Weber <swirl@gmx.li>
Date: Tue, 16 Mar 2010 11:47:56 +0100
Subject: Fix typos in comments

[Ss]ytem => [Ss]ystem
udpate => update
paramters => parameters
orginal => original

Signed-off-by: Thomas Weber <swirl@gmx.li>
Acked-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
---
 Documentation/cgroups/cgroups.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

(limited to 'Documentation/cgroups')

diff --git a/Documentation/cgroups/cgroups.txt b/Documentation/cgroups/cgroups.txt
index fd588ff0e29..5eb279a48fa 100644
--- a/Documentation/cgroups/cgroups.txt
+++ b/Documentation/cgroups/cgroups.txt
@@ -573,7 +573,7 @@ void cancel_attach(struct cgroup_subsys *ss, struct cgroup *cgrp,
 
 Called when a task attach operation has failed after can_attach() has succeeded.
 A subsystem whose can_attach() has some side-effects should provide this
-function, so that the subsytem can implement a rollback. If not, not necessary.
+function, so that the subsystem can implement a rollback. If not, not necessary.
 This will be called only about subsystems whose can_attach() operation have
 succeeded.
 
-- 
cgit v1.2.3-70-g09d2


From ab5097b11f6fcb587b65cd08bd7241661ade77db Mon Sep 17 00:00:00 2001
From: Greg Thelen <gthelen@google.com>
Date: Thu, 18 Mar 2010 15:04:39 -0700
Subject: memcg: fix typo in memcg documentation

Updated memory.txt to be more consistent: s/swapiness/swappiness/

Signed-off-by: Greg Thelen <gthelen@google.com>
Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
---
 Documentation/cgroups/memory.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

(limited to 'Documentation/cgroups')

diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt
index f8bc802d70b..3a6aecd078b 100644
--- a/Documentation/cgroups/memory.txt
+++ b/Documentation/cgroups/memory.txt
@@ -340,7 +340,7 @@ Note:
 5.3 swappiness
   Similar to /proc/sys/vm/swappiness, but affecting a hierarchy of groups only.
 
-  Following cgroups' swapiness can't be changed.
+  Following cgroups' swappiness can't be changed.
   - root cgroup (uses /proc/sys/vm/swappiness).
   - a cgroup which uses hierarchy and it has child cgroup.
   - a cgroup which uses hierarchy and not the root of hierarchy.
-- 
cgit v1.2.3-70-g09d2


From 5239c4ff4ae9e810ba761518ad71b463f0ccbf3c Mon Sep 17 00:00:00 2001
From: Greg Thelen <gthelen@google.com>
Date: Wed, 24 Mar 2010 14:48:30 -0700
Subject: cpuset: Fix documentation punctuation

Fix cpusets.txt documentation punctuation.

Signed-off-by: Greg Thelen <gthelen@google.com>
Acked-by: Randy Dunlap <rdunlap@xenotime.net>
Acked-by: Paul Menage <menage@google.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
---
 Documentation/cgroups/cpusets.txt | 38 +++++++++++++++++++-------------------
 1 file changed, 19 insertions(+), 19 deletions(-)

(limited to 'Documentation/cgroups')

diff --git a/Documentation/cgroups/cpusets.txt b/Documentation/cgroups/cpusets.txt
index 4160df82b3f..51682ab2dd1 100644
--- a/Documentation/cgroups/cpusets.txt
+++ b/Documentation/cgroups/cpusets.txt
@@ -42,7 +42,7 @@ Nodes to a set of tasks.   In this document "Memory Node" refers to
 an on-line node that contains memory.
 
 Cpusets constrain the CPU and Memory placement of tasks to only
-the resources within a tasks current cpuset.  They form a nested
+the resources within a task's current cpuset.  They form a nested
 hierarchy visible in a virtual file system.  These are the essential
 hooks, beyond what is already present, required to manage dynamic
 job placement on large systems.
@@ -53,11 +53,11 @@ Documentation/cgroups/cgroups.txt.
 Requests by a task, using the sched_setaffinity(2) system call to
 include CPUs in its CPU affinity mask, and using the mbind(2) and
 set_mempolicy(2) system calls to include Memory Nodes in its memory
-policy, are both filtered through that tasks cpuset, filtering out any
+policy, are both filtered through that task's cpuset, filtering out any
 CPUs or Memory Nodes not in that cpuset.  The scheduler will not
 schedule a task on a CPU that is not allowed in its cpus_allowed
 vector, and the kernel page allocator will not allocate a page on a
-node that is not allowed in the requesting tasks mems_allowed vector.
+node that is not allowed in the requesting task's mems_allowed vector.
 
 User level code may create and destroy cpusets by name in the cgroup
 virtual file system, manage the attributes and permissions of these
@@ -121,9 +121,9 @@ Cpusets extends these two mechanisms as follows:
  - Each task in the system is attached to a cpuset, via a pointer
    in the task structure to a reference counted cgroup structure.
  - Calls to sched_setaffinity are filtered to just those CPUs
-   allowed in that tasks cpuset.
+   allowed in that task's cpuset.
  - Calls to mbind and set_mempolicy are filtered to just
-   those Memory Nodes allowed in that tasks cpuset.
+   those Memory Nodes allowed in that task's cpuset.
  - The root cpuset contains all the systems CPUs and Memory
    Nodes.
  - For any cpuset, one can define child cpusets containing a subset
@@ -141,11 +141,11 @@ into the rest of the kernel, none in performance critical paths:
  - in init/main.c, to initialize the root cpuset at system boot.
  - in fork and exit, to attach and detach a task from its cpuset.
  - in sched_setaffinity, to mask the requested CPUs by what's
-   allowed in that tasks cpuset.
+   allowed in that task's cpuset.
  - in sched.c migrate_live_tasks(), to keep migrating tasks within
    the CPUs allowed by their cpuset, if possible.
  - in the mbind and set_mempolicy system calls, to mask the requested
-   Memory Nodes by what's allowed in that tasks cpuset.
+   Memory Nodes by what's allowed in that task's cpuset.
  - in page_alloc.c, to restrict memory to allowed nodes.
  - in vmscan.c, to restrict page recovery to the current cpuset.
 
@@ -155,7 +155,7 @@ new system calls are added for cpusets - all support for querying and
 modifying cpusets is via this cpuset file system.
 
 The /proc/<pid>/status file for each task has four added lines,
-displaying the tasks cpus_allowed (on which CPUs it may be scheduled)
+displaying the task's cpus_allowed (on which CPUs it may be scheduled)
 and mems_allowed (on which Memory Nodes it may obtain memory),
 in the two formats seen in the following example:
 
@@ -323,17 +323,17 @@ stack segment pages of a task.
 
 By default, both kinds of memory spreading are off, and memory
 pages are allocated on the node local to where the task is running,
-except perhaps as modified by the tasks NUMA mempolicy or cpuset
+except perhaps as modified by the task's NUMA mempolicy or cpuset
 configuration, so long as sufficient free memory pages are available.
 
 When new cpusets are created, they inherit the memory spread settings
 of their parent.
 
 Setting memory spreading causes allocations for the affected page
-or slab caches to ignore the tasks NUMA mempolicy and be spread
+or slab caches to ignore the task's NUMA mempolicy and be spread
 instead.    Tasks using mbind() or set_mempolicy() calls to set NUMA
 mempolicies will not notice any change in these calls as a result of
-their containing tasks memory spread settings.  If memory spreading
+their containing task's memory spread settings.  If memory spreading
 is turned off, then the currently specified NUMA mempolicy once again
 applies to memory page allocations.
 
@@ -357,7 +357,7 @@ pages from the node returned by cpuset_mem_spread_node().
 
 The cpuset_mem_spread_node() routine is also simple.  It uses the
 value of a per-task rotor cpuset_mem_spread_rotor to select the next
-node in the current tasks mems_allowed to prefer for the allocation.
+node in the current task's mems_allowed to prefer for the allocation.
 
 This memory placement policy is also known (in other contexts) as
 round-robin or interleave.
@@ -594,7 +594,7 @@ is attached, is subtle.
 If a cpuset has its Memory Nodes modified, then for each task attached
 to that cpuset, the next time that the kernel attempts to allocate
 a page of memory for that task, the kernel will notice the change
-in the tasks cpuset, and update its per-task memory placement to
+in the task's cpuset, and update its per-task memory placement to
 remain within the new cpusets memory placement.  If the task was using
 mempolicy MPOL_BIND, and the nodes to which it was bound overlap with
 its new cpuset, then the task will continue to use whatever subset
@@ -603,13 +603,13 @@ was using MPOL_BIND and now none of its MPOL_BIND nodes are allowed
 in the new cpuset, then the task will be essentially treated as if it
 was MPOL_BIND bound to the new cpuset (even though its NUMA placement,
 as queried by get_mempolicy(), doesn't change).  If a task is moved
-from one cpuset to another, then the kernel will adjust the tasks
+from one cpuset to another, then the kernel will adjust the task's
 memory placement, as above, the next time that the kernel attempts
 to allocate a page of memory for that task.
 
 If a cpuset has its 'cpuset.cpus' modified, then each task in that cpuset
 will have its allowed CPU placement changed immediately.  Similarly,
-if a tasks pid is written to another cpusets 'cpuset.tasks' file, then its
+if a task's pid is written to another cpusets 'cpuset.tasks' file, then its
 allowed CPU placement is changed immediately.  If such a task had been
 bound to some subset of its cpuset using the sched_setaffinity() call,
 the task will be allowed to run on any CPU allowed in its new cpuset,
@@ -626,16 +626,16 @@ cpusets memory placement policy 'cpuset.mems' subsequently changes.
 If the cpuset flag file 'cpuset.memory_migrate' is set true, then when
 tasks are attached to that cpuset, any pages that task had
 allocated to it on nodes in its previous cpuset are migrated
-to the tasks new cpuset. The relative placement of the page within
+to the task's new cpuset. The relative placement of the page within
 the cpuset is preserved during these migration operations if possible.
 For example if the page was on the second valid node of the prior cpuset
 then the page will be placed on the second valid node of the new cpuset.
 
-Also if 'cpuset.memory_migrate' is set true, then if that cpusets
+Also if 'cpuset.memory_migrate' is set true, then if that cpuset's
 'cpuset.mems' file is modified, pages allocated to tasks in that
 cpuset, that were on nodes in the previous setting of 'cpuset.mems',
 will be moved to nodes in the new setting of 'mems.'
-Pages that were not in the tasks prior cpuset, or in the cpusets
+Pages that were not in the task's prior cpuset, or in the cpuset's
 prior 'cpuset.mems' setting, will not be moved.
 
 There is an exception to the above.  If hotplug functionality is used
@@ -655,7 +655,7 @@ There is a second exception to the above.  GFP_ATOMIC requests are
 kernel internal allocations that must be satisfied, immediately.
 The kernel may drop some request, in rare cases even panic, if a
 GFP_ATOMIC alloc fails.  If the request cannot be satisfied within
-the current tasks cpuset, then we relax the cpuset, and look for
+the current task's cpuset, then we relax the cpuset, and look for
 memory anywhere we can find it.  It's better to violate the cpuset
 than stress the kernel.
 
-- 
cgit v1.2.3-70-g09d2


From 84c124da9ff50bd71fab9c939ee5b7cd8bef2bd9 Mon Sep 17 00:00:00 2001
From: Divyesh Shah <dpshah@google.com>
Date: Fri, 9 Apr 2010 08:31:19 +0200
Subject: blkio: Changes to IO controller additional stats patches

that include some minor fixes and addresses all comments.

Changelog: (most based on Vivek Goyal's comments)
o renamed blkiocg_reset_write to blkiocg_reset_stats
o more clarification in the documentation on io_service_time and io_wait_time
o Initialize blkg->stats_lock
o rename io_add_stat to blkio_add_stat and declare it static
o use bool for direction and sync
o derive direction and sync info from existing rq methods
o use 12 for major:minor string length
o define io_service_time better to cover the NCQ case
o add a separate reset_stats interface
o make the indexed stats a 2d array to simplify macro and function pointer code
o blkio.time now exports in jiffies as before
o Added stats description in patch description and
  Documentation/cgroup/blkio-controller.txt
o Prefix all stats functions with blkio and make them static as applicable
o replace IO_TYPE_MAX with IO_TYPE_TOTAL
o Moved #define constant to top of blk-cgroup.c
o Pass dev_t around instead of char *
o Add note to documentation file about resetting stats
o use BLK_CGROUP_MODULE in addition to BLK_CGROUP config option in #ifdef
  statements
o Avoid struct request specific knowledge in blk-cgroup. blk-cgroup.h now has
  rq_direction() and rq_sync() functions which are used by CFQ and when using
  io-controller at a higher level, bio_* functions can be added.

Signed-off-by: Divyesh Shah<dpshah@google.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
---
 Documentation/cgroups/blkio-controller.txt |  48 +++++++-
 block/blk-cgroup.c                         | 190 +++++++++++++----------------
 block/blk-cgroup.h                         |  64 ++++++----
 block/cfq-iosched.c                        |   8 +-
 include/linux/blkdev.h                     |  18 +++
 5 files changed, 198 insertions(+), 130 deletions(-)

(limited to 'Documentation/cgroups')

diff --git a/Documentation/cgroups/blkio-controller.txt b/Documentation/cgroups/blkio-controller.txt
index 630879cd9a4..ed04fe9cce1 100644
--- a/Documentation/cgroups/blkio-controller.txt
+++ b/Documentation/cgroups/blkio-controller.txt
@@ -77,7 +77,6 @@ Details of cgroup files
 =======================
 - blkio.weight
 	- Specifies per cgroup weight.
-
 	  Currently allowed range of weights is from 100 to 1000.
 
 - blkio.time
@@ -92,6 +91,49 @@ Details of cgroup files
 	  third field specifies the number of sectors transferred by the
 	  group to/from the device.
 
+- blkio.io_service_bytes
+	- Number of bytes transferred to/from the disk by the group. These
+	  are further divided by the type of operation - read or write, sync
+	  or async. First two fields specify the major and minor number of the
+	  device, third field specifies the operation type and the fourth field
+	  specifies the number of bytes.
+
+- blkio.io_serviced
+	- Number of IOs completed to/from the disk by the group. These
+	  are further divided by the type of operation - read or write, sync
+	  or async. First two fields specify the major and minor number of the
+	  device, third field specifies the operation type and the fourth field
+	  specifies the number of IOs.
+
+- blkio.io_service_time
+	- Total amount of time between request dispatch and request completion
+	  for the IOs done by this cgroup. This is in nanoseconds to make it
+	  meaningful for flash devices too. For devices with queue depth of 1,
+	  this time represents the actual service time. When queue_depth > 1,
+	  that is no longer true as requests may be served out of order. This
+	  may cause the service time for a given IO to include the service time
+	  of multiple IOs when served out of order which may result in total
+	  io_service_time > actual time elapsed. This time is further divided by
+	  the type of operation - read or write, sync or async. First two fields
+	  specify the major and minor number of the device, third field
+	  specifies the operation type and the fourth field specifies the
+	  io_service_time in ns.
+
+- blkio.io_wait_time
+	- Total amount of time the IOs for this cgroup spent waiting in the
+	  scheduler queues for service. This can be greater than the total time
+	  elapsed since it is cumulative io_wait_time for all IOs. It is not a
+	  measure of total time the cgroup spent waiting but rather a measure of
+	  the wait_time for its individual IOs. For devices with queue_depth > 1
+	  this metric does not include the time spent waiting for service once
+	  the IO is dispatched to the device but till it actually gets serviced
+	  (there might be a time lag here due to re-ordering of requests by the
+	  device). This is in nanoseconds to make it meaningful for flash
+	  devices too. This time is further divided by the type of operation -
+	  read or write, sync or async. First two fields specify the major and
+	  minor number of the device, third field specifies the operation type
+	  and the fourth field specifies the io_wait_time in ns.
+
 - blkio.dequeue
 	- Debugging aid only enabled if CONFIG_DEBUG_CFQ_IOSCHED=y. This
 	  gives the statistics about how many a times a group was dequeued
@@ -99,6 +141,10 @@ Details of cgroup files
 	  and minor number of the device and third field specifies the number
 	  of times a group was dequeued from a particular device.
 
+- blkio.reset_stats
+	- Writing an int to this file will result in resetting all the stats
+	  for that cgroup.
+
 CFQ sysfs tunable
 =================
 /sys/block/<disk>/queue/iosched/group_isolation
diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 9af7257f429..6797df50882 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -18,6 +18,8 @@
 #include <linux/blkdev.h>
 #include "blk-cgroup.h"
 
+#define MAX_KEY_LEN 100
+
 static DEFINE_SPINLOCK(blkio_list_lock);
 static LIST_HEAD(blkio_list);
 
@@ -56,24 +58,27 @@ struct blkio_cgroup *cgroup_to_blkio_cgroup(struct cgroup *cgroup)
 }
 EXPORT_SYMBOL_GPL(cgroup_to_blkio_cgroup);
 
+void blkio_group_init(struct blkio_group *blkg)
+{
+	spin_lock_init(&blkg->stats_lock);
+}
+EXPORT_SYMBOL_GPL(blkio_group_init);
+
 /*
  * Add to the appropriate stat variable depending on the request type.
  * This should be called with the blkg->stats_lock held.
  */
-void io_add_stat(uint64_t *stat, uint64_t add, unsigned int flags)
+static void blkio_add_stat(uint64_t *stat, uint64_t add, bool direction,
+				bool sync)
 {
-	if (flags & REQ_RW)
-		stat[IO_WRITE] += add;
+	if (direction)
+		stat[BLKIO_STAT_WRITE] += add;
 	else
-		stat[IO_READ] += add;
-	/*
-	 * Everywhere in the block layer, an IO is treated as sync if it is a
-	 * read or a SYNC write. We follow the same norm.
-	 */
-	if (!(flags & REQ_RW) || flags & REQ_RW_SYNC)
-		stat[IO_SYNC] += add;
+		stat[BLKIO_STAT_READ] += add;
+	if (sync)
+		stat[BLKIO_STAT_SYNC] += add;
 	else
-		stat[IO_ASYNC] += add;
+		stat[BLKIO_STAT_ASYNC] += add;
 }
 
 void blkiocg_update_timeslice_used(struct blkio_group *blkg, unsigned long time)
@@ -86,23 +91,25 @@ void blkiocg_update_timeslice_used(struct blkio_group *blkg, unsigned long time)
 }
 EXPORT_SYMBOL_GPL(blkiocg_update_timeslice_used);
 
-void blkiocg_update_request_dispatch_stats(struct blkio_group *blkg,
-						struct request *rq)
+void blkiocg_update_dispatch_stats(struct blkio_group *blkg,
+				uint64_t bytes, bool direction, bool sync)
 {
 	struct blkio_group_stats *stats;
 	unsigned long flags;
 
 	spin_lock_irqsave(&blkg->stats_lock, flags);
 	stats = &blkg->stats;
-	stats->sectors += blk_rq_sectors(rq);
-	io_add_stat(stats->io_serviced, 1, rq->cmd_flags);
-	io_add_stat(stats->io_service_bytes, blk_rq_sectors(rq) << 9,
-			rq->cmd_flags);
+	stats->sectors += bytes >> 9;
+	blkio_add_stat(stats->stat_arr[BLKIO_STAT_SERVICED], 1, direction,
+			sync);
+	blkio_add_stat(stats->stat_arr[BLKIO_STAT_SERVICE_BYTES], bytes,
+			direction, sync);
 	spin_unlock_irqrestore(&blkg->stats_lock, flags);
 }
+EXPORT_SYMBOL_GPL(blkiocg_update_dispatch_stats);
 
-void blkiocg_update_request_completion_stats(struct blkio_group *blkg,
-						struct request *rq)
+void blkiocg_update_completion_stats(struct blkio_group *blkg,
+	uint64_t start_time, uint64_t io_start_time, bool direction, bool sync)
 {
 	struct blkio_group_stats *stats;
 	unsigned long flags;
@@ -110,16 +117,15 @@ void blkiocg_update_request_completion_stats(struct blkio_group *blkg,
 
 	spin_lock_irqsave(&blkg->stats_lock, flags);
 	stats = &blkg->stats;
-	if (time_after64(now, rq->io_start_time_ns))
-		io_add_stat(stats->io_service_time, now - rq->io_start_time_ns,
-				rq->cmd_flags);
-	if (time_after64(rq->io_start_time_ns, rq->start_time_ns))
-		io_add_stat(stats->io_wait_time,
-				rq->io_start_time_ns - rq->start_time_ns,
-				rq->cmd_flags);
+	if (time_after64(now, io_start_time))
+		blkio_add_stat(stats->stat_arr[BLKIO_STAT_SERVICE_TIME],
+				now - io_start_time, direction, sync);
+	if (time_after64(io_start_time, start_time))
+		blkio_add_stat(stats->stat_arr[BLKIO_STAT_WAIT_TIME],
+				io_start_time - start_time, direction, sync);
 	spin_unlock_irqrestore(&blkg->stats_lock, flags);
 }
-EXPORT_SYMBOL_GPL(blkiocg_update_request_completion_stats);
+EXPORT_SYMBOL_GPL(blkiocg_update_completion_stats);
 
 void blkiocg_add_blkio_group(struct blkio_cgroup *blkcg,
 			struct blkio_group *blkg, void *key, dev_t dev)
@@ -230,7 +236,7 @@ blkiocg_weight_write(struct cgroup *cgroup, struct cftype *cftype, u64 val)
 }
 
 static int
-blkiocg_reset_write(struct cgroup *cgroup, struct cftype *cftype, u64 val)
+blkiocg_reset_stats(struct cgroup *cgroup, struct cftype *cftype, u64 val)
 {
 	struct blkio_cgroup *blkcg;
 	struct blkio_group *blkg;
@@ -249,29 +255,32 @@ blkiocg_reset_write(struct cgroup *cgroup, struct cftype *cftype, u64 val)
 	return 0;
 }
 
-void get_key_name(int type, char *disk_id, char *str, int chars_left)
+static void blkio_get_key_name(enum stat_sub_type type, dev_t dev, char *str,
+				int chars_left, bool diskname_only)
 {
-	strlcpy(str, disk_id, chars_left);
+	snprintf(str, chars_left, "%d:%d", MAJOR(dev), MINOR(dev));
 	chars_left -= strlen(str);
 	if (chars_left <= 0) {
 		printk(KERN_WARNING
 			"Possibly incorrect cgroup stat display format");
 		return;
 	}
+	if (diskname_only)
+		return;
 	switch (type) {
-	case IO_READ:
+	case BLKIO_STAT_READ:
 		strlcat(str, " Read", chars_left);
 		break;
-	case IO_WRITE:
+	case BLKIO_STAT_WRITE:
 		strlcat(str, " Write", chars_left);
 		break;
-	case IO_SYNC:
+	case BLKIO_STAT_SYNC:
 		strlcat(str, " Sync", chars_left);
 		break;
-	case IO_ASYNC:
+	case BLKIO_STAT_ASYNC:
 		strlcat(str, " Async", chars_left);
 		break;
-	case IO_TYPE_MAX:
+	case BLKIO_STAT_TOTAL:
 		strlcat(str, " Total", chars_left);
 		break;
 	default:
@@ -279,63 +288,47 @@ void get_key_name(int type, char *disk_id, char *str, int chars_left)
 	}
 }
 
-typedef uint64_t (get_var) (struct blkio_group *, int);
+static uint64_t blkio_fill_stat(char *str, int chars_left, uint64_t val,
+				struct cgroup_map_cb *cb, dev_t dev)
+{
+	blkio_get_key_name(0, dev, str, chars_left, true);
+	cb->fill(cb, str, val);
+	return val;
+}
 
-#define MAX_KEY_LEN 100
-uint64_t get_typed_stat(struct blkio_group *blkg, struct cgroup_map_cb *cb,
-		get_var *getvar, char *disk_id)
+/* This should be called with blkg->stats_lock held */
+static uint64_t blkio_get_stat(struct blkio_group *blkg,
+		struct cgroup_map_cb *cb, dev_t dev, enum stat_type type)
 {
 	uint64_t disk_total;
 	char key_str[MAX_KEY_LEN];
-	int type;
+	enum stat_sub_type sub_type;
+
+	if (type == BLKIO_STAT_TIME)
+		return blkio_fill_stat(key_str, MAX_KEY_LEN - 1,
+					blkg->stats.time, cb, dev);
+	if (type == BLKIO_STAT_SECTORS)
+		return blkio_fill_stat(key_str, MAX_KEY_LEN - 1,
+					blkg->stats.sectors, cb, dev);
+#ifdef CONFIG_DEBUG_BLK_CGROUP
+	if (type == BLKIO_STAT_DEQUEUE)
+		return blkio_fill_stat(key_str, MAX_KEY_LEN - 1,
+					blkg->stats.dequeue, cb, dev);
+#endif
 
-	for (type = 0; type < IO_TYPE_MAX; type++) {
-		get_key_name(type, disk_id, key_str, MAX_KEY_LEN);
-		cb->fill(cb, key_str, getvar(blkg, type));
+	for (sub_type = BLKIO_STAT_READ; sub_type < BLKIO_STAT_TOTAL;
+			sub_type++) {
+		blkio_get_key_name(sub_type, dev, key_str, MAX_KEY_LEN, false);
+		cb->fill(cb, key_str, blkg->stats.stat_arr[type][sub_type]);
 	}
-	disk_total = getvar(blkg, IO_READ) + getvar(blkg, IO_WRITE);
-	get_key_name(IO_TYPE_MAX, disk_id, key_str, MAX_KEY_LEN);
+	disk_total = blkg->stats.stat_arr[type][BLKIO_STAT_READ] +
+			blkg->stats.stat_arr[type][BLKIO_STAT_WRITE];
+	blkio_get_key_name(BLKIO_STAT_TOTAL, dev, key_str, MAX_KEY_LEN, false);
 	cb->fill(cb, key_str, disk_total);
 	return disk_total;
 }
 
-uint64_t get_stat(struct blkio_group *blkg, struct cgroup_map_cb *cb,
-		get_var *getvar, char *disk_id)
-{
-	uint64_t var = getvar(blkg, 0);
-	cb->fill(cb, disk_id, var);
-	return var;
-}
-
-#define GET_STAT_INDEXED(__VAR)						\
-uint64_t get_##__VAR##_stat(struct blkio_group *blkg, int type)		\
-{									\
-	return blkg->stats.__VAR[type];					\
-}									\
-
-GET_STAT_INDEXED(io_service_bytes);
-GET_STAT_INDEXED(io_serviced);
-GET_STAT_INDEXED(io_service_time);
-GET_STAT_INDEXED(io_wait_time);
-#undef GET_STAT_INDEXED
-
-#define GET_STAT(__VAR, __CONV)						\
-uint64_t get_##__VAR##_stat(struct blkio_group *blkg, int dummy)	\
-{									\
-	uint64_t data = blkg->stats.__VAR;				\
-	if (__CONV)							\
-		data = (uint64_t)jiffies_to_msecs(data) * NSEC_PER_MSEC;\
-	return data;							\
-}
-
-GET_STAT(time, 1);
-GET_STAT(sectors, 0);
-#ifdef CONFIG_DEBUG_BLK_CGROUP
-GET_STAT(dequeue, 0);
-#endif
-#undef GET_STAT
-
-#define SHOW_FUNCTION_PER_GROUP(__VAR, get_stats, getvar, show_total)	\
+#define SHOW_FUNCTION_PER_GROUP(__VAR, type, show_total)		\
 static int blkiocg_##__VAR##_read(struct cgroup *cgroup,		\
 		struct cftype *cftype, struct cgroup_map_cb *cb)	\
 {									\
@@ -343,7 +336,6 @@ static int blkiocg_##__VAR##_read(struct cgroup *cgroup,		\
 	struct blkio_group *blkg;					\
 	struct hlist_node *n;						\
 	uint64_t cgroup_total = 0;					\
-	char disk_id[10];						\
 									\
 	if (!cgroup_lock_live_group(cgroup))				\
 		return -ENODEV;						\
@@ -353,10 +345,8 @@ static int blkiocg_##__VAR##_read(struct cgroup *cgroup,		\
 	hlist_for_each_entry_rcu(blkg, n, &blkcg->blkg_list, blkcg_node) {\
 		if (blkg->dev) {					\
 			spin_lock_irq(&blkg->stats_lock);		\
-			snprintf(disk_id, 10, "%u:%u", MAJOR(blkg->dev),\
-					MINOR(blkg->dev));		\
-			cgroup_total += get_stats(blkg, cb, getvar,	\
-						disk_id);		\
+			cgroup_total += blkio_get_stat(blkg, cb,	\
+						blkg->dev, type);	\
 			spin_unlock_irq(&blkg->stats_lock);		\
 		}							\
 	}								\
@@ -367,16 +357,14 @@ static int blkiocg_##__VAR##_read(struct cgroup *cgroup,		\
 	return 0;							\
 }
 
-SHOW_FUNCTION_PER_GROUP(time, get_stat, get_time_stat, 0);
-SHOW_FUNCTION_PER_GROUP(sectors, get_stat, get_sectors_stat, 0);
-SHOW_FUNCTION_PER_GROUP(io_service_bytes, get_typed_stat,
-			get_io_service_bytes_stat, 1);
-SHOW_FUNCTION_PER_GROUP(io_serviced, get_typed_stat, get_io_serviced_stat, 1);
-SHOW_FUNCTION_PER_GROUP(io_service_time, get_typed_stat,
-			get_io_service_time_stat, 1);
-SHOW_FUNCTION_PER_GROUP(io_wait_time, get_typed_stat, get_io_wait_time_stat, 1);
+SHOW_FUNCTION_PER_GROUP(time, BLKIO_STAT_TIME, 0);
+SHOW_FUNCTION_PER_GROUP(sectors, BLKIO_STAT_SECTORS, 0);
+SHOW_FUNCTION_PER_GROUP(io_service_bytes, BLKIO_STAT_SERVICE_BYTES, 1);
+SHOW_FUNCTION_PER_GROUP(io_serviced, BLKIO_STAT_SERVICED, 1);
+SHOW_FUNCTION_PER_GROUP(io_service_time, BLKIO_STAT_SERVICE_TIME, 1);
+SHOW_FUNCTION_PER_GROUP(io_wait_time, BLKIO_STAT_WAIT_TIME, 1);
 #ifdef CONFIG_DEBUG_BLK_CGROUP
-SHOW_FUNCTION_PER_GROUP(dequeue, get_stat, get_dequeue_stat, 0);
+SHOW_FUNCTION_PER_GROUP(dequeue, BLKIO_STAT_DEQUEUE, 0);
 #endif
 #undef SHOW_FUNCTION_PER_GROUP
 
@@ -398,32 +386,30 @@ struct cftype blkio_files[] = {
 	{
 		.name = "time",
 		.read_map = blkiocg_time_read,
-		.write_u64 = blkiocg_reset_write,
 	},
 	{
 		.name = "sectors",
 		.read_map = blkiocg_sectors_read,
-		.write_u64 = blkiocg_reset_write,
 	},
 	{
 		.name = "io_service_bytes",
 		.read_map = blkiocg_io_service_bytes_read,
-		.write_u64 = blkiocg_reset_write,
 	},
 	{
 		.name = "io_serviced",
 		.read_map = blkiocg_io_serviced_read,
-		.write_u64 = blkiocg_reset_write,
 	},
 	{
 		.name = "io_service_time",
 		.read_map = blkiocg_io_service_time_read,
-		.write_u64 = blkiocg_reset_write,
 	},
 	{
 		.name = "io_wait_time",
 		.read_map = blkiocg_io_wait_time_read,
-		.write_u64 = blkiocg_reset_write,
+	},
+	{
+		.name = "reset_stats",
+		.write_u64 = blkiocg_reset_stats,
 	},
 #ifdef CONFIG_DEBUG_BLK_CGROUP
        {
diff --git a/block/blk-cgroup.h b/block/blk-cgroup.h
index 80010ef64ab..b22e55390a4 100644
--- a/block/blk-cgroup.h
+++ b/block/blk-cgroup.h
@@ -23,12 +23,31 @@ extern struct cgroup_subsys blkio_subsys;
 #define blkio_subsys_id blkio_subsys.subsys_id
 #endif
 
-enum io_type {
-	IO_READ = 0,
-	IO_WRITE,
-	IO_SYNC,
-	IO_ASYNC,
-	IO_TYPE_MAX
+enum stat_type {
+	/* Total time spent (in ns) between request dispatch to the driver and
+	 * request completion for IOs doen by this cgroup. This may not be
+	 * accurate when NCQ is turned on. */
+	BLKIO_STAT_SERVICE_TIME = 0,
+	/* Total bytes transferred */
+	BLKIO_STAT_SERVICE_BYTES,
+	/* Total IOs serviced, post merge */
+	BLKIO_STAT_SERVICED,
+	/* Total time spent waiting in scheduler queue in ns */
+	BLKIO_STAT_WAIT_TIME,
+	/* All the single valued stats go below this */
+	BLKIO_STAT_TIME,
+	BLKIO_STAT_SECTORS,
+#ifdef CONFIG_DEBUG_BLK_CGROUP
+	BLKIO_STAT_DEQUEUE
+#endif
+};
+
+enum stat_sub_type {
+	BLKIO_STAT_READ = 0,
+	BLKIO_STAT_WRITE,
+	BLKIO_STAT_SYNC,
+	BLKIO_STAT_ASYNC,
+	BLKIO_STAT_TOTAL
 };
 
 struct blkio_cgroup {
@@ -42,13 +61,7 @@ struct blkio_group_stats {
 	/* total disk time and nr sectors dispatched by this group */
 	uint64_t time;
 	uint64_t sectors;
-	/* Total disk time used by IOs in ns */
-	uint64_t io_service_time[IO_TYPE_MAX];
-	uint64_t io_service_bytes[IO_TYPE_MAX]; /* Total bytes transferred */
-	/* Total IOs serviced, post merge */
-	uint64_t io_serviced[IO_TYPE_MAX];
-	/* Total time spent waiting in scheduler queue in ns */
-	uint64_t io_wait_time[IO_TYPE_MAX];
+	uint64_t stat_arr[BLKIO_STAT_WAIT_TIME + 1][BLKIO_STAT_TOTAL];
 #ifdef CONFIG_DEBUG_BLK_CGROUP
 	/* How many times this group has been removed from service tree */
 	unsigned long dequeue;
@@ -65,7 +78,7 @@ struct blkio_group {
 	char path[128];
 #endif
 	/* The device MKDEV(major, minor), this group has been created for */
-	dev_t   dev;
+	dev_t dev;
 
 	/* Need to serialize the stats in the case of reset/update */
 	spinlock_t stats_lock;
@@ -128,21 +141,21 @@ extern void blkiocg_add_blkio_group(struct blkio_cgroup *blkcg,
 extern int blkiocg_del_blkio_group(struct blkio_group *blkg);
 extern struct blkio_group *blkiocg_lookup_group(struct blkio_cgroup *blkcg,
 						void *key);
+void blkio_group_init(struct blkio_group *blkg);
 void blkiocg_update_timeslice_used(struct blkio_group *blkg,
 					unsigned long time);
-void blkiocg_update_request_dispatch_stats(struct blkio_group *blkg,
-						struct request *rq);
-void blkiocg_update_request_completion_stats(struct blkio_group *blkg,
-						struct request *rq);
+void blkiocg_update_dispatch_stats(struct blkio_group *blkg, uint64_t bytes,
+						bool direction, bool sync);
+void blkiocg_update_completion_stats(struct blkio_group *blkg,
+	uint64_t start_time, uint64_t io_start_time, bool direction, bool sync);
 #else
 struct cgroup;
 static inline struct blkio_cgroup *
 cgroup_to_blkio_cgroup(struct cgroup *cgroup) { return NULL; }
 
+static inline void blkio_group_init(struct blkio_group *blkg) {}
 static inline void blkiocg_add_blkio_group(struct blkio_cgroup *blkcg,
-			struct blkio_group *blkg, void *key, dev_t dev)
-{
-}
+			struct blkio_group *blkg, void *key, dev_t dev) {}
 
 static inline int
 blkiocg_del_blkio_group(struct blkio_group *blkg) { return 0; }
@@ -151,9 +164,10 @@ static inline struct blkio_group *
 blkiocg_lookup_group(struct blkio_cgroup *blkcg, void *key) { return NULL; }
 static inline void blkiocg_update_timeslice_used(struct blkio_group *blkg,
 						unsigned long time) {}
-static inline void blkiocg_update_request_dispatch_stats(
-		struct blkio_group *blkg, struct request *rq) {}
-static inline void blkiocg_update_request_completion_stats(
-		struct blkio_group *blkg, struct request *rq) {}
+static inline void blkiocg_update_dispatch_stats(struct blkio_group *blkg,
+				uint64_t bytes, bool direction, bool sync) {}
+static inline void blkiocg_update_completion_stats(struct blkio_group *blkg,
+		uint64_t start_time, uint64_t io_start_time, bool direction,
+		bool sync) {}
 #endif
 #endif /* _BLK_CGROUP_H */
diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index 42028e7128a..5617ae030b1 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -955,6 +955,7 @@ cfq_find_alloc_cfqg(struct cfq_data *cfqd, struct cgroup *cgroup, int create)
 	for_each_cfqg_st(cfqg, i, j, st)
 		*st = CFQ_RB_ROOT;
 	RB_CLEAR_NODE(&cfqg->rb_node);
+	blkio_group_init(&cfqg->blkg);
 
 	/*
 	 * Take the initial reference that will be released on destroy
@@ -1865,7 +1866,8 @@ static void cfq_dispatch_insert(struct request_queue *q, struct request *rq)
 	elv_dispatch_sort(q, rq);
 
 	cfqd->rq_in_flight[cfq_cfqq_sync(cfqq)]++;
-	blkiocg_update_request_dispatch_stats(&cfqq->cfqg->blkg, rq);
+	blkiocg_update_dispatch_stats(&cfqq->cfqg->blkg, blk_rq_bytes(rq),
+					rq_data_dir(rq), rq_is_sync(rq));
 }
 
 /*
@@ -3286,7 +3288,9 @@ static void cfq_completed_request(struct request_queue *q, struct request *rq)
 	WARN_ON(!cfqq->dispatched);
 	cfqd->rq_in_driver--;
 	cfqq->dispatched--;
-	blkiocg_update_request_completion_stats(&cfqq->cfqg->blkg, rq);
+	blkiocg_update_completion_stats(&cfqq->cfqg->blkg, rq_start_time_ns(rq),
+			rq_io_start_time_ns(rq), rq_data_dir(rq),
+			rq_is_sync(rq));
 
 	cfqd->rq_in_flight[cfq_cfqq_sync(cfqq)]--;
 
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index f3fff8bf85e..d483c494672 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1209,9 +1209,27 @@ static inline void set_io_start_time_ns(struct request *req)
 {
 	req->io_start_time_ns = sched_clock();
 }
+
+static inline uint64_t rq_start_time_ns(struct request *req)
+{
+        return req->start_time_ns;
+}
+
+static inline uint64_t rq_io_start_time_ns(struct request *req)
+{
+        return req->io_start_time_ns;
+}
 #else
 static inline void set_start_time_ns(struct request *req) {}
 static inline void set_io_start_time_ns(struct request *req) {}
+static inline uint64_t rq_start_time_ns(struct request *req)
+{
+	return 0;
+}
+static inline uint64_t rq_io_start_time_ns(struct request *req)
+{
+	return 0;
+}
 #endif
 
 #define MODULE_ALIAS_BLOCKDEV(major,minor) \
-- 
cgit v1.2.3-70-g09d2


From 812d402648f4fc1ab1091b2172a46fc1b367c724 Mon Sep 17 00:00:00 2001
From: Divyesh Shah <dpshah@google.com>
Date: Thu, 8 Apr 2010 21:14:23 -0700
Subject: blkio: Add io_merged stat

This includes both the number of bios merged into requests belonging to this
cgroup as well as the number of requests merged together.
In the past, we've observed different merging behavior across upstream kernels,
some by design some actual bugs. This stat helps a lot in debugging such
problems when applications report decreased throughput with a new kernel
version.

This needed adding an extra elevator function to capture bios being merged as I
did not want to pollute elevator code with blkiocg knowledge and hence needed
the accounting invocation to come from CFQ.

Signed-off-by: Divyesh Shah<dpshah@google.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
---
 Documentation/cgroups/blkio-controller.txt |  5 +++++
 block/blk-cgroup.c                         | 17 +++++++++++++++++
 block/blk-cgroup.h                         |  8 +++++++-
 block/blk-core.c                           |  2 ++
 block/cfq-iosched.c                        | 11 +++++++++++
 block/elevator.c                           |  9 +++++++++
 include/linux/elevator.h                   |  6 ++++++
 7 files changed, 57 insertions(+), 1 deletion(-)

(limited to 'Documentation/cgroups')

diff --git a/Documentation/cgroups/blkio-controller.txt b/Documentation/cgroups/blkio-controller.txt
index ed04fe9cce1..810e30171a5 100644
--- a/Documentation/cgroups/blkio-controller.txt
+++ b/Documentation/cgroups/blkio-controller.txt
@@ -134,6 +134,11 @@ Details of cgroup files
 	  minor number of the device, third field specifies the operation type
 	  and the fourth field specifies the io_wait_time in ns.
 
+- blkio.io_merged
+	- Total number of bios/requests merged into requests belonging to this
+	  cgroup. This is further divided by the type of operation - read or
+	  write, sync or async.
+
 - blkio.dequeue
 	- Debugging aid only enabled if CONFIG_DEBUG_CFQ_IOSCHED=y. This
 	  gives the statistics about how many a times a group was dequeued
diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 6797df50882..d23b538858c 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -127,6 +127,18 @@ void blkiocg_update_completion_stats(struct blkio_group *blkg,
 }
 EXPORT_SYMBOL_GPL(blkiocg_update_completion_stats);
 
+void blkiocg_update_io_merged_stats(struct blkio_group *blkg, bool direction,
+					bool sync)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&blkg->stats_lock, flags);
+	blkio_add_stat(blkg->stats.stat_arr[BLKIO_STAT_MERGED], 1, direction,
+			sync);
+	spin_unlock_irqrestore(&blkg->stats_lock, flags);
+}
+EXPORT_SYMBOL_GPL(blkiocg_update_io_merged_stats);
+
 void blkiocg_add_blkio_group(struct blkio_cgroup *blkcg,
 			struct blkio_group *blkg, void *key, dev_t dev)
 {
@@ -363,6 +375,7 @@ SHOW_FUNCTION_PER_GROUP(io_service_bytes, BLKIO_STAT_SERVICE_BYTES, 1);
 SHOW_FUNCTION_PER_GROUP(io_serviced, BLKIO_STAT_SERVICED, 1);
 SHOW_FUNCTION_PER_GROUP(io_service_time, BLKIO_STAT_SERVICE_TIME, 1);
 SHOW_FUNCTION_PER_GROUP(io_wait_time, BLKIO_STAT_WAIT_TIME, 1);
+SHOW_FUNCTION_PER_GROUP(io_merged, BLKIO_STAT_MERGED, 1);
 #ifdef CONFIG_DEBUG_BLK_CGROUP
 SHOW_FUNCTION_PER_GROUP(dequeue, BLKIO_STAT_DEQUEUE, 0);
 #endif
@@ -407,6 +420,10 @@ struct cftype blkio_files[] = {
 		.name = "io_wait_time",
 		.read_map = blkiocg_io_wait_time_read,
 	},
+	{
+		.name = "io_merged",
+		.read_map = blkiocg_io_merged_read,
+	},
 	{
 		.name = "reset_stats",
 		.write_u64 = blkiocg_reset_stats,
diff --git a/block/blk-cgroup.h b/block/blk-cgroup.h
index b22e55390a4..470a29db6be 100644
--- a/block/blk-cgroup.h
+++ b/block/blk-cgroup.h
@@ -34,6 +34,8 @@ enum stat_type {
 	BLKIO_STAT_SERVICED,
 	/* Total time spent waiting in scheduler queue in ns */
 	BLKIO_STAT_WAIT_TIME,
+	/* Number of IOs merged */
+	BLKIO_STAT_MERGED,
 	/* All the single valued stats go below this */
 	BLKIO_STAT_TIME,
 	BLKIO_STAT_SECTORS,
@@ -61,7 +63,7 @@ struct blkio_group_stats {
 	/* total disk time and nr sectors dispatched by this group */
 	uint64_t time;
 	uint64_t sectors;
-	uint64_t stat_arr[BLKIO_STAT_WAIT_TIME + 1][BLKIO_STAT_TOTAL];
+	uint64_t stat_arr[BLKIO_STAT_MERGED + 1][BLKIO_STAT_TOTAL];
 #ifdef CONFIG_DEBUG_BLK_CGROUP
 	/* How many times this group has been removed from service tree */
 	unsigned long dequeue;
@@ -148,6 +150,8 @@ void blkiocg_update_dispatch_stats(struct blkio_group *blkg, uint64_t bytes,
 						bool direction, bool sync);
 void blkiocg_update_completion_stats(struct blkio_group *blkg,
 	uint64_t start_time, uint64_t io_start_time, bool direction, bool sync);
+void blkiocg_update_io_merged_stats(struct blkio_group *blkg, bool direction,
+					bool sync);
 #else
 struct cgroup;
 static inline struct blkio_cgroup *
@@ -169,5 +173,7 @@ static inline void blkiocg_update_dispatch_stats(struct blkio_group *blkg,
 static inline void blkiocg_update_completion_stats(struct blkio_group *blkg,
 		uint64_t start_time, uint64_t io_start_time, bool direction,
 		bool sync) {}
+static inline void blkiocg_update_io_merged_stats(struct blkio_group *blkg,
+						bool direction, bool sync) {}
 #endif
 #endif /* _BLK_CGROUP_H */
diff --git a/block/blk-core.c b/block/blk-core.c
index 4b1b29ef2cb..e9a5ae25db8 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1202,6 +1202,7 @@ static int __make_request(struct request_queue *q, struct bio *bio)
 		if (!blk_rq_cpu_valid(req))
 			req->cpu = bio->bi_comp_cpu;
 		drive_stat_acct(req, 0);
+		elv_bio_merged(q, req, bio);
 		if (!attempt_back_merge(q, req))
 			elv_merged_request(q, req, el_ret);
 		goto out;
@@ -1235,6 +1236,7 @@ static int __make_request(struct request_queue *q, struct bio *bio)
 		if (!blk_rq_cpu_valid(req))
 			req->cpu = bio->bi_comp_cpu;
 		drive_stat_acct(req, 0);
+		elv_bio_merged(q, req, bio);
 		if (!attempt_front_merge(q, req))
 			elv_merged_request(q, req, el_ret);
 		goto out;
diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index 5617ae030b1..4eb1906cf6c 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -1467,6 +1467,14 @@ static void cfq_merged_request(struct request_queue *q, struct request *req,
 	}
 }
 
+static void cfq_bio_merged(struct request_queue *q, struct request *req,
+				struct bio *bio)
+{
+	struct cfq_queue *cfqq = RQ_CFQQ(req);
+	blkiocg_update_io_merged_stats(&cfqq->cfqg->blkg, bio_data_dir(bio),
+					cfq_bio_sync(bio));
+}
+
 static void
 cfq_merged_requests(struct request_queue *q, struct request *rq,
 		    struct request *next)
@@ -1484,6 +1492,8 @@ cfq_merged_requests(struct request_queue *q, struct request *rq,
 	if (cfqq->next_rq == next)
 		cfqq->next_rq = rq;
 	cfq_remove_request(next);
+	blkiocg_update_io_merged_stats(&cfqq->cfqg->blkg, rq_data_dir(next),
+					rq_is_sync(next));
 }
 
 static int cfq_allow_merge(struct request_queue *q, struct request *rq,
@@ -3861,6 +3871,7 @@ static struct elevator_type iosched_cfq = {
 		.elevator_merged_fn =		cfq_merged_request,
 		.elevator_merge_req_fn =	cfq_merged_requests,
 		.elevator_allow_merge_fn =	cfq_allow_merge,
+		.elevator_bio_merged_fn =	cfq_bio_merged,
 		.elevator_dispatch_fn =		cfq_dispatch_requests,
 		.elevator_add_req_fn =		cfq_insert_request,
 		.elevator_activate_req_fn =	cfq_activate_request,
diff --git a/block/elevator.c b/block/elevator.c
index 76e3702d538..5e734592bb4 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -539,6 +539,15 @@ void elv_merge_requests(struct request_queue *q, struct request *rq,
 	q->last_merge = rq;
 }
 
+void elv_bio_merged(struct request_queue *q, struct request *rq,
+			struct bio *bio)
+{
+	struct elevator_queue *e = q->elevator;
+
+	if (e->ops->elevator_bio_merged_fn)
+		e->ops->elevator_bio_merged_fn(q, rq, bio);
+}
+
 void elv_requeue_request(struct request_queue *q, struct request *rq)
 {
 	/*
diff --git a/include/linux/elevator.h b/include/linux/elevator.h
index 1cb3372e65d..2c958f4fce1 100644
--- a/include/linux/elevator.h
+++ b/include/linux/elevator.h
@@ -14,6 +14,9 @@ typedef void (elevator_merged_fn) (struct request_queue *, struct request *, int
 
 typedef int (elevator_allow_merge_fn) (struct request_queue *, struct request *, struct bio *);
 
+typedef void (elevator_bio_merged_fn) (struct request_queue *,
+						struct request *, struct bio *);
+
 typedef int (elevator_dispatch_fn) (struct request_queue *, int);
 
 typedef void (elevator_add_req_fn) (struct request_queue *, struct request *);
@@ -36,6 +39,7 @@ struct elevator_ops
 	elevator_merged_fn *elevator_merged_fn;
 	elevator_merge_req_fn *elevator_merge_req_fn;
 	elevator_allow_merge_fn *elevator_allow_merge_fn;
+	elevator_bio_merged_fn *elevator_bio_merged_fn;
 
 	elevator_dispatch_fn *elevator_dispatch_fn;
 	elevator_add_req_fn *elevator_add_req_fn;
@@ -103,6 +107,8 @@ extern int elv_merge(struct request_queue *, struct request **, struct bio *);
 extern void elv_merge_requests(struct request_queue *, struct request *,
 			       struct request *);
 extern void elv_merged_request(struct request_queue *, struct request *, int);
+extern void elv_bio_merged(struct request_queue *q, struct request *,
+				struct bio *);
 extern void elv_requeue_request(struct request_queue *, struct request *);
 extern int elv_queue_empty(struct request_queue *);
 extern struct request *elv_former_request(struct request_queue *, struct request *);
-- 
cgit v1.2.3-70-g09d2


From cdc1184cf4a7bd99f5473a91244197accc49146b Mon Sep 17 00:00:00 2001
From: Divyesh Shah <dpshah@google.com>
Date: Thu, 8 Apr 2010 21:15:10 -0700
Subject: blkio: Add io_queued and avg_queue_size stats

These stats are useful for getting a feel for the queue depth of the cgroup,
i.e., how filled up its queues are at a given instant and over the existence of
the cgroup. This ability is useful when debugging problems in the wild as it
helps understand the application's IO pattern w/o having to read through the
userspace code (coz its tedious or just not available) or w/o the ability
to run blktrace (since you may not have root access and/or not want to disturb
performance).

Signed-off-by: Divyesh Shah<dpshah@google.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
---
 Documentation/cgroups/blkio-controller.txt | 11 ++++
 block/blk-cgroup.c                         | 98 ++++++++++++++++++++++++++++--
 block/blk-cgroup.h                         | 20 +++++-
 block/cfq-iosched.c                        | 11 ++++
 4 files changed, 134 insertions(+), 6 deletions(-)

(limited to 'Documentation/cgroups')

diff --git a/Documentation/cgroups/blkio-controller.txt b/Documentation/cgroups/blkio-controller.txt
index 810e30171a5..6e52e7c512a 100644
--- a/Documentation/cgroups/blkio-controller.txt
+++ b/Documentation/cgroups/blkio-controller.txt
@@ -139,6 +139,17 @@ Details of cgroup files
 	  cgroup. This is further divided by the type of operation - read or
 	  write, sync or async.
 
+- blkio.io_queued
+	- Total number of requests queued up at any given instant for this
+	  cgroup. This is further divided by the type of operation - read or
+	  write, sync or async.
+
+- blkio.avg_queue_size
+	- Debugging aid only enabled if CONFIG_DEBUG_CFQ_IOSCHED=y.
+	  The average queue size for this cgroup over the entire time of this
+	  cgroup's existence. Queue size samples are taken each time one of the
+	  queues of this cgroup gets a timeslice.
+
 - blkio.dequeue
 	- Debugging aid only enabled if CONFIG_DEBUG_CFQ_IOSCHED=y. This
 	  gives the statistics about how many a times a group was dequeued
diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index d23b538858c..1e0c4970b35 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -81,6 +81,71 @@ static void blkio_add_stat(uint64_t *stat, uint64_t add, bool direction,
 		stat[BLKIO_STAT_ASYNC] += add;
 }
 
+/*
+ * Decrements the appropriate stat variable if non-zero depending on the
+ * request type. Panics on value being zero.
+ * This should be called with the blkg->stats_lock held.
+ */
+static void blkio_check_and_dec_stat(uint64_t *stat, bool direction, bool sync)
+{
+	if (direction) {
+		BUG_ON(stat[BLKIO_STAT_WRITE] == 0);
+		stat[BLKIO_STAT_WRITE]--;
+	} else {
+		BUG_ON(stat[BLKIO_STAT_READ] == 0);
+		stat[BLKIO_STAT_READ]--;
+	}
+	if (sync) {
+		BUG_ON(stat[BLKIO_STAT_SYNC] == 0);
+		stat[BLKIO_STAT_SYNC]--;
+	} else {
+		BUG_ON(stat[BLKIO_STAT_ASYNC] == 0);
+		stat[BLKIO_STAT_ASYNC]--;
+	}
+}
+
+#ifdef CONFIG_DEBUG_BLK_CGROUP
+void blkiocg_update_set_active_queue_stats(struct blkio_group *blkg)
+{
+	unsigned long flags;
+	struct blkio_group_stats *stats;
+
+	spin_lock_irqsave(&blkg->stats_lock, flags);
+	stats = &blkg->stats;
+	stats->avg_queue_size_sum +=
+			stats->stat_arr[BLKIO_STAT_QUEUED][BLKIO_STAT_READ] +
+			stats->stat_arr[BLKIO_STAT_QUEUED][BLKIO_STAT_WRITE];
+	stats->avg_queue_size_samples++;
+	spin_unlock_irqrestore(&blkg->stats_lock, flags);
+}
+EXPORT_SYMBOL_GPL(blkiocg_update_set_active_queue_stats);
+#endif
+
+void blkiocg_update_request_add_stats(struct blkio_group *blkg,
+			struct blkio_group *curr_blkg, bool direction,
+			bool sync)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&blkg->stats_lock, flags);
+	blkio_add_stat(blkg->stats.stat_arr[BLKIO_STAT_QUEUED], 1, direction,
+			sync);
+	spin_unlock_irqrestore(&blkg->stats_lock, flags);
+}
+EXPORT_SYMBOL_GPL(blkiocg_update_request_add_stats);
+
+void blkiocg_update_request_remove_stats(struct blkio_group *blkg,
+						bool direction, bool sync)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&blkg->stats_lock, flags);
+	blkio_check_and_dec_stat(blkg->stats.stat_arr[BLKIO_STAT_QUEUED],
+					direction, sync);
+	spin_unlock_irqrestore(&blkg->stats_lock, flags);
+}
+EXPORT_SYMBOL_GPL(blkiocg_update_request_remove_stats);
+
 void blkiocg_update_timeslice_used(struct blkio_group *blkg, unsigned long time)
 {
 	unsigned long flags;
@@ -253,14 +318,18 @@ blkiocg_reset_stats(struct cgroup *cgroup, struct cftype *cftype, u64 val)
 	struct blkio_cgroup *blkcg;
 	struct blkio_group *blkg;
 	struct hlist_node *n;
-	struct blkio_group_stats *stats;
+	uint64_t queued[BLKIO_STAT_TOTAL];
+	int i;
 
 	blkcg = cgroup_to_blkio_cgroup(cgroup);
 	spin_lock_irq(&blkcg->lock);
 	hlist_for_each_entry(blkg, n, &blkcg->blkg_list, blkcg_node) {
 		spin_lock(&blkg->stats_lock);
-		stats = &blkg->stats;
-		memset(stats, 0, sizeof(struct blkio_group_stats));
+		for (i = 0; i < BLKIO_STAT_TOTAL; i++)
+			queued[i] = blkg->stats.stat_arr[BLKIO_STAT_QUEUED][i];
+		memset(&blkg->stats, 0, sizeof(struct blkio_group_stats));
+		for (i = 0; i < BLKIO_STAT_TOTAL; i++)
+			blkg->stats.stat_arr[BLKIO_STAT_QUEUED][i] = queued[i];
 		spin_unlock(&blkg->stats_lock);
 	}
 	spin_unlock_irq(&blkcg->lock);
@@ -323,6 +392,15 @@ static uint64_t blkio_get_stat(struct blkio_group *blkg,
 		return blkio_fill_stat(key_str, MAX_KEY_LEN - 1,
 					blkg->stats.sectors, cb, dev);
 #ifdef CONFIG_DEBUG_BLK_CGROUP
+	if (type == BLKIO_STAT_AVG_QUEUE_SIZE) {
+		uint64_t sum = blkg->stats.avg_queue_size_sum;
+		uint64_t samples = blkg->stats.avg_queue_size_samples;
+		if (samples)
+			do_div(sum, samples);
+		else
+			sum = 0;
+		return blkio_fill_stat(key_str, MAX_KEY_LEN - 1, sum, cb, dev);
+	}
 	if (type == BLKIO_STAT_DEQUEUE)
 		return blkio_fill_stat(key_str, MAX_KEY_LEN - 1,
 					blkg->stats.dequeue, cb, dev);
@@ -376,8 +454,10 @@ SHOW_FUNCTION_PER_GROUP(io_serviced, BLKIO_STAT_SERVICED, 1);
 SHOW_FUNCTION_PER_GROUP(io_service_time, BLKIO_STAT_SERVICE_TIME, 1);
 SHOW_FUNCTION_PER_GROUP(io_wait_time, BLKIO_STAT_WAIT_TIME, 1);
 SHOW_FUNCTION_PER_GROUP(io_merged, BLKIO_STAT_MERGED, 1);
+SHOW_FUNCTION_PER_GROUP(io_queued, BLKIO_STAT_QUEUED, 1);
 #ifdef CONFIG_DEBUG_BLK_CGROUP
 SHOW_FUNCTION_PER_GROUP(dequeue, BLKIO_STAT_DEQUEUE, 0);
+SHOW_FUNCTION_PER_GROUP(avg_queue_size, BLKIO_STAT_AVG_QUEUE_SIZE, 0);
 #endif
 #undef SHOW_FUNCTION_PER_GROUP
 
@@ -424,15 +504,23 @@ struct cftype blkio_files[] = {
 		.name = "io_merged",
 		.read_map = blkiocg_io_merged_read,
 	},
+	{
+		.name = "io_queued",
+		.read_map = blkiocg_io_queued_read,
+	},
 	{
 		.name = "reset_stats",
 		.write_u64 = blkiocg_reset_stats,
 	},
 #ifdef CONFIG_DEBUG_BLK_CGROUP
-       {
+	{
+		.name = "avg_queue_size",
+		.read_map = blkiocg_avg_queue_size_read,
+	},
+	{
 		.name = "dequeue",
 		.read_map = blkiocg_dequeue_read,
-       },
+	},
 #endif
 };
 
diff --git a/block/blk-cgroup.h b/block/blk-cgroup.h
index 470a29db6be..bea7f3b9a88 100644
--- a/block/blk-cgroup.h
+++ b/block/blk-cgroup.h
@@ -36,10 +36,13 @@ enum stat_type {
 	BLKIO_STAT_WAIT_TIME,
 	/* Number of IOs merged */
 	BLKIO_STAT_MERGED,
+	/* Number of IOs queued up */
+	BLKIO_STAT_QUEUED,
 	/* All the single valued stats go below this */
 	BLKIO_STAT_TIME,
 	BLKIO_STAT_SECTORS,
 #ifdef CONFIG_DEBUG_BLK_CGROUP
+	BLKIO_STAT_AVG_QUEUE_SIZE,
 	BLKIO_STAT_DEQUEUE
 #endif
 };
@@ -63,8 +66,12 @@ struct blkio_group_stats {
 	/* total disk time and nr sectors dispatched by this group */
 	uint64_t time;
 	uint64_t sectors;
-	uint64_t stat_arr[BLKIO_STAT_MERGED + 1][BLKIO_STAT_TOTAL];
+	uint64_t stat_arr[BLKIO_STAT_QUEUED + 1][BLKIO_STAT_TOTAL];
 #ifdef CONFIG_DEBUG_BLK_CGROUP
+	/* Sum of number of IOs queued across all samples */
+	uint64_t avg_queue_size_sum;
+	/* Count of samples taken for average */
+	uint64_t avg_queue_size_samples;
 	/* How many times this group has been removed from service tree */
 	unsigned long dequeue;
 #endif
@@ -127,10 +134,13 @@ static inline char *blkg_path(struct blkio_group *blkg)
 {
 	return blkg->path;
 }
+void blkiocg_update_set_active_queue_stats(struct blkio_group *blkg);
 void blkiocg_update_dequeue_stats(struct blkio_group *blkg,
 				unsigned long dequeue);
 #else
 static inline char *blkg_path(struct blkio_group *blkg) { return NULL; }
+static inline void blkiocg_update_set_active_queue_stats(
+						struct blkio_group *blkg) {}
 static inline void blkiocg_update_dequeue_stats(struct blkio_group *blkg,
 						unsigned long dequeue) {}
 #endif
@@ -152,6 +162,10 @@ void blkiocg_update_completion_stats(struct blkio_group *blkg,
 	uint64_t start_time, uint64_t io_start_time, bool direction, bool sync);
 void blkiocg_update_io_merged_stats(struct blkio_group *blkg, bool direction,
 					bool sync);
+void blkiocg_update_request_add_stats(struct blkio_group *blkg,
+		struct blkio_group *curr_blkg, bool direction, bool sync);
+void blkiocg_update_request_remove_stats(struct blkio_group *blkg,
+					bool direction, bool sync);
 #else
 struct cgroup;
 static inline struct blkio_cgroup *
@@ -175,5 +189,9 @@ static inline void blkiocg_update_completion_stats(struct blkio_group *blkg,
 		bool sync) {}
 static inline void blkiocg_update_io_merged_stats(struct blkio_group *blkg,
 						bool direction, bool sync) {}
+static inline void blkiocg_update_request_add_stats(struct blkio_group *blkg,
+		struct blkio_group *curr_blkg, bool direction, bool sync) {}
+static inline void blkiocg_update_request_remove_stats(struct blkio_group *blkg,
+						bool direction, bool sync) {}
 #endif
 #endif /* _BLK_CGROUP_H */
diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index 4eb1906cf6c..8e0b86a9111 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -1380,7 +1380,12 @@ static void cfq_reposition_rq_rb(struct cfq_queue *cfqq, struct request *rq)
 {
 	elv_rb_del(&cfqq->sort_list, rq);
 	cfqq->queued[rq_is_sync(rq)]--;
+	blkiocg_update_request_remove_stats(&cfqq->cfqg->blkg, rq_data_dir(rq),
+						rq_is_sync(rq));
 	cfq_add_rq_rb(rq);
+	blkiocg_update_request_add_stats(
+			&cfqq->cfqg->blkg, &cfqq->cfqd->serving_group->blkg,
+			rq_data_dir(rq), rq_is_sync(rq));
 }
 
 static struct request *
@@ -1436,6 +1441,8 @@ static void cfq_remove_request(struct request *rq)
 	cfq_del_rq_rb(rq);
 
 	cfqq->cfqd->rq_queued--;
+	blkiocg_update_request_remove_stats(&cfqq->cfqg->blkg, rq_data_dir(rq),
+						rq_is_sync(rq));
 	if (rq_is_meta(rq)) {
 		WARN_ON(!cfqq->meta_pending);
 		cfqq->meta_pending--;
@@ -1527,6 +1534,7 @@ static void __cfq_set_active_queue(struct cfq_data *cfqd,
 	if (cfqq) {
 		cfq_log_cfqq(cfqd, cfqq, "set_active wl_prio:%d wl_type:%d",
 				cfqd->serving_prio, cfqd->serving_type);
+		blkiocg_update_set_active_queue_stats(&cfqq->cfqg->blkg);
 		cfqq->slice_start = 0;
 		cfqq->dispatch_start = jiffies;
 		cfqq->allocated_slice = 0;
@@ -3213,6 +3221,9 @@ static void cfq_insert_request(struct request_queue *q, struct request *rq)
 	list_add_tail(&rq->queuelist, &cfqq->fifo);
 	cfq_add_rq_rb(rq);
 
+	blkiocg_update_request_add_stats(&cfqq->cfqg->blkg,
+			&cfqd->serving_group->blkg, rq_data_dir(rq),
+			rq_is_sync(rq));
 	cfq_rq_enqueued(cfqd, cfqq, rq);
 }
 
-- 
cgit v1.2.3-70-g09d2


From 812df48d127365ffd0869aa139738f572a86759c Mon Sep 17 00:00:00 2001
From: Divyesh Shah <dpshah@google.com>
Date: Thu, 8 Apr 2010 21:15:35 -0700
Subject: blkio: Add more debug-only per-cgroup stats

1) group_wait_time - This is the amount of time the cgroup had to wait to get a
  timeslice for one of its queues from when it became busy, i.e., went from 0
  to 1 request queued. This is different from the io_wait_time which is the
  cumulative total of the amount of time spent by each IO in that cgroup waiting
  in the scheduler queue. This stat is a great way to find out any jobs in the
  fleet that are being starved or waiting for longer than what is expected (due
  to an IO controller bug or any other issue).
2) empty_time - This is the amount of time a cgroup spends w/o any pending
   requests. This stat is useful when a job does not seem to be able to use its
   assigned disk share by helping check if that is happening due to an IO
   controller bug or because the job is not submitting enough IOs.
3) idle_time - This is the amount of time spent by the IO scheduler idling
   for a given cgroup in anticipation of a better request than the exising ones
   from other queues/cgroups.

All these stats are recorded using start and stop events. When reading these
stats, we do not add the delta between the current time and the last start time
if we're between the start and stop events. We avoid doing this to make sure
that these numbers are always monotonically increasing when read. Since we're
using sched_clock() which may use the tsc as its source, it may induce some
inconsistency (due to tsc resync across cpus) if we included the current delta.

Signed-off-by: Divyesh Shah<dpshah@google.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
---
 Documentation/cgroups/blkio-controller.txt |  29 ++++++
 block/blk-cgroup.c                         | 159 ++++++++++++++++++++++++++++-
 block/blk-cgroup.h                         |  54 ++++++++++
 block/cfq-iosched.c                        |  50 +++++----
 4 files changed, 271 insertions(+), 21 deletions(-)

(limited to 'Documentation/cgroups')

diff --git a/Documentation/cgroups/blkio-controller.txt b/Documentation/cgroups/blkio-controller.txt
index 6e52e7c512a..db054ea3e7f 100644
--- a/Documentation/cgroups/blkio-controller.txt
+++ b/Documentation/cgroups/blkio-controller.txt
@@ -150,6 +150,35 @@ Details of cgroup files
 	  cgroup's existence. Queue size samples are taken each time one of the
 	  queues of this cgroup gets a timeslice.
 
+- blkio.group_wait_time
+	- Debugging aid only enabled if CONFIG_DEBUG_CFQ_IOSCHED=y.
+	  This is the amount of time the cgroup had to wait since it became busy
+	  (i.e., went from 0 to 1 request queued) to get a timeslice for one of
+	  its queues. This is different from the io_wait_time which is the
+	  cumulative total of the amount of time spent by each IO in that cgroup
+	  waiting in the scheduler queue. This is in nanoseconds. If this is
+	  read when the cgroup is in a waiting (for timeslice) state, the stat
+	  will only report the group_wait_time accumulated till the last time it
+	  got a timeslice and will not include the current delta.
+
+- blkio.empty_time
+	- Debugging aid only enabled if CONFIG_DEBUG_CFQ_IOSCHED=y.
+	  This is the amount of time a cgroup spends without any pending
+	  requests when not being served, i.e., it does not include any time
+	  spent idling for one of the queues of the cgroup. This is in
+	  nanoseconds. If this is read when the cgroup is in an empty state,
+	  the stat will only report the empty_time accumulated till the last
+	  time it had a pending request and will not include the current delta.
+
+- blkio.idle_time
+	- Debugging aid only enabled if CONFIG_DEBUG_CFQ_IOSCHED=y.
+	  This is the amount of time spent by the IO scheduler idling for a
+	  given cgroup in anticipation of a better request than the exising ones
+	  from other queues/cgroups. This is in nanoseconds. If this is read
+	  when the cgroup is in an idling state, the stat will only report the
+	  idle_time accumulated till the last idle period and will not include
+	  the current delta.
+
 - blkio.dequeue
 	- Debugging aid only enabled if CONFIG_DEBUG_CFQ_IOSCHED=y. This
 	  gives the statistics about how many a times a group was dequeued
diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 1e0c4970b35..1ecff7a39f2 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -105,6 +105,76 @@ static void blkio_check_and_dec_stat(uint64_t *stat, bool direction, bool sync)
 }
 
 #ifdef CONFIG_DEBUG_BLK_CGROUP
+/* This should be called with the blkg->stats_lock held. */
+static void blkio_set_start_group_wait_time(struct blkio_group *blkg,
+						struct blkio_group *curr_blkg)
+{
+	if (blkio_blkg_waiting(&blkg->stats))
+		return;
+	if (blkg == curr_blkg)
+		return;
+	blkg->stats.start_group_wait_time = sched_clock();
+	blkio_mark_blkg_waiting(&blkg->stats);
+}
+
+/* This should be called with the blkg->stats_lock held. */
+static void blkio_update_group_wait_time(struct blkio_group_stats *stats)
+{
+	unsigned long long now;
+
+	if (!blkio_blkg_waiting(stats))
+		return;
+
+	now = sched_clock();
+	if (time_after64(now, stats->start_group_wait_time))
+		stats->group_wait_time += now - stats->start_group_wait_time;
+	blkio_clear_blkg_waiting(stats);
+}
+
+/* This should be called with the blkg->stats_lock held. */
+static void blkio_end_empty_time(struct blkio_group_stats *stats)
+{
+	unsigned long long now;
+
+	if (!blkio_blkg_empty(stats))
+		return;
+
+	now = sched_clock();
+	if (time_after64(now, stats->start_empty_time))
+		stats->empty_time += now - stats->start_empty_time;
+	blkio_clear_blkg_empty(stats);
+}
+
+void blkiocg_update_set_idle_time_stats(struct blkio_group *blkg)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&blkg->stats_lock, flags);
+	BUG_ON(blkio_blkg_idling(&blkg->stats));
+	blkg->stats.start_idle_time = sched_clock();
+	blkio_mark_blkg_idling(&blkg->stats);
+	spin_unlock_irqrestore(&blkg->stats_lock, flags);
+}
+EXPORT_SYMBOL_GPL(blkiocg_update_set_idle_time_stats);
+
+void blkiocg_update_idle_time_stats(struct blkio_group *blkg)
+{
+	unsigned long flags;
+	unsigned long long now;
+	struct blkio_group_stats *stats;
+
+	spin_lock_irqsave(&blkg->stats_lock, flags);
+	stats = &blkg->stats;
+	if (blkio_blkg_idling(stats)) {
+		now = sched_clock();
+		if (time_after64(now, stats->start_idle_time))
+			stats->idle_time += now - stats->start_idle_time;
+		blkio_clear_blkg_idling(stats);
+	}
+	spin_unlock_irqrestore(&blkg->stats_lock, flags);
+}
+EXPORT_SYMBOL_GPL(blkiocg_update_idle_time_stats);
+
 void blkiocg_update_set_active_queue_stats(struct blkio_group *blkg)
 {
 	unsigned long flags;
@@ -116,9 +186,14 @@ void blkiocg_update_set_active_queue_stats(struct blkio_group *blkg)
 			stats->stat_arr[BLKIO_STAT_QUEUED][BLKIO_STAT_READ] +
 			stats->stat_arr[BLKIO_STAT_QUEUED][BLKIO_STAT_WRITE];
 	stats->avg_queue_size_samples++;
+	blkio_update_group_wait_time(stats);
 	spin_unlock_irqrestore(&blkg->stats_lock, flags);
 }
 EXPORT_SYMBOL_GPL(blkiocg_update_set_active_queue_stats);
+#else
+static inline void blkio_set_start_group_wait_time(struct blkio_group *blkg,
+					struct blkio_group *curr_blkg) {}
+static inline void blkio_end_empty_time(struct blkio_group_stats *stats) {}
 #endif
 
 void blkiocg_update_request_add_stats(struct blkio_group *blkg,
@@ -130,6 +205,8 @@ void blkiocg_update_request_add_stats(struct blkio_group *blkg,
 	spin_lock_irqsave(&blkg->stats_lock, flags);
 	blkio_add_stat(blkg->stats.stat_arr[BLKIO_STAT_QUEUED], 1, direction,
 			sync);
+	blkio_end_empty_time(&blkg->stats);
+	blkio_set_start_group_wait_time(blkg, curr_blkg);
 	spin_unlock_irqrestore(&blkg->stats_lock, flags);
 }
 EXPORT_SYMBOL_GPL(blkiocg_update_request_add_stats);
@@ -156,6 +233,33 @@ void blkiocg_update_timeslice_used(struct blkio_group *blkg, unsigned long time)
 }
 EXPORT_SYMBOL_GPL(blkiocg_update_timeslice_used);
 
+void blkiocg_set_start_empty_time(struct blkio_group *blkg, bool ignore)
+{
+	unsigned long flags;
+	struct blkio_group_stats *stats;
+
+	spin_lock_irqsave(&blkg->stats_lock, flags);
+	stats = &blkg->stats;
+
+	if (stats->stat_arr[BLKIO_STAT_QUEUED][BLKIO_STAT_READ] ||
+			stats->stat_arr[BLKIO_STAT_QUEUED][BLKIO_STAT_WRITE]) {
+		spin_unlock_irqrestore(&blkg->stats_lock, flags);
+		return;
+	}
+
+	/*
+	 * If ignore is set, we do not panic on the empty flag being set
+	 * already. This is to avoid cases where there are superfluous timeslice
+	 * complete events (for eg., forced_dispatch in CFQ) when no IOs are
+	 * served which could result in triggering the empty check incorrectly.
+	 */
+	BUG_ON(!ignore && blkio_blkg_empty(stats));
+	stats->start_empty_time = sched_clock();
+	blkio_mark_blkg_empty(stats);
+	spin_unlock_irqrestore(&blkg->stats_lock, flags);
+}
+EXPORT_SYMBOL_GPL(blkiocg_set_start_empty_time);
+
 void blkiocg_update_dispatch_stats(struct blkio_group *blkg,
 				uint64_t bytes, bool direction, bool sync)
 {
@@ -317,19 +421,44 @@ blkiocg_reset_stats(struct cgroup *cgroup, struct cftype *cftype, u64 val)
 {
 	struct blkio_cgroup *blkcg;
 	struct blkio_group *blkg;
+	struct blkio_group_stats *stats;
 	struct hlist_node *n;
 	uint64_t queued[BLKIO_STAT_TOTAL];
 	int i;
+#ifdef CONFIG_DEBUG_BLK_CGROUP
+	bool idling, waiting, empty;
+	unsigned long long now = sched_clock();
+#endif
 
 	blkcg = cgroup_to_blkio_cgroup(cgroup);
 	spin_lock_irq(&blkcg->lock);
 	hlist_for_each_entry(blkg, n, &blkcg->blkg_list, blkcg_node) {
 		spin_lock(&blkg->stats_lock);
+		stats = &blkg->stats;
+#ifdef CONFIG_DEBUG_BLK_CGROUP
+		idling = blkio_blkg_idling(stats);
+		waiting = blkio_blkg_waiting(stats);
+		empty = blkio_blkg_empty(stats);
+#endif
 		for (i = 0; i < BLKIO_STAT_TOTAL; i++)
-			queued[i] = blkg->stats.stat_arr[BLKIO_STAT_QUEUED][i];
-		memset(&blkg->stats, 0, sizeof(struct blkio_group_stats));
+			queued[i] = stats->stat_arr[BLKIO_STAT_QUEUED][i];
+		memset(stats, 0, sizeof(struct blkio_group_stats));
 		for (i = 0; i < BLKIO_STAT_TOTAL; i++)
-			blkg->stats.stat_arr[BLKIO_STAT_QUEUED][i] = queued[i];
+			stats->stat_arr[BLKIO_STAT_QUEUED][i] = queued[i];
+#ifdef CONFIG_DEBUG_BLK_CGROUP
+		if (idling) {
+			blkio_mark_blkg_idling(stats);
+			stats->start_idle_time = now;
+		}
+		if (waiting) {
+			blkio_mark_blkg_waiting(stats);
+			stats->start_group_wait_time = now;
+		}
+		if (empty) {
+			blkio_mark_blkg_empty(stats);
+			stats->start_empty_time = now;
+		}
+#endif
 		spin_unlock(&blkg->stats_lock);
 	}
 	spin_unlock_irq(&blkcg->lock);
@@ -401,6 +530,15 @@ static uint64_t blkio_get_stat(struct blkio_group *blkg,
 			sum = 0;
 		return blkio_fill_stat(key_str, MAX_KEY_LEN - 1, sum, cb, dev);
 	}
+	if (type == BLKIO_STAT_GROUP_WAIT_TIME)
+		return blkio_fill_stat(key_str, MAX_KEY_LEN - 1,
+					blkg->stats.group_wait_time, cb, dev);
+	if (type == BLKIO_STAT_IDLE_TIME)
+		return blkio_fill_stat(key_str, MAX_KEY_LEN - 1,
+					blkg->stats.idle_time, cb, dev);
+	if (type == BLKIO_STAT_EMPTY_TIME)
+		return blkio_fill_stat(key_str, MAX_KEY_LEN - 1,
+					blkg->stats.empty_time, cb, dev);
 	if (type == BLKIO_STAT_DEQUEUE)
 		return blkio_fill_stat(key_str, MAX_KEY_LEN - 1,
 					blkg->stats.dequeue, cb, dev);
@@ -458,6 +596,9 @@ SHOW_FUNCTION_PER_GROUP(io_queued, BLKIO_STAT_QUEUED, 1);
 #ifdef CONFIG_DEBUG_BLK_CGROUP
 SHOW_FUNCTION_PER_GROUP(dequeue, BLKIO_STAT_DEQUEUE, 0);
 SHOW_FUNCTION_PER_GROUP(avg_queue_size, BLKIO_STAT_AVG_QUEUE_SIZE, 0);
+SHOW_FUNCTION_PER_GROUP(group_wait_time, BLKIO_STAT_GROUP_WAIT_TIME, 0);
+SHOW_FUNCTION_PER_GROUP(idle_time, BLKIO_STAT_IDLE_TIME, 0);
+SHOW_FUNCTION_PER_GROUP(empty_time, BLKIO_STAT_EMPTY_TIME, 0);
 #endif
 #undef SHOW_FUNCTION_PER_GROUP
 
@@ -517,6 +658,18 @@ struct cftype blkio_files[] = {
 		.name = "avg_queue_size",
 		.read_map = blkiocg_avg_queue_size_read,
 	},
+	{
+		.name = "group_wait_time",
+		.read_map = blkiocg_group_wait_time_read,
+	},
+	{
+		.name = "idle_time",
+		.read_map = blkiocg_idle_time_read,
+	},
+	{
+		.name = "empty_time",
+		.read_map = blkiocg_empty_time_read,
+	},
 	{
 		.name = "dequeue",
 		.read_map = blkiocg_dequeue_read,
diff --git a/block/blk-cgroup.h b/block/blk-cgroup.h
index bea7f3b9a88..bfce085b196 100644
--- a/block/blk-cgroup.h
+++ b/block/blk-cgroup.h
@@ -43,6 +43,9 @@ enum stat_type {
 	BLKIO_STAT_SECTORS,
 #ifdef CONFIG_DEBUG_BLK_CGROUP
 	BLKIO_STAT_AVG_QUEUE_SIZE,
+	BLKIO_STAT_IDLE_TIME,
+	BLKIO_STAT_EMPTY_TIME,
+	BLKIO_STAT_GROUP_WAIT_TIME,
 	BLKIO_STAT_DEQUEUE
 #endif
 };
@@ -55,6 +58,13 @@ enum stat_sub_type {
 	BLKIO_STAT_TOTAL
 };
 
+/* blkg state flags */
+enum blkg_state_flags {
+	BLKG_waiting = 0,
+	BLKG_idling,
+	BLKG_empty,
+};
+
 struct blkio_cgroup {
 	struct cgroup_subsys_state css;
 	unsigned int weight;
@@ -74,6 +84,21 @@ struct blkio_group_stats {
 	uint64_t avg_queue_size_samples;
 	/* How many times this group has been removed from service tree */
 	unsigned long dequeue;
+
+	/* Total time spent waiting for it to be assigned a timeslice. */
+	uint64_t group_wait_time;
+	uint64_t start_group_wait_time;
+
+	/* Time spent idling for this blkio_group */
+	uint64_t idle_time;
+	uint64_t start_idle_time;
+	/*
+	 * Total time when we have requests queued and do not contain the
+	 * current active queue.
+	 */
+	uint64_t empty_time;
+	uint64_t start_empty_time;
+	uint16_t flags;
 #endif
 };
 
@@ -137,12 +162,41 @@ static inline char *blkg_path(struct blkio_group *blkg)
 void blkiocg_update_set_active_queue_stats(struct blkio_group *blkg);
 void blkiocg_update_dequeue_stats(struct blkio_group *blkg,
 				unsigned long dequeue);
+void blkiocg_update_set_idle_time_stats(struct blkio_group *blkg);
+void blkiocg_update_idle_time_stats(struct blkio_group *blkg);
+void blkiocg_set_start_empty_time(struct blkio_group *blkg, bool ignore);
+
+#define BLKG_FLAG_FNS(name)						\
+static inline void blkio_mark_blkg_##name(				\
+		struct blkio_group_stats *stats)			\
+{									\
+	stats->flags |= (1 << BLKG_##name);				\
+}									\
+static inline void blkio_clear_blkg_##name(				\
+		struct blkio_group_stats *stats)			\
+{									\
+	stats->flags &= ~(1 << BLKG_##name);				\
+}									\
+static inline int blkio_blkg_##name(struct blkio_group_stats *stats)	\
+{									\
+	return (stats->flags & (1 << BLKG_##name)) != 0;		\
+}									\
+
+BLKG_FLAG_FNS(waiting)
+BLKG_FLAG_FNS(idling)
+BLKG_FLAG_FNS(empty)
+#undef BLKG_FLAG_FNS
 #else
 static inline char *blkg_path(struct blkio_group *blkg) { return NULL; }
 static inline void blkiocg_update_set_active_queue_stats(
 						struct blkio_group *blkg) {}
 static inline void blkiocg_update_dequeue_stats(struct blkio_group *blkg,
 						unsigned long dequeue) {}
+static inline void blkiocg_update_set_idle_time_stats(struct blkio_group *blkg)
+{}
+static inline void blkiocg_update_idle_time_stats(struct blkio_group *blkg) {}
+static inline void blkiocg_set_start_empty_time(struct blkio_group *blkg,
+						bool ignore) {}
 #endif
 
 #if defined(CONFIG_BLK_CGROUP) || defined(CONFIG_BLK_CGROUP_MODULE)
diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index 8e0b86a9111..b6e095c7ef5 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -886,7 +886,7 @@ static inline unsigned int cfq_cfqq_slice_usage(struct cfq_queue *cfqq)
 }
 
 static void cfq_group_served(struct cfq_data *cfqd, struct cfq_group *cfqg,
-				struct cfq_queue *cfqq)
+				struct cfq_queue *cfqq, bool forced)
 {
 	struct cfq_rb_root *st = &cfqd->grp_service_tree;
 	unsigned int used_sl, charge_sl;
@@ -916,6 +916,7 @@ static void cfq_group_served(struct cfq_data *cfqd, struct cfq_group *cfqg,
 	cfq_log_cfqg(cfqd, cfqg, "served: vt=%llu min_vt=%llu", cfqg->vdisktime,
 					st->min_vdisktime);
 	blkiocg_update_timeslice_used(&cfqg->blkg, used_sl);
+	blkiocg_set_start_empty_time(&cfqg->blkg, forced);
 }
 
 #ifdef CONFIG_CFQ_GROUP_IOSCHED
@@ -1528,6 +1529,12 @@ static int cfq_allow_merge(struct request_queue *q, struct request *rq,
 	return cfqq == RQ_CFQQ(rq);
 }
 
+static inline void cfq_del_timer(struct cfq_data *cfqd, struct cfq_queue *cfqq)
+{
+	del_timer(&cfqd->idle_slice_timer);
+	blkiocg_update_idle_time_stats(&cfqq->cfqg->blkg);
+}
+
 static void __cfq_set_active_queue(struct cfq_data *cfqd,
 				   struct cfq_queue *cfqq)
 {
@@ -1547,7 +1554,7 @@ static void __cfq_set_active_queue(struct cfq_data *cfqd,
 		cfq_clear_cfqq_fifo_expire(cfqq);
 		cfq_mark_cfqq_slice_new(cfqq);
 
-		del_timer(&cfqd->idle_slice_timer);
+		cfq_del_timer(cfqd, cfqq);
 	}
 
 	cfqd->active_queue = cfqq;
@@ -1558,12 +1565,12 @@ static void __cfq_set_active_queue(struct cfq_data *cfqd,
  */
 static void
 __cfq_slice_expired(struct cfq_data *cfqd, struct cfq_queue *cfqq,
-		    bool timed_out)
+		    bool timed_out, bool forced)
 {
 	cfq_log_cfqq(cfqd, cfqq, "slice expired t=%d", timed_out);
 
 	if (cfq_cfqq_wait_request(cfqq))
-		del_timer(&cfqd->idle_slice_timer);
+		cfq_del_timer(cfqd, cfqq);
 
 	cfq_clear_cfqq_wait_request(cfqq);
 	cfq_clear_cfqq_wait_busy(cfqq);
@@ -1585,7 +1592,7 @@ __cfq_slice_expired(struct cfq_data *cfqd, struct cfq_queue *cfqq,
 		cfq_log_cfqq(cfqd, cfqq, "resid=%ld", cfqq->slice_resid);
 	}
 
-	cfq_group_served(cfqd, cfqq->cfqg, cfqq);
+	cfq_group_served(cfqd, cfqq->cfqg, cfqq, forced);
 
 	if (cfq_cfqq_on_rr(cfqq) && RB_EMPTY_ROOT(&cfqq->sort_list))
 		cfq_del_cfqq_rr(cfqd, cfqq);
@@ -1604,12 +1611,13 @@ __cfq_slice_expired(struct cfq_data *cfqd, struct cfq_queue *cfqq,
 	}
 }
 
-static inline void cfq_slice_expired(struct cfq_data *cfqd, bool timed_out)
+static inline void cfq_slice_expired(struct cfq_data *cfqd, bool timed_out,
+					bool forced)
 {
 	struct cfq_queue *cfqq = cfqd->active_queue;
 
 	if (cfqq)
-		__cfq_slice_expired(cfqd, cfqq, timed_out);
+		__cfq_slice_expired(cfqd, cfqq, timed_out, forced);
 }
 
 /*
@@ -1865,6 +1873,7 @@ static void cfq_arm_slice_timer(struct cfq_data *cfqd)
 	sl = cfqd->cfq_slice_idle;
 
 	mod_timer(&cfqd->idle_slice_timer, jiffies + sl);
+	blkiocg_update_set_idle_time_stats(&cfqq->cfqg->blkg);
 	cfq_log_cfqq(cfqd, cfqq, "arm_idle: %lu", sl);
 }
 
@@ -2176,7 +2185,7 @@ static struct cfq_queue *cfq_select_queue(struct cfq_data *cfqd)
 	}
 
 expire:
-	cfq_slice_expired(cfqd, 0);
+	cfq_slice_expired(cfqd, 0, false);
 new_queue:
 	/*
 	 * Current queue expired. Check if we have to switch to a new
@@ -2202,7 +2211,7 @@ static int __cfq_forced_dispatch_cfqq(struct cfq_queue *cfqq)
 	BUG_ON(!list_empty(&cfqq->fifo));
 
 	/* By default cfqq is not expired if it is empty. Do it explicitly */
-	__cfq_slice_expired(cfqq->cfqd, cfqq, 0);
+	__cfq_slice_expired(cfqq->cfqd, cfqq, 0, true);
 	return dispatched;
 }
 
@@ -2218,7 +2227,7 @@ static int cfq_forced_dispatch(struct cfq_data *cfqd)
 	while ((cfqq = cfq_get_next_queue_forced(cfqd)) != NULL)
 		dispatched += __cfq_forced_dispatch_cfqq(cfqq);
 
-	cfq_slice_expired(cfqd, 0);
+	cfq_slice_expired(cfqd, 0, true);
 	BUG_ON(cfqd->busy_queues);
 
 	cfq_log(cfqd, "forced_dispatch=%d", dispatched);
@@ -2382,10 +2391,15 @@ static int cfq_dispatch_requests(struct request_queue *q, int force)
 	    cfqq->slice_dispatch >= cfq_prio_to_maxrq(cfqd, cfqq)) ||
 	    cfq_class_idle(cfqq))) {
 		cfqq->slice_end = jiffies + 1;
-		cfq_slice_expired(cfqd, 0);
+		cfq_slice_expired(cfqd, 0, false);
 	}
 
 	cfq_log_cfqq(cfqd, cfqq, "dispatched a request");
+	/*
+	 * This is needed since we don't exactly match the mod_timer() and
+	 * del_timer() calls in CFQ.
+	 */
+	blkiocg_update_idle_time_stats(&cfqq->cfqg->blkg);
 	return 1;
 }
 
@@ -2413,7 +2427,7 @@ static void cfq_put_queue(struct cfq_queue *cfqq)
 	orig_cfqg = cfqq->orig_cfqg;
 
 	if (unlikely(cfqd->active_queue == cfqq)) {
-		__cfq_slice_expired(cfqd, cfqq, 0);
+		__cfq_slice_expired(cfqd, cfqq, 0, false);
 		cfq_schedule_dispatch(cfqd);
 	}
 
@@ -2514,7 +2528,7 @@ static void cfq_exit_cfqq(struct cfq_data *cfqd, struct cfq_queue *cfqq)
 	struct cfq_queue *__cfqq, *next;
 
 	if (unlikely(cfqq == cfqd->active_queue)) {
-		__cfq_slice_expired(cfqd, cfqq, 0);
+		__cfq_slice_expired(cfqd, cfqq, 0, false);
 		cfq_schedule_dispatch(cfqd);
 	}
 
@@ -3143,7 +3157,7 @@ cfq_should_preempt(struct cfq_data *cfqd, struct cfq_queue *new_cfqq,
 static void cfq_preempt_queue(struct cfq_data *cfqd, struct cfq_queue *cfqq)
 {
 	cfq_log_cfqq(cfqd, cfqq, "preempt");
-	cfq_slice_expired(cfqd, 1);
+	cfq_slice_expired(cfqd, 1, false);
 
 	/*
 	 * Put the new queue at the front of the of the current list,
@@ -3191,7 +3205,7 @@ cfq_rq_enqueued(struct cfq_data *cfqd, struct cfq_queue *cfqq,
 		if (cfq_cfqq_wait_request(cfqq)) {
 			if (blk_rq_bytes(rq) > PAGE_CACHE_SIZE ||
 			    cfqd->busy_queues > 1) {
-				del_timer(&cfqd->idle_slice_timer);
+				cfq_del_timer(cfqd, cfqq);
 				cfq_clear_cfqq_wait_request(cfqq);
 				__blk_run_queue(cfqd->queue);
 			} else
@@ -3352,7 +3366,7 @@ static void cfq_completed_request(struct request_queue *q, struct request *rq)
 		 * - when there is a close cooperator
 		 */
 		if (cfq_slice_used(cfqq) || cfq_class_idle(cfqq))
-			cfq_slice_expired(cfqd, 1);
+			cfq_slice_expired(cfqd, 1, false);
 		else if (sync && cfqq_empty &&
 			 !cfq_close_cooperator(cfqd, cfqq)) {
 			cfqd->noidle_tree_requires_idle |= !rq_noidle(rq);
@@ -3612,7 +3626,7 @@ static void cfq_idle_slice_timer(unsigned long data)
 		cfq_clear_cfqq_deep(cfqq);
 	}
 expire:
-	cfq_slice_expired(cfqd, timed_out);
+	cfq_slice_expired(cfqd, timed_out, false);
 out_kick:
 	cfq_schedule_dispatch(cfqd);
 out_cont:
@@ -3655,7 +3669,7 @@ static void cfq_exit_queue(struct elevator_queue *e)
 	spin_lock_irq(q->queue_lock);
 
 	if (cfqd->active_queue)
-		__cfq_slice_expired(cfqd, cfqd->active_queue, 0);
+		__cfq_slice_expired(cfqd, cfqd->active_queue, 0, false);
 
 	while (!list_empty(&cfqd->cic_list)) {
 		struct cfq_io_context *cic = list_entry(cfqd->cic_list.next,
-- 
cgit v1.2.3-70-g09d2


From da69da184c06f365b335a0e013dc6360a82abe85 Mon Sep 17 00:00:00 2001
From: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Date: Tue, 13 Apr 2010 16:07:50 +0800
Subject: io-controller: Document for blkio.weight_device

Here is the document for blkio.weight_device

Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
---
 Documentation/cgroups/blkio-controller.txt | 31 +++++++++++++++++++++++++++++-
 1 file changed, 30 insertions(+), 1 deletion(-)

(limited to 'Documentation/cgroups')

diff --git a/Documentation/cgroups/blkio-controller.txt b/Documentation/cgroups/blkio-controller.txt
index db054ea3e7f..d422b410a99 100644
--- a/Documentation/cgroups/blkio-controller.txt
+++ b/Documentation/cgroups/blkio-controller.txt
@@ -76,9 +76,38 @@ CONFIG_DEBUG_BLK_CGROUP
 Details of cgroup files
 =======================
 - blkio.weight
-	- Specifies per cgroup weight.
+	- Specifies per cgroup weight. This is default weight of the group
+	  on all the devices until and unless overridden by per device rule.
+	  (See blkio.weight_device).
 	  Currently allowed range of weights is from 100 to 1000.
 
+- blkio.weight_device
+	- One can specify per cgroup per device rules using this interface.
+	  These rules override the default value of group weight as specified
+	  by blkio.weight.
+
+	  Following is the format.
+
+	  #echo dev_maj:dev_minor weight > /path/to/cgroup/blkio.weight_device
+	  Configure weight=300 on /dev/sdb (8:16) in this cgroup
+	  # echo 8:16 300 > blkio.weight_device
+	  # cat blkio.weight_device
+	  dev     weight
+	  8:16    300
+
+	  Configure weight=500 on /dev/sda (8:0) in this cgroup
+	  # echo 8:0 500 > blkio.weight_device
+	  # cat blkio.weight_device
+	  dev     weight
+	  8:0     500
+	  8:16    300
+
+	  Remove specific weight for /dev/sda in this cgroup
+	  # echo 8:0 0 > blkio.weight_device
+	  # cat blkio.weight_device
+	  dev     weight
+	  8:16    300
+
 - blkio.time
 	- disk time allocated to cgroup per device in milliseconds. First
 	  two fields specify the major and minor number of the device and
-- 
cgit v1.2.3-70-g09d2


From a33f32244d8550da8b4a26e277ce07d5c6d158b5 Mon Sep 17 00:00:00 2001
From: Francis Galiegue <fgaliegue@gmail.com>
Date: Fri, 23 Apr 2010 00:08:02 +0200
Subject: Documentation/: it's -> its where appropriate

Fix obvious cases of "it's" being used when "its" was meant.

Signed-off-by: Francis Galiegue <fgaliegue@gmail.com>
Acked-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
---
 Documentation/ABI/testing/sysfs-devices-memory      |  2 +-
 Documentation/DMA-API-HOWTO.txt                     |  2 +-
 Documentation/DocBook/libata.tmpl                   |  2 +-
 Documentation/PCI/pci-error-recovery.txt            |  4 ++--
 Documentation/Smack.txt                             |  2 +-
 Documentation/arm/SA1100/ADSBitsy                   |  2 +-
 Documentation/arm/Sharp-LH/ADC-LH7-Touchscreen      |  2 +-
 Documentation/atomic_ops.txt                        |  2 +-
 Documentation/blackfin/bfin-gpio-notes.txt          |  2 +-
 Documentation/cachetlb.txt                          |  6 +++---
 Documentation/cgroups/memcg_test.txt                |  2 +-
 Documentation/cgroups/memory.txt                    |  2 +-
 Documentation/connector/connector.txt               |  2 +-
 Documentation/dvb/ci.txt                            |  2 +-
 Documentation/dvb/contributors.txt                  |  2 +-
 Documentation/filesystems/autofs4-mount-control.txt |  2 +-
 Documentation/filesystems/ceph.txt                  |  2 +-
 Documentation/filesystems/dlmfs.txt                 |  2 +-
 Documentation/filesystems/fiemap.txt                | 12 ++++++------
 Documentation/filesystems/fuse.txt                  |  4 ++--
 Documentation/filesystems/hpfs.txt                  |  2 +-
 Documentation/filesystems/nfs/rpc-cache.txt         |  2 +-
 Documentation/filesystems/proc.txt                  |  4 ++--
 Documentation/filesystems/smbfs.txt                 |  2 +-
 Documentation/filesystems/vfs.txt                   |  2 +-
 Documentation/hwmon/lm85                            |  2 +-
 Documentation/input/joystick.txt                    |  2 +-
 Documentation/intel_txt.txt                         |  2 +-
 Documentation/kbuild/kconfig-language.txt           |  2 +-
 Documentation/kernel-docs.txt                       | 10 +++++-----
 Documentation/kprobes.txt                           |  2 +-
 Documentation/laptops/laptop-mode.txt               |  2 +-
 Documentation/lguest/lguest.c                       |  2 +-
 Documentation/md.txt                                |  2 +-
 Documentation/netlabel/lsm_interface.txt            |  2 +-
 Documentation/networking/ifenslave.c                |  2 +-
 Documentation/networking/packet_mmap.txt            |  4 ++--
 Documentation/power/regulator/consumer.txt          | 10 +++++-----
 Documentation/power/regulator/machine.txt           |  2 +-
 Documentation/power/regulator/overview.txt          |  6 +++---
 Documentation/powerpc/booting-without-of.txt        |  2 +-
 Documentation/powerpc/phyp-assisted-dump.txt        |  2 +-
 Documentation/rt-mutex-design.txt                   |  2 +-
 Documentation/scsi/ChangeLog.lpfc                   |  4 ++--
 Documentation/scsi/FlashPoint.txt                   |  2 +-
 Documentation/scsi/dtc3x80.txt                      |  2 +-
 Documentation/scsi/ncr53c8xx.txt                    |  2 +-
 Documentation/scsi/osst.txt                         |  2 +-
 Documentation/scsi/scsi_fc_transport.txt            |  4 ++--
 Documentation/scsi/sym53c8xx_2.txt                  |  2 +-
 Documentation/sound/alsa/soc/dapm.txt               |  4 ++--
 Documentation/sound/alsa/soc/machine.txt            |  2 +-
 Documentation/sound/alsa/soc/overview.txt           |  2 +-
 Documentation/usb/WUSB-Design-overview.txt          |  2 +-
 Documentation/vm/numa_memory_policy.txt             |  4 ++--
 Documentation/w1/w1.generic                         |  2 +-
 56 files changed, 81 insertions(+), 81 deletions(-)

(limited to 'Documentation/cgroups')

diff --git a/Documentation/ABI/testing/sysfs-devices-memory b/Documentation/ABI/testing/sysfs-devices-memory
index bf1627b02a0..aba7d989208 100644
--- a/Documentation/ABI/testing/sysfs-devices-memory
+++ b/Documentation/ABI/testing/sysfs-devices-memory
@@ -43,7 +43,7 @@ Date:		September 2008
 Contact:	Badari Pulavarty <pbadari@us.ibm.com>
 Description:
 		The file /sys/devices/system/memory/memoryX/state
-		is read-write.  When read, it's contents show the
+		is read-write.  When read, its contents show the
 		online/offline state of the memory section.  When written,
 		root can toggle the the online/offline state of a removable
 		memory section (see removable file description above)
diff --git a/Documentation/DMA-API-HOWTO.txt b/Documentation/DMA-API-HOWTO.txt
index 52618ab069a..2e435adfbd6 100644
--- a/Documentation/DMA-API-HOWTO.txt
+++ b/Documentation/DMA-API-HOWTO.txt
@@ -742,7 +742,7 @@ failure can be determined by:
 
 			   Closing
 
-This document, and the API itself, would not be in it's current
+This document, and the API itself, would not be in its current
 form without the feedback and suggestions from numerous individuals.
 We would like to specifically mention, in no particular order, the
 following people:
diff --git a/Documentation/DocBook/libata.tmpl b/Documentation/DocBook/libata.tmpl
index ba997577150..261b57bc6f0 100644
--- a/Documentation/DocBook/libata.tmpl
+++ b/Documentation/DocBook/libata.tmpl
@@ -490,7 +490,7 @@ void (*host_stop) (struct ata_host_set *host_set);
 	allocates space for a legacy IDE PRD table and returns.
 	</para>
 	<para>
-	->port_stop() is called after ->host_stop().  It's sole function
+	->port_stop() is called after ->host_stop().  Its sole function
 	is to release DMA/memory resources, now that they are no longer
 	actively being used.  Many drivers also free driver-private
 	data from port at this time.
diff --git a/Documentation/PCI/pci-error-recovery.txt b/Documentation/PCI/pci-error-recovery.txt
index e83f2ea7641..898ded24510 100644
--- a/Documentation/PCI/pci-error-recovery.txt
+++ b/Documentation/PCI/pci-error-recovery.txt
@@ -216,7 +216,7 @@ The driver should return one of the following result codes:
 
 		- PCI_ERS_RESULT_NEED_RESET
 		  Driver returns this if it thinks the device is not
-		  recoverable in it's current state and it needs a slot
+		  recoverable in its current state and it needs a slot
 		  reset to proceed.
 
 		- PCI_ERS_RESULT_DISCONNECT
@@ -241,7 +241,7 @@ in working condition.
 
 The driver is not supposed to restart normal driver I/O operations
 at this point.  It should limit itself to "probing" the device to
-check it's recoverability status. If all is right, then the platform
+check its recoverability status. If all is right, then the platform
 will call resume() once all drivers have ack'd link_reset().
 
 	Result codes:
diff --git a/Documentation/Smack.txt b/Documentation/Smack.txt
index 34614b4c708..e9dab41c0fe 100644
--- a/Documentation/Smack.txt
+++ b/Documentation/Smack.txt
@@ -73,7 +73,7 @@ NOTE: Smack labels are limited to 23 characters. The attr command
 If you don't do anything special all users will get the floor ("_")
 label when they log in. If you do want to log in via the hacked ssh
 at other labels use the attr command to set the smack value on the
-home directory and it's contents.
+home directory and its contents.
 
 You can add access rules in /etc/smack/accesses. They take the form:
 
diff --git a/Documentation/arm/SA1100/ADSBitsy b/Documentation/arm/SA1100/ADSBitsy
index 7197a9e958e..f9f62e8c071 100644
--- a/Documentation/arm/SA1100/ADSBitsy
+++ b/Documentation/arm/SA1100/ADSBitsy
@@ -32,7 +32,7 @@ Notes:
 
 - The flash on board is divided into 3 partitions.
   You should be careful to use flash on board.
-  It's partition is different from GraphicsClient Plus and GraphicsMaster
+  Its partition is different from GraphicsClient Plus and GraphicsMaster
 
 - 16bpp mode requires a different cable than what ships with the board.
   Contact ADS or look through the manual to wire your own. Currently,
diff --git a/Documentation/arm/Sharp-LH/ADC-LH7-Touchscreen b/Documentation/arm/Sharp-LH/ADC-LH7-Touchscreen
index 1e6a23fdf2f..dc460f05564 100644
--- a/Documentation/arm/Sharp-LH/ADC-LH7-Touchscreen
+++ b/Documentation/arm/Sharp-LH/ADC-LH7-Touchscreen
@@ -7,7 +7,7 @@ The driver only implements a four-wire touch panel protocol.
 
 The touchscreen driver is maintenance free except for the pen-down or
 touch threshold.  Some resistive displays and board combinations may
-require tuning of this threshold.  The driver exposes some of it's
+require tuning of this threshold.  The driver exposes some of its
 internal state in the sys filesystem.  If the kernel is configured
 with it, CONFIG_SYSFS, and sysfs is mounted at /sys, there will be a
 directory
diff --git a/Documentation/atomic_ops.txt b/Documentation/atomic_ops.txt
index 396bec3b74e..ac4d4718712 100644
--- a/Documentation/atomic_ops.txt
+++ b/Documentation/atomic_ops.txt
@@ -320,7 +320,7 @@ counter decrement would not become globally visible until the
 obj->active update does.
 
 As a historical note, 32-bit Sparc used to only allow usage of
-24-bits of it's atomic_t type.  This was because it used 8 bits
+24-bits of its atomic_t type.  This was because it used 8 bits
 as a spinlock for SMP safety.  Sparc32 lacked a "compare and swap"
 type instruction.  However, 32-bit Sparc has since been moved over
 to a "hash table of spinlocks" scheme, that allows the full 32-bit
diff --git a/Documentation/blackfin/bfin-gpio-notes.txt b/Documentation/blackfin/bfin-gpio-notes.txt
index 9898c7ded7d..f731c1e5647 100644
--- a/Documentation/blackfin/bfin-gpio-notes.txt
+++ b/Documentation/blackfin/bfin-gpio-notes.txt
@@ -43,7 +43,7 @@
 	void bfin_gpio_irq_free(unsigned gpio);
 
     The request functions will record the function state for a certain pin,
-    the free functions will clear it's function state.
+    the free functions will clear its function state.
     Once a pin is requested, it can't be requested again before it is freed by
     previous caller, otherwise kernel will dump stacks, and the request
     function fail.
diff --git a/Documentation/cachetlb.txt b/Documentation/cachetlb.txt
index 2b5f823abd0..9164ae3b83b 100644
--- a/Documentation/cachetlb.txt
+++ b/Documentation/cachetlb.txt
@@ -5,7 +5,7 @@
 
 This document describes the cache/tlb flushing interfaces called
 by the Linux VM subsystem.  It enumerates over each interface,
-describes it's intended purpose, and what side effect is expected
+describes its intended purpose, and what side effect is expected
 after the interface is invoked.
 
 The side effects described below are stated for a uniprocessor
@@ -231,7 +231,7 @@ require a whole different set of interfaces to handle properly.
 The biggest problem is that of virtual aliasing in the data cache
 of a processor.
 
-Is your port susceptible to virtual aliasing in it's D-cache?
+Is your port susceptible to virtual aliasing in its D-cache?
 Well, if your D-cache is virtually indexed, is larger in size than
 PAGE_SIZE, and does not prevent multiple cache lines for the same
 physical address from existing at once, you have this problem.
@@ -249,7 +249,7 @@ one way to solve this (in particular SPARC_FLAG_MMAPSHARED).
 Next, you have to solve the D-cache aliasing issue for all
 other cases.  Please keep in mind that fact that, for a given page
 mapped into some user address space, there is always at least one more
-mapping, that of the kernel in it's linear mapping starting at
+mapping, that of the kernel in its linear mapping starting at
 PAGE_OFFSET.  So immediately, once the first user maps a given
 physical page into its address space, by implication the D-cache
 aliasing problem has the potential to exist since the kernel already
diff --git a/Documentation/cgroups/memcg_test.txt b/Documentation/cgroups/memcg_test.txt
index f7f68b2ac19..b7eececfb19 100644
--- a/Documentation/cgroups/memcg_test.txt
+++ b/Documentation/cgroups/memcg_test.txt
@@ -244,7 +244,7 @@ Under below explanation, we assume CONFIG_MEM_RES_CTRL_SWAP=y.
 	  we have to check if OLDPAGE/NEWPAGE is a valid page after commit().
 
 8. LRU
-        Each memcg has its own private LRU. Now, it's handling is under global
+        Each memcg has its own private LRU. Now, its handling is under global
 	VM's control (means that it's handled under global zone->lru_lock).
 	Almost all routines around memcg's LRU is called by global LRU's
 	list management functions under zone->lru_lock().
diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt
index 3a6aecd078b..6cab1f29da4 100644
--- a/Documentation/cgroups/memory.txt
+++ b/Documentation/cgroups/memory.txt
@@ -263,7 +263,7 @@ some of the pages cached in the cgroup (page cache pages).
 
 4.2 Task migration
 
-When a task migrates from one cgroup to another, it's charge is not
+When a task migrates from one cgroup to another, its charge is not
 carried forward by default. The pages allocated from the original cgroup still
 remain charged to it, the charge is dropped when the page is freed or
 reclaimed.
diff --git a/Documentation/connector/connector.txt b/Documentation/connector/connector.txt
index 78c9466a9aa..e5c5f5e6ab7 100644
--- a/Documentation/connector/connector.txt
+++ b/Documentation/connector/connector.txt
@@ -88,7 +88,7 @@ int cn_netlink_send(struct cn_msg *msg, u32 __groups, int gfp_mask);
  int gfp_mask			- GFP mask.
 
  Note: When registering new callback user, connector core assigns
- netlink group to the user which is equal to it's id.idx.
+ netlink group to the user which is equal to its id.idx.
 
 /*****************************************/
 Protocol description.
diff --git a/Documentation/dvb/ci.txt b/Documentation/dvb/ci.txt
index 2ecd834585e..4a0c2b56e69 100644
--- a/Documentation/dvb/ci.txt
+++ b/Documentation/dvb/ci.txt
@@ -41,7 +41,7 @@ This application requires the following to function properly as of now.
 
 * Cards that fall in this category
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-At present the cards that fall in this category are the Twinhan and it's
+At present the cards that fall in this category are the Twinhan and its
 clones, these cards are available as VVMER, Tomato, Hercules, Orange and
 so on.
 
diff --git a/Documentation/dvb/contributors.txt b/Documentation/dvb/contributors.txt
index 4865addebe1..47c30098dab 100644
--- a/Documentation/dvb/contributors.txt
+++ b/Documentation/dvb/contributors.txt
@@ -1,7 +1,7 @@
 Thanks go to the following people for patches and contributions:
 
 Michael Hunold <m.hunold@gmx.de>
-  for the initial saa7146 driver and it's recent overhaul
+  for the initial saa7146 driver and its recent overhaul
 
 Christian Theiss
   for his work on the initial Linux DVB driver
diff --git a/Documentation/filesystems/autofs4-mount-control.txt b/Documentation/filesystems/autofs4-mount-control.txt
index 8f78ded4b64..51986bf08a4 100644
--- a/Documentation/filesystems/autofs4-mount-control.txt
+++ b/Documentation/filesystems/autofs4-mount-control.txt
@@ -146,7 +146,7 @@ found to be inadequate, in this case. The Generic Netlink system was
 used for this as raw Netlink would lead to a significant increase in
 complexity. There's no question that the Generic Netlink system is an
 elegant solution for common case ioctl functions but it's not a complete
-replacement probably because it's primary purpose in life is to be a
+replacement probably because its primary purpose in life is to be a
 message bus implementation rather than specifically an ioctl replacement.
 While it would be possible to work around this there is one concern
 that lead to the decision to not use it. This is that the autofs
diff --git a/Documentation/filesystems/ceph.txt b/Documentation/filesystems/ceph.txt
index 0660c9f5dee..763d8ebbbeb 100644
--- a/Documentation/filesystems/ceph.txt
+++ b/Documentation/filesystems/ceph.txt
@@ -90,7 +90,7 @@ Mount Options
 	Specify the IP and/or port the client should bind to locally.
 	There is normally not much reason to do this.  If the IP is not
 	specified, the client's IP address is determined by looking at the
-	address it's connection to the monitor originates from.
+	address its connection to the monitor originates from.
 
   wsize=X
 	Specify the maximum write size in bytes.  By default there is no
diff --git a/Documentation/filesystems/dlmfs.txt b/Documentation/filesystems/dlmfs.txt
index c50bbb2d52b..1b528b2ad80 100644
--- a/Documentation/filesystems/dlmfs.txt
+++ b/Documentation/filesystems/dlmfs.txt
@@ -47,7 +47,7 @@ You'll want to start heartbeating on a volume which all the nodes in
 your lockspace can access. The easiest way to do this is via
 ocfs2_hb_ctl (distributed with ocfs2-tools). Right now it requires
 that an OCFS2 file system be in place so that it can automatically
-find it's heartbeat area, though it will eventually support heartbeat
+find its heartbeat area, though it will eventually support heartbeat
 against raw disks.
 
 Please see the ocfs2_hb_ctl and mkfs.ocfs2 manual pages distributed
diff --git a/Documentation/filesystems/fiemap.txt b/Documentation/filesystems/fiemap.txt
index 606233cd461..1b805a0efbb 100644
--- a/Documentation/filesystems/fiemap.txt
+++ b/Documentation/filesystems/fiemap.txt
@@ -38,7 +38,7 @@ flags, it will return EBADR and the contents of fm_flags will contain
 the set of flags which caused the error. If the kernel is compatible
 with all flags passed, the contents of fm_flags will be unmodified.
 It is up to userspace to determine whether rejection of a particular
-flag is fatal to it's operation. This scheme is intended to allow the
+flag is fatal to its operation. This scheme is intended to allow the
 fiemap interface to grow in the future but without losing
 compatibility with old software.
 
@@ -56,7 +56,7 @@ If this flag is set, the kernel will sync the file before mapping extents.
 
 * FIEMAP_FLAG_XATTR
 If this flag is set, the extents returned will describe the inodes
-extended attribute lookup tree, instead of it's data tree.
+extended attribute lookup tree, instead of its data tree.
 
 
 Extent Mapping
@@ -89,7 +89,7 @@ struct fiemap_extent {
 };
 
 All offsets and lengths are in bytes and mirror those on disk.  It is valid
-for an extents logical offset to start before the request or it's logical
+for an extents logical offset to start before the request or its logical
 length to extend past the request.  Unless FIEMAP_EXTENT_NOT_ALIGNED is
 returned, fe_logical, fe_physical, and fe_length will be aligned to the
 block size of the file system.  With the exception of extents flagged as
@@ -125,7 +125,7 @@ been allocated for the file yet.
 
 * FIEMAP_EXTENT_DELALLOC
   - This will also set FIEMAP_EXTENT_UNKNOWN.
-Delayed allocation - while there is data for this extent, it's
+Delayed allocation - while there is data for this extent, its
 physical location has not been allocated yet.
 
 * FIEMAP_EXTENT_ENCODED
@@ -159,7 +159,7 @@ Data is located within a meta data block.
 Data is packed into a block with data from other files.
 
 * FIEMAP_EXTENT_UNWRITTEN
-Unwritten extent - the extent is allocated but it's data has not been
+Unwritten extent - the extent is allocated but its data has not been
 initialized.  This indicates the extent's data will be all zero if read
 through the filesystem but the contents are undefined if read directly from
 the device.
@@ -176,7 +176,7 @@ VFS -> File System Implementation
 
 File systems wishing to support fiemap must implement a ->fiemap callback on
 their inode_operations structure. The fs ->fiemap call is responsible for
-defining it's set of supported fiemap flags, and calling a helper function on
+defining its set of supported fiemap flags, and calling a helper function on
 each discovered extent:
 
 struct inode_operations {
diff --git a/Documentation/filesystems/fuse.txt b/Documentation/filesystems/fuse.txt
index 397a41adb4c..13af4a49e7d 100644
--- a/Documentation/filesystems/fuse.txt
+++ b/Documentation/filesystems/fuse.txt
@@ -91,7 +91,7 @@ Mount options
 'default_permissions'
 
   By default FUSE doesn't check file access permissions, the
-  filesystem is free to implement it's access policy or leave it to
+  filesystem is free to implement its access policy or leave it to
   the underlying file access mechanism (e.g. in case of network
   filesystems).  This option enables permission checking, restricting
   access based on file mode.  It is usually useful together with the
@@ -171,7 +171,7 @@ or may honor them by sending a reply to the _original_ request, with
 the error set to EINTR.
 
 It is also possible that there's a race between processing the
-original request and it's INTERRUPT request.  There are two possibilities:
+original request and its INTERRUPT request.  There are two possibilities:
 
   1) The INTERRUPT request is processed before the original request is
      processed
diff --git a/Documentation/filesystems/hpfs.txt b/Documentation/filesystems/hpfs.txt
index fa45c3baed9..74630bd504f 100644
--- a/Documentation/filesystems/hpfs.txt
+++ b/Documentation/filesystems/hpfs.txt
@@ -103,7 +103,7 @@ to analyze or change OS2SYS.INI.
 Codepages
 
 HPFS can contain several uppercasing tables for several codepages and each
-file has a pointer to codepage it's name is in. However OS/2 was created in
+file has a pointer to codepage its name is in. However OS/2 was created in
 America where people don't care much about codepages and so multiple codepages
 support is quite buggy. I have Czech OS/2 working in codepage 852 on my disk.
 Once I booted English OS/2 working in cp 850 and I created a file on my 852
diff --git a/Documentation/filesystems/nfs/rpc-cache.txt b/Documentation/filesystems/nfs/rpc-cache.txt
index 8a382bea680..ebcaaee2161 100644
--- a/Documentation/filesystems/nfs/rpc-cache.txt
+++ b/Documentation/filesystems/nfs/rpc-cache.txt
@@ -185,7 +185,7 @@ failed lookup meant a definite 'no'.
 request/response format
 -----------------------
 
-While each cache is free to use it's own format for requests
+While each cache is free to use its own format for requests
 and responses over channel, the following is recommended as
 appropriate and support routines are available to help:
 Each request or response record should be printable ASCII
diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
index 770700317c2..f6b1b5fca1d 100644
--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -965,7 +965,7 @@ your system and how much traffic was routed over those devices:
   ...] 1375103    17405    0    0    0     0       0          0 
   ...] 1703981     5535    0    0    0     3       0          0 
 
-In addition, each Channel Bond interface has it's own directory.  For
+In addition, each Channel Bond interface has its own directory.  For
 example, the bond0 device will have a directory called /proc/net/bond0/.
 It will contain information that is specific to that bond, such as the
 current slaves of the bond, the link status of the slaves, and how
@@ -1362,7 +1362,7 @@ been accounted as having caused 1MB of write.
 In other words: The number of bytes which this process caused to not happen,
 by truncating pagecache. A task can cause "negative" IO too. If this task
 truncates some dirty pagecache, some IO which another task has been accounted
-for (in it's write_bytes) will not be happening. We _could_ just subtract that
+for (in its write_bytes) will not be happening. We _could_ just subtract that
 from the truncating task's write_bytes, but there is information loss in doing
 that.
 
diff --git a/Documentation/filesystems/smbfs.txt b/Documentation/filesystems/smbfs.txt
index f673ef0de0f..194fb0decd2 100644
--- a/Documentation/filesystems/smbfs.txt
+++ b/Documentation/filesystems/smbfs.txt
@@ -3,6 +3,6 @@ protocol used by Windows for Workgroups, Windows 95 and Windows NT.
 Smbfs was inspired by Samba, the program written by Andrew Tridgell
 that turns any Unix host into a file server for DOS or Windows clients.
 
-Smbfs is a SMB client, but uses parts of samba for it's operation. For
+Smbfs is a SMB client, but uses parts of samba for its operation. For
 more info on samba, including documentation, please go to
 http://www.samba.org/ and then on to your nearest mirror.
diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt
index 3de2f32edd9..b66858538df 100644
--- a/Documentation/filesystems/vfs.txt
+++ b/Documentation/filesystems/vfs.txt
@@ -72,7 +72,7 @@ structure (this is the kernel-side implementation of file
 descriptors). The freshly allocated file structure is initialized with
 a pointer to the dentry and a set of file operation member functions.
 These are taken from the inode data. The open() file method is then
-called so the specific filesystem implementation can do it's work. You
+called so the specific filesystem implementation can do its work. You
 can see that this is another switch performed by the VFS. The file
 structure is placed into the file descriptor table for the process.
 
diff --git a/Documentation/hwmon/lm85 b/Documentation/hwmon/lm85
index a13680871bc..a76aefeeb68 100644
--- a/Documentation/hwmon/lm85
+++ b/Documentation/hwmon/lm85
@@ -157,7 +157,7 @@ temperature configuration points:
 
 There are three PWM outputs. The LM85 datasheet suggests that the
 pwm3 output control both fan3 and fan4. Each PWM can be individually
-configured and assigned to a zone for it's control value. Each PWM can be
+configured and assigned to a zone for its control value. Each PWM can be
 configured individually according to the following options.
 
 * pwm#_auto_pwm_min - this specifies the PWM value for temp#_auto_temp_off
diff --git a/Documentation/input/joystick.txt b/Documentation/input/joystick.txt
index 154d767b2ac..8007b7ca87b 100644
--- a/Documentation/input/joystick.txt
+++ b/Documentation/input/joystick.txt
@@ -402,7 +402,7 @@ for the port of the SoundFusion is supported by the cs461x.c module.
 ~~~~~~~~~~~~~~~~~~~~~~~~
   The Live! has a special PCI gameport, which, although it doesn't provide
 any "Enhanced" stuff like 4DWave and friends, is quite a bit faster than
-it's ISA counterparts. It also requires special support, hence the
+its ISA counterparts. It also requires special support, hence the
 emu10k1-gp.c module for it instead of the normal ns558.c one.
 
 3.15 SoundBlaster 64 and 128 - ES1370 and ES1371, ESS Solo1 and S3 SonicVibes
diff --git a/Documentation/intel_txt.txt b/Documentation/intel_txt.txt
index f40a1f03001..1423bcc7c50 100644
--- a/Documentation/intel_txt.txt
+++ b/Documentation/intel_txt.txt
@@ -126,7 +126,7 @@ o  Tboot then applies an (optional) user-defined launch policy to
 o  Tboot adjusts the e820 table provided by the bootloader to reserve
    its own location in memory as well as to reserve certain other
    TXT-related regions.
-o  As part of it's launch, tboot DMA protects all of RAM (using the
+o  As part of its launch, tboot DMA protects all of RAM (using the
    VT-d PMRs).  Thus, the kernel must be booted with 'intel_iommu=on'
    in order to remove this blanket protection and use VT-d's
    page-level protection.
diff --git a/Documentation/kbuild/kconfig-language.txt b/Documentation/kbuild/kconfig-language.txt
index c412c245848..b472e4e0ba6 100644
--- a/Documentation/kbuild/kconfig-language.txt
+++ b/Documentation/kbuild/kconfig-language.txt
@@ -181,7 +181,7 @@ Expressions are listed in decreasing order of precedence.
 (7) Returns the result of max(/expr/, /expr/).
 
 An expression can have a value of 'n', 'm' or 'y' (or 0, 1, 2
-respectively for calculations). A menu entry becomes visible when it's
+respectively for calculations). A menu entry becomes visible when its
 expression evaluates to 'm' or 'y'.
 
 There are two types of symbols: constant and non-constant symbols.
diff --git a/Documentation/kernel-docs.txt b/Documentation/kernel-docs.txt
index 28cdc2af213..ec8d31ee12e 100644
--- a/Documentation/kernel-docs.txt
+++ b/Documentation/kernel-docs.txt
@@ -116,7 +116,7 @@
        Author: Ingo Molnar, Gadi Oxman and Miguel de Icaza.
        URL: http://www.linuxjournal.com/article.php?sid=2391
        Keywords: RAID, MD driver.
-       Description: Linux Journal Kernel Korner article. Here is it's
+       Description: Linux Journal Kernel Korner article. Here is its
        abstract: "A description of the implementation of the RAID-1,
        RAID-4 and RAID-5 personalities of the MD device driver in the
        Linux kernel, providing users with high performance and reliable,
@@ -127,7 +127,7 @@
        URL: http://www.linuxjournal.com/article.php?sid=1219
        Keywords: device driver, module, loading/unloading modules,
        allocating resources.
-       Description: Linux Journal Kernel Korner article. Here is it's
+       Description: Linux Journal Kernel Korner article. Here is its
        abstract: "This is the first of a series of four articles
        co-authored by Alessandro Rubini and Georg Zezchwitz which present
        a practical approach to writing Linux device drivers as kernel
@@ -141,7 +141,7 @@
        Keywords: character driver, init_module, clean_up module,
        autodetection, mayor number, minor number, file operations,
        open(), close().
-       Description: Linux Journal Kernel Korner article. Here is it's
+       Description: Linux Journal Kernel Korner article. Here is its
        abstract: "This article, the second of four, introduces part of
        the actual code to create custom module implementing a character
        device driver. It describes the code for module initialization and
@@ -152,7 +152,7 @@
        URL: http://www.linuxjournal.com/article.php?sid=1221
        Keywords: read(), write(), select(), ioctl(), blocking/non
        blocking mode, interrupt handler.
-       Description: Linux Journal Kernel Korner article. Here is it's
+       Description: Linux Journal Kernel Korner article. Here is its
        abstract: "This article, the third of four on writing character
        device drivers, introduces concepts of reading, writing, and using
        ioctl-calls".
@@ -161,7 +161,7 @@
        Author: Alessandro Rubini and Georg v. Zezschwitz.
        URL: http://www.linuxjournal.com/article.php?sid=1222
        Keywords: interrupts, irqs, DMA, bottom halves, task queues.
-       Description: Linux Journal Kernel Korner article. Here is it's
+       Description: Linux Journal Kernel Korner article. Here is its
        abstract: "This is the fourth in a series of articles about
        writing character device drivers as loadable kernel modules. This
        month, we further investigate the field of interrupt handling.
diff --git a/Documentation/kprobes.txt b/Documentation/kprobes.txt
index 2f9115c0ae6..51ec634ac04 100644
--- a/Documentation/kprobes.txt
+++ b/Documentation/kprobes.txt
@@ -332,7 +332,7 @@ occurs during execution of kp->pre_handler or kp->post_handler,
 or during single-stepping of the probed instruction, Kprobes calls
 kp->fault_handler.  Any or all handlers can be NULL. If kp->flags
 is set KPROBE_FLAG_DISABLED, that kp will be registered but disabled,
-so, it's handlers aren't hit until calling enable_kprobe(kp).
+so, its handlers aren't hit until calling enable_kprobe(kp).
 
 NOTE:
 1. With the introduction of the "symbol_name" field to struct kprobe,
diff --git a/Documentation/laptops/laptop-mode.txt b/Documentation/laptops/laptop-mode.txt
index 2c3c3509302..0bf25eebce9 100644
--- a/Documentation/laptops/laptop-mode.txt
+++ b/Documentation/laptops/laptop-mode.txt
@@ -207,7 +207,7 @@ Tips & Tricks
 * Drew Scott Daniels observed: "I don't know why, but when I decrease the number
   of colours that my display uses it consumes less battery power. I've seen
   this on powerbooks too. I hope that this is a piece of information that
-  might be useful to the Laptop Mode patch or it's users."
+  might be useful to the Laptop Mode patch or its users."
 
 * In syslog.conf, you can prefix entries with a dash ``-'' to omit syncing the
   file after every logging. When you're using laptop-mode and your disk doesn't
diff --git a/Documentation/lguest/lguest.c b/Documentation/lguest/lguest.c
index 3119f5db75b..e9ce3c55451 100644
--- a/Documentation/lguest/lguest.c
+++ b/Documentation/lguest/lguest.c
@@ -263,7 +263,7 @@ static u8 *get_feature_bits(struct device *dev)
  * Launcher virtual with an offset.
  *
  * This can be tough to get your head around, but usually it just means that we
- * use these trivial conversion functions when the Guest gives us it's
+ * use these trivial conversion functions when the Guest gives us its
  * "physical" addresses:
  */
 static void *from_guest_phys(unsigned long addr)
diff --git a/Documentation/md.txt b/Documentation/md.txt
index 188f4768f1d..e4e893ef3e0 100644
--- a/Documentation/md.txt
+++ b/Documentation/md.txt
@@ -136,7 +136,7 @@ raid_disks != 0.
 
 Then uninitialized devices can be added with ADD_NEW_DISK.  The
 structure passed to ADD_NEW_DISK must specify the state of the device
-and it's role in the array.
+and its role in the array.
 
 Once started with RUN_ARRAY, uninitialized spares can be added with
 HOT_ADD_DISK.
diff --git a/Documentation/netlabel/lsm_interface.txt b/Documentation/netlabel/lsm_interface.txt
index 98dd9f7430f..638c74f7de7 100644
--- a/Documentation/netlabel/lsm_interface.txt
+++ b/Documentation/netlabel/lsm_interface.txt
@@ -38,7 +38,7 @@ Depending on the exact configuration, translation between the network packet
 label and the internal LSM security identifier can be time consuming.  The
 NetLabel label mapping cache is a caching mechanism which can be used to
 sidestep much of this overhead once a mapping has been established.  Once the
-LSM has received a packet, used NetLabel to decode it's security attributes,
+LSM has received a packet, used NetLabel to decode its security attributes,
 and translated the security attributes into a LSM internal identifier the LSM
 can use the NetLabel caching functions to associate the LSM internal
 identifier with the network packet's label.  This means that in the future
diff --git a/Documentation/networking/ifenslave.c b/Documentation/networking/ifenslave.c
index 1b96ccda383..2bac9618c34 100644
--- a/Documentation/networking/ifenslave.c
+++ b/Documentation/networking/ifenslave.c
@@ -756,7 +756,7 @@ static int enslave(char *master_ifname, char *slave_ifname)
 		 */
 		if (abi_ver < 1) {
 			/* For old ABI, the master needs to be
-			 * down before setting it's hwaddr
+			 * down before setting its hwaddr
 			 */
 			res = set_if_down(master_ifname, master_flags.ifr_flags);
 			if (res) {
diff --git a/Documentation/networking/packet_mmap.txt b/Documentation/networking/packet_mmap.txt
index 09ab0d29032..98f71a5cef0 100644
--- a/Documentation/networking/packet_mmap.txt
+++ b/Documentation/networking/packet_mmap.txt
@@ -100,7 +100,7 @@ by the kernel.
 The destruction of the socket and all associated resources
 is done by a simple call to close(fd).
 
-Next I will describe PACKET_MMAP settings and it's constraints,
+Next I will describe PACKET_MMAP settings and its constraints,
 also the mapping of the circular buffer in the user process and 
 the use of this buffer.
 
@@ -432,7 +432,7 @@ TP_STATUS_LOSING      : indicates there were packet drops from last time
                         the PACKET_STATISTICS option.
 
 TP_STATUS_CSUMNOTREADY: currently it's used for outgoing IP packets which 
-                        it's checksum will be done in hardware. So while 
+                        its checksum will be done in hardware. So while
                         reading the packet we should not try to check the 
                         checksum. 
 
diff --git a/Documentation/power/regulator/consumer.txt b/Documentation/power/regulator/consumer.txt
index cdebb5145c2..55c4175d809 100644
--- a/Documentation/power/regulator/consumer.txt
+++ b/Documentation/power/regulator/consumer.txt
@@ -8,11 +8,11 @@ Please see overview.txt for a description of the terms used in this text.
 1. Consumer Regulator Access (static & dynamic drivers)
 =======================================================
 
-A consumer driver can get access to it's supply regulator by calling :-
+A consumer driver can get access to its supply regulator by calling :-
 
 regulator = regulator_get(dev, "Vcc");
 
-The consumer passes in it's struct device pointer and power supply ID. The core
+The consumer passes in its struct device pointer and power supply ID. The core
 then finds the correct regulator by consulting a machine specific lookup table.
 If the lookup is successful then this call will return a pointer to the struct
 regulator that supplies this consumer.
@@ -34,7 +34,7 @@ usually be called in your device drivers probe() and remove() respectively.
 2. Regulator Output Enable & Disable (static & dynamic drivers)
 ====================================================================
 
-A consumer can enable it's power supply by calling:-
+A consumer can enable its power supply by calling:-
 
 int regulator_enable(regulator);
 
@@ -49,7 +49,7 @@ int regulator_is_enabled(regulator);
 This will return > zero when the regulator is enabled.
 
 
-A consumer can disable it's supply when no longer needed by calling :-
+A consumer can disable its supply when no longer needed by calling :-
 
 int regulator_disable(regulator);
 
@@ -140,7 +140,7 @@ by calling :-
 int regulator_set_optimum_mode(struct regulator *regulator, int load_uA);
 
 This will cause the core to recalculate the total load on the regulator (based
-on all it's consumers) and change operating mode (if necessary and permitted)
+on all its consumers) and change operating mode (if necessary and permitted)
 to best match the current operating load.
 
 The load_uA value can be determined from the consumers datasheet. e.g.most
diff --git a/Documentation/power/regulator/machine.txt b/Documentation/power/regulator/machine.txt
index 63728fed620..bdec39b9bd7 100644
--- a/Documentation/power/regulator/machine.txt
+++ b/Documentation/power/regulator/machine.txt
@@ -52,7 +52,7 @@ static struct regulator_init_data regulator1_data = {
 };
 
 Regulator-1 supplies power to Regulator-2. This relationship must be registered
-with the core so that Regulator-1 is also enabled when Consumer A enables it's
+with the core so that Regulator-1 is also enabled when Consumer A enables its
 supply (Regulator-2). The supply regulator is set by the supply_regulator_dev
 field below:-
 
diff --git a/Documentation/power/regulator/overview.txt b/Documentation/power/regulator/overview.txt
index ffd185bb605..9363e056188 100644
--- a/Documentation/power/regulator/overview.txt
+++ b/Documentation/power/regulator/overview.txt
@@ -35,16 +35,16 @@ Some terms used in this document:-
   o Consumer     - Electronic device that is supplied power by a regulator.
                    Consumers can be classified into two types:-
 
-                   Static: consumer does not change it's supply voltage or
+                   Static: consumer does not change its supply voltage or
                    current limit. It only needs to enable or disable it's
-                   power supply. It's supply voltage is set by the hardware,
+                   power supply. Its supply voltage is set by the hardware,
                    bootloader, firmware or kernel board initialisation code.
 
                    Dynamic: consumer needs to change it's supply voltage or
                    current limit to meet operation demands.
 
 
-  o Power Domain - Electronic circuit that is supplied it's input power by the
+  o Power Domain - Electronic circuit that is supplied its input power by the
                    output power of a regulator, switch or by another power
                    domain.
 
diff --git a/Documentation/powerpc/booting-without-of.txt b/Documentation/powerpc/booting-without-of.txt
index 79f533f38c6..46d22105aa0 100644
--- a/Documentation/powerpc/booting-without-of.txt
+++ b/Documentation/powerpc/booting-without-of.txt
@@ -1289,7 +1289,7 @@ link between a device node and its interrupt parent in
 the interrupt tree.  The value of interrupt-parent is the
 phandle of the parent node.
 
-If the interrupt-parent property is not defined for a node, it's
+If the interrupt-parent property is not defined for a node, its
 interrupt parent is assumed to be an ancestor in the node's
 _device tree_ hierarchy.
 
diff --git a/Documentation/powerpc/phyp-assisted-dump.txt b/Documentation/powerpc/phyp-assisted-dump.txt
index c4682b982a2..ad340205d96 100644
--- a/Documentation/powerpc/phyp-assisted-dump.txt
+++ b/Documentation/powerpc/phyp-assisted-dump.txt
@@ -19,7 +19,7 @@ dump offers several strong, practical advantages:
    immediately available to the system for normal use.
 -- After the dump is completed, no further reboots are
    required; the system will be fully usable, and running
-   in it's normal, production mode on it normal kernel.
+   in its normal, production mode on its normal kernel.
 
 The above can only be accomplished by coordination with,
 and assistance from the hypervisor. The procedure is
diff --git a/Documentation/rt-mutex-design.txt b/Documentation/rt-mutex-design.txt
index 4b736d24da7..8df0b782c4d 100644
--- a/Documentation/rt-mutex-design.txt
+++ b/Documentation/rt-mutex-design.txt
@@ -657,7 +657,7 @@ here.
 
 The waiter structure has a "task" field that points to the task that is blocked
 on the mutex.  This field can be NULL the first time it goes through the loop
-or if the task is a pending owner and had it's mutex stolen.  If the "task"
+or if the task is a pending owner and had its mutex stolen.  If the "task"
 field is NULL then we need to set up the accounting for it.
 
 Task blocks on mutex
diff --git a/Documentation/scsi/ChangeLog.lpfc b/Documentation/scsi/ChangeLog.lpfc
index 2ffc1148eb9..e759e92e286 100644
--- a/Documentation/scsi/ChangeLog.lpfc
+++ b/Documentation/scsi/ChangeLog.lpfc
@@ -707,7 +707,7 @@ Changes from 20040920 to 20041018
 	* Integrate patches from Christoph Hellwig: two new helpers common
 	  to lpfc_sli_resume_iocb and lpfc_sli_issue_iocb - singificant
 	  cleanup of those two functions - the unused SLI_IOCB_USE_TXQ is
-	  gone - lpfc_sli_issue_iocb_wait loses it's flags argument
+	  gone - lpfc_sli_issue_iocb_wait loses its flags argument
 	  totally.
 	* Fix in lpfc_sli.c: we can not store a 5 bit value in a 4-bit
 	  field.
@@ -1028,7 +1028,7 @@ Changes from 20040614 to 20040709
 	* Remove the need for buf_tmo.
 	* Changed ULP_BDE64 to struct ulp_bde64.
 	* Changed ULP_BDE to struct ulp_bde.
-	* Cleanup lpfc_os_return_scsi_cmd() and it's call path.
+	* Cleanup lpfc_os_return_scsi_cmd() and its call path.
 	* Removed lpfc_no_device_delay.
 	* Consolidating lpfc_hba_put_event() into lpfc_put_event().
 	* Removed following attributes and their functionality:
diff --git a/Documentation/scsi/FlashPoint.txt b/Documentation/scsi/FlashPoint.txt
index d5acaa300a4..1540a92f6d2 100644
--- a/Documentation/scsi/FlashPoint.txt
+++ b/Documentation/scsi/FlashPoint.txt
@@ -71,7 +71,7 @@ peters@mylex.com
 
 Ever since its introduction last October, the BusLogic FlashPoint LT has
 been problematic for members of the Linux community, in that no Linux
-drivers have been available for this new Ultra SCSI product.  Despite it's
+drivers have been available for this new Ultra SCSI product.  Despite its
 officially being positioned as a desktop workstation product, and not being
 particularly well suited for a high performance multitasking operating
 system like Linux, the FlashPoint LT has been touted by computer system
diff --git a/Documentation/scsi/dtc3x80.txt b/Documentation/scsi/dtc3x80.txt
index e8ae6230ab3..1d7af9f9a8e 100644
--- a/Documentation/scsi/dtc3x80.txt
+++ b/Documentation/scsi/dtc3x80.txt
@@ -12,7 +12,7 @@ The 3180 does not.  Otherwise, they are identical.
 The DTC3x80 does not support DMA but it does have Pseudo-DMA which is
 supported by the driver.
 
-It's DTC406 scsi chip is supposedly compatible with the NCR 53C400.
+Its DTC406 scsi chip is supposedly compatible with the NCR 53C400.
 It is memory mapped, uses an IRQ, but no dma or io-port.  There is
 internal DMA, between SCSI bus and an on-chip 128-byte buffer.  Double
 buffering is done automagically by the chip.  Data is transferred
diff --git a/Documentation/scsi/ncr53c8xx.txt b/Documentation/scsi/ncr53c8xx.txt
index 08e2b4d04aa..cda5f8fa2c6 100644
--- a/Documentation/scsi/ncr53c8xx.txt
+++ b/Documentation/scsi/ncr53c8xx.txt
@@ -1479,7 +1479,7 @@ Wide16 SCSI.
 Enabling serial NVRAM support enables detection of the serial NVRAM included
 on Symbios and some Symbios compatible host adaptors, and Tekram boards. The 
 serial NVRAM is used by Symbios and Tekram to hold set up parameters for the 
-host adaptor and it's attached drives.
+host adaptor and its attached drives.
 
 The Symbios NVRAM also holds data on the boot order of host adaptors in a
 system with more than one host adaptor. This enables the order of scanning
diff --git a/Documentation/scsi/osst.txt b/Documentation/scsi/osst.txt
index f536907e241..2b21890bc98 100644
--- a/Documentation/scsi/osst.txt
+++ b/Documentation/scsi/osst.txt
@@ -40,7 +40,7 @@ behavior looks very much the same as st to the userspace applications.
 
 History
 -------
-In the first place, osst shared it's identity very much with st. That meant
+In the first place, osst shared its identity very much with st. That meant
 that it used the same kernel structures and the same device node as st.
 So you could only have either of them being present in the kernel. This has
 been fixed by registering an own device, now.
diff --git a/Documentation/scsi/scsi_fc_transport.txt b/Documentation/scsi/scsi_fc_transport.txt
index aec6549ab09..e00192de4d1 100644
--- a/Documentation/scsi/scsi_fc_transport.txt
+++ b/Documentation/scsi/scsi_fc_transport.txt
@@ -70,7 +70,7 @@ Overview:
     up to an administrative entity controlling the vport. For example,
     if vports are to be associated with virtual machines, a XEN mgmt
     utility would be responsible for creating wwpn/wwnn's for the vport,
-    using it's own naming authority and OUI. (Note: it already does this
+    using its own naming authority and OUI. (Note: it already does this
     for virtual MAC addresses).
 
 
@@ -81,7 +81,7 @@ Device Trees and Vport Objects:
   with rports and scsi target objects underneath it. Currently the FC
   transport creates the vport object and places it under the scsi_host
   object corresponding to the physical adapter.  The LLDD will allocate
-  a new scsi_host for the vport and link it's object under the vport.
+  a new scsi_host for the vport and link its object under the vport.
   The remainder of the tree under the vports scsi_host is the same
   as the non-NPIV case. The transport is written currently to easily
   allow the parent of the vport to be something other than the scsi_host.
diff --git a/Documentation/scsi/sym53c8xx_2.txt b/Documentation/scsi/sym53c8xx_2.txt
index eb9a7b905b6..6f63b798967 100644
--- a/Documentation/scsi/sym53c8xx_2.txt
+++ b/Documentation/scsi/sym53c8xx_2.txt
@@ -687,7 +687,7 @@ maintain the driver code.
 Enabling serial NVRAM support enables detection of the serial NVRAM included
 on Symbios and some Symbios compatible host adaptors, and Tekram boards. The 
 serial NVRAM is used by Symbios and Tekram to hold set up parameters for the 
-host adaptor and it's attached drives.
+host adaptor and its attached drives.
 
 The Symbios NVRAM also holds data on the boot order of host adaptors in a
 system with more than one host adaptor.  This information is no longer used
diff --git a/Documentation/sound/alsa/soc/dapm.txt b/Documentation/sound/alsa/soc/dapm.txt
index 9ac842be9b4..05bf5a0eee4 100644
--- a/Documentation/sound/alsa/soc/dapm.txt
+++ b/Documentation/sound/alsa/soc/dapm.txt
@@ -188,8 +188,8 @@ The WM8731 output mixer has 3 inputs (sources)
  3. Mic Sidetone Input
 
 Each input in this example has a kcontrol associated with it (defined in example
-above) and is connected to the output mixer via it's kcontrol name. We can now
-connect the destination widget (wrt audio signal) with it's source widgets.
+above) and is connected to the output mixer via its kcontrol name. We can now
+connect the destination widget (wrt audio signal) with its source widgets.
 
 	/* output mixer */
 	{"Output Mixer", "Line Bypass Switch", "Line Input"},
diff --git a/Documentation/sound/alsa/soc/machine.txt b/Documentation/sound/alsa/soc/machine.txt
index bab7711ce96..2524c75557d 100644
--- a/Documentation/sound/alsa/soc/machine.txt
+++ b/Documentation/sound/alsa/soc/machine.txt
@@ -67,7 +67,7 @@ static struct snd_soc_dai_link corgi_dai = {
 	.ops = &corgi_ops,
 };
 
-struct snd_soc_card then sets up the machine with it's DAIs. e.g.
+struct snd_soc_card then sets up the machine with its DAIs. e.g.
 
 /* corgi audio machine driver */
 static struct snd_soc_card snd_soc_corgi = {
diff --git a/Documentation/sound/alsa/soc/overview.txt b/Documentation/sound/alsa/soc/overview.txt
index 1e4c6d3655f..138ac88c146 100644
--- a/Documentation/sound/alsa/soc/overview.txt
+++ b/Documentation/sound/alsa/soc/overview.txt
@@ -33,7 +33,7 @@ features :-
     and machines.
 
   * Easy I2S/PCM audio interface setup between codec and SoC. Each SoC
-    interface and codec registers it's audio interface capabilities with the
+    interface and codec registers its audio interface capabilities with the
     core and are subsequently matched and configured when the application
     hardware parameters are known.
 
diff --git a/Documentation/usb/WUSB-Design-overview.txt b/Documentation/usb/WUSB-Design-overview.txt
index c480e9c32db..4c5e3793934 100644
--- a/Documentation/usb/WUSB-Design-overview.txt
+++ b/Documentation/usb/WUSB-Design-overview.txt
@@ -381,7 +381,7 @@ descriptor that gives us the status of the transfer, its identification
 we issue another URB to read into the destination buffer the chunk of
 data coming out of the remote endpoint. Done, wait for the next guy. The
 callbacks for the URBs issued from here are the ones that will declare
-the xfer complete at some point and call it's callback.
+the xfer complete at some point and call its callback.
 
 Seems simple, but the implementation is not trivial.
 
diff --git a/Documentation/vm/numa_memory_policy.txt b/Documentation/vm/numa_memory_policy.txt
index be45dbb9d7f..6690fc34ef6 100644
--- a/Documentation/vm/numa_memory_policy.txt
+++ b/Documentation/vm/numa_memory_policy.txt
@@ -45,7 +45,7 @@ most general to most specific:
 	to establish the task policy for a child task exec()'d from an
 	executable image that has no awareness of memory policy.  See the
 	MEMORY POLICY APIS section, below, for an overview of the system call
-	that a task may use to set/change it's task/process policy.
+	that a task may use to set/change its task/process policy.
 
 	In a multi-threaded task, task policies apply only to the thread
 	[Linux kernel task] that installs the policy and any threads
@@ -301,7 +301,7 @@ decrement this reference count, respectively.  mpol_put() will only free
 the structure back to the mempolicy kmem cache when the reference count
 goes to zero.
 
-When a new memory policy is allocated, it's reference count is initialized
+When a new memory policy is allocated, its reference count is initialized
 to '1', representing the reference held by the task that is installing the
 new policy.  When a pointer to a memory policy structure is stored in another
 structure, another reference is added, as the task's reference will be dropped
diff --git a/Documentation/w1/w1.generic b/Documentation/w1/w1.generic
index e3333eec432..212f4ac31c0 100644
--- a/Documentation/w1/w1.generic
+++ b/Documentation/w1/w1.generic
@@ -25,7 +25,7 @@ When a w1 master driver registers with the w1 subsystem, the following occurs:
  - sysfs entries for that w1 master are created
  - the w1 bus is periodically searched for new slave devices
 
-When a device is found on the bus, w1 core checks if driver for it's family is
+When a device is found on the bus, w1 core checks if driver for its family is
 loaded. If so, the family driver is attached to the slave.
 If there is no driver for the family, default one is assigned, which allows to perform
 almost any kind of operations. Each logical operation is a transaction
-- 
cgit v1.2.3-70-g09d2


From afc24d49c1e5dbeef745c1c1246f5ae6ebd97c71 Mon Sep 17 00:00:00 2001
From: Vivek Goyal <vgoyal@redhat.com>
Date: Mon, 26 Apr 2010 19:27:56 +0200
Subject: blk-cgroup: config options re-arrangement

This patch fixes few usability and configurability issues.

o All the cgroup based controller options are configurable from
  "Genral Setup/Control Group Support/" menu. blkio is the only exception.
  Hence make this option visible in above menu and make it configurable from
  there to bring it inline with rest of the cgroup based controllers.

o Get rid of CONFIG_DEBUG_CFQ_IOSCHED.

  This option currently does two things.

  - Enable printing of cgroup paths in blktrace
  - Enables CONFIG_DEBUG_BLK_CGROUP, which in turn displays additional stat
    files in cgroup.

  If we are using group scheduling, blktrace data is of not really much use
  if cgroup information is not present. To get this data, currently one has to
  also enable CONFIG_DEBUG_CFQ_IOSCHED, which in turn brings the overhead of
  all the additional debug stat files which is not desired.

  Hence, this patch moves printing of cgroup paths under
  CONFIG_CFQ_GROUP_IOSCHED.

  This allows us to get rid of CONFIG_DEBUG_CFQ_IOSCHED completely. Now all
  the debug stat files are controlled only by CONFIG_DEBUG_BLK_CGROUP which
  can be enabled through config menu.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: Divyesh Shah <dpshah@google.com>
Reviewed-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
---
 Documentation/cgroups/blkio-controller.txt | 35 +++++++++++++-----------------
 block/Kconfig                              | 23 --------------------
 block/Kconfig.iosched                      | 16 +++++---------
 block/blk-cgroup.c                         |  2 --
 block/blk-cgroup.h                         | 14 ++++++------
 block/cfq-iosched.c                        |  2 +-
 init/Kconfig                               | 27 +++++++++++++++++++++++
 7 files changed, 55 insertions(+), 64 deletions(-)

(limited to 'Documentation/cgroups')

diff --git a/Documentation/cgroups/blkio-controller.txt b/Documentation/cgroups/blkio-controller.txt
index d422b410a99..48e0b21b005 100644
--- a/Documentation/cgroups/blkio-controller.txt
+++ b/Documentation/cgroups/blkio-controller.txt
@@ -17,6 +17,9 @@ HOWTO
 You can do a very simple testing of running two dd threads in two different
 cgroups. Here is what you can do.
 
+- Enable Block IO controller
+	CONFIG_BLK_CGROUP=y
+
 - Enable group scheduling in CFQ
 	CONFIG_CFQ_GROUP_IOSCHED=y
 
@@ -54,24 +57,16 @@ cgroups. Here is what you can do.
 
 Various user visible config options
 ===================================
-CONFIG_CFQ_GROUP_IOSCHED
-	- Enables group scheduling in CFQ. Currently only 1 level of group
-	  creation is allowed.
-
-CONFIG_DEBUG_CFQ_IOSCHED
-	- Enables some debugging messages in blktrace. Also creates extra
-	  cgroup file blkio.dequeue.
-
-Config options selected automatically
-=====================================
-These config options are not user visible and are selected/deselected
-automatically based on IO scheduler configuration.
-
 CONFIG_BLK_CGROUP
-	- Block IO controller. Selected by CONFIG_CFQ_GROUP_IOSCHED.
+	- Block IO controller.
 
 CONFIG_DEBUG_BLK_CGROUP
-	- Debug help. Selected by CONFIG_DEBUG_CFQ_IOSCHED.
+	- Debug help. Right now some additional stats file show up in cgroup
+	  if this option is enabled.
+
+CONFIG_CFQ_GROUP_IOSCHED
+	- Enables group scheduling in CFQ. Currently only 1 level of group
+	  creation is allowed.
 
 Details of cgroup files
 =======================
@@ -174,13 +169,13 @@ Details of cgroup files
 	  write, sync or async.
 
 - blkio.avg_queue_size
-	- Debugging aid only enabled if CONFIG_DEBUG_CFQ_IOSCHED=y.
+	- Debugging aid only enabled if CONFIG_DEBUG_BLK_CGROUP=y.
 	  The average queue size for this cgroup over the entire time of this
 	  cgroup's existence. Queue size samples are taken each time one of the
 	  queues of this cgroup gets a timeslice.
 
 - blkio.group_wait_time
-	- Debugging aid only enabled if CONFIG_DEBUG_CFQ_IOSCHED=y.
+	- Debugging aid only enabled if CONFIG_DEBUG_BLK_CGROUP=y.
 	  This is the amount of time the cgroup had to wait since it became busy
 	  (i.e., went from 0 to 1 request queued) to get a timeslice for one of
 	  its queues. This is different from the io_wait_time which is the
@@ -191,7 +186,7 @@ Details of cgroup files
 	  got a timeslice and will not include the current delta.
 
 - blkio.empty_time
-	- Debugging aid only enabled if CONFIG_DEBUG_CFQ_IOSCHED=y.
+	- Debugging aid only enabled if CONFIG_DEBUG_BLK_CGROUP=y.
 	  This is the amount of time a cgroup spends without any pending
 	  requests when not being served, i.e., it does not include any time
 	  spent idling for one of the queues of the cgroup. This is in
@@ -200,7 +195,7 @@ Details of cgroup files
 	  time it had a pending request and will not include the current delta.
 
 - blkio.idle_time
-	- Debugging aid only enabled if CONFIG_DEBUG_CFQ_IOSCHED=y.
+	- Debugging aid only enabled if CONFIG_DEBUG_BLK_CGROUP=y.
 	  This is the amount of time spent by the IO scheduler idling for a
 	  given cgroup in anticipation of a better request than the exising ones
 	  from other queues/cgroups. This is in nanoseconds. If this is read
@@ -209,7 +204,7 @@ Details of cgroup files
 	  the current delta.
 
 - blkio.dequeue
-	- Debugging aid only enabled if CONFIG_DEBUG_CFQ_IOSCHED=y. This
+	- Debugging aid only enabled if CONFIG_DEBUG_BLK_CGROUP=y. This
 	  gives the statistics about how many a times a group was dequeued
 	  from service tree of the device. First two fields specify the major
 	  and minor number of the device and third field specifies the number
diff --git a/block/Kconfig b/block/Kconfig
index f9e89f4d94b..9be0b56eaee 100644
--- a/block/Kconfig
+++ b/block/Kconfig
@@ -77,29 +77,6 @@ config BLK_DEV_INTEGRITY
 	T10/SCSI Data Integrity Field or the T13/ATA External Path
 	Protection.  If in doubt, say N.
 
-config BLK_CGROUP
-	tristate "Block cgroup support"
-	depends on CGROUPS
-	depends on CFQ_GROUP_IOSCHED
-	default n
-	---help---
-	Generic block IO controller cgroup interface. This is the common
-	cgroup interface which should be used by various IO controlling
-	policies.
-
-	Currently, CFQ IO scheduler uses it to recognize task groups and
-	control disk bandwidth allocation (proportional time slice allocation)
-	to such task groups.
-
-config DEBUG_BLK_CGROUP
-	bool
-	depends on BLK_CGROUP
-	default n
-	---help---
-	Enable some debugging help. Currently it stores the cgroup path
-	in the blk group which can be used by cfq for tracing various
-	group related activity.
-
 endif # BLOCK
 
 config BLOCK_COMPAT
diff --git a/block/Kconfig.iosched b/block/Kconfig.iosched
index fc71cf071fb..3199b76f795 100644
--- a/block/Kconfig.iosched
+++ b/block/Kconfig.iosched
@@ -23,7 +23,8 @@ config IOSCHED_DEADLINE
 
 config IOSCHED_CFQ
 	tristate "CFQ I/O scheduler"
-	select BLK_CGROUP if CFQ_GROUP_IOSCHED
+	# If BLK_CGROUP is a module, CFQ has to be built as module.
+	depends on (BLK_CGROUP=m && m) || !BLK_CGROUP || BLK_CGROUP=y
 	default y
 	---help---
 	  The CFQ I/O scheduler tries to distribute bandwidth equally
@@ -33,22 +34,15 @@ config IOSCHED_CFQ
 
 	  This is the default I/O scheduler.
 
+	  Note: If BLK_CGROUP=m, then CFQ can be built only as module.
+
 config CFQ_GROUP_IOSCHED
 	bool "CFQ Group Scheduling support"
-	depends on IOSCHED_CFQ && CGROUPS
+	depends on IOSCHED_CFQ && BLK_CGROUP
 	default n
 	---help---
 	  Enable group IO scheduling in CFQ.
 
-config DEBUG_CFQ_IOSCHED
-	bool "Debug CFQ Scheduling"
-	depends on CFQ_GROUP_IOSCHED
-	select DEBUG_BLK_CGROUP
-	default n
-	---help---
-	  Enable CFQ IO scheduling debugging in CFQ. Currently it makes
-	  blktrace output more verbose.
-
 choice
 	prompt "Default I/O scheduler"
 	default DEFAULT_CFQ
diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index af42efbb0c1..d02bbf88de1 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -351,10 +351,8 @@ void blkiocg_add_blkio_group(struct blkio_cgroup *blkcg,
 	blkg->blkcg_id = css_id(&blkcg->css);
 	hlist_add_head_rcu(&blkg->blkcg_node, &blkcg->blkg_list);
 	spin_unlock_irqrestore(&blkcg->lock, flags);
-#ifdef CONFIG_DEBUG_BLK_CGROUP
 	/* Need to take css reference ? */
 	cgroup_path(blkcg->css.cgroup, blkg->path, sizeof(blkg->path));
-#endif
 	blkg->dev = dev;
 }
 EXPORT_SYMBOL_GPL(blkiocg_add_blkio_group);
diff --git a/block/blk-cgroup.h b/block/blk-cgroup.h
index a491a6d56ec..2b866ec1dce 100644
--- a/block/blk-cgroup.h
+++ b/block/blk-cgroup.h
@@ -108,10 +108,8 @@ struct blkio_group {
 	void *key;
 	struct hlist_node blkcg_node;
 	unsigned short blkcg_id;
-#ifdef CONFIG_DEBUG_BLK_CGROUP
 	/* Store cgroup path */
 	char path[128];
-#endif
 	/* The device MKDEV(major, minor), this group has been created for */
 	dev_t dev;
 
@@ -147,6 +145,11 @@ struct blkio_policy_type {
 extern void blkio_policy_register(struct blkio_policy_type *);
 extern void blkio_policy_unregister(struct blkio_policy_type *);
 
+static inline char *blkg_path(struct blkio_group *blkg)
+{
+	return blkg->path;
+}
+
 #else
 
 struct blkio_group {
@@ -158,6 +161,8 @@ struct blkio_policy_type {
 static inline void blkio_policy_register(struct blkio_policy_type *blkiop) { }
 static inline void blkio_policy_unregister(struct blkio_policy_type *blkiop) { }
 
+static inline char *blkg_path(struct blkio_group *blkg) { return NULL; }
+
 #endif
 
 #define BLKIO_WEIGHT_MIN	100
@@ -165,10 +170,6 @@ static inline void blkio_policy_unregister(struct blkio_policy_type *blkiop) { }
 #define BLKIO_WEIGHT_DEFAULT	500
 
 #ifdef CONFIG_DEBUG_BLK_CGROUP
-static inline char *blkg_path(struct blkio_group *blkg)
-{
-	return blkg->path;
-}
 void blkiocg_update_avg_queue_size_stats(struct blkio_group *blkg);
 void blkiocg_update_dequeue_stats(struct blkio_group *blkg,
 				unsigned long dequeue);
@@ -197,7 +198,6 @@ BLKG_FLAG_FNS(idling)
 BLKG_FLAG_FNS(empty)
 #undef BLKG_FLAG_FNS
 #else
-static inline char *blkg_path(struct blkio_group *blkg) { return NULL; }
 static inline void blkiocg_update_avg_queue_size_stats(
 						struct blkio_group *blkg) {}
 static inline void blkiocg_update_dequeue_stats(struct blkio_group *blkg,
diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index 002a5b62165..286008cf889 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -345,7 +345,7 @@ CFQ_CFQQ_FNS(deep);
 CFQ_CFQQ_FNS(wait_busy);
 #undef CFQ_CFQQ_FNS
 
-#ifdef CONFIG_DEBUG_CFQ_IOSCHED
+#ifdef CONFIG_CFQ_GROUP_IOSCHED
 #define cfq_log_cfqq(cfqd, cfqq, fmt, args...)	\
 	blk_add_trace_msg((cfqd)->queue, "cfq%d%c %s " fmt, (cfqq)->pid, \
 			cfq_cfqq_sync((cfqq)) ? 'S' : 'A', \
diff --git a/init/Kconfig b/init/Kconfig
index eb77e8ccde1..087c14f3c59 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -612,6 +612,33 @@ config RT_GROUP_SCHED
 
 endif #CGROUP_SCHED
 
+config BLK_CGROUP
+	tristate "Block IO controller"
+	depends on CGROUPS && BLOCK
+	default n
+	---help---
+	Generic block IO controller cgroup interface. This is the common
+	cgroup interface which should be used by various IO controlling
+	policies.
+
+	Currently, CFQ IO scheduler uses it to recognize task groups and
+	control disk bandwidth allocation (proportional time slice allocation)
+	to such task groups.
+
+	This option only enables generic Block IO controller infrastructure.
+	One needs to also enable actual IO controlling logic in CFQ for it
+	to take effect. (CONFIG_CFQ_GROUP_IOSCHED=y).
+
+	See Documentation/cgroups/blkio-controller.txt for more information.
+
+config DEBUG_BLK_CGROUP
+	bool "Enable Block IO controller debugging"
+	depends on BLK_CGROUP
+	default n
+	---help---
+	Enable some debugging help. Currently it exports additional stat
+	files in a cgroup which can be useful for debugging.
+
 endif # CGROUPS
 
 config MM_OWNER
-- 
cgit v1.2.3-70-g09d2