asmadeus/linux.git - The linux kernel

Age	Commit message (Collapse)	Author
2006-06-26	spelling fixes	Andreas Mohr
	acquired (aquired) contiguous (contigious) successful (succesful, succesfull) surprise (suprise) whether (weather) some other misspellings Signed-off-by: Andreas Mohr <andi@lisas.de> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-06-23	[BLOCK] Fix bounce limit address check	Andi Kleen
	Do a safer check for when to enable DMA. Currently we enable ISA DMA for cases that do not need it, resulting in OOM conditions when ZONE_DMA runs out of space. Signed-off-by: Jens Axboe <axboe@suse.de>
2006-06-23	[PATCH] rbtree: support functions used by the io schedulers	Jens Axboe
	They all duplicate macros to check for empty root and/or node, and clearing a node. So put those in rbtree.h. Signed-off-by: Jens Axboe <axboe@suse.de>
2006-06-23	[PATCH] cfq-iosched: rq update fixes	Jens Axboe
	- Remember to set ->last_sector so that the cfq_choose_req() logic works correctly. - Remove redundant call to cfq_choose_req() Signed-off-by: Jens Axboe <axboe@suse.de>
2006-06-23	[PATCH] cfq-iosched: many performance fixes	Jens Axboe
	This is a collection of patches that greatly improve CFQ performance in some circumstances. - Change the idling logic to only kick in after a request is done and we are deciding what to do. Before the idling included the request service time, so it was hard to adjust. Now it's true think/idle time. - Take advantage of TCQ/NCQ/queueing for seeky sync workloads, but keep it in control for sync and sequential (or close to) workloads. - Expire queues immediately and move on to other busy queues, if we are not going to idle after the current one finishes. - Don't rearm idle timer if there are no busy queues. Just leave the system idle. Signed-off-by: Jens Axboe <axboe@suse.de>
2006-06-23	[PATCH] cfq-iosched: correctly set ioprio on both targets	Jens Axboe
	Patch originally from Vasily Tarasov <vtaras@sw.ru> If you set io-priority of process 1 using sys_ioprio_set system call by another process 2 (like ionice do), then cfq_init_prio_data() function sets priority of process 2 (current) on queue of process 1 and clears the flag, that designates change of ioprio. So the process 1 will work like with priority of process 2. I propose not to call cfq_init_prio_data() on io-priority change, but only mark queue as queue with changed prority. Every time when new request comes cfq-scheduler checks for this flag and atomaticaly changes priority of queue to new value. Signed-off-by: Jens Axboe <axboe@suse.de>
2006-06-23	[PATCH] Make CFQ the default IO scheduler	Jens Axboe
	Signed-off-by: Jens Axboe <axboe@suse.de>
2006-06-23	[PATCH] Kill PF_SYNCWRITE flag	Jens Axboe
	A process flag to indicate whether we are doing sync io is incredibly ugly. It also causes performance problems when one does a lot of async io and then proceeds to sync it. Part of the io will go out as async, and the other part as sync. This causes a disconnect between the previously submitted io and the synced io. For io schedulers such as CFQ, this will cause us lost merges and suboptimal behaviour in scheduling. Remove PF_SYNCWRITE completely from the fsync/msync paths, and let the O_DIRECT path just directly indicate that the writes are sync by using WRITE_SYNC instead. Signed-off-by: Jens Axboe <axboe@suse.de>
2006-06-23	[PATCH] cfq-iosched: Don't set the queue batching limits	Jens Axboe
	We cannot update them if the user changes nr_requests, so don't set it in the first place. The gains are pretty questionable as well. The batching loss has been shown to decrease throughput. Signed-off-by: Jens Axboe <axboe@suse.de>
2006-06-23	[PATCH] remove dead code from elevator switching	Dave Jones
	We already drop the refcount in elevator_exit(), and as we're setting 'e' to NULL, we'll never take that branch anyway. Finally, as 'e' is a local var that isn't referenced afterwards, setting it to NULL is pointless. Signed-off-by: Dave Jones <davej@redhat.com> Signed-off-by: Jens Axboe <axboe@suse.de>
2006-06-23	[PATCH] blk_start_queue() must be called with irq disabled - add warning	Paolo 'Blaisorblade' Giarrusso
	The queue lock can be taken from interrupts so it must always be taken with irq disabling primitives. Some primitives already verify this. blk_start_queue() is called under this lock, so interrupts must be disabled. Also document this requirement clearly in blk_init_queue(), where the queue spinlock is set. Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Jens Axboe <axboe@suse.de>
2006-06-23	[PATCH] iosched: use hlist for request hashtable	Akinobu Mita
	Use hlist instead of list_head for request hashtable in deadline-iosched and as-iosched. It also can remove the flag to know hashed or unhashed. Signed-off-by: Akinobu Mita <mita@miraclelinux.com> Signed-off-by: Jens Axboe <axboe@suse.de> block/as-iosched.c \| 45 +++++++++++++++++++-------------------------- block/deadline-iosched.c \| 39 ++++++++++++++++----------------------- 2 files changed, 35 insertions(+), 49 deletions(-)
2006-06-23	[PATCH] list: use list_replace_init() instead of list_splice_init()	Oleg Nesterov
	list_splice_init(list, head) does unneeded job if it is known that list_empty(head) == 1. We can use list_replace_init() instead. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-21	[PATCH] Driver core: add generic "subsystem" link to all devices	Kay Sievers
	Like the SUBSYTEM= key we find in the environment of the uevent, this creates a generic "subsystem" link in sysfs for every device. Userspace usually doesn't care at all if its a "class" or a "bus" device. This provides an unified way to determine the subsytem of a device, regardless of the way the driver core has created it. Signed-off-by: Kay Sievers <kay.sievers@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-06-20	Fix up CFQ scheduler for recent rbtree node shrinkage	Linus Torvalds
	The color is now in the low bits of the parent pointer, and initializing it to 0 happens as part of the whole memset above, so just remove the unnecessary RB_CLEAR_COLOR. Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-20	Merge git://git.infradead.org/~dwmw2/rbtree-2.6	Linus Torvalds
	* git://git.infradead.org/~dwmw2/rbtree-2.6: [RBTREE] Switch rb_colour() et al to en_US spelling of 'color' for consistency Update UML kernel/physmem.c to use rb_parent() accessor macro [RBTREE] Update hrtimers to use rb_parent() accessor macro. [RBTREE] Add explicit alignment to sizeof(long) for struct rb_node. [RBTREE] Merge colour and parent fields of struct rb_node. [RBTREE] Remove dead code in rb_erase() [RBTREE] Update JFFS2 to use rb_parent() accessor macro. [RBTREE] Update eventpoll.c to use rb_parent() accessor macro. [RBTREE] Update key.c to use rb_parent() accessor macro. [RBTREE] Update ext3 to use rb_parent() accessor macro. [RBTREE] Change rbtree off-tree marking in I/O schedulers. [RBTREE] Add accessor macros for colour and parent fields of rb_node
2006-06-14	[PATCH] cfq-iosched: fix crash in do_div()	Jens Axboe
	We don't clear the seek stat values in cfq_alloc_io_context(), and if ->seek_mean is unlucky enough to be set to -36 by chance, the first invocation of cfq_update_io_seektime() will oops with a divide by zero in do_div(). Just memset the entire cic instead of filling invididual values independently. Signed-off-by: Jens Axboe <axboe@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-08	[PATCH] elevator switching race	Jens Axboe
	There's a race between shutting down one io scheduler and firing up the next, in which a new io could enter and cause the io scheduler to be invoked with bad or NULL data. To fix this, we need to maintain the queue lock for a bit longer. Unfortunately we cannot do that, since the elevator init requires to be run without the lock held. This isn't easily fixable, without also changing the mempool API. So split the initialization into two parts, and alloc-init operation and an attach operation. Then we can preallocate the io scheduler and related structures, and run the attach inside the lock after we detach the old one. This patch has survived 30 minutes of 1 second io scheduler switching with a very busy io load. Signed-off-by: Jens Axboe <axboe@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-01	[PATCH] cfq-iosched: busy_rr fairness fix	Jens Axboe
	Now that we select busy_rr for possible service, insert entries at the back of that list instead of at the front. Signed-off-by: Jens Axboe <axboe@suse.de>
2006-06-01	[PATCH] cfq-iosched: fix bug in timer handling for the idle class	Jens Axboe
	There's a small window from when the timer is entered and we grab the queue lock, where cfq_set_active_queue() could be rearming the timer for us. Seen in the wild on a 12-way ppc box. Fix this by just using mod_timer(), which will do the right thing for us. Signed-off-by: Jens Axboe <axboe@suse.de>
2006-06-01	[PATCH] cfq-iosched: Detect hardware queueing	Jens Axboe
	If the hardware is doing real queueing, decide that it's worthless to idle the hardware. It does reasonable simultaneous io in that case anyways, and the idling hurts some work loads. Signed-off-by: Jens Axboe <axboe@suse.de>
2006-06-01	[PATCH] cfq-iosched: Detect idle process issuing async request	Jens Axboe
	If we are anticipating a sync request from this process and we are waiting for that and see an async request come in, expire that slice and move on. Signed-off-by: Jens Axboe <axboe@suse.de>
2006-06-01	[PATCH] cfq-iosched: check busy queues before deciding we are idle	Jens Axboe
	For just one busy queue (like async write out), we often overlooked that we could queue more io and decided we were idle instead. This causes us quite a bit of performance loss. Signed-off-by: Jens Axboe <axboe@suse.de>
2006-05-30	[PATCH] cfq-iosched: fixup locking and ->queue_list list management	Jens Axboe
	- Drop cic from the list when seen as dead. - Fixup the locking, just use a simple spinlock. Signed-off-by: Jens Axboe <axboe@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-05-23	[PATCH] blk: fix gendisk->in_flight accounting during barrier sequence	Jens Axboe
	While executing barrrier sequence, the bar_rq which carries actual write was accounted as normal IO on completion, while it wasn't on queueing. This caused gendisk->in_flight to be decremented by 1 after each barrier thus messed up statistics. This patch makes bar_rq not accounted as normal IO. As the containing barrier request as a whole is accounted, part of it shouldn't be. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jens Axboe <axboe@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-05-12	Revert "[BLOCK] Fix oops on removal of SD/MMC card"	Linus Torvalds
	This reverts commit 56cf6504fc1c0c221b82cebc16a444b684140fb7. Both Erik Mouw and Andrew Vasquez independently pinpointed this commit as causing problems, where the slab cache for a driver is never released (most obviously causing problems when immediately re-loading that driver, resulting in a "kmem_cache_create: duplicate cache <xyz>" message, but it can also cause other trouble). James Bottomley dug into it, and reports: "OK, here's the scoop. The problem patch adds a get of driverfs_dev in add_disk(), but doesn't put it again until disk_release() (which occurs on final put_disk() of the gendisk). However, in SCSI, the driverfs_dev is the sdev_gendev. That means there's a reference held on sdev_gendev until final disk put. Unfortunately, we use the driver model driver_remove to trigger del_gendisk (which removes the gendisk from visibility and decrements the refcount), so we've introduced an unbreakable deadlock in the reference counting with this. I suggest simply reversing this patch at the moment. If Russell and Jens can tell me what they're trying to do I'll see if there's another way to do it." so hereby the patch gets reverted, waiting for a better fix. Cc: Jens Axboe <axboe@suse.de> Cc: Russell King <rmk@arm.linux.org.uk> Cc: James Bottomley <James.Bottomley@SteelEye.com> Cc: Erik Mouw <erik@harddisk-recovery.com> Cc: Andrew Vasquez <andrew.vasquez@qlogic.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-05-11	[BLOCK] limit request_fn recursion	Jens Axboe
	Don't recurse back into the driver even if the unplug threshold is met, when the driver asks for a requeue. This is both silly from a logical point of view (requeues typically happen due to driver/hardware shortage), and also dangerous since we could hit an endless request_fn -> requeue -> unplug -> request_fn loop and crash on stack overrun. Also limit blk_run_queue() to one level of recursion, similar to how blk_start_queue() works. This patch fixed a real problem with SLES10 and lpfc, and it could hit any SCSI lld that returns non-zero from it's ->queuecommand() handler. Signed-off-by: Jens Axboe <axboe@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-05-05	[BLOCK] Fix oops on removal of SD/MMC card	Russell King
	The block layer keeps a reference (driverfs_dev) to the struct device associated with the block device, and uses it internally for generating uevents in block_uevent. Block device uevents include umounting the partition, which can occur after the backing device has been removed. Unfortunately, this reference is not counted. This means that if the struct device is removed from the device tree, the block layers reference will become stale. Guard against this by holding a reference to the struct device in add_disk(), and only drop the reference when we're releasing the gendisk kobject - in other words when we can be sure that no further uevents will be generated for this block device. Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Acked-by: Jens Axboe <axboe@suse.de>
2006-04-26	[PATCH] Remove __devinitdata from notifier block definitions	Chandra Seetharaman
	Few of the notifier_chain_register() callers use __devinitdata in the definition of notifier_block data structure. It is incorrect as the data structure should be available after the initializations (they do not unregister them during initializations). This was leading to an oops when notifier_chain_register() call is invoked for those callback chains after initialization. This patch fixes all such usages to _not_ have the notifier_block data structure in the init data section. Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-21	[RBTREE] Change rbtree off-tree marking in I/O schedulers.	David Woodhouse
	They were abusing the rb_color field to mark nodes which weren't currently on the tree. Fix that to use the same method as eventpoll did -- setting the parent pointer to point back to itself. And use the appropriate accessor macros for setting and reading the parent. Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2006-04-20	[PATCH] block/elevator.c: remove unused exports	Adrian Bunk
	This patch removes the following unused EXPORT_SYMBOL's: - elv_requeue_request - elv_completed_request They are only used by the block core, hence they need not be exported. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Jens Axboe <axboe@suse.de>
2006-04-20	[patch] cleanup: use blk_queue_stopped	Coywolf Qi Hunt
	This cleanup the source to use blk_queue_stopped. Signed-off-by: Coywolf Qi Hunt <qiyong@freeforge.net> Signed-off-by: Jens Axboe <axboe@suse.de>
2006-04-18	[PATCH] cfq: Further rbtree traversal and cfq_exit_queue() race fix	OGAWA Hirofumi
	In current code, we are re-reading cic->key after dead cic->key check. So, in theory, it may really re-read after cfq_exit_queue() seted NULL. To avoid race, we copy it to stack, then use it. With this change, I guess gcc will assign cic->key to a register or stack, and it wouldn't be re-readed. Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Jens Axboe <axboe@suse.de>
2006-04-18	[PATCH 2/2] cfq: fix cic's rbtree traversal	OGAWA Hirofumi
	When queue dies, we set cic->key=NULL as dead mark. So, when we traverse a rbtree, we must check whether it's still valid key. if it was invalidated, drop it, then restart the traversal from top. Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Jens Axboe <axboe@suse.de>
2006-04-18	[PATCH 1/2] iosched: fix typo and barrier()	OGAWA Hirofumi
	On rmmod path, cfq/as waits to make sure all io-contexts was freed. However, it's using complete(), not wait_for_completion(). I think barrier() is not enough in here. To avoid the following case, this patch replaces barrier() with smb_wmb(). cpu0 visibility cpu1 [ioc_gnone=NULL,ioc_count=1] ioc_gnone = &all_gone NULL,ioc_count=1 atomic_read(&ioc_count) NULL,ioc_count=1 wait_for_completion() NULL,ioc_count=0 atomic_sub_and_test() NULL,ioc_count=0 if ( && ioc_gone) [ioc_gone==NULL, so doesn't call complete()] &all_gone,ioc_count=0 Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Jens Axboe <axboe@suse.de>
2006-04-13	[SCSI] unify SCSI_IOCTL_SEND_COMMAND implementations	Christoph Hellwig
	We currently have two implementations of this obsolete ioctl, one in the block layer and one in the scsi code. Both of them have drawbacks. This patch kills the scsi layer version after updating the block version with the missing bits: - argument checking - use scatterlist I/O - set number of retries based on the submitted command This is the last user of non-S/G I/O except for the gdth driver, so getting this in ASAP and through the scsi tree would be nie to kill the non-S/G I/O path. Jens, what do you think about adding a check for non-S/G I/O in the midlayer? Thanks to Or Gerlitz for testing this patch. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
2006-04-02	Documentation: fix minor kernel-doc warnings	Martin Waitz
	This patch updates the comments to match the actual code. Signed-off-by: Martin Waitz <tali@admingilde.org> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-03-31	[PATCH] config: fix CONFIG_LFS option	Trond Myklebust
	The help text says that if you select CONFIG_LBD, then it will automatically select CONFIG_LFS. That isn't currently the case, so update the text. - Get rid of the cruft in the help text mentioning CONFIG_LBD - Tell unsure users to select CONFIG_LFS. - Remove the `default n'. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-31	[PATCH] Don't pass boot parameters to argv_init[]	OGAWA Hirofumi
	The boot cmdline is parsed in parse_early_param() and parse_args(,unknown_bootoption). And __setup() is used in obsolete_checksetup(). start_kernel() -> parse_args() -> unknown_bootoption() -> obsolete_checksetup() If __setup()'s callback (->setup_func()) returns 1 in obsolete_checksetup(), obsolete_checksetup() thinks a parameter was handled. If ->setup_func() returns 0, obsolete_checksetup() tries other ->setup_func(). If all ->setup_func() that matched a parameter returns 0, a parameter is seted to argv_init[]. Then, when runing /sbin/init or init=app, argv_init[] is passed to the app. If the app doesn't ignore those arguments, it will warning and exit. This patch fixes a wrong usage of it, however fixes obvious one only. Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-31	[PATCH] Simplify proc/devices and fix early termination regression	Joe Korty
	Make baby-simple the code for /proc/devices. Based on the proven design for /proc/interrupts. This also fixes the early-termination regression 2.6.16 introduced, as demonstrated by: # dd if=/proc/devices bs=1 Character devices: 1 mem 27+0 records in 27+0 records out This should also work (but is untested) when /proc/devices >4096 bytes, which I believe is what the original 2.6.16 rewrite fixed. [akpm@osdl.org: cleanups, simplifications] Signed-off-by: Joe Korty <joe.korty@ccur.com> Cc: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-28	Merge branch 'cfq-merge' of git://brick.kernel.dk/data/git/linux-2.6-block	Linus Torvalds
	* 'cfq-merge' of git://brick.kernel.dk/data/git/linux-2.6-block: [BLOCK] cfq-iosched: seek and async performance fixes [PATCH] ll_rw_blk: fix 80-col offender in put_io_context() [PATCH] cfq-iosched: small cfq_choose_req() optimization [PATCH] [BLOCK] cfq-iosched: change cfq io context linking from list to tree
2006-03-28	[PATCH] for_each_possible_cpu: fixes for generic part	KAMEZAWA Hiroyuki
	replaces for_each_cpu with for_each_possible_cpu(). Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-28	[BLOCK] cfq-iosched: seek and async performance fixes	Jens Axboe
	Detect whether a given process is seeky and if so disable (mostly) the idle window if it is. We still allow just a little idle time, just enough to allow that process to submit a new request. That is needed to maintain fairness across priority groups. In some cases, we could setup several async queues. This is not optimal from a performance POV, since we want all async io in one queue to perform good sorting on it. It also impacted sync queues, as async io got too much slice time. Signed-off-by: Jens Axboe <axboe@suse.de>
2006-03-28	[PATCH] ll_rw_blk: fix 80-col offender in put_io_context()	Jens Axboe
	This makes akpm more happy. Signed-off-by: Jens Axboe <axboe@suse.de>
2006-03-28	[PATCH] cfq-iosched: small cfq_choose_req() optimization	Andreas Mohr
	this is a small optimization to cfq_choose_req() in the CFQ I/O scheduler (this function is a semi-often invoked candidate in an oprofile log): by using a bit mask variable, we can use a simple switch() to check the various cases instead of having to query two variables for each check. Benefit: 251 vs. 285 bytes footprint of cfq_choose_req(). Also, common case 0 (no request wrapping) is now checked first in code. Signed-off-by: Andreas Mohr <andi@lisas.de> Signed-off-by: Jens Axboe <axboe@suse.de>
2006-03-28	[PATCH] [BLOCK] cfq-iosched: change cfq io context linking from list to tree	Jens Axboe
	On setups with many disks, we spend a considerable amount of time looking up the process-disk mapping on each queue of io. Testing with a NULL based block driver, this costs 40-50% reduction in throughput for 1000 disks. Signed-off-by: Jens Axboe <axboe@suse.de>
2006-03-27	Merge branch 'for-linus' of git://brick.kernel.dk/data/git/linux-2.6-block	Linus Torvalds
	* 'for-linus' of git://brick.kernel.dk/data/git/linux-2.6-block: [PATCH] Don't make debugfs depend on DEBUG_KERNEL [PATCH] Fix blktrace compile with sysfs not defined [PATCH] unused label in drivers/block/cciss. [BLOCK] increase size of disk stat counters [PATCH] blk_execute_rq_nowait-speedup [PATCH] ide-cd: quiet down GPCMD_READ_CDVD_CAPACITY failure [BLOCK] ll_rw_blk: kmalloc -> kzalloc conversion [PATCH] kzalloc() conversion in drivers/block [PATCH] update max_sectors documentation
2006-03-27	[PATCH] md: Make sure QUEUE_FLAG_CLUSTER is set properly for md.	NeilBrown
	This flag should be set for a virtual device iff it is set for all underlying devices. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27	[PATCH] Fix blktrace compile with sysfs not defined	Jens Axboe
	debugfs depends on sysfs, so make blktrace kconfig option depend on that. Reported by Adrian Bunk. Signed-off-by: Jens Axboe <axboe@suse.de>
2006-03-27	[BLOCK] increase size of disk stat counters	Ben Woodard
	The kernel's representation of the disk statistics uses the type unsigned which is 32b on both 32b and 64b platforms. Unfortunately, most system tools that work with these numbers that are exported in /proc/diskstats including iostat read these numbers into unsigned longs. This works fine on 32b platforms and when the number of IO transactions are small on 64b platforms. However, when the numbers wrap on 64b platforms & you read the numbers into unsigned longs, and compare the numbers to previous readings, then you get an unsigned representation of a negative number. This looks like a very large 64b number & gives you bizarre readouts in iostat: ilc4: Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util ilc4: sda 5.50 0.00 143.96 0.00 307496983987862656.00 0.00 153748491993931328.00 0.00 2136028725038430.00 7.94 55.12 5.59 80.42 Though fixing iostat in user space is possible, and a quick survey indicates that several other similar tools also use unsigned longs when processing /proc/diskstats. Therefore, it seems like a better approach would be to extend the length of the disk_stats structure on 64b architectures to 64b. The following patch does that. It should not affect the operation on 32b platforms. Signed-off-by: Ben Woodard <woodard@redhat.com> Cc: Rick Lindsley <ricklind@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Jens Axboe <axboe@suse.de>