summaryrefslogtreecommitdiffstats
path: root/drivers/block
AgeCommit message (Collapse)Author
2014-01-12block: null_blk: fix queue leak inside removing deviceMing Lei
When queue_mode is NULL_Q_MQ and null_blk is being removed, blk_cleanup_queue() isn't called to cleanup queue, so the queue allocated won't be freed. This patch calls blk_cleanup_queue() for MQ to drain all pending requests first and release the reference counter of queue kobject, then blk_mq_free_queue() will be called in queue kobject's release handler when queue kobject's reference counter drops to zero. Signed-off-by: Ming Lei <tom.leiming@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-08Merge branch 'for-3.14/core' into for-3.14/driversJens Axboe
We need the updated code to make bcache easier to merge.
2014-01-03xen/pvhvm: If xen_platform_pci=0 is set don't blow up (v4).Konrad Rzeszutek Wilk
The user has the option of disabling the platform driver: 00:02.0 Unassigned class [ff80]: XenSource, Inc. Xen Platform Device (rev 01) which is used to unplug the emulated drivers (IDE, Realtek 8169, etc) and allow the PV drivers to take over. If the user wishes to disable that they can set: xen_platform_pci=0 (in the guest config file) or xen_emul_unplug=never (on the Linux command line) except it does not work properly. The PV drivers still try to load and since the Xen platform driver is not run - and it has not initialized the grant tables, most of the PV drivers stumble upon: input: Xen Virtual Keyboard as /devices/virtual/input/input5 input: Xen Virtual Pointer as /devices/virtual/input/input6M ------------[ cut here ]------------ kernel BUG at /home/konrad/ssd/konrad/linux/drivers/xen/grant-table.c:1206! invalid opcode: 0000 [#1] SMP Modules linked in: xen_kbdfront(+) xenfs xen_privcmd CPU: 6 PID: 1389 Comm: modprobe Not tainted 3.13.0-rc1upstream-00021-ga6c892b-dirty #1 Hardware name: Xen HVM domU, BIOS 4.4-unstable 11/26/2013 RIP: 0010:[<ffffffff813ddc40>] [<ffffffff813ddc40>] get_free_entries+0x2e0/0x300 Call Trace: [<ffffffff8150d9a3>] ? evdev_connect+0x1e3/0x240 [<ffffffff813ddd0e>] gnttab_grant_foreign_access+0x2e/0x70 [<ffffffffa0010081>] xenkbd_connect_backend+0x41/0x290 [xen_kbdfront] [<ffffffffa0010a12>] xenkbd_probe+0x2f2/0x324 [xen_kbdfront] [<ffffffff813e5757>] xenbus_dev_probe+0x77/0x130 [<ffffffff813e7217>] xenbus_frontend_dev_probe+0x47/0x50 [<ffffffff8145e9a9>] driver_probe_device+0x89/0x230 [<ffffffff8145ebeb>] __driver_attach+0x9b/0xa0 [<ffffffff8145eb50>] ? driver_probe_device+0x230/0x230 [<ffffffff8145eb50>] ? driver_probe_device+0x230/0x230 [<ffffffff8145cf1c>] bus_for_each_dev+0x8c/0xb0 [<ffffffff8145e7d9>] driver_attach+0x19/0x20 [<ffffffff8145e260>] bus_add_driver+0x1a0/0x220 [<ffffffff8145f1ff>] driver_register+0x5f/0xf0 [<ffffffff813e55c5>] xenbus_register_driver_common+0x15/0x20 [<ffffffff813e76b3>] xenbus_register_frontend+0x23/0x40 [<ffffffffa0015000>] ? 0xffffffffa0014fff [<ffffffffa001502b>] xenkbd_init+0x2b/0x1000 [xen_kbdfront] [<ffffffff81002049>] do_one_initcall+0x49/0x170 .. snip.. which is hardly nice. This patch fixes this by having each PV driver check for: - if running in PV, then it is fine to execute (as that is their native environment). - if running in HVM, check if user wanted 'xen_emul_unplug=never', in which case bail out and don't load any PV drivers. - if running in HVM, and if PCI device 5853:0001 (xen_platform_pci) does not exist, then bail out and not load PV drivers. - (v2) if running in HVM, and if the user wanted 'xen_emul_unplug=ide-disks', then bail out for all PV devices _except_ the block one. Ditto for the network one ('nics'). - (v2) if running in HVM, and if the user wanted 'xen_emul_unplug=unnecessary' then load block PV driver, and also setup the legacy IDE paths. In (v3) make it actually load PV drivers. Reported-by: Sander Eikelenboom <linux@eikelenboom.it Reported-by: Anthony PERARD <anthony.perard@citrix.com> Reported-and-Tested-by: Fabio Fantoni <fabio.fantoni@m2r.biz> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> [v2: Add extra logic to handle the myrid ways 'xen_emul_unplug' can be used per Ian and Stefano suggestion] [v3: Make the unnecessary case work properly] [v4: s/disks/ide-disks/ spotted by Fabio] Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> [for PCI parts] CC: stable@vger.kernel.org
2014-01-03pktcdvd: fix error return codeJulia Lawall
Set the return variable to an error code as done elsewhere in the function. A simplified version of the semantic match that finds this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> ( if@p1 (\(ret < 0\|ret != 0\)) { ... return ret; } | ret@p1 = 0 ) ... when != ret = e1 when != &ret *if(...) { ... when != ret = e2 when forall return ret; } // </smpl> Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2013-12-31rbd: tear down watch request if rbd_dev_device_setup() failsIlya Dryomov
Tear down watch request if rbd_dev_device_setup() fails. Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-12-31rbd: introduce rbd_dev_header_unwatch_sync() and switch to itIlya Dryomov
Rename rbd_dev_header_watch_sync() to __rbd_dev_header_watch_sync() and introduce two helpers: rbd_dev_header_{,un}watch_sync() to make it more clear what is going on. Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-12-31rbd: enable extended devt in single-major modeIlya Dryomov
If single-major device number allocation scheme is turned on, instead of reserving 256 minors per device, which imposes a limit of 4096 images mapped at once, reserve 16 minors per device and enable extended devt feature. This results in a theoretical limit of 65536 images mapped at once, and still allows to have more than 15 partititions: partitions starting with 16th are mapped under major 259 (Block Extended Major): $ rbd showmapped id pool image snap device 0 rbd b5 - /dev/rbd0 # no partitions 1 rbd b2 - /dev/rbd1 # 40 partitions 2 rbd b3 - /dev/rbd2 # 2 partitions $ cat /proc/partitions 251 0 1024 rbd0 251 16 1024 rbd1 251 17 0 rbd1p1 251 18 0 rbd1p2 ... 251 30 0 rbd1p14 251 31 0 rbd1p15 259 0 0 rbd1p16 259 1 0 rbd1p17 ... 259 23 0 rbd1p39 259 24 0 rbd1p40 251 32 1024 rbd2 251 33 0 rbd2p1 251 34 0 rbd2p2 (major 251 was assigned dynamically at module load time) Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-12-31rbd: add support for single-major device number allocation schemeIlya Dryomov
Currently each rbd device is allocated its own major number, which leads to a hard limit of 230-250 images mapped at once. This commit adds support for a new single-major device number allocation scheme, which is hidden behind a new single_major boolean module parameter and is disabled by default for backwards compatibility reasons. (Old userspace cannot correctly unmap images mapped under single-major scheme and would essentially just unmap a random image, if that.) $ rbd showmapped id pool image snap device 0 rbd b100 - /dev/rbd0 1 rbd b101 - /dev/rbd1 2 rbd b102 - /dev/rbd2 3 rbd b103 - /dev/rbd3 Old scheme (modprobe rbd): $ ls -l /dev/rbd* brw-rw---- 1 root disk 253, 0 Dec 10 12:24 /dev/rbd0 brw-rw---- 1 root disk 252, 0 Dec 10 12:28 /dev/rbd1 brw-rw---- 1 root disk 252, 1 Dec 10 12:28 /dev/rbd1p1 brw-rw---- 1 root disk 252, 2 Dec 10 12:28 /dev/rbd1p2 brw-rw---- 1 root disk 252, 3 Dec 10 12:28 /dev/rbd1p3 brw-rw---- 1 root disk 251, 0 Dec 10 12:28 /dev/rbd2 brw-rw---- 1 root disk 251, 1 Dec 10 12:28 /dev/rbd2p1 brw-rw---- 1 root disk 250, 0 Dec 10 12:24 /dev/rbd3 New scheme (modprobe rbd single_major=Y): $ ls -l /dev/rbd* brw-rw---- 1 root disk 253, 0 Dec 10 12:30 /dev/rbd0 brw-rw---- 1 root disk 253, 256 Dec 10 12:30 /dev/rbd1 brw-rw---- 1 root disk 253, 257 Dec 10 12:30 /dev/rbd1p1 brw-rw---- 1 root disk 253, 258 Dec 10 12:30 /dev/rbd1p2 brw-rw---- 1 root disk 253, 259 Dec 10 12:30 /dev/rbd1p3 brw-rw---- 1 root disk 253, 512 Dec 10 12:30 /dev/rbd2 brw-rw---- 1 root disk 253, 513 Dec 10 12:30 /dev/rbd2p1 brw-rw---- 1 root disk 253, 768 Dec 10 12:30 /dev/rbd3 (major 253 was assigned dynamically at module load time) The new limit is 4096 images mapped at once, and it comes from the fact that, as before, 256 minor numbers are reserved for each mapping. (A follow-up commit changes the number of minors reserved and the way we deal with partitions over that number.) If single_major is set to true, two new sysfs interfaces show up: /sys/bus/rbd/{add,remove}_single_major. These are to be used instead of /sys/bus/rbd/{add,remove}, which are disabled for backwards compatibility reasons outlined above. Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Alex Elder <elder@linaro.org> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-12-31rbd: wire up is_visible() sysfs callback for rbd busIlya Dryomov
In preparation for single-major device number allocation scheme, wire up attribute_group::is_visible() callback for rbd bus. This allows us to make the new single-major attributes conditional. Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Alex Elder <elder@linaro.org> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-12-31rbd: add 'minor' sysfs rbd device attributeIlya Dryomov
Introduce /sys/bus/rbd/devices/<id>/minor sysfs attribute for exporting rbd whole disk minor numbers. This is a step towards single-major device number allocation scheme, but also a good thing on its own. Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Alex Elder <elder@linaro.org> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-12-31rbd: switch to ida for rbd id assignmentsIlya Dryomov
Currently rbd ids are allocated using an atomic variable that keeps track of the highest id currently in use and each new id is simply one more than the value of that variable. That's nice and cheap, but it does mean that rbd ids are allowed to grow boundlessly, and, more importantly, it's completely unpredictable. So, in preparation for single-major device number allocation scheme, which is going to establish and rely on a constant mapping between rbd ids and device numbers, switch to ida for rbd id assignments. Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Alex Elder <elder@linaro.org> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-12-31rbd: refactor rbd_init() a bitIlya Dryomov
Refactor rbd_init() a bit to make it more clear what's going on. Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Alex Elder <elder@linaro.org> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-12-31rbd: tweak "loaded" message and module descriptionIlya Dryomov
Tweak "loaded" message, so that it looks like [ 30.184235] rbd: loaded instead of [ 38.056564] rbd: loaded rbd (rados block device) Also move (and slightly tweak) MODULE_DESCRIPTION so that all authors are next to each other in modinfo output. Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Alex Elder <elder@linaro.org> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-12-31rbd: rbd_device::dev_id is an int, format it as suchIlya Dryomov
rbd_device::dev_id is an int, format it as such. Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Alex Elder <elder@linaro.org> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-12-31Merge tag 'v3.13-rc6' into for-3.14/coreJens Axboe
Needed to bring blk-mq uptodate, since changes have been going in since for-3.14/core was established. Fixup merge issues related to the immutable biovec changes. Signed-off-by: Jens Axboe <axboe@kernel.dk> Conflicts: block/blk-flush.c fs/btrfs/check-integrity.c fs/btrfs/extent_io.c fs/btrfs/scrub.c fs/logfs/dev_bdev.c
2013-12-24Merge branch 'for-linus' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block fixes from Jens Axboe: - fix for a memory leak on certain unplug events - a collection of bcache fixes from Kent and Nicolas - a few null_blk fixes and updates form Matias - a marking of static of functions in the stec pci-e driver * 'for-linus' of git://git.kernel.dk/linux-block: null_blk: support submit_queues on use_per_node_hctx null_blk: set use_per_node_hctx param to false null_blk: corrections to documentation null_blk: warning on ignored submit_queues param null_blk: refactor init and init errors code paths null_blk: documentation null_blk: mem garbage on NUMA systems during init drivers: block: Mark the functions as static in skd_main.c bcache: New writeback PD controller bcache: bugfix for race between moving_gc and bucket_invalidate bcache: fix for gc and writeback race bcache: bugfix - moving_gc now moves only correct buckets bcache: fix for gc crashing when no sectors are used bcache: Fix heap_peek() macro bcache: Fix for can_attach_cache() bcache: Fix dirty_data accounting bcache: Use uninterruptible sleep in writeback bcache: kthread don't set writeback task to INTERUPTIBLE block: fix memory leaks on unplugging block device bcache: fix sparse non static symbol warning
2013-12-21null_blk: support submit_queues on use_per_node_hctxMatias Bjørling
In the case of both the submit_queues param and use_per_node_hctx param are used. We limit the number af submit_queues to the number of online nodes. If the submit_queues is a multiple of nr_online_nodes, its trivial. Simply map them to the nodes. For example: 8 submit queues are mapped as node0[0,1], node1[2,3], ... If uneven, we are left with an uneven number of submit_queues that must be mapped. These are mapped toward the first node and onward. E.g. 5 submit queues mapped onto 4 nodes are mapped as node0[0,1], node1[2], ... Signed-off-by: Matias Bjorling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-12-21null_blk: set use_per_node_hctx param to falseMatias Bjørling
The defaults for the module is to instantiate itself with blk-mq and a submit queue for each CPU node in the system. To save resources, initialize instead with a single submit queue. Signed-off-by: Matias Bjorling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-12-19null_blk: warning on ignored submit_queues paramMatias Bjorling
Let the user know when the number of submission queues are being ignored. Signed-off-by: Matias Bjorling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-12-19null_blk: refactor init and init errors code pathsMatias Bjorling
Simplify the initialization logic of the three block-layers. - The queue initialization is split into two parts. This allows reuse of code when initializing the sq-, bio- and mq-based layers. - Set submit_queues default value to 0 and always set it at init time. - Simplify the init error code paths. Signed-off-by: Matias Bjorling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-12-19null_blk: mem garbage on NUMA systems during initMatias Bjorling
For NUMA systems, initializing the blk-mq layer and using per node hctx. We initialize submit queues to 1, while blk-mq nr_hw_queues is initialized to the number of NUMA nodes. This makes the null_init_hctx function overwrite memory outside of what it allocated. In my case it lead to writing garbage into struct request_queue's mq_map. Signed-off-by: Matias Bjorling <m@bjorling.me> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-12-19drivers: block: Mark the functions as static in skd_main.cRashika Kheria
Mark functions skd_skmsg_state_to_str() and skd_skreq_state_to_str() as static in skd_main.c because they are not used outside this file. This eliminates the following warnings in skd_main.c: drivers/block/skd_main.c:5272:13: warning: no previous prototype for ‘skd_skmsg_state_to_str’ [-Wmissing-prototypes] drivers/block/skd_main.c:5284:13: warning: no previous prototype for ‘skd_skreq_state_to_str’ [-Wmissing-prototypes] Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-12-19Merge branch 'master' into for-nextJiri Kosina
Sync with Linus' tree to be able to apply fixes on top of newer things in tree (efi-stub). Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2013-12-16NVMe: Device resume error handlingKeith Busch
Adds controller error handling on resume power management. If the device fails to initialize, the device is queued for a reset. If the reset fails, a thread is spawned to remove the pci device. If the device resumes as "busy", the device is responding to admin commands but will not create IO queues. In this case, we need to remove the gendisks and free the IO queues since they can't be used and may be holding bios in their lists. From testing, the dma pools require a pci device so this had to change the pci driver 'remove' to release the dma resources in line with that call instead of after all references to the device are released. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2013-12-16NVMe: Cache dev->pci_dev in a local pointerMatthew Wilcox
Helps with line-length issues Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2013-12-16NVMe: Fix lockdep warningsMatthew Wilcox
During the initialisation path, the queue lock is taken without interrupt protection. It's perfectly safe to do so, because the interrupt handler can't run at this point, but it confuses lockdep. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2013-12-16NVMe: compat SG_IO ioctlKeith Busch
For 32-bit versions of sg3-utils running on a 64-bit system. This is mostly a copy from the relevent portions of fs/compat_ioctl.c, with slight modifications for going through block_device_operations. Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Vishal Verma <vishal.l.verma@linux.intel.com> [fixed up CONFIG_COMPAT=n build problems] Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2013-12-15null_blk: mem garbage on NUMA systems during initMatias Bjorling
For NUMA systems, initializing the blk-mq layer and using per node hctx. We initialize submit queues to 1, while blk-mq nr_hw_queues is initialized to the number of NUMA nodes. This makes the null_init_hctx function overwrite memory outside of what it allocated. In my case it lead to writing garbage into struct request_queue's mq_map. Signed-off-by: Matias Bjorling <m@bjorling.me> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-12-05Merge branch 'for-linus' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block layer fixes from Jens Axboe: "A small collection of fixes for the current series. It contains: - A fix for a use-after-free of a request in blk-mq. From Ming Lei - A fix for a blk-mq bug that could attempt to dereference a NULL rq if allocation failed - Two xen-blkfront small fixes - Cleanup of submit_bio_wait() type uses in the kernel, unifying that. From Kent - A fix for 32-bit blkg_rwstat reading. I apologize for this one looking mangled in the shortlog, it's entirely my fault for missing an empty line between the description and body of the text" * 'for-linus' of git://git.kernel.dk/linux-block: blk-mq: fix use-after-free of request blk-mq: fix dereference of rq->mq_ctx if allocation fails block: xen-blkfront: Fix possible NULL ptr dereference xen-blkfront: Silence pfn maybe-uninitialized warning block: submit_bio_wait() conversions Update of blkg_stat and blkg_rwstat may happen in bh context
2013-12-03vio: remove dangly makefile bitsAlan
The drivers are long gone but some config escaped the prune Signed-off-by: Alan Cox <gnomes@lxorguk.ukuu.org.uk> Resolves-bug: https://bugzilla.kernel.org/show_bug.cgi?id=57221 Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2013-11-27Merge branch 'stable/for-jens-3.13-take-two' of ↵Jens Axboe
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip into for-linus
2013-11-26block: xen-blkfront: Fix possible NULL ptr dereferenceFelipe Pena
In the blkif_release function the bdget_disk() call might returns a NULL ptr which might be dereferenced on bdev->bd_openers checking Signed-off-by: Felipe Pena <felipensp@gmail.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> [v2: Added WARN per Roger's suggestion]
2013-11-26xen-blkfront: Silence pfn maybe-uninitialized warningTim Gardner
pfn cannot actually be used unless (!info->feature_persistent), nor is pfn accessed in get_grant() unless (!info->feature_persistent), but silence this warning anyway. gcc-4.8 drivers/block/xen-blkfront.c: In function 'do_blkif_request': drivers/block/xen-blkfront.c:508:20: warning: 'pfn' may be used uninitialized in this function [-Wmaybe-uninitialized] gnt_list_entry = get_grant(&gref_head, pfn, info); ^ drivers/block/xen-blkfront.c:492:19: note: 'pfn' was declared here unsigned long pfn; Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com>
2013-11-26block/z2ram: Remove duplicate external declarationsGeert Uytterhoeven
Remove the external declarations for m68k_realnum_memory and m68k_memory, which are already provided by <asm/setup.h>, to avoid conflicts later. Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
2013-11-26zorro: ZTWO_VADDR() should return "void __iomem *"Geert Uytterhoeven
ZTWO_VADDR() converts from physical to virtual I/O addresses, so it should return "void __iomem *" instead of "unsigned long". This allows to drop several casts, but requires adding a few casts to accomodate legacy driver frameworks that store "unsigned long" I/O addresses. Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
2013-11-23block: Introduce new bio_split()Kent Overstreet
The new bio_split() can split arbitrary bios - it's not restricted to single page bios, like the old bio_split() (previously renamed to bio_pair_split()). It also has different semantics - it doesn't allocate a struct bio_pair, leaving it up to the caller to handle completions. Then convert the existing bio_pair_split() users to the new bio_split() - and also nvme, which was open coding bio splitting. (We have to take that BUG_ON() out of bio_integrity_trim() because this bio_split() needs to use it, and there's no reason it has to be used on bios marked as cloned; BIO_CLONED doesn't seem to have clearly documented semantics anyways.) Signed-off-by: Kent Overstreet <kmo@daterainc.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Martin K. Petersen <martin.petersen@oracle.com> Cc: Matthew Wilcox <matthew.r.wilcox@intel.com> Cc: Keith Busch <keith.busch@intel.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Neil Brown <neilb@suse.de>
2013-11-23block: Rename bio_split() -> bio_pair_split()Kent Overstreet
This is prep work for introducing a more general bio_split(). Signed-off-by: Kent Overstreet <kmo@daterainc.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: NeilBrown <neilb@suse.de> Cc: Alasdair Kergon <agk@redhat.com> Cc: Lars Ellenberg <lars.ellenberg@linbit.com> Cc: Peter Osterlund <petero2@telia.com> Cc: Sage Weil <sage@inktank.com>
2013-11-23rbd: Refactor bio cloningKent Overstreet
Now that we've got drivers converted to the new immutable bvec primitives, bio splitting becomes much easier - this is how the new bio_split() will work. (Someone more familiar with the ceph code could probably use bio_clone_fast() instead of bio_clone() here). Signed-off-by: Kent Overstreet <kmo@daterainc.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Yehuda Sadeh <yehuda@inktank.com> Cc: Alex Elder <elder@inktank.com> Cc: ceph-devel@vger.kernel.org
2013-11-23aoe: Convert to immutable biovecsKent Overstreet
Now that we've got a mechanism for immutable biovecs - bi_iter.bi_bvec_done - we need to convert drivers to use primitives that respect it instead of using the bvec array directly. The aoe code no longer has to manually iterate over partial bvecs, so some struct members go away - other struct members are effectively renamed: buf->resid -> buf->iter.bi_size buf->sector -> buf->iter.bi_sector f->bcnt -> f->iter.bi_size f->lba -> f->iter.bi_sector Signed-off-by: Kent Overstreet <kmo@daterainc.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: "Ed L. Cashin" <ecashin@coraid.com>
2013-11-23block: Convert drivers to immutable biovecsKent Overstreet
Now that we've got a mechanism for immutable biovecs - bi_iter.bi_bvec_done - we need to convert drivers to use primitives that respect it instead of using the bvec array directly. Signed-off-by: Kent Overstreet <kmo@daterainc.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: NeilBrown <neilb@suse.de> Cc: Alasdair Kergon <agk@redhat.com> Cc: dm-devel@redhat.com
2013-11-23block: Kill bio_segments()/bi_vcnt usageKent Overstreet
When we start sharing biovecs, keeping bi_vcnt accurate for splits is going to be error prone - and unnecessary, if we refactor some code. So bio_segments() has to go - but most of the existing users just needed to know if the bio had multiple segments, which is easier - add a bio_multiple_segments() for them. (Two of the current uses of bio_segments() are going to go away in a couple patches, but the current implementation of bio_segments() is unsafe as soon as we start doing driver conversions for immutable biovecs - so implement a dumb version for bisectability, it'll go away in a couple patches) Signed-off-by: Kent Overstreet <kmo@daterainc.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Neil Brown <neilb@suse.de> Cc: Nagalakshmi Nandigama <Nagalakshmi.Nandigama@lsi.com> Cc: Sreekanth Reddy <Sreekanth.Reddy@lsi.com> Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
2013-11-23block: Immutable bio vecsKent Overstreet
This adds a mechanism by which we can advance a bio by an arbitrary number of bytes without modifying the biovec: bio->bi_iter.bi_bvec_done indicates the number of bytes completed in the current bvec. Various driver code still needs to be updated to not refer to the bvec directly before we can use this for interesting things, like efficient bio splitting. Signed-off-by: Kent Overstreet <kmo@daterainc.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Lars Ellenberg <drbd-dev@lists.linbit.com> Cc: Paul Clements <Paul.Clements@steeleye.com> Cc: drbd-user@lists.linbit.com Cc: nbd-general@lists.sourceforge.net
2013-11-23block: Convert bio_for_each_segment() to bvec_iterKent Overstreet
More prep work for immutable biovecs - with immutable bvecs drivers won't be able to use the biovec directly, they'll need to use helpers that take into account bio->bi_iter.bi_bvec_done. This updates callers for the new usage without changing the implementation yet. Signed-off-by: Kent Overstreet <kmo@daterainc.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: "Ed L. Cashin" <ecashin@coraid.com> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Lars Ellenberg <drbd-dev@lists.linbit.com> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Paul Clements <Paul.Clements@steeleye.com> Cc: Jim Paris <jim@jtan.com> Cc: Geoff Levand <geoff@infradead.org> Cc: Yehuda Sadeh <yehuda@inktank.com> Cc: Sage Weil <sage@inktank.com> Cc: Alex Elder <elder@inktank.com> Cc: ceph-devel@vger.kernel.org Cc: Joshua Morris <josh.h.morris@us.ibm.com> Cc: Philip Kelleher <pjk1939@linux.vnet.ibm.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Neil Brown <neilb@suse.de> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: linux390@de.ibm.com Cc: Nagalakshmi Nandigama <Nagalakshmi.Nandigama@lsi.com> Cc: Sreekanth Reddy <Sreekanth.Reddy@lsi.com> Cc: support@lsi.com Cc: "James E.J. Bottomley" <JBottomley@parallels.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com> Cc: Tejun Heo <tj@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Guo Chao <yan@linux.vnet.ibm.com> Cc: Asai Thambi S P <asamymuthupa@micron.com> Cc: Selvan Mani <smani@micron.com> Cc: Sam Bradshaw <sbradshaw@micron.com> Cc: Matthew Wilcox <matthew.r.wilcox@intel.com> Cc: Keith Busch <keith.busch@intel.com> Cc: Stephen Hemminger <shemminger@vyatta.com> Cc: Quoc-Son Anh <quoc-sonx.anh@intel.com> Cc: Sebastian Ott <sebott@linux.vnet.ibm.com> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Jerome Marchand <jmarchan@redhat.com> Cc: Seth Jennings <sjenning@linux.vnet.ibm.com> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: Mike Snitzer <snitzer@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: "Darrick J. Wong" <darrick.wong@oracle.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Jan Kara <jack@suse.cz> Cc: linux-m68k@lists.linux-m68k.org Cc: linuxppc-dev@lists.ozlabs.org Cc: drbd-user@lists.linbit.com Cc: nbd-general@lists.sourceforge.net Cc: cbe-oss-dev@lists.ozlabs.org Cc: xen-devel@lists.xensource.com Cc: virtualization@lists.linux-foundation.org Cc: linux-raid@vger.kernel.org Cc: linux-s390@vger.kernel.org Cc: DL-MPTFusionLinux@lsi.com Cc: linux-scsi@vger.kernel.org Cc: devel@driverdev.osuosl.org Cc: linux-fsdevel@vger.kernel.org Cc: cluster-devel@redhat.com Cc: linux-mm@kvack.org Acked-by: Geoff Levand <geoff@infradead.org>
2013-11-23block: Convert bio_iovec() to bvec_iterKent Overstreet
For immutable biovecs, we'll be introducing a new bio_iovec() that uses our new bvec iterator to construct a biovec, taking into account bvec_iter->bi_bvec_done - this patch updates existing users for the new usage. Some of the existing users really do need a pointer into the bvec array - those uses are all going to be removed, but we'll need the functionality from immutable to remove them - so for now rename the existing bio_iovec() -> __bio_iovec(), and it'll be removed in a couple patches. Signed-off-by: Kent Overstreet <kmo@daterainc.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: "Ed L. Cashin" <ecashin@coraid.com> Cc: Alasdair Kergon <agk@redhat.com> Cc: dm-devel@redhat.com Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
2013-11-23block: Abstract out bvec iteratorKent Overstreet
Immutable biovecs are going to require an explicit iterator. To implement immutable bvecs, a later patch is going to add a bi_bvec_done member to this struct; for now, this patch effectively just renames things. Signed-off-by: Kent Overstreet <kmo@daterainc.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: "Ed L. Cashin" <ecashin@coraid.com> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Lars Ellenberg <drbd-dev@lists.linbit.com> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Matthew Wilcox <willy@linux.intel.com> Cc: Geoff Levand <geoff@infradead.org> Cc: Yehuda Sadeh <yehuda@inktank.com> Cc: Sage Weil <sage@inktank.com> Cc: Alex Elder <elder@inktank.com> Cc: ceph-devel@vger.kernel.org Cc: Joshua Morris <josh.h.morris@us.ibm.com> Cc: Philip Kelleher <pjk1939@linux.vnet.ibm.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Neil Brown <neilb@suse.de> Cc: Alasdair Kergon <agk@redhat.com> Cc: Mike Snitzer <snitzer@redhat.com> Cc: dm-devel@redhat.com Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: linux390@de.ibm.com Cc: Boaz Harrosh <bharrosh@panasas.com> Cc: Benny Halevy <bhalevy@tonian.com> Cc: "James E.J. Bottomley" <JBottomley@parallels.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Nicholas A. Bellinger" <nab@linux-iscsi.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Chris Mason <chris.mason@fusionio.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: Jaegeuk Kim <jaegeuk.kim@samsung.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Dave Kleikamp <shaggy@kernel.org> Cc: Joern Engel <joern@logfs.org> Cc: Prasad Joshi <prasadjoshi.linux@gmail.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Ben Myers <bpm@sgi.com> Cc: xfs@oss.sgi.com Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Len Brown <len.brown@intel.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com> Cc: Ben Hutchings <ben@decadent.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Guo Chao <yan@linux.vnet.ibm.com> Cc: Tejun Heo <tj@kernel.org> Cc: Asai Thambi S P <asamymuthupa@micron.com> Cc: Selvan Mani <smani@micron.com> Cc: Sam Bradshaw <sbradshaw@micron.com> Cc: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Cc: "Roger Pau Monné" <roger.pau@citrix.com> Cc: Jan Beulich <jbeulich@suse.com> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Cc: Ian Campbell <Ian.Campbell@citrix.com> Cc: Sebastian Ott <sebott@linux.vnet.ibm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Jiang Liu <jiang.liu@huawei.com> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Jerome Marchand <jmarchand@redhat.com> Cc: Joe Perches <joe@perches.com> Cc: Peng Tao <tao.peng@emc.com> Cc: Andy Adamson <andros@netapp.com> Cc: fanchaoting <fanchaoting@cn.fujitsu.com> Cc: Jie Liu <jeff.liu@oracle.com> Cc: Sunil Mushran <sunil.mushran@gmail.com> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: Namjae Jeon <namjae.jeon@samsung.com> Cc: Pankaj Kumar <pankaj.km@samsung.com> Cc: Dan Magenheimer <dan.magenheimer@oracle.com> Cc: Mel Gorman <mgorman@suse.de>6
2013-11-21kernel: remove CONFIG_USE_GENERIC_SMP_HELPERS cleanlyYuanhan Liu
Remove CONFIG_USE_GENERIC_SMP_HELPERS left by commit 0a06ff068f12 ("kernel: remove CONFIG_USE_GENERIC_SMP_HELPERS"). Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-19virtio-blk: virtqueue_kick() must be ordered with other virtqueue operationsShaohua Li
It isn't safe to call it without holding the vblk->vq_lock. Reported-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Shaohua Li <shli@fusionio.com> Fixed another condition of virtqueue_kick() not holding the lock. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-11-18NVMe: remove deprecated IRQF_DISABLEDMichael Opdenacker
This patch proposes to remove the use of the IRQF_DISABLED flag It's a NOOP since 2.6.35 and it will be removed one day. Signed-off-by: Michael Opdenacker <michael.opdenacker@free-electrons.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2013-11-18NVMe: Avoid shift operation when writing cq head doorbellHaiyan Hu
Changes the type of dev->db_stride to unsigned and changes the value stored there to be 1 << the current value. Then there is less calculation to be done at completion time. Signed-off-by: Haiyan Hu <huhaiyan@huawei.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2013-11-15Merge branch 'for-linus' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull second round of block driver updates from Jens Axboe: "As mentioned in the original pull request, the bcache bits were pulled because of their dependency on the immutable bio vecs. Kent re-did this part and resubmitted it, so here's the 2nd round of (mostly) driver updates for 3.13. It contains: - The bcache work from Kent. - Conversion of virtio-blk to blk-mq. This removes the bio and request path, and substitutes with the blk-mq path instead. The end result almost 200 deleted lines. Patch is acked by Asias and Christoph, who both did a bunch of testing. - A removal of bootmem.h include from Grygorii Strashko, part of a larger series of his killing the dependency on that header file. - Removal of __cpuinit from blk-mq from Paul Gortmaker" * 'for-linus' of git://git.kernel.dk/linux-block: (56 commits) virtio_blk: blk-mq support blk-mq: remove newly added instances of __cpuinit bcache: defensively handle format strings bcache: Bypass torture test bcache: Delete some slower inline asm bcache: Use ida for bcache block dev minor bcache: Fix sysfs splat on shutdown with flash only devs bcache: Better full stripe scanning bcache: Have btree_split() insert into parent directly bcache: Move spinlock into struct time_stats bcache: Kill sequential_merge option bcache: Kill bch_next_recurse_key() bcache: Avoid deadlocking in garbage collection bcache: Incremental gc bcache: Add make_btree_freeing_key() bcache: Add btree_node_write_sync() bcache: PRECEDING_KEY() bcache: bch_(btree|extent)_ptr_invalid() bcache: Don't bother with bucket refcount for btree node allocations bcache: Debug code improvements ...