Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
Pull ceph updates from Sage Weil:
"This is a big batch. From Ilya we have:
- rbd support for more than ~250 mapped devices (now uses same scheme
that SCSI does for device major/minor numbering)
- crush updates for new mapping behaviors (will be needed for coming
erasure coding support, among other things)
- preliminary support for tiered storage pools
There is also a big series fixing a pile cephfs bugs with clustered
MDSs from Yan Zheng, ACL support for cephfs from Guangliang Zhao, ceph
fscache improvements from Li Wang, improved behavior when we get
ENOSPC from Josh Durgin, some readv/writev improvements from
Majianpeng, and the usual mix of small cleanups"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (76 commits)
ceph: cast PAGE_SIZE to size_t in ceph_sync_write()
ceph: fix dout() compile warnings in ceph_filemap_fault()
libceph: support CEPH_FEATURE_OSD_CACHEPOOL feature
libceph: follow redirect replies from osds
libceph: rename ceph_osd_request::r_{oloc,oid} to r_base_{oloc,oid}
libceph: follow {read,write}_tier fields on osd request submission
libceph: add ceph_pg_pool_by_id()
libceph: CEPH_OSD_FLAG_* enum update
libceph: replace ceph_calc_ceph_pg() with ceph_oloc_oid_to_pg()
libceph: introduce and start using oid abstraction
libceph: rename MAX_OBJ_NAME_SIZE to CEPH_MAX_OID_NAME_LEN
libceph: move ceph_file_layout helpers to ceph_fs.h
libceph: start using oloc abstraction
libceph: dout() is missing a newline
libceph: add ceph_kv{malloc,free}() and switch to them
libceph: support CEPH_FEATURE_EXPORT_PEER
ceph: add imported caps when handling cap export message
ceph: add open export target session helper
ceph: remove exported caps when handling cap import message
ceph: handle session flush message
...
|
|
Rename ceph_osd_request::r_{oloc,oid} to r_base_{oloc,oid} before
introducing r_target_{oloc,oid} needed for redirects.
Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
|
|
In preparation for tiering support, which would require having two
(base and target) object names for each osd request and also copying
those names around, introduce struct ceph_object_id (oid) and a couple
helpers to facilitate those copies and encapsulate the fact that object
name is not necessarily a NUL-terminated string.
Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
|
|
In preparation for adding oid abstraction, rename MAX_OBJ_NAME_SIZE to
CEPH_MAX_OID_NAME_LEN.
Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
|
|
Instead of relying on pool fields in ceph_file_layout (for mapping) and
ceph_pg (for enconding), start using ceph_object_locator (oloc)
abstraction. Note that userspace oloc currently consists of pool, key,
nspace and hash fields, while this one contains only a pool. This is
OK, because at this point we only send (i.e. encode) olocs and never
have to receive (i.e. decode) them.
This makes keeping a copy of ceph_file_layout in every osd request
unnecessary, so ceph_osd_request::r_file_layout field is nuked.
Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
|
|
RAM block device support module name changed to brd.ko some years ago
with an "rd" alias to match previous module implementation. This patch
updates its Kconfig definition.
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull Xen updates from Konrad Rzeszutek Wilk:
"Two major features that Xen community is excited about:
The first is event channel scalability by David Vrabel - we switch
over from an two-level per-cpu bitmap of events (IRQs) - to an FIFO
queue with priorities. This lets us be able to handle more events,
have lower latency, and better scalability. Good stuff.
The other is PVH by Mukesh Rathor. In short, PV is a mode where the
kernel lets the hypervisor program page-tables, segments, etc. With
EPT/NPT capabilities in current processors, the overhead of doing this
in an HVM (Hardware Virtual Machine) container is much lower than the
hypervisor doing it for us.
In short we let a PV guest run without doing page-table, segment,
syscall, etc updates through the hypervisor - instead it is all done
within the guest container. It is a "hybrid" PV - hence the 'PVH'
name - a PV guest within an HVM container.
The major benefits are less code to deal with - for example we only
use one function from the the pv_mmu_ops (which has 39 function
calls); faster performance for syscall (no context switches into the
hypervisor); less traps on various operations; etc.
It is still being baked - the ABI is not yet set in stone. But it is
pretty awesome and we are excited about it.
Lastly, there are some changes to ARM code - you should get a simple
conflict which has been resolved in #linux-next.
In short, this pull has awesome features.
Features:
- FIFO event channels. Key advantages: support for over 100,000
events (2^17), 16 different event priorities, improved fairness in
event latency through the use of FIFOs.
- Xen PVH support. "It’s a fully PV kernel mode, running with
paravirtualized disk and network, paravirtualized interrupts and
timers, no emulated devices of any kind (and thus no qemu), no BIOS
or legacy boot — but instead of requiring PV MMU, it uses the HVM
hardware extensions to virtualize the pagetables, as well as system
calls and other privileged operations." (from "The
Paravirtualization Spectrum, Part 2: From poles to a spectrum")
Bug-fixes:
- Fixes in balloon driver (refactor and make it work under ARM)
- Allow xenfb to be used in HVM guests.
- Allow xen_platform_pci=0 to work properly.
- Refactors in event channels"
* tag 'stable/for-linus-3.14-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: (52 commits)
xen/pvh: Set X86_CR0_WP and others in CR0 (v2)
MAINTAINERS: add git repository for Xen
xen/pvh: Use 'depend' instead of 'select'.
xen: delete new instances of __cpuinit usage
xen/fb: allow xenfb initialization for hvm guests
xen/evtchn_fifo: fix error return code in evtchn_fifo_setup()
xen-platform: fix error return code in platform_pci_init()
xen/pvh: remove duplicated include from enlighten.c
xen/pvh: Fix compile issues with xen_pvh_domain()
xen: Use dev_is_pci() to check whether it is pci device
xen/grant-table: Force to use v1 of grants.
xen/pvh: Support ParaVirtualized Hardware extensions (v3).
xen/pvh: Piggyback on PVHVM XenBus.
xen/pvh: Piggyback on PVHVM for grant driver (v4)
xen/grant: Implement an grant frame array struct (v3).
xen/grant-table: Refactor gnttab_init
xen/grants: Remove gnttab_max_grant_frames dependency on gnttab_init.
xen/pvh: Piggyback on PVHVM for event channels (v2)
xen/pvh: Update E820 to work with PVH (v2)
xen/pvh: Secondary VCPU bringup (non-bootup CPUs)
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial tree updates from Jiri Kosina:
"Usual rocket science stuff from trivial.git"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (39 commits)
neighbour.h: fix comment
sched: Fix warning on make htmldocs caused by wait.h
slab: struct kmem_cache is protected by slab_mutex
doc: Fix typo in USB Gadget Documentation
of/Kconfig: Spelling s/one/once/
mkregtable: Fix sscanf handling
lp5523, lp8501: comment improvements
thermal: rcar: comment spelling
treewide: fix comments and printk msgs
IXP4xx: remove '1 &&' from a condition check in ixp4xx_restart()
Documentation: update /proc/uptime field description
Documentation: Fix size parameter for snprintf
arm: fix comment header and macro name
asm-generic: uaccess: Spelling s/a ny/any/
mtd: onenand: fix comment header
doc: driver-model/platform.txt: fix a typo
drivers: fix typo in DEVTMPFS_MOUNT Kconfig help text
doc: Fix typo (acces_process_vm -> access_process_vm)
treewide: Fix typos in printk
drivers/gpu/drm/qxl/Kconfig: reformat the help text
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k
Pull m68k updates from Geert Uytterhoeven:
- Zorro bus cleanups and UAPI revival
- Bootinfo cleanups and UAPI revival
- Kexec support
- Memory size reductions and bug fixes for multi-platform kernels
- Polled interrupt support for Atari EtherNAT, EtherNEC and NetUSBee
- Machine-specific random_get_entropy()
- Defconfig updates and cleanups
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k: (46 commits)
m68k/mac: Make SCC reset work more reliably
m68k/irq - Use polled IRQ flag for MFP timer cascaded interrupts
m68k: Update defconfigs for v3.13-rc1
m68k/defconfig: Enable EARLY_PRINTK
m68k/mm: kmap spelling/grammar fixes
m68k: Convert arch/m68k/kernel/traps.c to pr_*()
m68k: Convert arch/m68k/mm/fault.c to pr_*()
m68k/mm: Check for mm != NULL in do_page_fault() debug code
m68k/defconfig: Disable /sbin/hotplug fork-bomb by default
m68k/atari: Hide RTC_PORT() macro from rtc-cmos
m68k/amiga,atari: Fix specifying multiple debug= parameters
m68k/defconfig: Use ext4 for ext2/ext3 file systems
m68k: Add support to export bootinfo in procfs
m68k: Add kexec support
m68k/mac: Mark Mac IIsi ADB driver BROKEN
m68k/amiga: Provide mach_random_get_entropy()
m68k: Add infrastructure for machine-specific random_get_entropy()
m68k/atari: Call paging_init() before nf_init()
m68k: Remove superfluous inclusions of <asm/bootinfo.h>
m68k/UAPI: Use proper types (endianness/size) in <asm/bootinfo*.h>
...
|
|
When queue_mode is NULL_Q_MQ and null_blk is being removed,
blk_cleanup_queue() isn't called to cleanup queue, so the queue
allocated won't be freed.
This patch calls blk_cleanup_queue() for MQ to drain all pending
requests first and release the reference counter of queue kobject, then
blk_mq_free_queue() will be called in queue kobject's release handler
when queue kobject's reference counter drops to zero.
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The user has the option of disabling the platform driver:
00:02.0 Unassigned class [ff80]: XenSource, Inc. Xen Platform Device (rev 01)
which is used to unplug the emulated drivers (IDE, Realtek 8169, etc)
and allow the PV drivers to take over. If the user wishes
to disable that they can set:
xen_platform_pci=0
(in the guest config file)
or
xen_emul_unplug=never
(on the Linux command line)
except it does not work properly. The PV drivers still try to
load and since the Xen platform driver is not run - and it
has not initialized the grant tables, most of the PV drivers
stumble upon:
input: Xen Virtual Keyboard as /devices/virtual/input/input5
input: Xen Virtual Pointer as /devices/virtual/input/input6M
------------[ cut here ]------------
kernel BUG at /home/konrad/ssd/konrad/linux/drivers/xen/grant-table.c:1206!
invalid opcode: 0000 [#1] SMP
Modules linked in: xen_kbdfront(+) xenfs xen_privcmd
CPU: 6 PID: 1389 Comm: modprobe Not tainted 3.13.0-rc1upstream-00021-ga6c892b-dirty #1
Hardware name: Xen HVM domU, BIOS 4.4-unstable 11/26/2013
RIP: 0010:[<ffffffff813ddc40>] [<ffffffff813ddc40>] get_free_entries+0x2e0/0x300
Call Trace:
[<ffffffff8150d9a3>] ? evdev_connect+0x1e3/0x240
[<ffffffff813ddd0e>] gnttab_grant_foreign_access+0x2e/0x70
[<ffffffffa0010081>] xenkbd_connect_backend+0x41/0x290 [xen_kbdfront]
[<ffffffffa0010a12>] xenkbd_probe+0x2f2/0x324 [xen_kbdfront]
[<ffffffff813e5757>] xenbus_dev_probe+0x77/0x130
[<ffffffff813e7217>] xenbus_frontend_dev_probe+0x47/0x50
[<ffffffff8145e9a9>] driver_probe_device+0x89/0x230
[<ffffffff8145ebeb>] __driver_attach+0x9b/0xa0
[<ffffffff8145eb50>] ? driver_probe_device+0x230/0x230
[<ffffffff8145eb50>] ? driver_probe_device+0x230/0x230
[<ffffffff8145cf1c>] bus_for_each_dev+0x8c/0xb0
[<ffffffff8145e7d9>] driver_attach+0x19/0x20
[<ffffffff8145e260>] bus_add_driver+0x1a0/0x220
[<ffffffff8145f1ff>] driver_register+0x5f/0xf0
[<ffffffff813e55c5>] xenbus_register_driver_common+0x15/0x20
[<ffffffff813e76b3>] xenbus_register_frontend+0x23/0x40
[<ffffffffa0015000>] ? 0xffffffffa0014fff
[<ffffffffa001502b>] xenkbd_init+0x2b/0x1000 [xen_kbdfront]
[<ffffffff81002049>] do_one_initcall+0x49/0x170
.. snip..
which is hardly nice. This patch fixes this by having each
PV driver check for:
- if running in PV, then it is fine to execute (as that is their
native environment).
- if running in HVM, check if user wanted 'xen_emul_unplug=never',
in which case bail out and don't load any PV drivers.
- if running in HVM, and if PCI device 5853:0001 (xen_platform_pci)
does not exist, then bail out and not load PV drivers.
- (v2) if running in HVM, and if the user wanted 'xen_emul_unplug=ide-disks',
then bail out for all PV devices _except_ the block one.
Ditto for the network one ('nics').
- (v2) if running in HVM, and if the user wanted 'xen_emul_unplug=unnecessary'
then load block PV driver, and also setup the legacy IDE paths.
In (v3) make it actually load PV drivers.
Reported-by: Sander Eikelenboom <linux@eikelenboom.it
Reported-by: Anthony PERARD <anthony.perard@citrix.com>
Reported-and-Tested-by: Fabio Fantoni <fabio.fantoni@m2r.biz>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
[v2: Add extra logic to handle the myrid ways 'xen_emul_unplug'
can be used per Ian and Stefano suggestion]
[v3: Make the unnecessary case work properly]
[v4: s/disks/ide-disks/ spotted by Fabio]
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com> [for PCI parts]
CC: stable@vger.kernel.org
|
|
Tear down watch request if rbd_dev_device_setup() fails.
Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
|
|
Rename rbd_dev_header_watch_sync() to __rbd_dev_header_watch_sync() and
introduce two helpers: rbd_dev_header_{,un}watch_sync() to make it more
clear what is going on.
Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
|
|
If single-major device number allocation scheme is turned on, instead
of reserving 256 minors per device, which imposes a limit of 4096
images mapped at once, reserve 16 minors per device and enable extended
devt feature. This results in a theoretical limit of 65536 images
mapped at once, and still allows to have more than 15 partititions:
partitions starting with 16th are mapped under major 259 (Block
Extended Major):
$ rbd showmapped
id pool image snap device
0 rbd b5 - /dev/rbd0 # no partitions
1 rbd b2 - /dev/rbd1 # 40 partitions
2 rbd b3 - /dev/rbd2 # 2 partitions
$ cat /proc/partitions
251 0 1024 rbd0
251 16 1024 rbd1
251 17 0 rbd1p1
251 18 0 rbd1p2
...
251 30 0 rbd1p14
251 31 0 rbd1p15
259 0 0 rbd1p16
259 1 0 rbd1p17
...
259 23 0 rbd1p39
259 24 0 rbd1p40
251 32 1024 rbd2
251 33 0 rbd2p1
251 34 0 rbd2p2
(major 251 was assigned dynamically at module load time)
Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
|
|
Currently each rbd device is allocated its own major number, which
leads to a hard limit of 230-250 images mapped at once. This commit
adds support for a new single-major device number allocation scheme,
which is hidden behind a new single_major boolean module parameter and
is disabled by default for backwards compatibility reasons. (Old
userspace cannot correctly unmap images mapped under single-major
scheme and would essentially just unmap a random image, if that.)
$ rbd showmapped
id pool image snap device
0 rbd b100 - /dev/rbd0
1 rbd b101 - /dev/rbd1
2 rbd b102 - /dev/rbd2
3 rbd b103 - /dev/rbd3
Old scheme (modprobe rbd):
$ ls -l /dev/rbd*
brw-rw---- 1 root disk 253, 0 Dec 10 12:24 /dev/rbd0
brw-rw---- 1 root disk 252, 0 Dec 10 12:28 /dev/rbd1
brw-rw---- 1 root disk 252, 1 Dec 10 12:28 /dev/rbd1p1
brw-rw---- 1 root disk 252, 2 Dec 10 12:28 /dev/rbd1p2
brw-rw---- 1 root disk 252, 3 Dec 10 12:28 /dev/rbd1p3
brw-rw---- 1 root disk 251, 0 Dec 10 12:28 /dev/rbd2
brw-rw---- 1 root disk 251, 1 Dec 10 12:28 /dev/rbd2p1
brw-rw---- 1 root disk 250, 0 Dec 10 12:24 /dev/rbd3
New scheme (modprobe rbd single_major=Y):
$ ls -l /dev/rbd*
brw-rw---- 1 root disk 253, 0 Dec 10 12:30 /dev/rbd0
brw-rw---- 1 root disk 253, 256 Dec 10 12:30 /dev/rbd1
brw-rw---- 1 root disk 253, 257 Dec 10 12:30 /dev/rbd1p1
brw-rw---- 1 root disk 253, 258 Dec 10 12:30 /dev/rbd1p2
brw-rw---- 1 root disk 253, 259 Dec 10 12:30 /dev/rbd1p3
brw-rw---- 1 root disk 253, 512 Dec 10 12:30 /dev/rbd2
brw-rw---- 1 root disk 253, 513 Dec 10 12:30 /dev/rbd2p1
brw-rw---- 1 root disk 253, 768 Dec 10 12:30 /dev/rbd3
(major 253 was assigned dynamically at module load time)
The new limit is 4096 images mapped at once, and it comes from the fact
that, as before, 256 minor numbers are reserved for each mapping.
(A follow-up commit changes the number of minors reserved and the way
we deal with partitions over that number.)
If single_major is set to true, two new sysfs interfaces show up:
/sys/bus/rbd/{add,remove}_single_major. These are to be used instead
of /sys/bus/rbd/{add,remove}, which are disabled for backwards
compatibility reasons outlined above.
Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Alex Elder <elder@linaro.org>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
|
|
In preparation for single-major device number allocation scheme, wire
up attribute_group::is_visible() callback for rbd bus. This allows us
to make the new single-major attributes conditional.
Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Alex Elder <elder@linaro.org>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
|
|
Introduce /sys/bus/rbd/devices/<id>/minor sysfs attribute for exporting
rbd whole disk minor numbers. This is a step towards single-major
device number allocation scheme, but also a good thing on its own.
Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Alex Elder <elder@linaro.org>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
|
|
Currently rbd ids are allocated using an atomic variable that keeps
track of the highest id currently in use and each new id is simply one
more than the value of that variable. That's nice and cheap, but it
does mean that rbd ids are allowed to grow boundlessly, and, more
importantly, it's completely unpredictable. So, in preparation for
single-major device number allocation scheme, which is going to
establish and rely on a constant mapping between rbd ids and device
numbers, switch to ida for rbd id assignments.
Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Alex Elder <elder@linaro.org>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
|
|
Refactor rbd_init() a bit to make it more clear what's going on.
Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Alex Elder <elder@linaro.org>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
|
|
Tweak "loaded" message, so that it looks like
[ 30.184235] rbd: loaded
instead of
[ 38.056564] rbd: loaded rbd (rados block device)
Also move (and slightly tweak) MODULE_DESCRIPTION so that all authors
are next to each other in modinfo output.
Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Alex Elder <elder@linaro.org>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
|
|
rbd_device::dev_id is an int, format it as such.
Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Alex Elder <elder@linaro.org>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
|
|
Pull block fixes from Jens Axboe:
- fix for a memory leak on certain unplug events
- a collection of bcache fixes from Kent and Nicolas
- a few null_blk fixes and updates form Matias
- a marking of static of functions in the stec pci-e driver
* 'for-linus' of git://git.kernel.dk/linux-block:
null_blk: support submit_queues on use_per_node_hctx
null_blk: set use_per_node_hctx param to false
null_blk: corrections to documentation
null_blk: warning on ignored submit_queues param
null_blk: refactor init and init errors code paths
null_blk: documentation
null_blk: mem garbage on NUMA systems during init
drivers: block: Mark the functions as static in skd_main.c
bcache: New writeback PD controller
bcache: bugfix for race between moving_gc and bucket_invalidate
bcache: fix for gc and writeback race
bcache: bugfix - moving_gc now moves only correct buckets
bcache: fix for gc crashing when no sectors are used
bcache: Fix heap_peek() macro
bcache: Fix for can_attach_cache()
bcache: Fix dirty_data accounting
bcache: Use uninterruptible sleep in writeback
bcache: kthread don't set writeback task to INTERUPTIBLE
block: fix memory leaks on unplugging block device
bcache: fix sparse non static symbol warning
|
|
In the case of both the submit_queues param and use_per_node_hctx param
are used. We limit the number af submit_queues to the number of online
nodes.
If the submit_queues is a multiple of nr_online_nodes, its trivial. Simply map
them to the nodes. For example: 8 submit queues are mapped as node0[0,1],
node1[2,3], ...
If uneven, we are left with an uneven number of submit_queues that must be
mapped. These are mapped toward the first node and onward. E.g. 5
submit queues mapped onto 4 nodes are mapped as node0[0,1], node1[2], ...
Signed-off-by: Matias Bjorling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
The defaults for the module is to instantiate itself with blk-mq and a
submit queue for each CPU node in the system.
To save resources, initialize instead with a single submit queue.
Signed-off-by: Matias Bjorling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Let the user know when the number of submission queues are being
ignored.
Signed-off-by: Matias Bjorling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Simplify the initialization logic of the three block-layers.
- The queue initialization is split into two parts. This allows reuse of
code when initializing the sq-, bio- and mq-based layers.
- Set submit_queues default value to 0 and always set it at init time.
- Simplify the init error code paths.
Signed-off-by: Matias Bjorling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
For NUMA systems, initializing the blk-mq layer and using per node hctx.
We initialize submit queues to 1, while blk-mq nr_hw_queues is
initialized to the number of NUMA nodes.
This makes the null_init_hctx function overwrite memory outside of what
it allocated. In my case it lead to writing garbage into struct
request_queue's mq_map.
Signed-off-by: Matias Bjorling <m@bjorling.me>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Mark functions skd_skmsg_state_to_str() and skd_skreq_state_to_str() as
static in skd_main.c because they are not used outside this file.
This eliminates the following warnings in skd_main.c:
drivers/block/skd_main.c:5272:13: warning: no previous prototype for ‘skd_skmsg_state_to_str’ [-Wmissing-prototypes]
drivers/block/skd_main.c:5284:13: warning: no previous prototype for ‘skd_skreq_state_to_str’ [-Wmissing-prototypes]
Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Sync with Linus' tree to be able to apply fixes on top of newer things
in tree (efi-stub).
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
|
|
For NUMA systems, initializing the blk-mq layer and using per node hctx.
We initialize submit queues to 1, while blk-mq nr_hw_queues is
initialized to the number of NUMA nodes.
This makes the null_init_hctx function overwrite memory outside of what
it allocated. In my case it lead to writing garbage into struct
request_queue's mq_map.
Signed-off-by: Matias Bjorling <m@bjorling.me>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Pull block layer fixes from Jens Axboe:
"A small collection of fixes for the current series. It contains:
- A fix for a use-after-free of a request in blk-mq. From Ming Lei
- A fix for a blk-mq bug that could attempt to dereference a NULL rq
if allocation failed
- Two xen-blkfront small fixes
- Cleanup of submit_bio_wait() type uses in the kernel, unifying
that. From Kent
- A fix for 32-bit blkg_rwstat reading. I apologize for this one
looking mangled in the shortlog, it's entirely my fault for missing
an empty line between the description and body of the text"
* 'for-linus' of git://git.kernel.dk/linux-block:
blk-mq: fix use-after-free of request
blk-mq: fix dereference of rq->mq_ctx if allocation fails
block: xen-blkfront: Fix possible NULL ptr dereference
xen-blkfront: Silence pfn maybe-uninitialized warning
block: submit_bio_wait() conversions
Update of blkg_stat and blkg_rwstat may happen in bh context
|
|
The drivers are long gone but some config escaped the prune
Signed-off-by: Alan Cox <gnomes@lxorguk.ukuu.org.uk>
Resolves-bug: https://bugzilla.kernel.org/show_bug.cgi?id=57221
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip into for-linus
|
|
In the blkif_release function the bdget_disk() call might returns
a NULL ptr which might be dereferenced on bdev->bd_openers checking
Signed-off-by: Felipe Pena <felipensp@gmail.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
[v2: Added WARN per Roger's suggestion]
|
|
pfn cannot actually be used unless (!info->feature_persistent), nor is
pfn accessed in get_grant() unless (!info->feature_persistent), but silence
this warning anyway. gcc-4.8
drivers/block/xen-blkfront.c: In function 'do_blkif_request':
drivers/block/xen-blkfront.c:508:20: warning: 'pfn' may be used uninitialized in this function [-Wmaybe-uninitialized]
gnt_list_entry = get_grant(&gref_head, pfn, info);
^
drivers/block/xen-blkfront.c:492:19: note: 'pfn' was declared here
unsigned long pfn;
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
|
|
Remove the external declarations for m68k_realnum_memory and m68k_memory,
which are already provided by <asm/setup.h>, to avoid conflicts later.
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
|
|
ZTWO_VADDR() converts from physical to virtual I/O addresses, so it should
return "void __iomem *" instead of "unsigned long".
This allows to drop several casts, but requires adding a few casts to
accomodate legacy driver frameworks that store "unsigned long" I/O
addresses.
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
|
|
Remove CONFIG_USE_GENERIC_SMP_HELPERS left by commit 0a06ff068f12
("kernel: remove CONFIG_USE_GENERIC_SMP_HELPERS").
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
It isn't safe to call it without holding the vblk->vq_lock.
Reported-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Shaohua Li <shli@fusionio.com>
Fixed another condition of virtqueue_kick() not holding the lock.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Pull second round of block driver updates from Jens Axboe:
"As mentioned in the original pull request, the bcache bits were pulled
because of their dependency on the immutable bio vecs. Kent re-did
this part and resubmitted it, so here's the 2nd round of (mostly)
driver updates for 3.13. It contains:
- The bcache work from Kent.
- Conversion of virtio-blk to blk-mq. This removes the bio and request
path, and substitutes with the blk-mq path instead. The end result
almost 200 deleted lines. Patch is acked by Asias and Christoph, who
both did a bunch of testing.
- A removal of bootmem.h include from Grygorii Strashko, part of a
larger series of his killing the dependency on that header file.
- Removal of __cpuinit from blk-mq from Paul Gortmaker"
* 'for-linus' of git://git.kernel.dk/linux-block: (56 commits)
virtio_blk: blk-mq support
blk-mq: remove newly added instances of __cpuinit
bcache: defensively handle format strings
bcache: Bypass torture test
bcache: Delete some slower inline asm
bcache: Use ida for bcache block dev minor
bcache: Fix sysfs splat on shutdown with flash only devs
bcache: Better full stripe scanning
bcache: Have btree_split() insert into parent directly
bcache: Move spinlock into struct time_stats
bcache: Kill sequential_merge option
bcache: Kill bch_next_recurse_key()
bcache: Avoid deadlocking in garbage collection
bcache: Incremental gc
bcache: Add make_btree_freeing_key()
bcache: Add btree_node_write_sync()
bcache: PRECEDING_KEY()
bcache: bch_(btree|extent)_ptr_invalid()
bcache: Don't bother with bucket refcount for btree node allocations
bcache: Debug code improvements
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux
Pull virtio updates from Rusty Russell:
"Nothing really exciting: some groundwork for changing virtio endian,
and some robustness fixes for broken virtio devices, plus minor
tweaks"
* tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
virtio_scsi: verify if queue is broken after virtqueue_get_buf()
x86, asmlinkage, lguest: Pass in globals into assembler statement
virtio: mmio: fix signature checking for BE guests
virtio_ring: adapt to notify() returning bool
virtio_net: verify if queue is broken after virtqueue_get_buf()
virtio_console: verify if queue is broken after virtqueue_get_buf()
virtio_blk: verify if queue is broken after virtqueue_get_buf()
virtio_ring: add new function virtqueue_is_broken()
virtio_test: verify if virtqueue_kick() succeeded
virtio_net: verify if virtqueue_kick() succeeded
virtio_ring: let virtqueue_{kick()/notify()} return a bool
virtio_ring: change host notification API
virtio_config: remove virtio_config_val
virtio: use size-based config accessors.
virtio_config: introduce size-based accessors.
virtio_ring: plug kmemleak false positive.
virtio: pm: use CONFIG_PM_SLEEP instead of CONFIG_PM
|
|
Use this new function to make code more comprehensible, since we are
reinitialzing the completion, not initializing.
[akpm@linux-foundation.org: linux-next resyncs]
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Acked-by: Linus Walleij <linus.walleij@linaro.org> (personally at LCE13)
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Switch virtio-blk from the dual support for old-style requests and bios
to use the block-multiqueue.
Acked-by: Asias He <asias@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
|
|
Pull block driver updates from Jens Axboe:
"This is the block driver pull request for 3.13. As with the core pull
request just sent out, this was rebased on top of the core branch
again after the immutable series was pulled. This also means that
bcache gets to sit the initial pull over. I will send a second driver
pull request in the merge window to get those fixes in, once they have
been rebased and tested on top of the non-immutable stack.
This pull request contains:
- Add support for the sTec Kronos pci-e flash card from sTec. Also
has various cleanups for this driver, from myself, Bart, Mike
Snizter, and Wei Yongjun.
- Add surprise removal support for the micron mtip32xx driver from
Micron.
- Floppy documentation fix from Ben Harris.
- debugfs bug fix for pktcdvd from Dan Carpenter.
- Fix for the mtip32xx driver stack usage in the debugfs path,
dynamically allocating those buffers instead. From David Milburn.
- Disable cpqarray in Kconfig. The plan is to remove it on request
of HP, but lets disable it for a few revisions just to see if
anyone yells.
- drbd fixes from Lars Ellenberg and Philipp Reisner.
- Elevator switch fix for the s390 block driver from Heiko Carstens.
- loop crash fix on IO to unassigned device from Mikulas Patocka.
- A series of bug fixes for the IBM rsxx pci-e flash driver from
Philip J Kelleher.
- cciss probe fix from Stephen Cameron.
- Xen block front/back fixes from Roger Pau Monne and Vegard Nossum"
* 'for-3.13/drivers' of git://git.kernel.dk/linux-block: (41 commits)
floppy: Correct documentation of driver options when used as a module.
pktcdvd: debugfs functions return NULL on error
xen-blkfront: restore the non-persistent data path
skd: fix formatting in skd_s1120.h
skd: reorder construct/destruct code
skd: cleanup skd_do_inq_page_da()
skd: remove SKD_OMIT_FROM_SRC_DIST ifdefs
skd: remove redundant skdev->pdev assignment from skd_pci_probe()
skd: use <asm/unaligned.h>
skd: remove SCSI subsystem specific includes
skd: register block device only if some devices are present
skd: fix error messages in skd_init()
skd: fix error paths in skd_init()
skd: fix unregister_blkdev() placement
skd: more removal of bio-based code
skd: cleanup the skd_*() function block wrapping
skd: rip out bio path
skd: fix error return code in skd_pci_probe()
s390/dasd: hold request queue sysfs lock when calling elevator_init()
cciss: return 0 from driver probe function on success, not 1
...
|
|
Pull block IO core updates from Jens Axboe:
"This is the pull request for the core changes in the block layer for
3.13. It contains:
- The new blk-mq request interface.
This is a new and more scalable queueing model that marries the
best part of the request based interface we currently have (which
is fully featured, but scales poorly) and the bio based "interface"
which the new drivers for high IOPS devices end up using because
it's much faster than the request based one.
The bio interface has no block layer support, since it taps into
the stack much earlier. This means that drivers end up having to
implement a lot of functionality on their own, like tagging,
timeout handling, requeue, etc. The blk-mq interface provides all
these. Some drivers even provide a switch to select bio or rq and
has code to handle both, since things like merging only works in
the rq model and hence is faster for some workloads. This is a
huge mess. Conversion of these drivers nets us a substantial code
reduction. Initial results on converting SCSI to this model even
shows an 8x improvement on single queue devices. So while the
model was intended to work on the newer multiqueue devices, it has
substantial improvements for "classic" hardware as well. This code
has gone through extensive testing and development, it's now ready
to go. A pull request is coming to convert virtio-blk to this
model will be will be coming as well, with more drivers scheduled
for 3.14 conversion.
- Two blktrace fixes from Jan and Chen Gang.
- A plug merge fix from Alireza Haghdoost.
- Conversion of __get_cpu_var() from Christoph Lameter.
- Fix for sector_div() with 64-bit divider from Geert Uytterhoeven.
- A fix for a race between request completion and the timeout
handling from Jeff Moyer. This is what caused the merge conflict
with blk-mq/core, in case you are looking at that.
- A dm stacking fix from Mike Snitzer.
- A code consolidation fix and duplicated code removal from Kent
Overstreet.
- A handful of block bug fixes from Mikulas Patocka, fixing a loop
crash and memory corruption on blk cg.
- Elevator switch bug fix from Tomoki Sekiyama.
A heads-up that I had to rebase this branch. Initially the immutable
bio_vecs had been queued up for inclusion, but a week later, it became
clear that it wasn't fully cooked yet. So the decision was made to
pull this out and postpone it until 3.14. It was a straight forward
rebase, just pruning out the immutable series and the later fixes of
problems with it. The rest of the patches applied directly and no
further changes were made"
* 'for-3.13/core' of git://git.kernel.dk/linux-block: (31 commits)
block: replace IS_ERR and PTR_ERR with PTR_ERR_OR_ZERO
block: replace IS_ERR and PTR_ERR with PTR_ERR_OR_ZERO
block: Do not call sector_div() with a 64-bit divisor
kernel: trace: blktrace: remove redundent memcpy() in compat_blk_trace_setup()
block: Consolidate duplicated bio_trim() implementations
block: Use rw_copy_check_uvector()
block: Enable sysfs nomerge control for I/O requests in the plug list
block: properly stack underlying max_segment_size to DM device
elevator: acquire q->sysfs_lock in elevator_change()
elevator: Fix a race in elevator switching and md device initialization
block: Replace __get_cpu_var uses
bdi: test bdi_init failure
block: fix a probe argument to blk_register_region
loop: fix crash if blk_alloc_queue fails
blk-core: Fix memory corruption if blkcg_init_queue fails
block: fix race between request completion and timeout handling
blktrace: Send BLK_TN_PROCESS events to all running traces
blk-mq: don't disallow request merges for req->special being set
blk-mq: mq plug list breakage
blk-mq: fix for flush deadlock
...
|
|
Pull DMA mask updates from Russell King:
"This series cleans up the handling of DMA masks in a lot of drivers,
fixing some bugs as we go.
Some of the more serious errors include:
- drivers which only set their coherent DMA mask if the attempt to
set the streaming mask fails.
- drivers which test for a NULL dma mask pointer, and then set the
dma mask pointer to a location in their module .data section -
which will cause problems if the module is reloaded.
To counter these, I have introduced two helper functions:
- dma_set_mask_and_coherent() takes care of setting both the
streaming and coherent masks at the same time, with the correct
error handling as specified by the API.
- dma_coerce_mask_and_coherent() which resolves the problem of
drivers forcefully setting DMA masks. This is more a marker for
future work to further clean these locations up - the code which
creates the devices really should be initialising these, but to fix
that in one go along with this change could potentially be very
disruptive.
The last thing this series does is prise away some of Linux's addition
to "DMA addresses are physical addresses and RAM always starts at
zero". We have ARM LPAE systems where all system memory is above 4GB
physical, hence having DMA masks interpreted by (eg) the block layers
as describing physical addresses in the range 0..DMAMASK fails on
these platforms. Santosh Shilimkar addresses this in this series; the
patches were copied to the appropriate people multiple times but were
ignored.
Fixing this also gets rid of some ARM weirdness in the setup of the
max*pfn variables, and brings ARM into line with every other Linux
architecture as far as those go"
* 'for-linus-dma-masks' of git://git.linaro.org/people/rmk/linux-arm: (52 commits)
ARM: 7805/1: mm: change max*pfn to include the physical offset of memory
ARM: 7797/1: mmc: Use dma_max_pfn(dev) helper for bounce_limit calculations
ARM: 7796/1: scsi: Use dma_max_pfn(dev) helper for bounce_limit calculations
ARM: 7795/1: mm: dma-mapping: Add dma_max_pfn(dev) helper function
ARM: 7794/1: block: Rename parameter dma_mask to max_addr for blk_queue_bounce_limit()
ARM: DMA-API: better handing of DMA masks for coherent allocations
ARM: 7857/1: dma: imx-sdma: setup dma mask
DMA-API: firmware/google/gsmi.c: avoid direct access to DMA masks
DMA-API: dcdbas: update DMA mask handing
DMA-API: dma: edma.c: no need to explicitly initialize DMA masks
DMA-API: usb: musb: use platform_device_register_full() to avoid directly messing with dma masks
DMA-API: crypto: remove last references to 'static struct device *dev'
DMA-API: crypto: fix ixp4xx crypto platform device support
DMA-API: others: use dma_set_coherent_mask()
DMA-API: staging: use dma_set_coherent_mask()
DMA-API: usb: use new dma_coerce_mask_and_coherent()
DMA-API: usb: use dma_set_coherent_mask()
DMA-API: parport: parport_pc.c: use dma_coerce_mask_and_coherent()
DMA-API: net: octeon: use dma_coerce_mask_and_coherent()
DMA-API: net: nxp/lpc_eth: use dma_coerce_mask_and_coherent()
...
|
|
My static checker complains correctly that this is potential NULL
dereference because debugfs functions return NULL on error. They return
an ERR_PTR if they are configured out.
We don't need to check for ERR_PTR because if debugfs is stubbed out the
dummy functions won't complain about that. We don't need to check the
values before calling debugfs_remove() because that accepts ERR_PTRs and
NULL pointers.
We don't need to set pkt->dfs_f_info to NULL in pkt_debugfs_dev_new()
because it was initialized with kzalloc() so I have removed that.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
|
|
When persistent grants were added they were always used, even if the
backend doesn't have this feature (there's no harm in always using the
same set of pages). This restores the old data path when the backend
doesn't have persistent grants, removing the burden of doing a memcpy
when it is not actually needed.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reported-by: Felipe Franciosi <felipe.franciosi@citrix.com>
Cc: Felipe Franciosi <felipe.franciosi@citrix.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
[v2: Fix up whitespace issues]
|
|
Cc: Akhil Bhansali <abhansali@stec-inc.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|