summaryrefslogtreecommitdiffstats
path: root/drivers/virtio
AgeCommit message (Collapse)Author
2008-05-30virtio: force callback on empty.Rusty Russell
virtio allows drivers to suppress callbacks (ie. interrupts) for efficiency (no locking, it's just an optimization). There's a similar mechanism for the host to suppress notifications coming from the guest: in that case, we ignore the suppression if the ring is completely full. It turns out that life is simpler if the host similarly ignores callback suppression when the ring is completely empty: the network driver wants to free up old packets in a timely manner, and otherwise has to use a timer to poll. We have to remove the code which ignores interrupts when the driver has disabled them (again, it had no locking and hence was unreliable anyway). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-05-30virtio_net: another race with virtio_net and enable_cbChristian Borntraeger
Hello Rusty, seems that we still have a problem with virtio_net and the enable_cb callback. During a long running network stress tests with virtio and got the following oops: ------------[ cut here ]------------ kernel BUG at drivers/virtio/virtio_ring.c:230! illegal operation: 0001 [#1] SMP Modules linked in: CPU: 0 Not tainted 2.6.26-rc2-kvm-00436-gc94c08b-dirty #34 Process netserver (pid: 2582, task: 000000000fbc4c68, ksp: 000000000f42b990) Krnl PSW : 0704c00180000000 00000000002d0ec8 (vring_enable_cb+0x1c/0x60) R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 EA:3 Krnl GPRS: 0000000000000000 0000000000000000 000000000ef3d000 0000000010009800 0000000000000000 0000000000419ce0 0000000000000080 000000000000007b 000000000adb5538 000000000ef40900 000000000ef40000 000000000ef40920 0000000000000000 0000000000000005 000000000029c1b0 000000000fea7d18 Krnl Code: 00000000002d0ebc: a7110001 tmll %r1,1 00000000002d0ec0: a7740004 brc 7,2d0ec8 00000000002d0ec4: a7f40001 brc 15,2d0ec6 >00000000002d0ec8: a517fffe nill %r1,65534 00000000002d0ecc: 40103000 sth %r1,0(%r3) 00000000002d0ed0: 07f0 bcr 15,%r0 00000000002d0ed2: e31020380004 lg %r1,56(%r2) 00000000002d0ed8: a7480000 lhi %r4,0 Call Trace: ([<000000000029c0fc>] virtnet_poll+0x290/0x3b8) [<0000000000333fb8>] net_rx_action+0x9c/0x1b8 [<00000000001394bc>] __do_softirq+0x74/0x108 [<000000000010d16a>] do_softirq+0x92/0xac [<0000000000139826>] irq_exit+0x72/0xc8 [<000000000010a7b6>] do_extint+0xe2/0x104 [<0000000000110508>] ext_no_vtime+0x16/0x1a Last Breaking-Event-Address: [<00000000002d0ec4>] vring_enable_cb+0x18/0x60 I looked into the virtio_net code for some time and I think the following scenario happened. Please look at virtnet_poll: [...] /* Out of packets? */ if (received < budget) { netif_rx_complete(vi->dev, napi); if (unlikely(!vi->rvq->vq_ops->enable_cb(vi->rvq)) && napi_schedule_prep(napi)) { vi->rvq->vq_ops->disable_cb(vi->rvq); __netif_rx_schedule(vi->dev, napi); goto again; } } If an interrupt arrives after netif_rx_complete, a second poll routine can run on a different cpu. The second check for napi_schedule_prep would prevent any harm in the network stack, but we have called enable_cb possibly after the disable_cb in skb_recv_done. static void skb_recv_done(struct virtqueue *rvq) { struct virtnet_info *vi = rvq->vdev->priv; /* Schedule NAPI, Suppress further interrupts if successful. */ if (netif_rx_schedule_prep(vi->dev, &vi->napi)) { rvq->vq_ops->disable_cb(rvq); __netif_rx_schedule(vi->dev, &vi->napi); } } That means that the second poll routine runs with interrupts enabled, which is ok, since we can handle additional interrupts. The problem is now that the second poll routine might also call enable_cb, triggering the BUG. The only solution I can come up with, is to remove the BUG statement in enable_cb - similar to disable_cb. Opinions or better ideas where the oops could come from? Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-05-30virtio: set device index in common code.Rusty Russell
Anthony Liguori points out that three different transports use the virtio code, but each one keeps its own counter to set the virtio_device's index field. In theory (though not in current practice) this means that names could be duplicated, and that risk grows as more transports are created. So we move the selection of the unique virtio_device.index into the common code in virtio.c, which has the side-benefit of removing duplicate code. The only complexity is that lguest and S/390 use the index to uniquely identify the device in case of catastrophic failure before register_virtio_device() is called: now we use the offset within the descriptor page as a unique identifier for the printks. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Carsten Otte <cotte@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Chris Lalancette <clalance@redhat.com> Cc: Anthony Liguori <anthony@codemonkey.ws>
2008-05-30virtio: virtio_pci should not set bus_id.Rusty Russell
The common virtio code sets the bus_id, overriding anything virtio_pci sets anyway. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Carsten Otte <cotte@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Chris Lalancette <clalance@redhat.com> Cc: Anthony Liguori <anthony@codemonkey.ws>
2008-05-30virtio: bus_id for devices should contain 'virtio'Rusty Russell
Chris Lalancette <clalance@redhat.com> points out that virtio.c sets all device names to '0', '1', etc, which looks silly in /proc/interrupts. We change this from '%d' to 'virtio%d'. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Carsten Otte <cotte@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Chris Lalancette <clalance@redhat.com> Cc: Anthony Liguori <anthony@codemonkey.ws>
2008-05-02virtio: explicit advertisement of driver featuresRusty Russell
A recent proposed feature addition to the virtio block driver revealed some flaws in the API: in particular, we assume that feature negotiation is complete once a driver's probe function returns. There is nothing in the API to require this, however, and even I didn't notice when it was violated. So instead, we require the driver to specify what features it supports in a table, we can then move the feature negotiation into the virtio core. The intersection of device and driver features are presented in a new 'features' bitmap in the struct virtio_device. Note that this highlights the difference between Linux unsigned-long bitmaps where each unsigned long is in native endian, and a straight-forward little-endian array of bytes. Drivers can still remove feature bits in their probe routine if they really have to. API changes: - dev->config->feature() no longer gets and acks a feature. - drivers should advertise their features in the 'feature_table' field - use virtio_has_feature() for extra sanity when checking feature bits Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-05-02virtio: change config to guest endian.Rusty Russell
A recent proposed feature addition to the virtio block driver revealed some flaws in the API, in particular how easy it is to break big endian machines. The virtio config space was originally chosen to be little-endian, because we thought the config might be part of the PCI config space for virtio_pci. It's actually a separate mmio region, so that argument holds little water; as only x86 is currently using the virtio mechanism, we can change this (but must do so now, before the impending s390 merge). API changes: - __virtio_config_val() just becomes a striaght vdev->config_get() call. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-05-02virtio: fix sparse return void-valued expression warningsHarvey Harrison
drivers/virtio/virtio_pci.c:148:2: warning: returning void-valued expression drivers/virtio/virtio_pci.c:155:2: warning: returning void-valued expression Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-05-02virtio: ignore corrupted virtqueues rather than spinning.Rusty Russell
A corrupt virtqueue (caused by the other end screwing up) can have strange results such as a driver spinning: just bail when we try to get a buffer from a known-broken queue. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-04-07virtio: remove overzealous BUG_ON.Rusty Russell
The 'disable_cb' callback is designed as an optimization to tell the host we don't need callbacks now. As it is not reliable, the debug check is overzealous: it can happen on two CPUs at the same time. Document this. Even if it were reliable, the virtio_net driver doesn't disable callbacks on transmit so the START_USE/END_USE debugging reentrance protection can be easily tripped even on UP. Thanks to Balaji Rao for the bug report and testing. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> CC: Balaji Rao <balajirrao@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-03-30virtio_pci iomem annotationsAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-03-28virtio_pci: unregister virtio device at device removeAnthony Liguori
Make sure to call unregister_virtio_device() when a virtio device is removed. Otherwise, virtio_pci.ko cannot be rmmod'd. This was spotted by Marcelo Tosatti. Signed-off-by: Anthony Liguori <aliguori@us.ibm.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-03-17virtio: fix race in enable_cbChristian Borntraeger
There is a race in virtio_net, dealing with disabling/enabling the callback. I saw the following oops: kernel BUG at /space/kvm/drivers/virtio/virtio_ring.c:218! illegal operation: 0001 [#1] SMP Modules linked in: sunrpc dm_mod CPU: 2 Not tainted 2.6.25-rc1zlive-host-10623-gd358142-dirty #99 Process swapper (pid: 0, task: 000000000f85a610, ksp: 000000000f873c60) Krnl PSW : 0404300180000000 00000000002b81a6 (vring_disable_cb+0x16/0x20) R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:0 CC:3 PM:0 EA:3 Krnl GPRS: 0000000000000001 0000000000000001 0000000010005800 0000000000000001 000000000f3a0900 000000000f85a610 0000000000000000 0000000000000000 0000000000000000 000000000f870000 0000000000000000 0000000000001237 000000000f3a0920 000000000010ff74 00000000002846f6 000000000fa0bcd8 Krnl Code: 00000000002b819a: a7110001 tmll %r1,1 00000000002b819e: a7840004 brc 8,2b81a6 00000000002b81a2: a7f40001 brc 15,2b81a4 >00000000002b81a6: a51b0001 oill %r1,1 00000000002b81aa: 40102000 sth %r1,0(%r2) 00000000002b81ae: 07fe bcr 15,%r14 00000000002b81b0: eb7ff0380024 stmg %r7,%r15,56(%r15) 00000000002b81b6: a7f13e00 tmll %r15,15872 Call Trace: ([<000000000fa0bcd0>] 0xfa0bcd0) [<00000000002b8350>] vring_interrupt+0x5c/0x6c [<000000000010ab08>] do_extint+0xb8/0xf0 [<0000000000110716>] ext_no_vtime+0x16/0x1a [<0000000000107e72>] cpu_idle+0x1c2/0x1e0 The problem can be triggered with a high amount of host->guest traffic. I think its the following race: poll says netif_rx_complete poll calls enable_cb enable_cb opens the interrupt mask a new packet comes, an interrupt is triggered----\ enable_cb sees that there is more work | enable_cb disables the interrupt | . V . interrupt is delivered . skb_recv_done does atomic napi test, ok some waiting disable_cb is called->check fails->bang! . poll would do napi check poll would do disable_cb The fix is to let enable_cb not disable the interrupt again, but expect the caller to do the cleanup if it returns false. In that case, the interrupt is only disabled, if the napi test_set_bit was successful. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (cleaned up doco)
2008-03-17virtio: handle > 2 billion page balloon targetsRusty Russell
If the host asks for a huge target towards_target() can overflow, and we up oops as we try to release more pages than we have. The simple fix is to use a 64-bit value. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-03-17virtio: Use spin_lock_irqsave/restore for virtio-pciAnthony Liguori
virtio-pci acquires its spin lock in an interrupt context so it's necessary to use spin_lock_irqsave/restore variants. This patch fixes guest SMP when using virtio devices in KVM. Signed-off-by: Anthony Liguori <aliguori@us.ibm.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-02-06virtio: add missing #include <linux/delay.h>Johann Felix Soden
Include linux/delay.h to fix compiler error: drivers/virtio/virtio_balloon.c: In function 'fill_balloon': drivers/virtio/virtio_balloon.c:98: error: implicit declaration of function 'msleep' Signed-off-by: Johann Felix Soden <johfel@users.sourceforge.net> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-04virtio: balloon driverRusty Russell
After discussions with Anthony Liguori, it seems that the virtio balloon can be made even simpler. Here's my attempt. The device configuration tells the driver how much memory it should take from the guest (ie. balloon size). The guest feeds the page numbers it has taken via one virtqueue. A second virtqueue feeds the page numbers the driver wants back: if the device has the VIRTIO_BALLOON_F_MUST_TELL_HOST bit, then this queue is compulsory, otherwise it's advisory (and the guest can simply fault the pages back in). This driver can be enhanced later to deflate the balloon via a shrinker, oom callback or we could even go for a complete set of in-guest regulators. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-02-04virtio: Use PCI revision field to indicate virtio PCI ABI versionAnthony Liguori
As Avi pointed out, as we continue to massage the virtio PCI ABI, we can make things a little more friendly to users by utilizing the PCI revision field to indicate which version of the ABI we're using. This is a hard ABI version and incrementing it will cause the guest driver to break. This is the necessary changes to virtio_pci to support this. Signed-off-by: Anthony Liguori <aliguori@us.ibm.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-02-04virtio: PCI deviceAnthony Liguori
This is a PCI device that implements a transport for virtio. It allows virtio devices to be used by QEMU based VMMs like KVM or Xen. Signed-off-by: Anthony Liguori <aliguori@us.ibm.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-02-04virtio: Allow virtio to be modular and used by modulesRusty Russell
This is needed for the virtio PCI device to be compiled as a module. Signed-off-by: Anthony Liguori <aliguori@us.ibm.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-02-04virtio: Use the sg_phys convenience function.Rusty Russell
Simple cleanup. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-02-04virtio: handle interrupts after callbacks turned offRusty Russell
Anthony Liguori found double interrupt suppression in the virtio_net driver, triggered by two skb_recv_done's in a row. This is because virtio_ring's interrupt suppression is a best-effort optimization: it contains no synchronization so the host can miss it and still send interrupts. But it's certainly nicer for virtio users if calling disable_cb actually disables callbacks, so we check for the race in the interrupt routine. Note: SMP guests might require syncronization here, but since disable_cb is actually called from interrupt context, there has to be some form of synchronization before the next same interrupt handler is called (Linux guarantees that the same device's irq handler will never run simultanously on multiple CPUs). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-02-04virtio: reset functionRusty Russell
A reset function solves three problems: 1) It allows us to renegotiate features, eg. if we want to upgrade a guest driver without rebooting the guest. 2) It gives us a clean way of shutting down virtqueues: after a reset, we know that the buffers won't be used by the host, and 3) It helps the guest recover from messed-up drivers. So we remove the ->shutdown hook, and the only way we now remove feature bits is via reset. We leave it to the driver to do the reset before it deletes queues: the balloon driver, for example, needs to chat to the host in its remove function. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-02-04virtio: clarify NO_NOTIFY flag usageRusty Russell
The other side (host) can set the NO_NOTIFY flag as an optimization, to say "no need to kick me when you add things". Make it clear that this is advisory only; especially that we should always notify when the ring is full. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-02-04virtio: explicit enable_cb/disable_cb rather than callback return.Rusty Russell
It seems that virtio_net wants to disable callbacks (interrupts) before calling netif_rx_schedule(), so we can't use the return value to do so. Rename "restart" to "cb_enable" and introduce "cb_disable" hook: callback now returns void, rather than a boolean. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-02-04virtio: simplify config mechanism.Rusty Russell
Previously we used a type/len pair within the config space, but this seems overkill. We now simply define a structure which represents the layout in the config space: the config space can now only be extended at the end. The main driver-visible changes: 1) We indicate what fields are present with an explicit feature bit. 2) Virtqueues are explicitly numbered, and not in the config space. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2007-11-19virtio: fix module/device unloadingRusty Russell
The virtio code never hooked through the ->remove callback. Although noone supports device removal at the moment, this code is already needed for module unloading. This of course also revealed bugs in virtio_blk, virtio_net and lguest unloading paths. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2007-11-12virtio: Force use of power-of-two for descriptor ring sizesRusty Russell
The virtio descriptor rings of size N-1 were nicely set up to be aligned to an N-byte boundary. But as Anthony Liguori points out, the free-running indices used by virtio require that the sizes be a power of 2, otherwise we get problems on wrap (demonstrated with lguest). So we replace the clever "2^n-1" scheme with a simple "align to page boundary" scheme: this means that all virtio rings take at least two pages, but it's safer than guessing cache alignment. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2007-11-12virtio: Fix used_idx wrap-aroundAnthony Liguori
The more_used() function compares the vq->vring.used->idx with last_used_idx. Since vq->vring.used->idx is a 16-bit integer, and last_used_idx is an unsigned int, this results in unpredictable behavior when vq->vring.used->idx wraps around. This patch corrects this by changing last_used_idx to the correct type. Signed-off-by: Anthony Liguori <aliguori@us.ibm.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2007-10-23Virtio helper routines for a descriptor ringbuffer implementationRusty Russell
These helper routines supply most of the virtqueue_ops for hypervisors which want to use a ring for virtio. Unlike the previous lguest implementation: 1) The rings are variable sized (2^n-1 elements). 2) They have an unfortunate limit of 65535 bytes per sg element. 3) The page numbers are always 64 bit (PAE anyone?) 4) They no longer place used[] on a separate page, just a separate cacheline. 5) We do a modulo on a variable. We could be tricky if we cared. 6) Interrupts and notifies are suppressed using flags within the rings. Users need only get the ring pages and provide a notify hook (KVM wants the guest to allocate the rings, lguest does it sanely). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Dor Laor <dor.laor@qumranet.com>
2007-10-23Module autoprobing support for virtio drivers.Rusty Russell
This adds the logic to convert the virtio ids into module aliases, and includes a modalias entry in sysfs and the env var to make probing work. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2007-10-23Virtio interfaceRusty Russell
This attempts to implement a "virtual I/O" layer which should allow common drivers to be efficiently used across most virtual I/O mechanisms. It will no-doubt need further enhancement. The virtio drivers add buffers to virtio queues; as the buffers are consumed the driver "interrupt" callbacks are invoked. There is also a generic implementation of config space which drivers can query to get setup information from the host. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Dor Laor <dor.laor@qumranet.com> Cc: Arnd Bergmann <arnd@arndb.de>