summaryrefslogtreecommitdiffstats
path: root/Documentation
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/Changes15
-rw-r--r--Documentation/SubmittingPatches76
-rw-r--r--Documentation/arm/Samsung-S3C24XX/GPIO.txt10
-rw-r--r--Documentation/development-process/5.Posting31
-rw-r--r--Documentation/feature-removal-schedule.txt10
-rw-r--r--Documentation/filesystems/debugfs.txt158
-rw-r--r--Documentation/i2c/busses/i2c-ocores17
-rw-r--r--Documentation/ide/ide.txt2
-rw-r--r--Documentation/kernel-parameters.txt7
-rw-r--r--Documentation/lguest/Makefile3
-rw-r--r--Documentation/lguest/lguest.c1008
-rw-r--r--Documentation/lguest/lguest.txt1
-rw-r--r--Documentation/power/devices.txt34
-rw-r--r--Documentation/sound/alsa/ALSA-Configuration.txt36
-rw-r--r--Documentation/sound/alsa/HD-Audio-Models.txt18
-rw-r--r--Documentation/sound/alsa/Procfile.txt36
-rw-r--r--Documentation/sound/alsa/README.maya44163
-rw-r--r--Documentation/sound/alsa/soc/dapm.txt1
-rw-r--r--Documentation/x86/x86_64/boot-options.txt44
-rw-r--r--Documentation/x86/x86_64/machinecheck8
20 files changed, 955 insertions, 723 deletions
diff --git a/Documentation/Changes b/Documentation/Changes
index b95082be4d5..d21b3b5aa54 100644
--- a/Documentation/Changes
+++ b/Documentation/Changes
@@ -48,6 +48,7 @@ o procps 3.2.0 # ps --version
o oprofile 0.9 # oprofiled --version
o udev 081 # udevinfo -V
o grub 0.93 # grub --version
+o mcelog 0.6
Kernel compilation
==================
@@ -276,6 +277,16 @@ before running exportfs or mountd. It is recommended that all NFS
services be protected from the internet-at-large by a firewall where
that is possible.
+mcelog
+------
+
+In Linux 2.6.31+ the i386 kernel needs to run the mcelog utility
+as a regular cronjob similar to the x86-64 kernel to process and log
+machine check events when CONFIG_X86_NEW_MCE is enabled. Machine check
+events are errors reported by the CPU. Processing them is strongly encouraged.
+All x86-64 kernels since 2.6.4 require the mcelog utility to
+process machine checks.
+
Getting updated software
========================
@@ -365,6 +376,10 @@ FUSE
----
o <http://sourceforge.net/projects/fuse>
+mcelog
+------
+o <ftp://ftp.kernel.org/pub/linux/utils/cpu/mce/mcelog/>
+
Networking
**********
diff --git a/Documentation/SubmittingPatches b/Documentation/SubmittingPatches
index f309d3c6221..6c456835c1f 100644
--- a/Documentation/SubmittingPatches
+++ b/Documentation/SubmittingPatches
@@ -91,6 +91,10 @@ Be as specific as possible. The WORST descriptions possible include
things like "update driver X", "bug fix for driver X", or "this patch
includes updates for subsystem X. Please apply."
+The maintainer will thank you if you write your patch description in a
+form which can be easily pulled into Linux's source code management
+system, git, as a "commit log". See #15, below.
+
If your description starts to get long, that's a sign that you probably
need to split up your patch. See #3, next.
@@ -405,7 +409,14 @@ person it names. This tag documents that potentially interested parties
have been included in the discussion
-14) Using Tested-by: and Reviewed-by:
+14) Using Reported-by:, Tested-by: and Reviewed-by:
+
+If this patch fixes a problem reported by somebody else, consider adding a
+Reported-by: tag to credit the reporter for their contribution. Please
+note that this tag should not be added without the reporter's permission,
+especially if the problem was not reported in a public forum. That said,
+if we diligently credit our bug reporters, they will, hopefully, be
+inspired to help us again in the future.
A Tested-by: tag indicates that the patch has been successfully tested (in
some environment) by the person named. This tag informs maintainers that
@@ -444,7 +455,7 @@ offer a Reviewed-by tag for a patch. This tag serves to give credit to
reviewers and to inform maintainers of the degree of review which has been
done on the patch. Reviewed-by: tags, when supplied by reviewers known to
understand the subject area and to perform thorough reviews, will normally
-increase the liklihood of your patch getting into the kernel.
+increase the likelihood of your patch getting into the kernel.
15) The canonical patch format
@@ -485,12 +496,33 @@ phrase" should not be a filename. Do not use the same "summary
phrase" for every patch in a whole patch series (where a "patch
series" is an ordered sequence of multiple, related patches).
-Bear in mind that the "summary phrase" of your email becomes
-a globally-unique identifier for that patch. It propagates
-all the way into the git changelog. The "summary phrase" may
-later be used in developer discussions which refer to the patch.
-People will want to google for the "summary phrase" to read
-discussion regarding that patch.
+Bear in mind that the "summary phrase" of your email becomes a
+globally-unique identifier for that patch. It propagates all the way
+into the git changelog. The "summary phrase" may later be used in
+developer discussions which refer to the patch. People will want to
+google for the "summary phrase" to read discussion regarding that
+patch. It will also be the only thing that people may quickly see
+when, two or three months later, they are going through perhaps
+thousands of patches using tools such as "gitk" or "git log
+--oneline".
+
+For these reasons, the "summary" must be no more than 70-75
+characters, and it must describe both what the patch changes, as well
+as why the patch might be necessary. It is challenging to be both
+succinct and descriptive, but that is what a well-written summary
+should do.
+
+The "summary phrase" may be prefixed by tags enclosed in square
+brackets: "Subject: [PATCH tag] <summary phrase>". The tags are not
+considered part of the summary phrase, but describe how the patch
+should be treated. Common tags might include a version descriptor if
+the multiple versions of the patch have been sent out in response to
+comments (i.e., "v1, v2, v3"), or "RFC" to indicate a request for
+comments. If there are four patches in a patch series the individual
+patches may be numbered like this: 1/4, 2/4, 3/4, 4/4. This assures
+that developers understand the order in which the patches should be
+applied and that they have reviewed or applied all of the patches in
+the patch series.
A couple of example Subjects:
@@ -510,19 +542,31 @@ the patch author in the changelog.
The explanation body will be committed to the permanent source
changelog, so should make sense to a competent reader who has long
since forgotten the immediate details of the discussion that might
-have led to this patch.
+have led to this patch. Including symptoms of the failure which the
+patch addresses (kernel log messages, oops messages, etc.) is
+especially useful for people who might be searching the commit logs
+looking for the applicable patch. If a patch fixes a compile failure,
+it may not be necessary to include _all_ of the compile failures; just
+enough that it is likely that someone searching for the patch can find
+it. As in the "summary phrase", it is important to be both succinct as
+well as descriptive.
The "---" marker line serves the essential purpose of marking for patch
handling tools where the changelog message ends.
One good use for the additional comments after the "---" marker is for
-a diffstat, to show what files have changed, and the number of inserted
-and deleted lines per file. A diffstat is especially useful on bigger
-patches. Other comments relevant only to the moment or the maintainer,
-not suitable for the permanent changelog, should also go here.
-Use diffstat options "-p 1 -w 70" so that filenames are listed from the
-top of the kernel source tree and don't use too much horizontal space
-(easily fit in 80 columns, maybe with some indentation).
+a diffstat, to show what files have changed, and the number of
+inserted and deleted lines per file. A diffstat is especially useful
+on bigger patches. Other comments relevant only to the moment or the
+maintainer, not suitable for the permanent changelog, should also go
+here. A good example of such comments might be "patch changelogs"
+which describe what has changed between the v1 and v2 version of the
+patch.
+
+If you are going to include a diffstat after the "---" marker, please
+use diffstat options "-p 1 -w 70" so that filenames are listed from
+the top of the kernel source tree and don't use too much horizontal
+space (easily fit in 80 columns, maybe with some indentation).
See more details on the proper patch format in the following
references.
diff --git a/Documentation/arm/Samsung-S3C24XX/GPIO.txt b/Documentation/arm/Samsung-S3C24XX/GPIO.txt
index ea7ccfc4b27..948c8718d96 100644
--- a/Documentation/arm/Samsung-S3C24XX/GPIO.txt
+++ b/Documentation/arm/Samsung-S3C24XX/GPIO.txt
@@ -51,7 +51,7 @@ PIN Numbers
-----------
Each pin has an unique number associated with it in regs-gpio.h,
- eg S3C2410_GPA0 or S3C2410_GPF1. These defines are used to tell
+ eg S3C2410_GPA(0) or S3C2410_GPF(1). These defines are used to tell
the GPIO functions which pin is to be used.
@@ -65,11 +65,11 @@ Configuring a pin
Eg:
- s3c2410_gpio_cfgpin(S3C2410_GPA0, S3C2410_GPA0_ADDR0);
- s3c2410_gpio_cfgpin(S3C2410_GPE8, S3C2410_GPE8_SDDAT1);
+ s3c2410_gpio_cfgpin(S3C2410_GPA(0), S3C2410_GPA0_ADDR0);
+ s3c2410_gpio_cfgpin(S3C2410_GPE(8), S3C2410_GPE8_SDDAT1);
- which would turn GPA0 into the lowest Address line A0, and set
- GPE8 to be connected to the SDIO/MMC controller's SDDAT1 line.
+ which would turn GPA(0) into the lowest Address line A0, and set
+ GPE(8) to be connected to the SDIO/MMC controller's SDDAT1 line.
Reading the current configuration
diff --git a/Documentation/development-process/5.Posting b/Documentation/development-process/5.Posting
index dd48132a74d..f622c1e9f0f 100644
--- a/Documentation/development-process/5.Posting
+++ b/Documentation/development-process/5.Posting
@@ -119,7 +119,7 @@ which takes quite a bit of time and thought after the "real work" has been
done. When done properly, though, it is time well spent.
-5.4: PATCH FORMATTING
+5.4: PATCH FORMATTING AND CHANGELOGS
So now you have a perfect series of patches for posting, but the work is
not done quite yet. Each patch needs to be formatted into a message which
@@ -146,8 +146,33 @@ that end, each patch will be composed of the following:
- One or more tag lines, with, at a minimum, one Signed-off-by: line from
the author of the patch. Tags will be described in more detail below.
-The above three items should, normally, be the text used when committing
-the change to a revision control system. They are followed by:
+The items above, together, form the changelog for the patch. Writing good
+changelogs is a crucial but often-neglected art; it's worth spending
+another moment discussing this issue. When writing a changelog, you should
+bear in mind that a number of different people will be reading your words.
+These include subsystem maintainers and reviewers who need to decide
+whether the patch should be included, distributors and other maintainers
+trying to decide whether a patch should be backported to other kernels, bug
+hunters wondering whether the patch is responsible for a problem they are
+chasing, users who want to know how the kernel has changed, and more. A
+good changelog conveys the needed information to all of these people in the
+most direct and concise way possible.
+
+To that end, the summary line should describe the effects of and motivation
+for the change as well as possible given the one-line constraint. The
+detailed description can then amplify on those topics and provide any
+needed additional information. If the patch fixes a bug, cite the commit
+which introduced the bug if possible. If a problem is associated with
+specific log or compiler output, include that output to help others
+searching for a solution to the same problem. If the change is meant to
+support other changes coming in later patch, say so. If internal APIs are
+changed, detail those changes and how other developers should respond. In
+general, the more you can put yourself into the shoes of everybody who will
+be reading your changelog, the better that changelog (and the kernel as a
+whole) will be.
+
+Needless to say, the changelog should be the text used when committing the
+change to a revision control system. It will be followed by:
- The patch itself, in the unified ("-u") patch format. Using the "-p"
option to diff will associate function names with changes, making the
diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt
index de491a3e231..ec9ef5d0d7b 100644
--- a/Documentation/feature-removal-schedule.txt
+++ b/Documentation/feature-removal-schedule.txt
@@ -437,3 +437,13 @@ Why: Superseded by tdfxfb. I2C/DDC support used to live in a separate
driver but this caused driver conflicts.
Who: Jean Delvare <khali@linux-fr.org>
Krzysztof Helt <krzysztof.h1@wp.pl>
+
+----------------------------
+
+What: CONFIG_X86_OLD_MCE
+When: 2.6.32
+Why: Remove the old legacy 32bit machine check code. This has been
+ superseded by the newer machine check code from the 64bit port,
+ but the old version has been kept around for easier testing. Note this
+ doesn't impact the old P5 and WinChip machine check handlers.
+Who: Andi Kleen <andi@firstfloor.org>
diff --git a/Documentation/filesystems/debugfs.txt b/Documentation/filesystems/debugfs.txt
new file mode 100644
index 00000000000..ed52af60c2d
--- /dev/null
+++ b/Documentation/filesystems/debugfs.txt
@@ -0,0 +1,158 @@
+Copyright 2009 Jonathan Corbet <corbet@lwn.net>
+
+Debugfs exists as a simple way for kernel developers to make information
+available to user space. Unlike /proc, which is only meant for information
+about a process, or sysfs, which has strict one-value-per-file rules,
+debugfs has no rules at all. Developers can put any information they want
+there. The debugfs filesystem is also intended to not serve as a stable
+ABI to user space; in theory, there are no stability constraints placed on
+files exported there. The real world is not always so simple, though [1];
+even debugfs interfaces are best designed with the idea that they will need
+to be maintained forever.
+
+Debugfs is typically mounted with a command like:
+
+ mount -t debugfs none /sys/kernel/debug
+
+(Or an equivalent /etc/fstab line).
+
+Note that the debugfs API is exported GPL-only to modules.
+
+Code using debugfs should include <linux/debugfs.h>. Then, the first order
+of business will be to create at least one directory to hold a set of
+debugfs files:
+
+ struct dentry *debugfs_create_dir(const char *name, struct dentry *parent);
+
+This call, if successful, will make a directory called name underneath the
+indicated parent directory. If parent is NULL, the directory will be
+created in the debugfs root. On success, the return value is a struct
+dentry pointer which can be used to create files in the directory (and to
+clean it up at the end). A NULL return value indicates that something went
+wrong. If ERR_PTR(-ENODEV) is returned, that is an indication that the
+kernel has been built without debugfs support and none of the functions
+described below will work.
+
+The most general way to create a file within a debugfs directory is with:
+
+ struct dentry *debugfs_create_file(const char *name, mode_t mode,
+ struct dentry *parent, void *data,
+ const struct file_operations *fops);
+
+Here, name is the name of the file to create, mode describes the access
+permissions the file should have, parent indicates the directory which
+should hold the file, data will be stored in the i_private field of the
+resulting inode structure, and fops is a set of file operations which
+implement the file's behavior. At a minimum, the read() and/or write()
+operations should be provided; others can be included as needed. Again,
+the return value will be a dentry pointer to the created file, NULL for
+error, or ERR_PTR(-ENODEV) if debugfs support is missing.
+
+In a number of cases, the creation of a set of file operations is not
+actually necessary; the debugfs code provides a number of helper functions
+for simple situations. Files containing a single integer value can be
+created with any of:
+
+ struct dentry *debugfs_create_u8(const char *name, mode_t mode,
+ struct dentry *parent, u8 *value);
+ struct dentry *debugfs_create_u16(const char *name, mode_t mode,
+ struct dentry *parent, u16 *value);
+ struct dentry *debugfs_create_u32(const char *name, mode_t mode,
+ struct dentry *parent, u32 *value);
+ struct dentry *debugfs_create_u64(const char *name, mode_t mode,
+ struct dentry *parent, u64 *value);
+
+These files support both reading and writing the given value; if a specific
+file should not be written to, simply set the mode bits accordingly. The
+values in these files are in decimal; if hexadecimal is more appropriate,
+the following functions can be used instead:
+
+ struct dentry *debugfs_create_x8(const char *name, mode_t mode,
+ struct dentry *parent, u8 *value);
+ struct dentry *debugfs_create_x16(const char *name, mode_t mode,
+ struct dentry *parent, u16 *value);
+ struct dentry *debugfs_create_x32(const char *name, mode_t mode,
+ struct dentry *parent, u32 *value);
+
+Note that there is no debugfs_create_x64().
+
+These functions are useful as long as the developer knows the size of the
+value to be exported. Some types can have different widths on different
+architectures, though, complicating the situation somewhat. There is a
+function meant to help out in one special case:
+
+ struct dentry *debugfs_create_size_t(const char *name, mode_t mode,
+ struct dentry *parent,
+ size_t *value);
+
+As might be expected, this function will create a debugfs file to represent
+a variable of type size_t.
+
+Boolean values can be placed in debugfs with:
+
+ struct dentry *debugfs_create_bool(const char *name, mode_t mode,
+ struct dentry *parent, u32 *value);
+
+A read on the resulting file will yield either Y (for non-zero values) or
+N, followed by a newline. If written to, it will accept either upper- or
+lower-case values, or 1 or 0. Any other input will be silently ignored.
+
+Finally, a block of arbitrary binary data can be exported with:
+
+ struct debugfs_blob_wrapper {
+ void *data;
+ unsigned long size;
+ };
+
+ struct dentry *debugfs_create_blob(const char *name, mode_t mode,
+ struct dentry *parent,
+ struct debugfs_blob_wrapper *blob);
+
+A read of this file will return the data pointed to by the
+debugfs_blob_wrapper structure. Some drivers use "blobs" as a simple way
+to return several lines of (static) formatted text output. This function
+can be used to export binary information, but there does not appear to be
+any code which does so in the mainline. Note that all files created with
+debugfs_create_blob() are read-only.
+
+There are a couple of other directory-oriented helper functions:
+
+ struct dentry *debugfs_rename(struct dentry *old_dir,
+ struct dentry *old_dentry,
+ struct dentry *new_dir,
+ const char *new_name);
+
+ struct dentry *debugfs_create_symlink(const char *name,
+ struct dentry *parent,
+ const char *target);
+
+A call to debugfs_rename() will give a new name to an existing debugfs
+file, possibly in a different directory. The new_name must not exist prior
+to the call; the return value is old_dentry with updated information.
+Symbolic links can be created with debugfs_create_symlink().
+
+There is one important thing that all debugfs users must take into account:
+there is no automatic cleanup of any directories created in debugfs. If a
+module is unloaded without explicitly removing debugfs entries, the result
+will be a lot of stale pointers and no end of highly antisocial behavior.
+So all debugfs users - at least those which can be built as modules - must
+be prepared to remove all files and directories they create there. A file
+can be removed with:
+
+ void debugfs_remove(struct dentry *dentry);
+
+The dentry value can be NULL, in which case nothing will be removed.
+
+Once upon a time, debugfs users were required to remember the dentry
+pointer for every debugfs file they created so that all files could be
+cleaned up. We live in more civilized times now, though, and debugfs users
+can call:
+
+ void debugfs_remove_recursive(struct dentry *dentry);
+
+If this function is passed a pointer for the dentry corresponding to the
+top-level directory, the entire hierarchy below that directory will be
+removed.
+
+Notes:
+ [1] http://lwn.net/Articles/309298/
diff --git a/Documentation/i2c/busses/i2c-ocores b/Documentation/i2c/busses/i2c-ocores
index cfcebb10d14..c269aaa2f26 100644
--- a/Documentation/i2c/busses/i2c-ocores
+++ b/Documentation/i2c/busses/i2c-ocores
@@ -20,6 +20,8 @@ platform_device with the base address and interrupt number. The
dev.platform_data of the device should also point to a struct
ocores_i2c_platform_data (see linux/i2c-ocores.h) describing the
distance between registers and the input clock speed.
+There is also a possibility to attach a list of i2c_board_info which
+the i2c-ocores driver will add to the bus upon creation.
E.G. something like:
@@ -36,9 +38,24 @@ static struct resource ocores_resources[] = {
},
};
+/* optional board info */
+struct i2c_board_info ocores_i2c_board_info[] = {
+ {
+ I2C_BOARD_INFO("tsc2003", 0x48),
+ .platform_data = &tsc2003_platform_data,
+ .irq = TSC_IRQ
+ },
+ {
+ I2C_BOARD_INFO("adv7180", 0x42 >> 1),
+ .irq = ADV_IRQ
+ }
+};
+
static struct ocores_i2c_platform_data myi2c_data = {
.regstep = 2, /* two bytes between registers */
.clock_khz = 50000, /* input clock of 50MHz */
+ .devices = ocores_i2c_board_info, /* optional table of devices */
+ .num_devices = ARRAY_SIZE(ocores_i2c_board_info), /* table size */
};
static struct platform_device myi2c = {
diff --git a/Documentation/ide/ide.txt b/Documentation/ide/ide.txt
index 0c78f4b1d9d..e77bebfa7b0 100644
--- a/Documentation/ide/ide.txt
+++ b/Documentation/ide/ide.txt
@@ -216,6 +216,8 @@ Other kernel parameters for ide_core are:
* "noflush=[interface_number.device_number]" to disable flush requests
+* "nohpa=[interface_number.device_number]" to disable Host Protected Area
+
* "noprobe=[interface_number.device_number]" to skip probing
* "nowerr=[interface_number.device_number]" to ignore the WRERR_STAT bit
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 7bcdebffdab..0bf8a882ee9 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -887,11 +887,8 @@ and is between 256 and 4096 characters. It is defined in the file
ide-core.nodma= [HW] (E)IDE subsystem
Format: =0.0 to prevent dma on hda, =0.1 hdb =1.0 hdc
- .vlb_clock .pci_clock .noflush .noprobe .nowerr .cdrom
- .chs .ignore_cable are additional options
- See Documentation/ide/ide.txt.
-
- idebus= [HW] (E)IDE subsystem - VLB/PCI bus speed
+ .vlb_clock .pci_clock .noflush .nohpa .noprobe .nowerr
+ .cdrom .chs .ignore_cable are additional options
See Documentation/ide/ide.txt.
ide-pci-generic.all-generic-ide [HW] (E)IDE subsystem
diff --git a/Documentation/lguest/Makefile b/Documentation/lguest/Makefile
index 1f4f9e888bd..28c8cdfcafd 100644
--- a/Documentation/lguest/Makefile
+++ b/Documentation/lguest/Makefile
@@ -1,6 +1,5 @@
# This creates the demonstration utility "lguest" which runs a Linux guest.
-CFLAGS:=-Wall -Wmissing-declarations -Wmissing-prototypes -O3 -I../../include -I../../arch/x86/include -U_FORTIFY_SOURCE
-LDLIBS:=-lz
+CFLAGS:=-m32 -Wall -Wmissing-declarations -Wmissing-prototypes -O3 -I../../include -I../../arch/x86/include -U_FORTIFY_SOURCE
all: lguest
diff --git a/Documentation/lguest/lguest.c b/Documentation/lguest/lguest.c
index d36fcc0f271..9ebcd6ef361 100644
--- a/Documentation/lguest/lguest.c
+++ b/Documentation/lguest/lguest.c
@@ -16,6 +16,7 @@
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/wait.h>
+#include <sys/eventfd.h>
#include <fcntl.h>
#include <stdbool.h>
#include <errno.h>
@@ -59,7 +60,6 @@ typedef uint8_t u8;
/*:*/
#define PAGE_PRESENT 0x7 /* Present, RW, Execute */
-#define NET_PEERNUM 1
#define BRIDGE_PFX "bridge:"
#ifndef SIOCBRADDIF
#define SIOCBRADDIF 0x89a2 /* add interface to bridge */
@@ -76,19 +76,12 @@ static bool verbose;
do { if (verbose) printf(args); } while(0)
/*:*/
-/* File descriptors for the Waker. */
-struct {
- int pipe[2];
- int lguest_fd;
-} waker_fds;
-
/* The pointer to the start of guest memory. */
static void *guest_base;
/* The maximum guest physical address allowed, and maximum possible. */
static unsigned long guest_limit, guest_max;
-/* The pipe for signal hander to write to. */
-static int timeoutpipe[2];
-static unsigned int timeout_usec = 500;
+/* The /dev/lguest file descriptor. */
+static int lguest_fd;
/* a per-cpu variable indicating whose vcpu is currently running */
static unsigned int __thread cpu_id;
@@ -96,11 +89,6 @@ static unsigned int __thread cpu_id;
/* This is our list of devices. */
struct device_list
{
- /* Summary information about the devices in our list: ready to pass to
- * select() to ask which need servicing.*/
- fd_set infds;
- int max_infd;
-
/* Counter to assign interrupt numbers. */
unsigned int next_irq;
@@ -126,22 +114,21 @@ struct device
/* The linked-list pointer. */
struct device *next;
- /* The this device's descriptor, as mapped into the Guest. */
+ /* The device's descriptor, as mapped into the Guest. */
struct lguest_device_desc *desc;
+ /* We can't trust desc values once Guest has booted: we use these. */
+ unsigned int feature_len;
+ unsigned int num_vq;
+
/* The name of this device, for --verbose. */
const char *name;
- /* If handle_input is set, it wants to be called when this file
- * descriptor is ready. */
- int fd;
- bool (*handle_input)(int fd, struct device *me);
-
/* Any queues attached to this device */
struct virtqueue *vq;
- /* Handle status being finalized (ie. feature bits stable). */
- void (*ready)(struct device *me);
+ /* Is it operational */
+ bool running;
/* Device-specific data. */
void *priv;
@@ -164,22 +151,28 @@ struct virtqueue
/* Last available index we saw. */
u16 last_avail_idx;
- /* The routine to call when the Guest pings us, or timeout. */
- void (*handle_output)(int fd, struct virtqueue *me, bool timeout);
+ /* How many are used since we sent last irq? */
+ unsigned int pending_used;
- /* Outstanding buffers */
- unsigned int inflight;
+ /* Eventfd where Guest notifications arrive. */
+ int eventfd;
- /* Is this blocked awaiting a timer? */
- bool blocked;
+ /* Function for the thread which is servicing this virtqueue. */
+ void (*service)(struct virtqueue *vq);
+ pid_t thread;
};
/* Remember the arguments to the program so we can "reboot" */
static char **main_args;
-/* Since guest is UP and we don't run at the same time, we don't need barriers.
- * But I include them in the code in case others copy it. */
-#define wmb()
+/* The original tty settings to restore on exit. */
+static struct termios orig_term;
+
+/* We have to be careful with barriers: our devices are all run in separate
+ * threads and so we need to make sure that changes visible to the Guest happen
+ * in precise order. */
+#define wmb() __asm__ __volatile__("" : : : "memory")
+#define mb() __asm__ __volatile__("" : : : "memory")
/* Convert an iovec element to the given type.
*
@@ -245,7 +238,7 @@ static void iov_consume(struct iovec iov[], unsigned num_iov, unsigned len)
static u8 *get_feature_bits(struct device *dev)
{
return (u8 *)(dev->desc + 1)
- + dev->desc->num_vq * sizeof(struct lguest_vqconfig);
+ + dev->num_vq * sizeof(struct lguest_vqconfig);
}
/*L:100 The Launcher code itself takes us out into userspace, that scary place
@@ -505,99 +498,19 @@ static void concat(char *dst, char *args[])
* saw the arguments it expects when we looked at initialize() in lguest_user.c:
* the base of Guest "physical" memory, the top physical page to allow and the
* entry point for the Guest. */
-static int tell_kernel(unsigned long start)
+static void tell_kernel(unsigned long start)
{
unsigned long args[] = { LHREQ_INITIALIZE,
(unsigned long)guest_base,
guest_limit / getpagesize(), start };
- int fd;
-
verbose("Guest: %p - %p (%#lx)\n",
guest_base, guest_base + guest_limit, guest_limit);
- fd = open_or_die("/dev/lguest", O_RDWR);
- if (write(fd, args, sizeof(args)) < 0)
+ lguest_fd = open_or_die("/dev/lguest", O_RDWR);
+ if (write(lguest_fd, args, sizeof(args)) < 0)
err(1, "Writing to /dev/lguest");
-
- /* We return the /dev/lguest file descriptor to control this Guest */
- return fd;
}
/*:*/
-static void add_device_fd(int fd)
-{
- FD_SET(fd, &devices.infds);
- if (fd > devices.max_infd)
- devices.max_infd = fd;
-}
-
-/*L:200
- * The Waker.
- *
- * With console, block and network devices, we can have lots of input which we
- * need to process. We could try to tell the kernel what file descriptors to
- * watch, but handing a file descriptor mask through to the kernel is fairly
- * icky.
- *
- * Instead, we clone off a thread which watches the file descriptors and writes
- * the LHREQ_BREAK command to the /dev/lguest file descriptor to tell the Host
- * stop running the Guest. This causes the Launcher to return from the
- * /dev/lguest read with -EAGAIN, where it will write to /dev/lguest to reset
- * the LHREQ_BREAK and wake us up again.
- *
- * This, of course, is merely a different *kind* of icky.
- *
- * Given my well-known antipathy to threads, I'd prefer to use processes. But
- * it's easier to share Guest memory with threads, and trivial to share the
- * devices.infds as the Launcher changes it.
- */
-static int waker(void *unused)
-{
- /* Close the write end of the pipe: only the Launcher has it open. */
- close(waker_fds.pipe[1]);
-
- for (;;) {
- fd_set rfds = devices.infds;
- unsigned long args[] = { LHREQ_BREAK, 1 };
- unsigned int maxfd = devices.max_infd;
-
- /* We also listen to the pipe from the Launcher. */
- FD_SET(waker_fds.pipe[0], &rfds);
- if (waker_fds.pipe[0] > maxfd)
- maxfd = waker_fds.pipe[0];
-
- /* Wait until input is ready from one of the devices. */
- select(maxfd+1, &rfds, NULL, NULL, NULL);
-
- /* Message from Launcher? */
- if (FD_ISSET(waker_fds.pipe[0], &rfds)) {
- char c;
- /* If this fails, then assume Launcher has exited.
- * Don't do anything on exit: we're just a thread! */
- if (read(waker_fds.pipe[0], &c, 1) != 1)
- _exit(0);
- continue;
- }
-
- /* Send LHREQ_BREAK command to snap the Launcher out of it. */
- pwrite(waker_fds.lguest_fd, args, sizeof(args), cpu_id);
- }
- return 0;
-}
-
-/* This routine just sets up a pipe to the Waker process. */
-static void setup_waker(int lguest_fd)
-{
- /* This pipe is closed when Launcher dies, telling Waker. */
- if (pipe(waker_fds.pipe) != 0)
- err(1, "Creating pipe for Waker");
-
- /* Waker also needs to know the lguest fd */
- waker_fds.lguest_fd = lguest_fd;
-
- if (clone(waker, malloc(4096) + 4096, CLONE_VM | SIGCHLD, NULL) == -1)
- err(1, "Creating Waker");
-}
-
/*
* Device Handling.
*
@@ -623,49 +536,90 @@ static void *_check_pointer(unsigned long addr, unsigned int size,
/* Each buffer in the virtqueues is actually a chain of descriptors. This
* function returns the next descriptor in the chain, or vq->vring.num if we're
* at the end. */
-static unsigned next_desc(struct virtqueue *vq, unsigned int i)
+static unsigned next_desc(struct vring_desc *desc,
+ unsigned int i, unsigned int max)
{
unsigned int next;
/* If this descriptor says it doesn't chain, we're done. */
- if (!(vq->vring.desc[i].flags & VRING_DESC_F_NEXT))
- return vq->vring.num;
+ if (!(desc[i].flags & VRING_DESC_F_NEXT))
+ return max;
/* Check they're not leading us off end of descriptors. */
- next = vq->vring.desc[i].next;
+ next = desc[i].next;
/* Make sure compiler knows to grab that: we don't want it changing! */
wmb();
- if (next >= vq->vring.num)
+ if (next >= max)
errx(1, "Desc next is %u", next);
return next;
}
+/* This actually sends the interrupt for this virtqueue */
+static void trigger_irq(struct virtqueue *vq)
+{
+ unsigned long buf[] = { LHREQ_IRQ, vq->config.irq };
+
+ /* Don't inform them if nothing used. */
+ if (!vq->pending_used)
+ return;
+ vq->pending_used = 0;
+
+ /* If they don't want an interrupt, don't send one, unless empty. */
+ if ((vq->vring.avail->flags & VRING_AVAIL_F_NO_INTERRUPT)
+ && lg_last_avail(vq) != vq->vring.avail->idx)
+ return;
+
+ /* Send the Guest an interrupt tell them we used something up. */
+ if (write(lguest_fd, buf, sizeof(buf)) != 0)
+ err(1, "Triggering irq %i", vq->config.irq);
+}
+
/* This looks in the virtqueue and for the first available buffer, and converts
* it to an iovec for convenient access. Since descriptors consist of some
* number of output then some number of input descriptors, it's actually two
* iovecs, but we pack them into one and note how many of each there were.
*
- * This function returns the descriptor number found, or vq->vring.num (which
- * is never a valid descriptor number) if none was found. */
-static unsigned get_vq_desc(struct virtqueue *vq,
- struct iovec iov[],
- unsigned int *out_num, unsigned int *in_num)
+ * This function returns the descriptor number found. */
+static unsigned wait_for_vq_desc(struct virtqueue *vq,
+ struct iovec iov[],
+ unsigned int *out_num, unsigned int *in_num)
{
- unsigned int i, head;
- u16 last_avail;
+ unsigned int i, head, max;
+ struct vring_desc *desc;
+ u16 last_avail = lg_last_avail(vq);
+
+ while (last_avail == vq->vring.avail->idx) {
+ u64 event;
+
+ /* OK, tell Guest about progress up to now. */
+ trigger_irq(vq);
+
+ /* OK, now we need to know about added descriptors. */
+ vq->vring.used->flags &= ~VRING_USED_F_NO_NOTIFY;
+
+ /* They could have slipped one in as we were doing that: make
+ * sure it's written, then check again. */
+ mb();
+ if (last_avail != vq->vring.avail->idx) {
+ vq->vring.used->flags |= VRING_USED_F_NO_NOTIFY;
+ break;
+ }
+
+ /* Nothing new? Wait for eventfd to tell us they refilled. */
+ if (read(vq->eventfd, &event, sizeof(event)) != sizeof(event))
+ errx(1, "Event read failed?");
+
+ /* We don't need to be notified again. */
+ vq->vring.used->flags |= VRING_USED_F_NO_NOTIFY;
+ }
/* Check it isn't doing very strange things with descriptor numbers. */
- last_avail = lg_last_avail(vq);
if ((u16)(vq->vring.avail->idx - last_avail) > vq->vring.num)
errx(1, "Guest moved used index from %u to %u",
last_avail, vq->vring.avail->idx);
- /* If there's nothing new since last we looked, return invalid. */
- if (vq->vring.avail->idx == last_avail)
- return vq->vring.num;
-
/* Grab the next descriptor number they're advertising, and increment
* the index we've seen. */
head = vq->vring.avail->ring[last_avail % vq->vring.num];
@@ -678,15 +632,28 @@ static unsigned get_vq_desc(struct virtqueue *vq,
/* When we start there are none of either input nor output. */
*out_num = *in_num = 0;
+ max = vq->vring.num;
+ desc = vq->vring.desc;
i = head;
+
+ /* If this is an indirect entry, then this buffer contains a descriptor
+ * table which we handle as if it's any normal descriptor chain. */
+ if (desc[i].flags & VRING_DESC_F_INDIRECT) {
+ if (desc[i].len % sizeof(struct vring_desc))
+ errx(1, "Invalid size for indirect buffer table");
+
+ max = desc[i].len / sizeof(struct vring_desc);
+ desc = check_pointer(desc[i].addr, desc[i].len);
+ i = 0;
+ }
+
do {
/* Grab the first descriptor, and check it's OK. */
- iov[*out_num + *in_num].iov_len = vq->vring.desc[i].len;
+ iov[*out_num + *in_num].iov_len = desc[i].len;
iov[*out_num + *in_num].iov_base
- = check_pointer(vq->vring.desc[i].addr,
- vq->vring.desc[i].len);
+ = check_pointer(desc[i].addr, desc[i].len);
/* If this is an input descriptor, increment that count. */
- if (vq->vring.desc[i].flags & VRING_DESC_F_WRITE)
+ if (desc[i].flags & VRING_DESC_F_WRITE)
(*in_num)++;
else {
/* If it's an output descriptor, they're all supposed
@@ -697,11 +664,10 @@ static unsigned get_vq_desc(struct virtqueue *vq,
}
/* If we've got too many, that implies a descriptor loop. */
- if (*out_num + *in_num > vq->vring.num)
+ if (*out_num + *in_num > max)
errx(1, "Looped descriptor");
- } while ((i = next_desc(vq, i)) != vq->vring.num);
+ } while ((i = next_desc(desc, i, max)) != max);
- vq->inflight++;
return head;
}
@@ -719,44 +685,20 @@ static void add_used(struct virtqueue *vq, unsigned int head, int len)
/* Make sure buffer is written before we update index. */
wmb();
vq->vring.used->idx++;
- vq->inflight--;
-}
-
-/* This actually sends the interrupt for this virtqueue */
-static void trigger_irq(int fd, struct virtqueue *vq)
-{
- unsigned long buf[] = { LHREQ_IRQ, vq->config.irq };
-
- /* If they don't want an interrupt, don't send one, unless empty. */
- if ((vq->vring.avail->flags & VRING_AVAIL_F_NO_INTERRUPT)
- && vq->inflight)
- return;
-
- /* Send the Guest an interrupt tell them we used something up. */
- if (write(fd, buf, sizeof(buf)) != 0)
- err(1, "Triggering irq %i", vq->config.irq);
+ vq->pending_used++;
}
/* And here's the combo meal deal. Supersize me! */
-static void add_used_and_trigger(int fd, struct virtqueue *vq,
- unsigned int head, int len)
+static void add_used_and_trigger(struct virtqueue *vq, unsigned head, int len)
{
add_used(vq, head, len);
- trigger_irq(fd, vq);
+ trigger_irq(vq);
}
/*
* The Console
*
- * Here is the input terminal setting we save, and the routine to restore them
- * on exit so the user gets their terminal back. */
-static struct termios orig_term;
-static void restore_term(void)
-{
- tcsetattr(STDIN_FILENO, TCSANOW, &orig_term);
-}
-
-/* We associate some data with the console for our exit hack. */
+ * We associate some data with the console for our exit hack. */
struct console_abort
{
/* How many times have they hit ^C? */
@@ -766,276 +708,275 @@ struct console_abort
};
/* This is the routine which handles console input (ie. stdin). */
-static bool handle_console_input(int fd, struct device *dev)
+static void console_input(struct virtqueue *vq)
{
int len;
unsigned int head, in_num, out_num;
- struct iovec iov[dev->vq->vring.num];
- struct console_abort *abort = dev->priv;
-
- /* First we need a console buffer from the Guests's input virtqueue. */
- head = get_vq_desc(dev->vq, iov, &out_num, &in_num);
-
- /* If they're not ready for input, stop listening to this file
- * descriptor. We'll start again once they add an input buffer. */
- if (head == dev->vq->vring.num)
- return false;
+ struct console_abort *abort = vq->dev->priv;
+ struct iovec iov[vq->vring.num];
+ /* Make sure there's a descriptor waiting. */
+ head = wait_for_vq_desc(vq, iov, &out_num, &in_num);
if (out_num)
errx(1, "Output buffers in console in queue?");
- /* This is why we convert to iovecs: the readv() call uses them, and so
- * it reads straight into the Guest's buffer. */
- len = readv(dev->fd, iov, in_num);
+ /* Read it in. */
+ len = readv(STDIN_FILENO, iov, in_num);
if (len <= 0) {
- /* This implies that the console is closed, is /dev/null, or
- * something went terribly wrong. */
+ /* Ran out of input? */
warnx("Failed to get console input, ignoring console.");
- /* Put the input terminal back. */
- restore_term();
- /* Remove callback from input vq, so it doesn't restart us. */
- dev->vq->handle_output = NULL;
- /* Stop listening to this fd: don't call us again. */
- return false;
+ /* For simplicity, dying threads kill the whole Launcher. So
+ * just nap here. */
+ for (;;)
+ pause();
}
- /* Tell the Guest about the new input. */
- add_used_and_trigger(fd, dev->vq, head, len);
+ add_used_and_trigger(vq, head, len);
/* Three ^C within one second? Exit.
*
- * This is such a hack, but works surprisingly well. Each ^C has to be
- * in a buffer by itself, so they can't be too fast. But we check that
- * we get three within about a second, so they can't be too slow. */
- if (len == 1 && ((char *)iov[0].iov_base)[0] == 3) {
- if (!abort->count++)
- gettimeofday(&abort->start, NULL);
- else if (abort->count == 3) {
- struct timeval now;
- gettimeofday(&now, NULL);
- if (now.tv_sec <= abort->start.tv_sec+1) {
- unsigned long args[] = { LHREQ_BREAK, 0 };
- /* Close the fd so Waker will know it has to
- * exit. */
- close(waker_fds.pipe[1]);
- /* Just in case Waker is blocked in BREAK, send
- * unbreak now. */
- write(fd, args, sizeof(args));
- exit(2);
- }
- abort->count = 0;
- }
- } else
- /* Any other key resets the abort counter. */
+ * This is such a hack, but works surprisingly well. Each ^C has to
+ * be in a buffer by itself, so they can't be too fast. But we check
+ * that we get three within about a second, so they can't be too
+ * slow. */
+ if (len != 1 || ((char *)iov[0].iov_base)[0] != 3) {
abort->count = 0;
+ return;
+ }
- /* Everything went OK! */
- return true;
+ abort->count++;
+ if (abort->count == 1)
+ gettimeofday(&abort->start, NULL);
+ else if (abort->count == 3) {
+ struct timeval now;
+ gettimeofday(&now, NULL);
+ /* Kill all Launcher processes with SIGINT, like normal ^C */
+ if (now.tv_sec <= abort->start.tv_sec+1)
+ kill(0, SIGINT);
+ abort->count = 0;
+ }
}
-/* Handling output for console is simple: we just get all the output buffers
- * and write them to stdout. */
-static void handle_console_output(int fd, struct virtqueue *vq, bool timeout)
+/* This is the routine which handles console output (ie. stdout). */
+static void console_output(struct virtqueue *vq)
{
unsigned int head, out, in;
- int len;
struct iovec iov[vq->vring.num];
- /* Keep getting output buffers from the Guest until we run out. */
- while ((head = get_vq_desc(vq, iov, &out, &in)) != vq->vring.num) {
- if (in)
- errx(1, "Input buffers in output queue?");
- len = writev(STDOUT_FILENO, iov, out);
- add_used_and_trigger(fd, vq, head, len);
+ head = wait_for_vq_desc(vq, iov, &out, &in);
+ if (in)
+ errx(1, "Input buffers in console output queue?");
+ while (!iov_empty(iov, out)) {
+ int len = writev(STDOUT_FILENO, iov, out);
+ if (len <= 0)
+ err(1, "Write to stdout gave %i", len);
+ iov_consume(iov, out, len);
}
-}
-
-/* This is called when we no longer want to hear about Guest changes to a
- * virtqueue. This is more efficient in high-traffic cases, but it means we
- * have to set a timer to check if any more changes have occurred. */
-static void block_vq(struct virtqueue *vq)
-{
- struct itimerval itm;
-
- vq->vring.used->flags |= VRING_USED_F_NO_NOTIFY;
- vq->blocked = true;
-
- itm.it_interval.tv_sec = 0;
- itm.it_interval.tv_usec = 0;
- itm.it_value.tv_sec = 0;
- itm.it_value.tv_usec = timeout_usec;
-
- setitimer(ITIMER_REAL, &itm, NULL);
+ add_used(vq, head, 0);
}
/*
* The Network
*
* Handling output for network is also simple: we get all the output buffers
- * and write them (ignoring the first element) to this device's file descriptor
- * (/dev/net/tun).
+ * and write them to /dev/net/tun.
*/
-static void handle_net_output(int fd, struct virtqueue *vq, bool timeout)
+struct net_info {
+ int tunfd;
+};
+
+static void net_output(struct virtqueue *vq)
{
- unsigned int head, out, in, num = 0;
- int len;
+ struct net_info *net_info = vq->dev->priv;
+ unsigned int head, out, in;
struct iovec iov[vq->vring.num];
- static int last_timeout_num;
-
- /* Keep getting output buffers from the Guest until we run out. */
- while ((head = get_vq_desc(vq, iov, &out, &in)) != vq->vring.num) {
- if (in)
- errx(1, "Input buffers in output queue?");
- len = writev(vq->dev->fd, iov, out);
- if (len < 0)
- err(1, "Writing network packet to tun");
- add_used_and_trigger(fd, vq, head, len);
- num++;
- }
- /* Block further kicks and set up a timer if we saw anything. */
- if (!timeout && num)
- block_vq(vq);
-
- /* We never quite know how long should we wait before we check the
- * queue again for more packets. We start at 500 microseconds, and if
- * we get fewer packets than last time, we assume we made the timeout
- * too small and increase it by 10 microseconds. Otherwise, we drop it
- * by one microsecond every time. It seems to work well enough. */
- if (timeout) {
- if (num < last_timeout_num)
- timeout_usec += 10;
- else if (timeout_usec > 1)
- timeout_usec--;
- last_timeout_num = num;
- }
+ head = wait_for_vq_desc(vq, iov, &out, &in);
+ if (in)
+ errx(1, "Input buffers in net output queue?");
+ if (writev(net_info->tunfd, iov, out) < 0)
+ errx(1, "Write to tun failed?");
+ add_used(vq, head, 0);
+}
+
+/* Will reading from this file descriptor block? */
+static bool will_block(int fd)
+{
+ fd_set fdset;
+ struct timeval zero = { 0, 0 };
+ FD_ZERO(&fdset);
+ FD_SET(fd, &fdset);
+ return select(fd+1, &fdset, NULL, NULL, &zero) != 1;
}
-/* This is where we handle a packet coming in from the tun device to our
+/* This is where we handle packets coming in from the tun device to our
* Guest. */
-static bool handle_tun_input(int fd, struct device *dev)
+static void net_input(struct virtqueue *vq)
{
- unsigned int head, in_num, out_num;
int len;
- struct iovec iov[dev->vq->vring.num];
-
- /* First we need a network buffer from the Guests's recv virtqueue. */
- head = get_vq_desc(dev->vq, iov, &out_num, &in_num);
- if (head == dev->vq->vring.num) {
- /* Now, it's expected that if we try to send a packet too
- * early, the Guest won't be ready yet. Wait until the device
- * status says it's ready. */
- /* FIXME: Actually want DRIVER_ACTIVE here. */
-
- /* Now tell it we want to know if new things appear. */
- dev->vq->vring.used->flags &= ~VRING_USED_F_NO_NOTIFY;
- wmb();
-
- /* We'll turn this back on if input buffers are registered. */
- return false;
- } else if (out_num)
- errx(1, "Output buffers in network recv queue?");
-
- /* Read the packet from the device directly into the Guest's buffer. */
- len = readv(dev->fd, iov, in_num);
- if (len <= 0)
- err(1, "reading network");
+ unsigned int head, out, in;
+ struct iovec iov[vq->vring.num];
+ struct net_info *net_info = vq->dev->priv;
- /* Tell the Guest about the new packet. */
- add_used_and_trigger(fd, dev->vq, head, len);
+ head = wait_for_vq_desc(vq, iov, &out, &in);
+ if (out)
+ errx(1, "Output buffers in net input queue?");
- verbose("tun input packet len %i [%02x %02x] (%s)\n", len,
- ((u8 *)iov[1].iov_base)[0], ((u8 *)iov[1].iov_base)[1],
- head != dev->vq->vring.num ? "sent" : "discarded");
+ /* Deliver interrupt now, since we're about to sleep. */
+ if (vq->pending_used && will_block(net_info->tunfd))
+ trigger_irq(vq);
- /* All good. */
- return true;
+ len = readv(net_info->tunfd, iov, in);
+ if (len <= 0)
+ err(1, "Failed to read from tun.");
+ add_used(vq, head, len);
}
-/*L:215 This is the callback attached to the network and console input
- * virtqueues: it ensures we try again, in case we stopped console or net
- * delivery because Guest didn't have any buffers. */
-static void enable_fd(int fd, struct virtqueue *vq, bool timeout)
+/* This is the helper to create threads. */
+static int do_thread(void *_vq)
{
- add_device_fd(vq->dev->fd);
- /* Snap the Waker out of its select loop. */
- write(waker_fds.pipe[1], "", 1);
+ struct virtqueue *vq = _vq;
+
+ for (;;)
+ vq->service(vq);
+ return 0;
}
-static void net_enable_fd(int fd, struct virtqueue *vq, bool timeout)
+/* When a child dies, we kill our entire process group with SIGTERM. This
+ * also has the side effect that the shell restores the console for us! */
+static void kill_launcher(int signal)
{
- /* We don't need to know again when Guest refills receive buffer. */
- vq->vring.used->flags |= VRING_USED_F_NO_NOTIFY;
- enable_fd(fd, vq, timeout);
+ kill(0, SIGTERM);
}
-/* When the Guest tells us they updated the status field, we handle it. */
-static void update_device_status(struct device *dev)
+static void reset_device(struct device *dev)
{
struct virtqueue *vq;
- /* This is a reset. */
- if (dev->desc->status == 0) {
- verbose("Resetting device %s\n", dev->name);
+ verbose("Resetting device %s\n", dev->name);
- /* Clear any features they've acked. */
- memset(get_feature_bits(dev) + dev->desc->feature_len, 0,
- dev->desc->feature_len);
+ /* Clear any features they've acked. */
+ memset(get_feature_bits(dev) + dev->feature_len, 0, dev->feature_len);
- /* Zero out the virtqueues. */
- for (vq = dev->vq; vq; vq = vq->next) {
- memset(vq->vring.desc, 0,
- vring_size(vq->config.num, LGUEST_VRING_ALIGN));
- lg_last_avail(vq) = 0;
+ /* We're going to be explicitly killing threads, so ignore them. */
+ signal(SIGCHLD, SIG_IGN);
+
+ /* Zero out the virtqueues, get rid of their threads */
+ for (vq = dev->vq; vq; vq = vq->next) {
+ if (vq->thread != (pid_t)-1) {
+ kill(vq->thread, SIGTERM);
+ waitpid(vq->thread, NULL, 0);
+ vq->thread = (pid_t)-1;
}
- } else if (dev->desc->status & VIRTIO_CONFIG_S_FAILED) {
+ memset(vq->vring.desc, 0,
+ vring_size(vq->config.num, LGUEST_VRING_ALIGN));
+ lg_last_avail(vq) = 0;
+ }
+ dev->running = false;
+
+ /* Now we care if threads die. */
+ signal(SIGCHLD, (void *)kill_launcher);
+}
+
+static void create_thread(struct virtqueue *vq)
+{
+ /* Create stack for thread and run it. Since stack grows
+ * upwards, we point the stack pointer to the end of this
+ * region. */
+ char *stack = malloc(32768);
+ unsigned long args[] = { LHREQ_EVENTFD,
+ vq->config.pfn*getpagesize(), 0 };
+
+ /* Create a zero-initialized eventfd. */
+ vq->eventfd = eventfd(0, 0);
+ if (vq->eventfd < 0)
+ err(1, "Creating eventfd");
+ args[2] = vq->eventfd;
+
+ /* Attach an eventfd to this virtqueue: it will go off
+ * when the Guest does an LHCALL_NOTIFY for this vq. */
+ if (write(lguest_fd, &args, sizeof(args)) != 0)
+ err(1, "Attaching eventfd");
+
+ /* CLONE_VM: because it has to access the Guest memory, and
+ * SIGCHLD so we get a signal if it dies. */
+ vq->thread = clone(do_thread, stack + 32768, CLONE_VM | SIGCHLD, vq);
+ if (vq->thread == (pid_t)-1)
+ err(1, "Creating clone");
+ /* We close our local copy, now the child has it. */
+ close(vq->eventfd);
+}
+
+static void start_device(struct device *dev)
+{
+ unsigned int i;
+ struct virtqueue *vq;
+
+ verbose("Device %s OK: offered", dev->name);
+ for (i = 0; i < dev->feature_len; i++)
+ verbose(" %02x", get_feature_bits(dev)[i]);
+ verbose(", accepted");
+ for (i = 0; i < dev->feature_len; i++)
+ verbose(" %02x", get_feature_bits(dev)
+ [dev->feature_len+i]);
+
+ for (vq = dev->vq; vq; vq = vq->next) {
+ if (vq->service)
+ create_thread(vq);
+ }
+ dev->running = true;
+}
+
+static void cleanup_devices(void)
+{
+ struct device *dev;
+
+ for (dev = devices.dev; dev; dev = dev->next)
+ reset_device(dev);
+
+ /* If we saved off the original terminal settings, restore them now. */
+ if (orig_term.c_lflag & (ISIG|ICANON|ECHO))
+ tcsetattr(STDIN_FILENO, TCSANOW, &orig_term);
+}
+
+/* When the Guest tells us they updated the status field, we handle it. */
+static void update_device_status(struct device *dev)
+{
+ /* A zero status is a reset, otherwise it's a set of flags. */
+ if (dev->desc->status == 0)
+ reset_device(dev);
+ else if (dev->desc->status & VIRTIO_CONFIG_S_FAILED) {
warnx("Device %s configuration FAILED", dev->name);
+ if (dev->running)
+ reset_device(dev);
} else if (dev->desc->status & VIRTIO_CONFIG_S_DRIVER_OK) {
- unsigned int i;
-
- verbose("Device %s OK: offered", dev->name);
- for (i = 0; i < dev->desc->feature_len; i++)
- verbose(" %02x", get_feature_bits(dev)[i]);
- verbose(", accepted");
- for (i = 0; i < dev->desc->feature_len; i++)
- verbose(" %02x", get_feature_bits(dev)
- [dev->desc->feature_len+i]);
-
- if (dev->ready)
- dev->ready(dev);
+ if (!dev->running)
+ start_device(dev);
}
}
/* This is the generic routine we call when the Guest uses LHCALL_NOTIFY. */
-static void handle_output(int fd, unsigned long addr)
+static void handle_output(unsigned long addr)
{
struct device *i;
- struct virtqueue *vq;
- /* Check each device and virtqueue. */
+ /* Check each device. */
for (i = devices.dev; i; i = i->next) {
+ struct virtqueue *vq;
+
/* Notifications to device descriptors update device status. */
if (from_guest_phys(addr) == i->desc) {
update_device_status(i);
return;
}
- /* Notifications to virtqueues mean output has occurred. */
+ /* Devices *can* be used before status is set to DRIVER_OK. */
for (vq = i->vq; vq; vq = vq->next) {
- if (vq->config.pfn != addr/getpagesize())
+ if (addr != vq->config.pfn*getpagesize())
continue;
-
- /* Guest should acknowledge (and set features!) before
- * using the device. */
- if (i->desc->status == 0) {
- warnx("%s gave early output", i->name);
- return;
- }
-
- if (strcmp(vq->dev->name, "console") != 0)
- verbose("Output to %s\n", vq->dev->name);
- if (vq->handle_output)
- vq->handle_output(fd, vq, false);
+ if (i->running)
+ errx(1, "Notification on running %s", i->name);
+ start_device(i);
return;
}
}
@@ -1049,71 +990,6 @@ static void handle_output(int fd, unsigned long addr)
strnlen(from_guest_phys(addr), guest_limit - addr));
}
-static void handle_timeout(int fd)
-{
- char buf[32];
- struct device *i;
- struct virtqueue *vq;
-
- /* Clear the pipe */
- read(timeoutpipe[0], buf, sizeof(buf));
-
- /* Check each device and virtqueue: flush blocked ones. */
- for (i = devices.dev; i; i = i->next) {
- for (vq = i->vq; vq; vq = vq->next) {
- if (!vq->blocked)
- continue;
-
- vq->vring.used->flags &= ~VRING_USED_F_NO_NOTIFY;
- vq->blocked = false;
- if (vq->handle_output)
- vq->handle_output(fd, vq, true);
- }
- }
-}
-
-/* This is called when the Waker wakes us up: check for incoming file
- * descriptors. */
-static void handle_input(int fd)
-{
- /* select() wants a zeroed timeval to mean "don't wait". */
- struct timeval poll = { .tv_sec = 0, .tv_usec = 0 };
-
- for (;;) {
- struct device *i;
- fd_set fds = devices.infds;
- int num;
-
- num = select(devices.max_infd+1, &fds, NULL, NULL, &poll);
- /* Could get interrupted */
- if (num < 0)
- continue;
- /* If nothing is ready, we're done. */
- if (num == 0)
- break;
-
- /* Otherwise, call the device(s) which have readable file
- * descriptors and a method of handling them. */
- for (i = devices.dev; i; i = i->next) {
- if (i->handle_input && FD_ISSET(i->fd, &fds)) {
- if (i->handle_input(fd, i))
- continue;
-
- /* If handle_input() returns false, it means we
- * should no longer service it. Networking and
- * console do this when there's no input
- * buffers to deliver into. Console also uses
- * it when it discovers that stdin is closed. */
- FD_CLR(i->fd, &devices.infds);
- }
- }
-
- /* Is this the timeout fd? */
- if (FD_ISSET(timeoutpipe[0], &fds))
- handle_timeout(fd);
- }
-}
-
/*L:190
* Device Setup
*
@@ -1129,8 +1005,8 @@ static void handle_input(int fd)
static u8 *device_config(const struct device *dev)
{
return (void *)(dev->desc + 1)
- + dev->desc->num_vq * sizeof(struct lguest_vqconfig)
- + dev->desc->feature_len * 2;
+ + dev->num_vq * sizeof(struct lguest_vqconfig)
+ + dev->feature_len * 2;
}
/* This routine allocates a new "struct lguest_device_desc" from descriptor
@@ -1159,7 +1035,7 @@ static struct lguest_device_desc *new_dev_desc(u16 type)
/* Each device descriptor is followed by the description of its virtqueues. We
* specify how many descriptors the virtqueue is to have. */
static void add_virtqueue(struct device *dev, unsigned int num_descs,
- void (*handle_output)(int, struct virtqueue *, bool))
+ void (*service)(struct virtqueue *))
{
unsigned int pages;
struct virtqueue **i, *vq = malloc(sizeof(*vq));
@@ -1174,8 +1050,8 @@ static void add_virtqueue(struct device *dev, unsigned int num_descs,
vq->next = NULL;
vq->last_avail_idx = 0;
vq->dev = dev;
- vq->inflight = 0;
- vq->blocked = false;
+ vq->service = service;
+ vq->thread = (pid_t)-1;
/* Initialize the configuration. */
vq->config.num = num_descs;
@@ -1191,6 +1067,7 @@ static void add_virtqueue(struct device *dev, unsigned int num_descs,
* yet, otherwise we'd be overwriting them. */
assert(dev->desc->config_len == 0 && dev->desc->feature_len == 0);
memcpy(device_config(dev), &vq->config, sizeof(vq->config));
+ dev->num_vq++;
dev->desc->num_vq++;
verbose("Virtqueue page %#lx\n", to_guest_phys(p));
@@ -1199,15 +1076,6 @@ static void add_virtqueue(struct device *dev, unsigned int num_descs,
* second. */
for (i = &dev->vq; *i; i = &(*i)->next);
*i = vq;
-
- /* Set the routine to call when the Guest does something to this
- * virtqueue. */
- vq->handle_output = handle_output;
-
- /* As an optimization, set the advisory "Don't Notify Me" flag if we
- * don't have a handler */
- if (!handle_output)
- vq->vring.used->flags = VRING_USED_F_NO_NOTIFY;
}
/* The first half of the feature bitmask is for us to advertise features. The
@@ -1219,7 +1087,7 @@ static void add_feature(struct device *dev, unsigned bit)
/* We can't extend the feature bits once we've added config bytes */
if (dev->desc->feature_len <= bit / CHAR_BIT) {
assert(dev->desc->config_len == 0);
- dev->desc->feature_len = (bit / CHAR_BIT) + 1;
+ dev->feature_len = dev->desc->feature_len = (bit/CHAR_BIT) + 1;
}
features[bit / CHAR_BIT] |= (1 << (bit % CHAR_BIT));
@@ -1243,22 +1111,17 @@ static void set_config(struct device *dev, unsigned len, const void *conf)
* calling new_dev_desc() to allocate the descriptor and device memory.
*
* See what I mean about userspace being boring? */
-static struct device *new_device(const char *name, u16 type, int fd,
- bool (*handle_input)(int, struct device *))
+static struct device *new_device(const char *name, u16 type)
{
struct device *dev = malloc(sizeof(*dev));
/* Now we populate the fields one at a time. */
- dev->fd = fd;
- /* If we have an input handler for this file descriptor, then we add it
- * to the device_list's fdset and maxfd. */
- if (handle_input)
- add_device_fd(dev->fd);
dev->desc = new_dev_desc(type);
- dev->handle_input = handle_input;
dev->name = name;
dev->vq = NULL;
- dev->ready = NULL;
+ dev->feature_len = 0;
+ dev->num_vq = 0;
+ dev->running = false;
/* Append to device list. Prepending to a single-linked list is
* easier, but the user expects the devices to be arranged on the bus
@@ -1286,13 +1149,10 @@ static void setup_console(void)
* raw input stream to the Guest. */
term.c_lflag &= ~(ISIG|ICANON|ECHO);
tcsetattr(STDIN_FILENO, TCSANOW, &term);
- /* If we exit gracefully, the original settings will be
- * restored so the user can see what they're typing. */
- atexit(restore_term);
}
- dev = new_device("console", VIRTIO_ID_CONSOLE,
- STDIN_FILENO, handle_console_input);
+ dev = new_device("console", VIRTIO_ID_CONSOLE);
+
/* We store the console state in dev->priv, and initialize it. */
dev->priv = malloc(sizeof(struct console_abort));
((struct console_abort *)dev->priv)->count = 0;
@@ -1301,31 +1161,13 @@ static void setup_console(void)
* they put something the input queue, we make sure we're listening to
* stdin. When they put something in the output queue, we write it to
* stdout. */
- add_virtqueue(dev, VIRTQUEUE_NUM, enable_fd);
- add_virtqueue(dev, VIRTQUEUE_NUM, handle_console_output);
+ add_virtqueue(dev, VIRTQUEUE_NUM, console_input);
+ add_virtqueue(dev, VIRTQUEUE_NUM, console_output);
- verbose("device %u: console\n", devices.device_num++);
+ verbose("device %u: console\n", ++devices.device_num);
}
/*:*/
-static void timeout_alarm(int sig)
-{
- write(timeoutpipe[1], "", 1);
-}
-
-static void setup_timeout(void)
-{
- if (pipe(timeoutpipe) != 0)
- err(1, "Creating timeout pipe");
-
- if (fcntl(timeoutpipe[1], F_SETFL,
- fcntl(timeoutpipe[1], F_GETFL) | O_NONBLOCK) != 0)
- err(1, "Making timeout pipe nonblocking");
-
- add_device_fd(timeoutpipe[0]);
- signal(SIGALRM, timeout_alarm);
-}
-
/*M:010 Inter-guest networking is an interesting area. Simplest is to have a
* --sharenet=<name> option which opens or creates a named pipe. This can be
* used to send packets to another guest in a 1:1 manner.
@@ -1447,21 +1289,23 @@ static int get_tun_device(char tapif[IFNAMSIZ])
static void setup_tun_net(char *arg)
{
struct device *dev;
- int netfd, ipfd;
+ struct net_info *net_info = malloc(sizeof(*net_info));
+ int ipfd;
u32 ip = INADDR_ANY;
bool bridging = false;
char tapif[IFNAMSIZ], *p;
struct virtio_net_config conf;
- netfd = get_tun_device(tapif);
+ net_info->tunfd = get_tun_device(tapif);
/* First we create a new network device. */
- dev = new_device("net", VIRTIO_ID_NET, netfd, handle_tun_input);
+ dev = new_device("net", VIRTIO_ID_NET);
+ dev->priv = net_info;
/* Network devices need a receive and a send queue, just like
* console. */
- add_virtqueue(dev, VIRTQUEUE_NUM, net_enable_fd);
- add_virtqueue(dev, VIRTQUEUE_NUM, handle_net_output);
+ add_virtqueue(dev, VIRTQUEUE_NUM, net_input);
+ add_virtqueue(dev, VIRTQUEUE_NUM, net_output);
/* We need a socket to perform the magic network ioctls to bring up the
* tap interface, connect to the bridge etc. Any socket will do! */
@@ -1502,6 +1346,8 @@ static void setup_tun_net(char *arg)
add_feature(dev, VIRTIO_NET_F_HOST_TSO4);
add_feature(dev, VIRTIO_NET_F_HOST_TSO6);
add_feature(dev, VIRTIO_NET_F_HOST_ECN);
+ /* We handle indirect ring entries */
+ add_feature(dev, VIRTIO_RING_F_INDIRECT_DESC);
set_config(dev, sizeof(conf), &conf);
/* We don't need the socket any more; setup is done. */
@@ -1550,20 +1396,18 @@ struct vblk_info
* Remember that the block device is handled by a separate I/O thread. We head
* straight into the core of that thread here:
*/
-static bool service_io(struct device *dev)
+static void blk_request(struct virtqueue *vq)
{
- struct vblk_info *vblk = dev->priv;
+ struct vblk_info *vblk = vq->dev->priv;
unsigned int head, out_num, in_num, wlen;
int ret;
u8 *in;
struct virtio_blk_outhdr *out;
- struct iovec iov[dev->vq->vring.num];
+ struct iovec iov[vq->vring.num];
off64_t off;
- /* See if there's a request waiting. If not, nothing to do. */
- head = get_vq_desc(dev->vq, iov, &out_num, &in_num);
- if (head == dev->vq->vring.num)
- return false;
+ /* Get the next request. */
+ head = wait_for_vq_desc(vq, iov, &out_num, &in_num);
/* Every block request should contain at least one output buffer
* (detailing the location on disk and the type of request) and one
@@ -1637,83 +1481,21 @@ static bool service_io(struct device *dev)
if (out->type & VIRTIO_BLK_T_BARRIER)
fdatasync(vblk->fd);
- /* We can't trigger an IRQ, because we're not the Launcher. It does
- * that when we tell it we're done. */
- add_used(dev->vq, head, wlen);
- return true;
-}
-
-/* This is the thread which actually services the I/O. */
-static int io_thread(void *_dev)
-{
- struct device *dev = _dev;
- struct vblk_info *vblk = dev->priv;
- char c;
-
- /* Close other side of workpipe so we get 0 read when main dies. */
- close(vblk->workpipe[1]);
- /* Close the other side of the done_fd pipe. */
- close(dev->fd);
-
- /* When this read fails, it means Launcher died, so we follow. */
- while (read(vblk->workpipe[0], &c, 1) == 1) {
- /* We acknowledge each request immediately to reduce latency,
- * rather than waiting until we've done them all. I haven't
- * measured to see if it makes any difference.
- *
- * That would be an interesting test, wouldn't it? You could
- * also try having more than one I/O thread. */
- while (service_io(dev))
- write(vblk->done_fd, &c, 1);
- }
- return 0;
-}
-
-/* Now we've seen the I/O thread, we return to the Launcher to see what happens
- * when that thread tells us it's completed some I/O. */
-static bool handle_io_finish(int fd, struct device *dev)
-{
- char c;
-
- /* If the I/O thread died, presumably it printed the error, so we
- * simply exit. */
- if (read(dev->fd, &c, 1) != 1)
- exit(1);
-
- /* It did some work, so trigger the irq. */
- trigger_irq(fd, dev->vq);
- return true;
-}
-
-/* When the Guest submits some I/O, we just need to wake the I/O thread. */
-static void handle_virtblk_output(int fd, struct virtqueue *vq, bool timeout)
-{
- struct vblk_info *vblk = vq->dev->priv;
- char c = 0;
-
- /* Wake up I/O thread and tell it to go to work! */
- if (write(vblk->workpipe[1], &c, 1) != 1)
- /* Presumably it indicated why it died. */
- exit(1);
+ add_used(vq, head, wlen);
}
/*L:198 This actually sets up a virtual block device. */
static void setup_block_file(const char *filename)
{
- int p[2];
struct device *dev;
struct vblk_info *vblk;
- void *stack;
struct virtio_blk_config conf;
- /* This is the pipe the I/O thread will use to tell us I/O is done. */
- pipe(p);
-
/* The device responds to return from I/O thread. */
- dev = new_device("block", VIRTIO_ID_BLOCK, p[0], handle_io_finish);
+ dev = new_device("block", VIRTIO_ID_BLOCK);
/* The device has one virtqueue, where the Guest places requests. */
- add_virtqueue(dev, VIRTQUEUE_NUM, handle_virtblk_output);
+ add_virtqueue(dev, VIRTQUEUE_NUM, blk_request);
/* Allocate the room for our own bookkeeping */
vblk = dev->priv = malloc(sizeof(*vblk));
@@ -1735,49 +1517,29 @@ static void setup_block_file(const char *filename)
set_config(dev, sizeof(conf), &conf);
- /* The I/O thread writes to this end of the pipe when done. */
- vblk->done_fd = p[1];
-
- /* This is the second pipe, which is how we tell the I/O thread about
- * more work. */
- pipe(vblk->workpipe);
-
- /* Create stack for thread and run it. Since stack grows upwards, we
- * point the stack pointer to the end of this region. */
- stack = malloc(32768);
- /* SIGCHLD - We dont "wait" for our cloned thread, so prevent it from
- * becoming a zombie. */
- if (clone(io_thread, stack + 32768, CLONE_VM | SIGCHLD, dev) == -1)
- err(1, "Creating clone");
-
- /* We don't need to keep the I/O thread's end of the pipes open. */
- close(vblk->done_fd);
- close(vblk->workpipe[0]);
-
verbose("device %u: virtblock %llu sectors\n",
- devices.device_num, le64_to_cpu(conf.capacity));
+ ++devices.device_num, le64_to_cpu(conf.capacity));
}
+struct rng_info {
+ int rfd;
+};
+
/* Our random number generator device reads from /dev/random into the Guest's
* input buffers. The usual case is that the Guest doesn't want random numbers
* and so has no buffers although /dev/random is still readable, whereas
* console is the reverse.
*
* The same logic applies, however. */
-static bool handle_rng_input(int fd, struct device *dev)
+static void rng_input(struct virtqueue *vq)
{
int len;
unsigned int head, in_num, out_num, totlen = 0;
- struct iovec iov[dev->vq->vring.num];
+ struct rng_info *rng_info = vq->dev->priv;
+ struct iovec iov[vq->vring.num];
/* First we need a buffer from the Guests's virtqueue. */
- head = get_vq_desc(dev->vq, iov, &out_num, &in_num);
-
- /* If they're not ready for input, stop listening to this file
- * descriptor. We'll start again once they add an input buffer. */
- if (head == dev->vq->vring.num)
- return false;
-
+ head = wait_for_vq_desc(vq, iov, &out_num, &in_num);
if (out_num)
errx(1, "Output buffers in rng?");
@@ -1785,7 +1547,7 @@ static bool handle_rng_input(int fd, struct device *dev)
* it reads straight into the Guest's buffer. We loop to make sure we
* fill it. */
while (!iov_empty(iov, in_num)) {
- len = readv(dev->fd, iov, in_num);
+ len = readv(rng_info->rfd, iov, in_num);
if (len <= 0)
err(1, "Read from /dev/random gave %i", len);
iov_consume(iov, in_num, len);
@@ -1793,25 +1555,23 @@ static bool handle_rng_input(int fd, struct device *dev)
}
/* Tell the Guest about the new input. */
- add_used_and_trigger(fd, dev->vq, head, totlen);
-
- /* Everything went OK! */
- return true;
+ add_used(vq, head, totlen);
}
/* And this creates a "hardware" random number device for the Guest. */
static void setup_rng(void)
{
struct device *dev;
- int fd;
+ struct rng_info *rng_info = malloc(sizeof(*rng_info));
- fd = open_or_die("/dev/random", O_RDONLY);
+ rng_info->rfd = open_or_die("/dev/random", O_RDONLY);
/* The device responds to return from I/O thread. */
- dev = new_device("rng", VIRTIO_ID_RNG, fd, handle_rng_input);
+ dev = new_device("rng", VIRTIO_ID_RNG);
+ dev->priv = rng_info;
/* The device has one virtqueue, where the Guest places inbufs. */
- add_virtqueue(dev, VIRTQUEUE_NUM, enable_fd);
+ add_virtqueue(dev, VIRTQUEUE_NUM, rng_input);
verbose("device %u: rng\n", devices.device_num++);
}
@@ -1827,17 +1587,18 @@ static void __attribute__((noreturn)) restart_guest(void)
for (i = 3; i < FD_SETSIZE; i++)
close(i);
- /* The exec automatically gets rid of the I/O and Waker threads. */
+ /* Reset all the devices (kills all threads). */
+ cleanup_devices();
+
execv(main_args[0], main_args);
err(1, "Could not exec %s", main_args[0]);
}
/*L:220 Finally we reach the core of the Launcher which runs the Guest, serves
* its input and output, and finally, lays it to rest. */
-static void __attribute__((noreturn)) run_guest(int lguest_fd)
+static void __attribute__((noreturn)) run_guest(void)
{
for (;;) {
- unsigned long args[] = { LHREQ_BREAK, 0 };
unsigned long notify_addr;
int readval;
@@ -1848,8 +1609,7 @@ static void __attribute__((noreturn)) run_guest(int lguest_fd)
/* One unsigned long means the Guest did HCALL_NOTIFY */
if (readval == sizeof(notify_addr)) {
verbose("Notify on address %#lx\n", notify_addr);
- handle_output(lguest_fd, notify_addr);
- continue;
+ handle_output(notify_addr);
/* ENOENT means the Guest died. Reading tells us why. */
} else if (errno == ENOENT) {
char reason[1024] = { 0 };
@@ -1858,19 +1618,9 @@ static void __attribute__((noreturn)) run_guest(int lguest_fd)
/* ERESTART means that we need to reboot the guest */
} else if (errno == ERESTART) {
restart_guest();
- /* EAGAIN means a signal (timeout).
- * Anything else means a bug or incompatible change. */
- } else if (errno != EAGAIN)
+ /* Anything else means a bug or incompatible change. */
+ } else
err(1, "Running guest failed");
-
- /* Only service input on thread for CPU 0. */
- if (cpu_id != 0)
- continue;
-
- /* Service input, then unset the BREAK to release the Waker. */
- handle_input(lguest_fd);
- if (pwrite(lguest_fd, args, sizeof(args), cpu_id) < 0)
- err(1, "Resetting break");
}
}
/*L:240
@@ -1904,8 +1654,8 @@ int main(int argc, char *argv[])
/* Memory, top-level pagetable, code startpoint and size of the
* (optional) initrd. */
unsigned long mem = 0, start, initrd_size = 0;
- /* Two temporaries and the /dev/lguest file descriptor. */
- int i, c, lguest_fd;
+ /* Two temporaries. */
+ int i, c;
/* The boot information for the Guest. */
struct boot_params *boot;
/* If they specify an initrd file to load. */
@@ -1913,18 +1663,10 @@ int main(int argc, char *argv[])
/* Save the args: we "reboot" by execing ourselves again. */
main_args = argv;
- /* We don't "wait" for the children, so prevent them from becoming
- * zombies. */
- signal(SIGCHLD, SIG_IGN);
- /* First we initialize the device list. Since console and network
- * device receive input from a file descriptor, we keep an fdset
- * (infds) and the maximum fd number (max_infd) with the head of the
- * list. We also keep a pointer to the last device. Finally, we keep
- * the next interrupt number to use for devices (1: remember that 0 is
- * used by the timer). */
- FD_ZERO(&devices.infds);
- devices.max_infd = -1;
+ /* First we initialize the device list. We keep a pointer to the last
+ * device, and the next interrupt number to use for devices (1:
+ * remember that 0 is used by the timer). */
devices.lastdev = NULL;
devices.next_irq = 1;
@@ -1982,9 +1724,6 @@ int main(int argc, char *argv[])
/* We always have a console device */
setup_console();
- /* We can timeout waiting for Guest network transmit. */
- setup_timeout();
-
/* Now we load the kernel */
start = load_kernel(open_or_die(argv[optind+1], O_RDONLY));
@@ -2023,15 +1762,16 @@ int main(int argc, char *argv[])
/* We tell the kernel to initialize the Guest: this returns the open
* /dev/lguest file descriptor. */
- lguest_fd = tell_kernel(start);
+ tell_kernel(start);
+
+ /* Ensure that we terminate if a child dies. */
+ signal(SIGCHLD, kill_launcher);
- /* We clone off a thread, which wakes the Launcher whenever one of the
- * input file descriptors needs attention. We call this the Waker, and
- * we'll cover it in a moment. */
- setup_waker(lguest_fd);
+ /* If we exit via err(), this kills all the threads, restores tty. */
+ atexit(cleanup_devices);
/* Finally, run the Guest. This doesn't return. */
- run_guest(lguest_fd);
+ run_guest();
}
/*:*/
diff --git a/Documentation/lguest/lguest.txt b/Documentation/lguest/lguest.txt
index 28c747362f9..efb3a6a045a 100644
--- a/Documentation/lguest/lguest.txt
+++ b/Documentation/lguest/lguest.txt
@@ -37,7 +37,6 @@ Running Lguest:
"Paravirtualized guest support" = Y
"Lguest guest support" = Y
"High Memory Support" = off/4GB
- "PAE (Physical Address Extension) Support" = N
"Alignment value to which kernel should be aligned" = 0x100000
(CONFIG_PARAVIRT=y, CONFIG_LGUEST_GUEST=y, CONFIG_HIGHMEM64G=n and
CONFIG_PHYSICAL_ALIGN=0x100000)
diff --git a/Documentation/power/devices.txt b/Documentation/power/devices.txt
index 421e7d00ffd..c9abbd86bc1 100644
--- a/Documentation/power/devices.txt
+++ b/Documentation/power/devices.txt
@@ -75,9 +75,6 @@ may need to apply in domain-specific ways to their devices:
struct bus_type {
...
int (*suspend)(struct device *dev, pm_message_t state);
- int (*suspend_late)(struct device *dev, pm_message_t state);
-
- int (*resume_early)(struct device *dev);
int (*resume)(struct device *dev);
};
@@ -226,20 +223,7 @@ The phases are seen by driver notifications issued in this order:
This call should handle parts of device suspend logic that require
sleeping. It probably does work to quiesce the device which hasn't
- been abstracted into class.suspend() or bus.suspend_late().
-
- 3 bus.suspend_late(dev, message) is called with IRQs disabled, and
- with only one CPU active. Until the bus.resume_early() phase
- completes (see later), IRQs are not enabled again. This method
- won't be exposed by all busses; for message based busses like USB,
- I2C, or SPI, device interactions normally require IRQs. This bus
- call may be morphed into a driver call with bus-specific parameters.
-
- This call might save low level hardware state that might otherwise
- be lost in the upcoming low power state, and actually put the
- device into a low power state ... so that in some cases the device
- may stay partly usable until this late. This "late" call may also
- help when coping with hardware that behaves badly.
+ been abstracted into class.suspend().
The pm_message_t parameter is currently used to refine those semantics
(described later).
@@ -351,19 +335,11 @@ devices processing each phase's calls before the next phase begins.
The phases are seen by driver notifications issued in this order:
- 1 bus.resume_early(dev) is called with IRQs disabled, and with
- only one CPU active. As with bus.suspend_late(), this method
- won't be supported on busses that require IRQs in order to
- interact with devices.
-
- This reverses the effects of bus.suspend_late().
-
- 2 bus.resume(dev) is called next. This may be morphed into a device
- driver call with bus-specific parameters; implementations may sleep.
-
- This reverses the effects of bus.suspend().
+ 1 bus.resume(dev) reverses the effects of bus.suspend(). This may
+ be morphed into a device driver call with bus-specific parameters;
+ implementations may sleep.
- 3 class.resume(dev) is called for devices associated with a class
+ 2 class.resume(dev) is called for devices associated with a class
that has such a method. Implementations may sleep.
This reverses the effects of class.suspend(), and would usually
diff --git a/Documentation/sound/alsa/ALSA-Configuration.txt b/Documentation/sound/alsa/ALSA-Configuration.txt
index 012858d2b11..5c08d96f407 100644
--- a/Documentation/sound/alsa/ALSA-Configuration.txt
+++ b/Documentation/sound/alsa/ALSA-Configuration.txt
@@ -460,6 +460,25 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
The power-management is supported.
+ Module snd-ctxfi
+ ----------------
+
+ Module for Creative Sound Blaster X-Fi boards (20k1 / 20k2 chips)
+ * Creative Sound Blaster X-Fi Titanium Fatal1ty Champion Series
+ * Creative Sound Blaster X-Fi Titanium Fatal1ty Professional Series
+ * Creative Sound Blaster X-Fi Titanium Professional Audio
+ * Creative Sound Blaster X-Fi Titanium
+ * Creative Sound Blaster X-Fi Elite Pro
+ * Creative Sound Blaster X-Fi Platinum
+ * Creative Sound Blaster X-Fi Fatal1ty
+ * Creative Sound Blaster X-Fi XtremeGamer
+ * Creative Sound Blaster X-Fi XtremeMusic
+
+ reference_rate - reference sample rate, 44100 or 48000 (default)
+ multiple - multiple to ref. sample rate, 1 or 2 (default)
+
+ This module supports multiple cards.
+
Module snd-darla20
------------------
@@ -925,6 +944,7 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
* Onkyo SE-90PCI
* Onkyo SE-200PCI
* ESI Juli@
+ * ESI Maya44
* Hercules Fortissimo IV
* EGO-SYS WaveTerminal 192M
@@ -933,7 +953,7 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
prodigy71xt, prodigy71hifi, prodigyhd2, prodigy192,
juli, aureon51, aureon71, universe, ap192, k8x800,
phase22, phase28, ms300, av710, se200pci, se90pci,
- fortissimo4, sn25p, WT192M
+ fortissimo4, sn25p, WT192M, maya44
This module supports multiple cards and autoprobe.
@@ -1093,6 +1113,13 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
This module supports multiple cards.
The driver requires the firmware loader support on kernel.
+ Module snd-lx6464es
+ -------------------
+
+ Module for Digigram LX6464ES boards
+
+ This module supports multiple cards.
+
Module snd-maestro3
-------------------
@@ -1543,13 +1570,15 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
Module snd-sc6000
-----------------
- Module for Gallant SC-6000 soundcard.
+ Module for Gallant SC-6000 soundcard and later models: SC-6600
+ and SC-7000.
port - Port # (0x220 or 0x240)
mss_port - MSS Port # (0x530 or 0xe80)
irq - IRQ # (5,7,9,10,11)
mpu_irq - MPU-401 IRQ # (5,7,9,10) ,0 - no MPU-401 irq
dma - DMA # (1,3,0)
+ joystick - Enable gameport - 0 = disable (default), 1 = enable
This module supports multiple cards.
@@ -1859,7 +1888,8 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
-------------------
Module for sound cards based on the Asus AV100/AV200 chips,
- i.e., Xonar D1, DX, D2, D2X, HDAV1.3 (Deluxe), and Essence STX.
+ i.e., Xonar D1, DX, D2, D2X, HDAV1.3 (Deluxe), Essence ST
+ (Deluxe) and Essence STX.
This module supports autoprobe and multiple cards.
diff --git a/Documentation/sound/alsa/HD-Audio-Models.txt b/Documentation/sound/alsa/HD-Audio-Models.txt
index 322869fc8a9..de8e10a9410 100644
--- a/Documentation/sound/alsa/HD-Audio-Models.txt
+++ b/Documentation/sound/alsa/HD-Audio-Models.txt
@@ -36,6 +36,7 @@ ALC260
acer Acer TravelMate
will Will laptops (PB V7900)
replacer Replacer 672V
+ favorit100 Maxdata Favorit 100XS
basic fixed pin assignment (old default model)
test for testing/debugging purpose, almost all controls can
adjusted. Appearing only when compiled with
@@ -85,10 +86,11 @@ ALC269
eeepc-p703 ASUS Eeepc P703 P900A
eeepc-p901 ASUS Eeepc P901 S101
fujitsu FSC Amilo
+ lifebook Fujitsu Lifebook S6420
auto auto-config reading BIOS (default)
-ALC662/663
-==========
+ALC662/663/272
+==============
3stack-dig 3-stack (2-channel) with SPDIF
3stack-6ch 3-stack (6-channel)
3stack-6ch-dig 3-stack (6-channel) with SPDIF
@@ -107,6 +109,9 @@ ALC662/663
asus-mode4 ASUS
asus-mode5 ASUS
asus-mode6 ASUS
+ dell Dell with ALC272
+ dell-zm1 Dell ZM1 with ALC272
+ samsung-nc10 Samsung NC10 mini notebook
auto auto-config reading BIOS (default)
ALC882/885
@@ -118,6 +123,7 @@ ALC882/885
asus-a7j ASUS A7J
asus-a7m ASUS A7M
macpro MacPro support
+ mb5 Macbook 5,1
mbp3 Macbook Pro rev3
imac24 iMac 24'' with jack detection
w2jc ASUS W2JC
@@ -133,10 +139,12 @@ ALC883/888
acer Acer laptops (Travelmate 3012WTMi, Aspire 5600, etc)
acer-aspire Acer Aspire 9810
acer-aspire-4930g Acer Aspire 4930G
+ acer-aspire-8930g Acer Aspire 8930G
medion Medion Laptops
medion-md2 Medion MD2
targa-dig Targa/MSI
- targa-2ch-dig Targs/MSI with 2-channel
+ targa-2ch-dig Targa/MSI with 2-channel
+ targa-8ch-dig Targa/MSI with 8-channel (MSI GX620)
laptop-eapd 3-jack with SPDIF I/O and EAPD (Clevo M540JE, M550JE)
lenovo-101e Lenovo 101E
lenovo-nb0763 Lenovo NB0763
@@ -150,6 +158,9 @@ ALC883/888
fujitsu-pi2515 Fujitsu AMILO Pi2515
fujitsu-xa3530 Fujitsu AMILO XA3530
3stack-6ch-intel Intel DG33* boards
+ asus-p5q ASUS P5Q-EM boards
+ mb31 MacBook 3,1
+ sony-vaio-tt Sony VAIO TT
auto auto-config reading BIOS (default)
ALC861/660
@@ -348,6 +359,7 @@ STAC92HD71B*
hp-m4 HP mini 1000
hp-dv5 HP dv series
hp-hdx HP HDX series
+ hp-dv4-1222nr HP dv4-1222nr (with LED support)
auto BIOS setup (default)
STAC92HD73*
diff --git a/Documentation/sound/alsa/Procfile.txt b/Documentation/sound/alsa/Procfile.txt
index cfac20cf9e3..381908d8ca4 100644
--- a/Documentation/sound/alsa/Procfile.txt
+++ b/Documentation/sound/alsa/Procfile.txt
@@ -88,26 +88,34 @@ card*/pcm*/info
substreams, etc.
card*/pcm*/xrun_debug
- This file appears when CONFIG_SND_DEBUG=y.
- This shows the status of xrun (= buffer overrun/xrun) debug of
- ALSA PCM middle layer, as an integer from 0 to 2. The value
- can be changed by writing to this file, such as
-
- # cat 2 > /proc/asound/card0/pcm0p/xrun_debug
-
- When this value is greater than 0, the driver will show the
- messages to kernel log when an xrun is detected. The debug
- message is shown also when the invalid H/W pointer is detected
- at the update of periods (usually called from the interrupt
+ This file appears when CONFIG_SND_DEBUG=y and
+ CONFIG_PCM_XRUN_DEBUG=y.
+ This shows the status of xrun (= buffer overrun/xrun) and
+ invalid PCM position debug/check of ALSA PCM middle layer.
+ It takes an integer value, can be changed by writing to this
+ file, such as
+
+ # cat 5 > /proc/asound/card0/pcm0p/xrun_debug
+
+ The value consists of the following bit flags:
+ bit 0 = Enable XRUN/jiffies debug messages
+ bit 1 = Show stack trace at XRUN / jiffies check
+ bit 2 = Enable additional jiffies check
+
+ When the bit 0 is set, the driver will show the messages to
+ kernel log when an xrun is detected. The debug message is
+ shown also when the invalid H/W pointer is detected at the
+ update of periods (usually called from the interrupt
handler).
- When this value is greater than 1, the driver will show the
- stack trace additionally. This may help the debugging.
+ When the bit 1 is set, the driver will show the stack trace
+ additionally. This may help the debugging.
- Since 2.6.30, this option also enables the hwptr check using
+ Since 2.6.30, this option can enable the hwptr check using
jiffies. This detects spontaneous invalid pointer callback
values, but can be lead to too much corrections for a (mostly
buggy) hardware that doesn't give smooth pointer updates.
+ This feature is enabled via the bit 2.
card*/pcm*/sub*/info
The general information of this PCM sub-stream.
diff --git a/Documentation/sound/alsa/README.maya44 b/Documentation/sound/alsa/README.maya44
new file mode 100644
index 00000000000..0e41576fa13
--- /dev/null
+++ b/Documentation/sound/alsa/README.maya44
@@ -0,0 +1,163 @@
+NOTE: The following is the original document of Rainer's patch that the
+current maya44 code based on. Some contents might be obsoleted, but I
+keep here as reference -- tiwai
+
+----------------------------------------------------------------
+
+STATE OF DEVELOPMENT:
+
+This driver is being developed on the initiative of Piotr Makowski (oponek@gmail.com) and financed by Lars Bergmann.
+Development is carried out by Rainer Zimmermann (mail@lightshed.de).
+
+ESI provided a sample Maya44 card for the development work.
+
+However, unfortunately it has turned out difficult to get detailed programming information, so I (Rainer Zimmermann) had to find out some card-specific information by experiment and conjecture. Some information (in particular, several GPIO bits) is still missing.
+
+This is the first testing version of the Maya44 driver released to the alsa-devel mailing list (Feb 5, 2008).
+
+
+The following functions work, as tested by Rainer Zimmermann and Piotr Makowski:
+
+- playback and capture at all sampling rates
+- input/output level
+- crossmixing
+- line/mic switch
+- phantom power switch
+- analogue monitor a.k.a bypass
+
+
+The following functions *should* work, but are not fully tested:
+
+- Channel 3+4 analogue - S/PDIF input switching
+- S/PDIF output
+- all inputs/outputs on the M/IO/DIO extension card
+- internal/external clock selection
+
+
+*In particular, we would appreciate testing of these functions by anyone who has access to an M/IO/DIO extension card.*
+
+
+Things that do not seem to work:
+
+- The level meters ("multi track") in 'alsamixer' do not seem to react to signals in (if this is a bug, it would probably be in the existing ICE1724 code).
+
+- Ardour 2.1 seems to work only via JACK, not using ALSA directly or via OSS. This still needs to be tracked down.
+
+
+DRIVER DETAILS:
+
+the following files were added:
+
+pci/ice1724/maya44.c - Maya44 specific code
+pci/ice1724/maya44.h
+pci/ice1724/ice1724.patch
+pci/ice1724/ice1724.h.patch - PROPOSED patch to ice1724.h (see SAMPLING RATES)
+i2c/other/wm8776.c - low-level access routines for Wolfson WM8776 codecs
+include/wm8776.h
+
+
+Note that the wm8776.c code is meant to be card-independent and does not actually register the codec with the ALSA infrastructure.
+This is done in maya44.c, mainly because some of the WM8776 controls are used in Maya44-specific ways, and should be named appropriately.
+
+
+the following files were created in pci/ice1724, simply #including the corresponding file from the alsa-kernel tree:
+
+wtm.h
+vt1720_mobo.h
+revo.h
+prodigy192.h
+pontis.h
+phase.h
+maya44.h
+juli.h
+aureon.h
+amp.h
+envy24ht.h
+se.h
+prodigy_hifi.h
+
+
+*I hope this is the correct way to do things.*
+
+
+SAMPLING RATES:
+
+The Maya44 card (or more exactly, the Wolfson WM8776 codecs) allow a maximum sampling rate of 192 kHz for playback and 92 kHz for capture.
+
+As the ICE1724 chip only allows one global sampling rate, this is handled as follows:
+
+* setting the sampling rate on any open PCM device on the maya44 card will always set the *global* sampling rate for all playback and capture channels.
+
+* In the current state of the driver, setting rates of up to 192 kHz is permitted even for capture devices.
+
+*AVOID CAPTURING AT RATES ABOVE 96kHz*, even though it may appear to work. The codec cannot actually capture at such rates, meaning poor quality.
+
+
+I propose some additional code for limiting the sampling rate when setting on a capture pcm device. However because of the global sampling rate, this logic would be somewhat problematic.
+
+The proposed code (currently deactivated) is in ice1712.h.patch, ice1724.c and maya44.c (in pci/ice1712).
+
+
+SOUND DEVICES:
+
+PCM devices correspond to inputs/outputs as follows (assuming Maya44 is card #0):
+
+hw:0,0 input - stereo, analog input 1+2
+hw:0,0 output - stereo, analog output 1+2
+hw:0,1 input - stereo, analog input 3+4 OR S/PDIF input
+hw:0,1 output - stereo, analog output 3+4 (and SPDIF out)
+
+
+NAMING OF MIXER CONTROLS:
+
+(for more information about the signal flow, please refer to the block diagram on p.24 of the ESI Maya44 manual, or in the ESI windows software).
+
+
+PCM: (digital) output level for channel 1+2
+PCM 1: same for channel 3+4
+
+Mic Phantom+48V: switch for +48V phantom power for electrostatic microphones on input 1/2.
+ Make sure this is not turned on while any other source is connected to input 1/2.
+ It might damage the source and/or the maya44 card.
+
+Mic/Line input: if switch is is on, input jack 1/2 is microphone input (mono), otherwise line input (stereo).
+
+Bypass: analogue bypass from ADC input to output for channel 1+2. Same as "Monitor" in the windows driver.
+Bypass 1: same for channel 3+4.
+
+Crossmix: cross-mixer from channels 1+2 to channels 3+4
+Crossmix 1: cross-mixer from channels 3+4 to channels 1+2
+
+IEC958 Output: switch for S/PDIF output.
+ This is not supported by the ESI windows driver.
+ S/PDIF should output the same signal as channel 3+4. [untested!]
+
+
+Digitial output selectors:
+
+ These switches allow a direct digital routing from the ADCs to the DACs.
+ Each switch determines where the digital input data to one of the DACs comes from.
+ They are not supported by the ESI windows driver.
+ For normal operation, they should all be set to "PCM out".
+
+H/W: Output source channel 1
+H/W 1: Output source channel 2
+H/W 2: Output source channel 3
+H/W 3: Output source channel 4
+
+H/W 4 ... H/W 9: unknown function, left in to enable testing.
+ Possibly some of these control S/PDIF output(s).
+ If these turn out to be unused, they will go away in later driver versions.
+
+Selectable values for each of the digital output selectors are:
+ "PCM out" -> DAC output of the corresponding channel (default setting)
+ "Input 1"...
+ "Input 4" -> direct routing from ADC output of the selected input channel
+
+
+--------
+
+Feb 14, 2008
+Rainer Zimmermann
+mail@lightshed.de
+
diff --git a/Documentation/sound/alsa/soc/dapm.txt b/Documentation/sound/alsa/soc/dapm.txt
index 9e6763264a2..9ac842be9b4 100644
--- a/Documentation/sound/alsa/soc/dapm.txt
+++ b/Documentation/sound/alsa/soc/dapm.txt
@@ -62,6 +62,7 @@ Audio DAPM widgets fall into a number of types:-
o Mic - Mic (and optional Jack)
o Line - Line Input/Output (and optional Jack)
o Speaker - Speaker
+ o Supply - Power or clock supply widget used by other widgets.
o Pre - Special PRE widget (exec before all others)
o Post - Special POST widget (exec after all others)
diff --git a/Documentation/x86/x86_64/boot-options.txt b/Documentation/x86/x86_64/boot-options.txt
index 2db5893d6c9..29a6ff8bc7d 100644
--- a/Documentation/x86/x86_64/boot-options.txt
+++ b/Documentation/x86/x86_64/boot-options.txt
@@ -5,21 +5,51 @@ only the AMD64 specific ones are listed here.
Machine check
- mce=off disable machine check
- mce=bootlog Enable logging of machine checks left over from booting.
- Disabled by default on AMD because some BIOS leave bogus ones.
- If your BIOS doesn't do that it's a good idea to enable though
- to make sure you log even machine check events that result
- in a reboot. On Intel systems it is enabled by default.
+ Please see Documentation/x86/x86_64/machinecheck for sysfs runtime tunables.
+
+ mce=off
+ Disable machine check
+ mce=no_cmci
+ Disable CMCI(Corrected Machine Check Interrupt) that
+ Intel processor supports. Usually this disablement is
+ not recommended, but it might be handy if your hardware
+ is misbehaving.
+ Note that you'll get more problems without CMCI than with
+ due to the shared banks, i.e. you might get duplicated
+ error logs.
+ mce=dont_log_ce
+ Don't make logs for corrected errors. All events reported
+ as corrected are silently cleared by OS.
+ This option will be useful if you have no interest in any
+ of corrected errors.
+ mce=ignore_ce
+ Disable features for corrected errors, e.g. polling timer
+ and CMCI. All events reported as corrected are not cleared
+ by OS and remained in its error banks.
+ Usually this disablement is not recommended, however if
+ there is an agent checking/clearing corrected errors
+ (e.g. BIOS or hardware monitoring applications), conflicting
+ with OS's error handling, and you cannot deactivate the agent,
+ then this option will be a help.
+ mce=bootlog
+ Enable logging of machine checks left over from booting.
+ Disabled by default on AMD because some BIOS leave bogus ones.
+ If your BIOS doesn't do that it's a good idea to enable though
+ to make sure you log even machine check events that result
+ in a reboot. On Intel systems it is enabled by default.
mce=nobootlog
Disable boot machine check logging.
- mce=tolerancelevel (number)
+ mce=tolerancelevel[,monarchtimeout] (number,number)
+ tolerance levels:
0: always panic on uncorrected errors, log corrected errors
1: panic or SIGBUS on uncorrected errors, log corrected errors
2: SIGBUS or log uncorrected errors, log corrected errors
3: never panic or SIGBUS, log all errors (for testing only)
Default is 1
Can be also set using sysfs which is preferable.
+ monarchtimeout:
+ Sets the time in us to wait for other CPUs on machine checks. 0
+ to disable.
nomce (for compatibility with i386): same as mce=off
diff --git a/Documentation/x86/x86_64/machinecheck b/Documentation/x86/x86_64/machinecheck
index a05e58e7b15..b1fb3027328 100644
--- a/Documentation/x86/x86_64/machinecheck
+++ b/Documentation/x86/x86_64/machinecheck
@@ -41,7 +41,9 @@ check_interval
the polling interval. When the poller stops finding MCEs, it
triggers an exponential backoff (poll less often) on the polling
interval. The check_interval variable is both the initial and
- maximum polling interval.
+ maximum polling interval. 0 means no polling for corrected machine
+ check errors (but some corrected errors might be still reported
+ in other ways)
tolerant
Tolerance level. When a machine check exception occurs for a non
@@ -67,6 +69,10 @@ trigger
Program to run when a machine check event is detected.
This is an alternative to running mcelog regularly from cron
and allows to detect events faster.
+monarch_timeout
+ How long to wait for the other CPUs to machine check too on a
+ exception. 0 to disable waiting for other CPUs.
+ Unit: us
TBD document entries for AMD threshold interrupt configuration