diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2012-03-28 12:55:04 -0700 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2012-03-28 12:55:04 -0700 |
commit | 89e5d6f0d979f6e7dc2bbb1ebd9e239217e2e952 (patch) | |
tree | 1126044004b73df905a6183430376f1d97c3b6c9 /Documentation | |
parent | 516e77977085c9c50703fabb5dc61bd57a8cc1d0 (diff) | |
parent | a4ffc152198efba2ed9e6eac0eb97f17bfebce85 (diff) |
Merge tag 'dm-3.4-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-dm
Pull device-mapper changes for 3.4 from Alasdair Kergon:
- Update thin provisioning to support read-only external snapshot
origins and discards.
- A new target, dm verity, for device content validation.
- Mark dm uevent and dm raid as no-longer-experimental.
- Miscellaneous other fixes and clean-ups.
* tag 'dm-3.4-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-dm: (27 commits)
dm: add verity target
dm bufio: prefetch
dm thin: add pool target flags to control discard
dm thin: support discards
dm thin: prepare to support discard
dm thin: use dm_target_offset
dm thin: support read only external snapshot origins
dm thin: relax hard limit on the maximum size of a metadata device
dm persistent data: remove space map ref_count entries if redundant
dm thin: commit outstanding data every second
dm: reject trailing characters in sccanf input
dm raid: handle failed devices during start up
dm thin metadata: pass correct space map to dm_sm_root_size
dm persistent data: remove redundant value_size arg from value_ptr
dm mpath: detect invalid map_context
dm: clear bi_end_io on remapping failure
dm table: simplify call to free_devices
dm thin: correct comments
dm raid: no longer experimental
dm uevent: no longer experimental
...
Diffstat (limited to 'Documentation')
-rw-r--r-- | Documentation/ABI/testing/sysfs-block-dm | 25 | ||||
-rw-r--r-- | Documentation/device-mapper/thin-provisioning.txt | 65 | ||||
-rw-r--r-- | Documentation/device-mapper/verity.txt | 194 |
3 files changed, 269 insertions, 15 deletions
diff --git a/Documentation/ABI/testing/sysfs-block-dm b/Documentation/ABI/testing/sysfs-block-dm new file mode 100644 index 00000000000..87ca5691e29 --- /dev/null +++ b/Documentation/ABI/testing/sysfs-block-dm @@ -0,0 +1,25 @@ +What: /sys/block/dm-<num>/dm/name +Date: January 2009 +KernelVersion: 2.6.29 +Contact: dm-devel@redhat.com +Description: Device-mapper device name. + Read-only string containing mapped device name. +Users: util-linux, device-mapper udev rules + +What: /sys/block/dm-<num>/dm/uuid +Date: January 2009 +KernelVersion: 2.6.29 +Contact: dm-devel@redhat.com +Description: Device-mapper device UUID. + Read-only string containing DM-UUID or empty string + if DM-UUID is not set. +Users: util-linux, device-mapper udev rules + +What: /sys/block/dm-<num>/dm/suspended +Date: June 2009 +KernelVersion: 2.6.31 +Contact: dm-devel@redhat.com +Description: Device-mapper device suspend state. + Contains the value 1 while the device is suspended. + Otherwise it contains 0. Read-only attribute. +Users: util-linux, device-mapper udev rules diff --git a/Documentation/device-mapper/thin-provisioning.txt b/Documentation/device-mapper/thin-provisioning.txt index 1ff044d87ca..3370bc4d7b9 100644 --- a/Documentation/device-mapper/thin-provisioning.txt +++ b/Documentation/device-mapper/thin-provisioning.txt @@ -75,10 +75,12 @@ less sharing than average you'll need a larger-than-average metadata device. As a guide, we suggest you calculate the number of bytes to use in the metadata device as 48 * $data_dev_size / $data_block_size but round it up -to 2MB if the answer is smaller. The largest size supported is 16GB. +to 2MB if the answer is smaller. If you're creating large numbers of +snapshots which are recording large amounts of change, you may find you +need to increase this. -If you're creating large numbers of snapshots which are recording large -amounts of change, you may need find you need to increase this. +The largest size supported is 16GB: If the device is larger, +a warning will be issued and the excess space will not be used. Reloading a pool table ---------------------- @@ -167,6 +169,38 @@ ii) Using an internal snapshot. dmsetup create snap --table "0 2097152 thin /dev/mapper/pool 1" +External snapshots +------------------ + +You can use an external _read only_ device as an origin for a +thinly-provisioned volume. Any read to an unprovisioned area of the +thin device will be passed through to the origin. Writes trigger +the allocation of new blocks as usual. + +One use case for this is VM hosts that want to run guests on +thinly-provisioned volumes but have the base image on another device +(possibly shared between many VMs). + +You must not write to the origin device if you use this technique! +Of course, you may write to the thin device and take internal snapshots +of the thin volume. + +i) Creating a snapshot of an external device + + This is the same as creating a thin device. + You don't mention the origin at this stage. + + dmsetup message /dev/mapper/pool 0 "create_thin 0" + +ii) Using a snapshot of an external device. + + Append an extra parameter to the thin target specifying the origin: + + dmsetup create snap --table "0 2097152 thin /dev/mapper/pool 0 /dev/image" + + N.B. All descendants (internal snapshots) of this snapshot require the + same extra origin parameter. + Deactivation ------------ @@ -189,7 +223,13 @@ i) Constructor <low water mark (blocks)> [<number of feature args> [<arg>]*] Optional feature arguments: - - 'skip_block_zeroing': skips the zeroing of newly-provisioned blocks. + + skip_block_zeroing: Skip the zeroing of newly-provisioned blocks. + + ignore_discard: Disable discard support. + + no_discard_passdown: Don't pass discards down to the underlying + data device, but just remove the mapping. Data block size must be between 64KB (128 sectors) and 1GB (2097152 sectors) inclusive. @@ -237,16 +277,6 @@ iii) Messages Deletes a thin device. Irreversible. - trim <dev id> <new size in sectors> - - Delete mappings from the end of a thin device. Irreversible. - You might want to use this if you're reducing the size of - your thinly-provisioned device. In many cases, due to the - sharing of blocks between devices, it is not possible to - determine in advance how much space 'trim' will release. (In - future a userspace tool might be able to perform this - calculation.) - set_transaction_id <current id> <new id> Userland volume managers, such as LVM, need a way to @@ -262,7 +292,7 @@ iii) Messages i) Constructor - thin <pool dev> <dev id> + thin <pool dev> <dev id> [<external origin dev>] pool dev: the thin-pool device, e.g. /dev/mapper/my_pool or 253:0 @@ -271,6 +301,11 @@ i) Constructor the internal device identifier of the device to be activated. + external origin dev: + an optional block device outside the pool to be treated as a + read-only snapshot origin: reads to unprovisioned areas of the + thin target will be mapped to this device. + The pool doesn't store any size against the thin devices. If you load a thin target that is smaller than you've been using previously, then you'll have no access to blocks mapped beyond the end. If you diff --git a/Documentation/device-mapper/verity.txt b/Documentation/device-mapper/verity.txt new file mode 100644 index 00000000000..32e48797a14 --- /dev/null +++ b/Documentation/device-mapper/verity.txt @@ -0,0 +1,194 @@ +dm-verity +========== + +Device-Mapper's "verity" target provides transparent integrity checking of +block devices using a cryptographic digest provided by the kernel crypto API. +This target is read-only. + +Construction Parameters +======================= + <version> <dev> <hash_dev> <hash_start> + <data_block_size> <hash_block_size> + <num_data_blocks> <hash_start_block> + <algorithm> <digest> <salt> + +<version> + This is the version number of the on-disk format. + + 0 is the original format used in the Chromium OS. + The salt is appended when hashing, digests are stored continuously and + the rest of the block is padded with zeros. + + 1 is the current format that should be used for new devices. + The salt is prepended when hashing and each digest is + padded with zeros to the power of two. + +<dev> + This is the device containing the data the integrity of which needs to be + checked. It may be specified as a path, like /dev/sdaX, or a device number, + <major>:<minor>. + +<hash_dev> + This is the device that that supplies the hash tree data. It may be + specified similarly to the device path and may be the same device. If the + same device is used, the hash_start should be outside of the dm-verity + configured device size. + +<data_block_size> + The block size on a data device. Each block corresponds to one digest on + the hash device. + +<hash_block_size> + The size of a hash block. + +<num_data_blocks> + The number of data blocks on the data device. Additional blocks are + inaccessible. You can place hashes to the same partition as data, in this + case hashes are placed after <num_data_blocks>. + +<hash_start_block> + This is the offset, in <hash_block_size>-blocks, from the start of hash_dev + to the root block of the hash tree. + +<algorithm> + The cryptographic hash algorithm used for this device. This should + be the name of the algorithm, like "sha1". + +<digest> + The hexadecimal encoding of the cryptographic hash of the root hash block + and the salt. This hash should be trusted as there is no other authenticity + beyond this point. + +<salt> + The hexadecimal encoding of the salt value. + +Theory of operation +=================== + +dm-verity is meant to be setup as part of a verified boot path. This +may be anything ranging from a boot using tboot or trustedgrub to just +booting from a known-good device (like a USB drive or CD). + +When a dm-verity device is configured, it is expected that the caller +has been authenticated in some way (cryptographic signatures, etc). +After instantiation, all hashes will be verified on-demand during +disk access. If they cannot be verified up to the root node of the +tree, the root hash, then the I/O will fail. This should identify +tampering with any data on the device and the hash data. + +Cryptographic hashes are used to assert the integrity of the device on a +per-block basis. This allows for a lightweight hash computation on first read +into the page cache. Block hashes are stored linearly-aligned to the nearest +block the size of a page. + +Hash Tree +--------- + +Each node in the tree is a cryptographic hash. If it is a leaf node, the hash +is of some block data on disk. If it is an intermediary node, then the hash is +of a number of child nodes. + +Each entry in the tree is a collection of neighboring nodes that fit in one +block. The number is determined based on block_size and the size of the +selected cryptographic digest algorithm. The hashes are linearly-ordered in +this entry and any unaligned trailing space is ignored but included when +calculating the parent node. + +The tree looks something like: + +alg = sha256, num_blocks = 32768, block_size = 4096 + + [ root ] + / . . . \ + [entry_0] [entry_1] + / . . . \ . . . \ + [entry_0_0] . . . [entry_0_127] . . . . [entry_1_127] + / ... \ / . . . \ / \ + blk_0 ... blk_127 blk_16256 blk_16383 blk_32640 . . . blk_32767 + + +On-disk format +============== + +Below is the recommended on-disk format. The verity kernel code does not +read the on-disk header. It only reads the hash blocks which directly +follow the header. It is expected that a user-space tool will verify the +integrity of the verity_header and then call dmsetup with the correct +parameters. Alternatively, the header can be omitted and the dmsetup +parameters can be passed via the kernel command-line in a rooted chain +of trust where the command-line is verified. + +The on-disk format is especially useful in cases where the hash blocks +are on a separate partition. The magic number allows easy identification +of the partition contents. Alternatively, the hash blocks can be stored +in the same partition as the data to be verified. In such a configuration +the filesystem on the partition would be sized a little smaller than +the full-partition, leaving room for the hash blocks. + +struct superblock { + uint8_t signature[8] + "verity\0\0"; + + uint8_t version; + 1 - current format + + uint8_t data_block_bits; + log2(data block size) + + uint8_t hash_block_bits; + log2(hash block size) + + uint8_t pad1[1]; + zero padding + + uint16_t salt_size; + big-endian salt size + + uint8_t pad2[2]; + zero padding + + uint32_t data_blocks_hi; + big-endian high 32 bits of the 64-bit number of data blocks + + uint32_t data_blocks_lo; + big-endian low 32 bits of the 64-bit number of data blocks + + uint8_t algorithm[16]; + cryptographic algorithm + + uint8_t salt[384]; + salt (the salt size is specified above) + + uint8_t pad3[88]; + zero padding to 512-byte boundary +} + +Directly following the header (and with sector number padded to the next hash +block boundary) are the hash blocks which are stored a depth at a time +(starting from the root), sorted in order of increasing index. + +Status +====== +V (for Valid) is returned if every check performed so far was valid. +If any check failed, C (for Corruption) is returned. + +Example +======= + +Setup a device: + dmsetup create vroot --table \ + "0 2097152 "\ + "verity 1 /dev/sda1 /dev/sda2 4096 4096 2097152 1 "\ + "4392712ba01368efdf14b05c76f9e4df0d53664630b5d48632ed17a137f39076 "\ + "1234000000000000000000000000000000000000000000000000000000000000" + +A command line tool veritysetup is available to compute or verify +the hash tree or activate the kernel driver. This is available from +the LVM2 upstream repository and may be supplied as a package called +device-mapper-verity-tools: + git://sources.redhat.com/git/lvm2 + http://sourceware.org/git/?p=lvm2.git + http://sourceware.org/cgi-bin/cvsweb.cgi/LVM2/verity?cvsroot=lvm2 + +veritysetup -a vroot /dev/sda1 /dev/sda2 \ + 4392712ba01368efdf14b05c76f9e4df0d53664630b5d48632ed17a137f39076 |