Skip to content

Commit

Permalink
docs: convert docs to ReST and rename to *.rst
Browse files Browse the repository at this point in the history
The conversion is actually:
  - add blank lines and indentation in order to identify paragraphs;
  - fix tables markups;
  - add some lists markups;
  - mark literal blocks;
  - adjust title markups.

At its new index.rst, let's add a :orphan: while this is not linked to
the main index.rst file, in order to avoid build warnings.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
  • Loading branch information
Mauro Carvalho Chehab authored and Jonathan Corbet committed Jun 14, 2019
1 parent 8ea6188 commit f0ba437
Show file tree
Hide file tree
Showing 36 changed files with 1,142 additions and 814 deletions.
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
=============================
Guidance for writing policies
=============================

Expand Down Expand Up @@ -30,7 +31,7 @@ multiqueue (mq)

This policy is now an alias for smq (see below).

The following tunables are accepted, but have no effect:
The following tunables are accepted, but have no effect::

'sequential_threshold <#nr_sequential_ios>'
'random_threshold <#nr_random_ios>'
Expand All @@ -56,7 +57,9 @@ mq policy's hints to be dropped. Also, performance of the cache may
degrade slightly until smq recalculates the origin device's hotspots
that should be cached.

Memory usage:
Memory usage
^^^^^^^^^^^^

The mq policy used a lot of memory; 88 bytes per cache block on a 64
bit machine.

Expand All @@ -69,7 +72,9 @@ cache block).
All this means smq uses ~25bytes per cache block. Still a lot of
memory, but a substantial improvement nontheless.

Level balancing:
Level balancing
^^^^^^^^^^^^^^^

mq placed entries in different levels of the multiqueue structures
based on their hit count (~ln(hit count)). This meant the bottom
levels generally had the most entries, and the top ones had very
Expand All @@ -94,7 +99,9 @@ is used to decide which blocks to promote. If the hotspot queue is
performing badly then it starts moving entries more quickly between
levels. This lets it adapt to new IO patterns very quickly.

Performance:
Performance
^^^^^^^^^^^

Testing smq shows substantially better performance than mq.

cleaner
Expand All @@ -105,16 +112,19 @@ The cleaner writes back all dirty blocks in a cache to decommission it.
Examples
========

The syntax for a table is:
The syntax for a table is::

cache <metadata dev> <cache dev> <origin dev> <block size>
<#feature_args> [<feature arg>]*
<policy> <#policy_args> [<policy arg>]*

The syntax to send a message using the dmsetup command is:
The syntax to send a message using the dmsetup command is::

dmsetup message <mapped device> 0 sequential_threshold 1024
dmsetup message <mapped device> 0 random_threshold 8

Using dmsetup:
Using dmsetup::

dmsetup create blah --table "0 268435456 cache /dev/sdb /dev/sdc \
/dev/sdd 512 0 mq 4 sequential_threshold 1024 random_threshold 8"
creates a 128GB large mapped device named 'blah' with the
Expand Down
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
=====
Cache
=====

Introduction
============

Expand All @@ -24,10 +28,13 @@ scenarios (eg. a vm image server).
Glossary
========

Migration - Movement of the primary copy of a logical block from one
Migration
Movement of the primary copy of a logical block from one
device to the other.
Promotion - Migration from slow device to fast device.
Demotion - Migration from fast device to slow device.
Promotion
Migration from slow device to fast device.
Demotion
Migration from fast device to slow device.

The origin device always contains a copy of the logical block, which
may be out of date or kept in sync with the copy on the cache device
Expand Down Expand Up @@ -169,45 +176,53 @@ Target interface
Constructor
-----------

cache <metadata dev> <cache dev> <origin dev> <block size>
<#feature args> [<feature arg>]*
<policy> <#policy args> [policy args]*
::

cache <metadata dev> <cache dev> <origin dev> <block size>
<#feature args> [<feature arg>]*
<policy> <#policy args> [policy args]*

metadata dev : fast device holding the persistent metadata
cache dev : fast device holding cached data blocks
origin dev : slow device holding original data blocks
block size : cache unit size in sectors
================ =======================================================
metadata dev fast device holding the persistent metadata
cache dev fast device holding cached data blocks
origin dev slow device holding original data blocks
block size cache unit size in sectors

#feature args : number of feature arguments passed
feature args : writethrough or passthrough (The default is writeback.)
#feature args number of feature arguments passed
feature args writethrough or passthrough (The default is writeback.)

policy : the replacement policy to use
#policy args : an even number of arguments corresponding to
key/value pairs passed to the policy
policy args : key/value pairs passed to the policy
E.g. 'sequential_threshold 1024'
See cache-policies.txt for details.
policy the replacement policy to use
#policy args an even number of arguments corresponding to
key/value pairs passed to the policy
policy args key/value pairs passed to the policy
E.g. 'sequential_threshold 1024'
See cache-policies.txt for details.
================ =======================================================

Optional feature arguments are:
writethrough : write through caching that prohibits cache block
content from being different from origin block content.
Without this argument, the default behaviour is to write
back cache block contents later for performance reasons,
so they may differ from the corresponding origin blocks.

passthrough : a degraded mode useful for various cache coherency
situations (e.g., rolling back snapshots of
underlying storage). Reads and writes always go to
the origin. If a write goes to a cached origin
block, then the cache block is invalidated.
To enable passthrough mode the cache must be clean.

metadata2 : use version 2 of the metadata. This stores the dirty bits
in a separate btree, which improves speed of shutting
down the cache.

no_discard_passdown : disable passing down discards from the cache
to the origin's data device.


==================== ========================================================
writethrough write through caching that prohibits cache block
content from being different from origin block content.
Without this argument, the default behaviour is to write
back cache block contents later for performance reasons,
so they may differ from the corresponding origin blocks.

passthrough a degraded mode useful for various cache coherency
situations (e.g., rolling back snapshots of
underlying storage). Reads and writes always go to
the origin. If a write goes to a cached origin
block, then the cache block is invalidated.
To enable passthrough mode the cache must be clean.

metadata2 use version 2 of the metadata. This stores the dirty
bits in a separate btree, which improves speed of
shutting down the cache.

no_discard_passdown disable passing down discards from the cache
to the origin's data device.
==================== ========================================================

A policy called 'default' is always registered. This is an alias for
the policy we currently think is giving best all round performance.
Expand All @@ -218,54 +233,61 @@ the characteristics of a specific policy, always request it by name.
Status
------

<metadata block size> <#used metadata blocks>/<#total metadata blocks>
<cache block size> <#used cache blocks>/<#total cache blocks>
<#read hits> <#read misses> <#write hits> <#write misses>
<#demotions> <#promotions> <#dirty> <#features> <features>*
<#core args> <core args>* <policy name> <#policy args> <policy args>*
<cache metadata mode>

metadata block size : Fixed block size for each metadata block in
sectors
#used metadata blocks : Number of metadata blocks used
#total metadata blocks : Total number of metadata blocks
cache block size : Configurable block size for the cache device
in sectors
#used cache blocks : Number of blocks resident in the cache
#total cache blocks : Total number of cache blocks
#read hits : Number of times a READ bio has been mapped
to the cache
#read misses : Number of times a READ bio has been mapped
to the origin
#write hits : Number of times a WRITE bio has been mapped
to the cache
#write misses : Number of times a WRITE bio has been
mapped to the origin
#demotions : Number of times a block has been removed
from the cache
#promotions : Number of times a block has been moved to
the cache
#dirty : Number of blocks in the cache that differ
from the origin
#feature args : Number of feature args to follow
feature args : 'writethrough' (optional)
#core args : Number of core arguments (must be even)
core args : Key/value pairs for tuning the core
e.g. migration_threshold
policy name : Name of the policy
#policy args : Number of policy arguments to follow (must be even)
policy args : Key/value pairs e.g. sequential_threshold
cache metadata mode : ro if read-only, rw if read-write
In serious cases where even a read-only mode is deemed unsafe
no further I/O will be permitted and the status will just
contain the string 'Fail'. The userspace recovery tools
should then be used.
needs_check : 'needs_check' if set, '-' if not set
A metadata operation has failed, resulting in the needs_check
flag being set in the metadata's superblock. The metadata
device must be deactivated and checked/repaired before the
cache can be made fully operational again. '-' indicates
needs_check is not set.
::

<metadata block size> <#used metadata blocks>/<#total metadata blocks>
<cache block size> <#used cache blocks>/<#total cache blocks>
<#read hits> <#read misses> <#write hits> <#write misses>
<#demotions> <#promotions> <#dirty> <#features> <features>*
<#core args> <core args>* <policy name> <#policy args> <policy args>*
<cache metadata mode>


========================= =====================================================
metadata block size Fixed block size for each metadata block in
sectors
#used metadata blocks Number of metadata blocks used
#total metadata blocks Total number of metadata blocks
cache block size Configurable block size for the cache device
in sectors
#used cache blocks Number of blocks resident in the cache
#total cache blocks Total number of cache blocks
#read hits Number of times a READ bio has been mapped
to the cache
#read misses Number of times a READ bio has been mapped
to the origin
#write hits Number of times a WRITE bio has been mapped
to the cache
#write misses Number of times a WRITE bio has been
mapped to the origin
#demotions Number of times a block has been removed
from the cache
#promotions Number of times a block has been moved to
the cache
#dirty Number of blocks in the cache that differ
from the origin
#feature args Number of feature args to follow
feature args 'writethrough' (optional)
#core args Number of core arguments (must be even)
core args Key/value pairs for tuning the core
e.g. migration_threshold
policy name Name of the policy
#policy args Number of policy arguments to follow (must be even)
policy args Key/value pairs e.g. sequential_threshold
cache metadata mode ro if read-only, rw if read-write

In serious cases where even a read-only mode is
deemed unsafe no further I/O will be permitted and
the status will just contain the string 'Fail'.
The userspace recovery tools should then be used.
needs_check 'needs_check' if set, '-' if not set
A metadata operation has failed, resulting in the
needs_check flag being set in the metadata's
superblock. The metadata device must be
deactivated and checked/repaired before the
cache can be made fully operational again.
'-' indicates needs_check is not set.
========================= =====================================================

Messages
--------
Expand All @@ -274,11 +296,12 @@ Policies will have different tunables, specific to each one, so we
need a generic way of getting and setting these. Device-mapper
messages are used. (A sysfs interface would also be possible.)

The message format is:
The message format is::

<key> <value>

E.g.
E.g.::

dmsetup message my_cache 0 sequential_threshold 1024


Expand All @@ -290,11 +313,12 @@ of values from 5 to 9. Each cblock must be expressed as a decimal
value, in the future a variant message that takes cblock ranges
expressed in hexadecimal may be needed to better support efficient
invalidation of larger caches. The cache must be in passthrough mode
when invalidate_cblocks is used.
when invalidate_cblocks is used::

invalidate_cblocks [<cblock>|<cblock begin>-<cblock end>]*

E.g.
E.g.::

dmsetup message my_cache 0 invalidate_cblocks 2345 3456-4567 5678-6789

Examples
Expand All @@ -304,8 +328,10 @@ The test suite can be found here:

https://github.com/jthornber/device-mapper-test-suite

dmsetup create my_cache --table '0 41943040 cache /dev/mapper/metadata \
/dev/mapper/ssd /dev/mapper/origin 512 1 writeback default 0'
dmsetup create my_cache --table '0 41943040 cache /dev/mapper/metadata \
/dev/mapper/ssd /dev/mapper/origin 1024 1 writeback \
mq 4 sequential_threshold 1024 random_threshold 8'
::

dmsetup create my_cache --table '0 41943040 cache /dev/mapper/metadata \
/dev/mapper/ssd /dev/mapper/origin 512 1 writeback default 0'
dmsetup create my_cache --table '0 41943040 cache /dev/mapper/metadata \
/dev/mapper/ssd /dev/mapper/origin 1024 1 writeback \
mq 4 sequential_threshold 1024 random_threshold 8'
Original file line number Diff line number Diff line change
@@ -1,10 +1,12 @@
========
dm-delay
========

Device-Mapper's "delay" target delays reads and/or writes
and maps them to different devices.

Parameters:
Parameters::

<device> <offset> <delay> [<write_device> <write_offset> <write_delay>
[<flush_device> <flush_offset> <flush_delay>]]

Expand All @@ -14,15 +16,16 @@ Delays are specified in milliseconds.

Example scripts
===============
[[
#!/bin/sh
# Create device delaying rw operation for 500ms
echo "0 `blockdev --getsz $1` delay $1 0 500" | dmsetup create delayed
]]

[[
#!/bin/sh
# Create device delaying only write operation for 500ms and
# splitting reads and writes to different devices $1 $2
echo "0 `blockdev --getsz $1` delay $1 0 0 $2 0 500" | dmsetup create delayed
]]

::

#!/bin/sh
# Create device delaying rw operation for 500ms
echo "0 `blockdev --getsz $1` delay $1 0 500" | dmsetup create delayed

::

#!/bin/sh
# Create device delaying only write operation for 500ms and
# splitting reads and writes to different devices $1 $2
echo "0 `blockdev --getsz $1` delay $1 0 0 $2 0 500" | dmsetup create delayed
Loading

0 comments on commit f0ba437

Please sign in to comment.