Skip to content

Commit

Permalink
Merge tag 'dm-3.14-changes' of git://git.kernel.org/pub/scm/linux/ker…
Browse files Browse the repository at this point in the history
…nel/git/device-mapper/linux-dm

Pull device-mapper changes from Mike Snitzer:
 "A lot of attention was paid to improving the thin-provisioning
  target's handling of metadata operation failures and running out of
  space.  A new 'error_if_no_space' feature was added to allow users to
  error IOs rather than queue them when either the data or metadata
  space is exhausted.

  Additional fixes/features include:
   - a few fixes to properly support thin metadata device resizing
   - a solution for reliably waiting for a DM device's embedded kobject
     to be released before destroying the device
   - old dm-snapshot is updated to use the dm-bufio interface to take
     advantage of readahead capabilities that improve snapshot
     activation
   - new dm-cache target tunables to control how quickly data is
     promoted to the cache (fast) device
   - improved write efficiency of cluster mirror target by combining
     userspace flush and mark requests"

* tag 'dm-3.14-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (35 commits)
  dm log userspace: allow mark requests to piggyback on flush requests
  dm space map metadata: fix bug in resizing of thin metadata
  dm cache: add policy name to status output
  dm thin: fix pool feature parsing
  dm sysfs: fix a module unload race
  dm snapshot: use dm-bufio prefetch
  dm snapshot: use dm-bufio
  dm snapshot: prepare for switch to using dm-bufio
  dm snapshot: use GFP_KERNEL when initializing exceptions
  dm cache: add block sizes and total cache blocks to status output
  dm btree: add dm_btree_find_lowest_key
  dm space map metadata: fix extending the space map
  dm space map common: make sure new space is used during extend
  dm: wait until embedded kobject is released before destroying a device
  dm: remove pointless kobject comparison in dm_get_from_kobject
  dm snapshot: call destroy_work_on_stack() to pair with INIT_WORK_ONSTACK()
  dm cache policy mq: introduce three promotion threshold tunables
  dm cache policy mq: use list_del_init instead of list_del + INIT_LIST_HEAD
  dm thin: fix set_pool_mode exposed pool operation races
  dm thin: eliminate the no_free_space flag
  ...
  • Loading branch information
Linus Torvalds committed Jan 23, 2014
2 parents 194e57f + 5066a4d commit fe41c2c
Show file tree
Hide file tree
Showing 29 changed files with 767 additions and 321 deletions.
16 changes: 14 additions & 2 deletions Documentation/device-mapper/cache-policies.txt
Original file line number Diff line number Diff line change
Expand Up @@ -40,8 +40,11 @@ on hit count on entry. The policy aims to take different cache miss
costs into account and to adjust to varying load patterns automatically.

Message and constructor argument pairs are:
'sequential_threshold <#nr_sequential_ios>' and
'random_threshold <#nr_random_ios>'.
'sequential_threshold <#nr_sequential_ios>'
'random_threshold <#nr_random_ios>'
'read_promote_adjustment <value>'
'write_promote_adjustment <value>'
'discard_promote_adjustment <value>'

The sequential threshold indicates the number of contiguous I/Os
required before a stream is treated as sequential. The random threshold
Expand All @@ -55,6 +58,15 @@ since spindles tend to have good bandwidth. The io_tracker counts
contiguous I/Os to try to spot when the io is in one of these sequential
modes.

Internally the mq policy maintains a promotion threshold variable. If
the hit count of a block not in the cache goes above this threshold it
gets promoted to the cache. The read, write and discard promote adjustment
tunables allow you to tweak the promotion threshold by adding a small
value based on the io type. They default to 4, 8 and 1 respectively.
If you're trying to quickly warm a new cache device you may wish to
reduce these to encourage promotion. Remember to switch them back to
their defaults after the cache fills though.

cleaner
-------

Expand Down
51 changes: 29 additions & 22 deletions Documentation/device-mapper/cache.txt
Original file line number Diff line number Diff line change
Expand Up @@ -217,36 +217,43 @@ the characteristics of a specific policy, always request it by name.
Status
------

<#used metadata blocks>/<#total metadata blocks> <#read hits> <#read misses>
<#write hits> <#write misses> <#demotions> <#promotions> <#blocks in cache>
<#dirty> <#features> <features>* <#core args> <core args>* <#policy args>
<policy args>*

#used metadata blocks : Number of metadata blocks used
#total metadata blocks : Total number of metadata blocks
#read hits : Number of times a READ bio has been mapped
<metadata block size> <#used metadata blocks>/<#total metadata blocks>
<cache block size> <#used cache blocks>/<#total cache blocks>
<#read hits> <#read misses> <#write hits> <#write misses>
<#demotions> <#promotions> <#dirty> <#features> <features>*
<#core args> <core args>* <policy name> <#policy args> <policy args>*

metadata block size : Fixed block size for each metadata block in
sectors
#used metadata blocks : Number of metadata blocks used
#total metadata blocks : Total number of metadata blocks
cache block size : Configurable block size for the cache device
in sectors
#used cache blocks : Number of blocks resident in the cache
#total cache blocks : Total number of cache blocks
#read hits : Number of times a READ bio has been mapped
to the cache
#read misses : Number of times a READ bio has been mapped
#read misses : Number of times a READ bio has been mapped
to the origin
#write hits : Number of times a WRITE bio has been mapped
#write hits : Number of times a WRITE bio has been mapped
to the cache
#write misses : Number of times a WRITE bio has been
#write misses : Number of times a WRITE bio has been
mapped to the origin
#demotions : Number of times a block has been removed
#demotions : Number of times a block has been removed
from the cache
#promotions : Number of times a block has been moved to
#promotions : Number of times a block has been moved to
the cache
#blocks in cache : Number of blocks resident in the cache
#dirty : Number of blocks in the cache that differ
#dirty : Number of blocks in the cache that differ
from the origin
#feature args : Number of feature args to follow
feature args : 'writethrough' (optional)
#core args : Number of core arguments (must be even)
core args : Key/value pairs for tuning the core
#feature args : Number of feature args to follow
feature args : 'writethrough' (optional)
#core args : Number of core arguments (must be even)
core args : Key/value pairs for tuning the core
e.g. migration_threshold
#policy args : Number of policy arguments to follow (must be even)
policy args : Key/value pairs
e.g. 'sequential_threshold 1024
policy name : Name of the policy
#policy args : Number of policy arguments to follow (must be even)
policy args : Key/value pairs
e.g. sequential_threshold

Messages
--------
Expand Down
7 changes: 7 additions & 0 deletions Documentation/device-mapper/thin-provisioning.txt
Original file line number Diff line number Diff line change
Expand Up @@ -235,6 +235,8 @@ i) Constructor
read_only: Don't allow any changes to be made to the pool
metadata.

error_if_no_space: Error IOs, instead of queueing, if no space.

Data block size must be between 64KB (128 sectors) and 1GB
(2097152 sectors) inclusive.

Expand Down Expand Up @@ -276,6 +278,11 @@ ii) Status
contain the string 'Fail'. The userspace recovery tools
should then be used.

error_if_no_space|queue_if_no_space
If the pool runs out of data or metadata space, the pool will
either queue or error the IO destined to the data device. The
default is to queue the IO until more space is added.

iii) Messages

create_thin <dev id>
Expand Down
11 changes: 8 additions & 3 deletions drivers/md/Kconfig
Original file line number Diff line number Diff line change
Expand Up @@ -176,8 +176,12 @@ config MD_FAULTY

source "drivers/md/bcache/Kconfig"

config BLK_DEV_DM_BUILTIN
boolean

config BLK_DEV_DM
tristate "Device mapper support"
select BLK_DEV_DM_BUILTIN
---help---
Device-mapper is a low level volume manager. It works by allowing
people to specify mappings for ranges of logical sectors. Various
Expand Down Expand Up @@ -238,6 +242,7 @@ config DM_CRYPT
config DM_SNAPSHOT
tristate "Snapshot target"
depends on BLK_DEV_DM
select DM_BUFIO
---help---
Allow volume managers to take writable snapshots of a device.

Expand All @@ -250,12 +255,12 @@ config DM_THIN_PROVISIONING
Provides thin provisioning and snapshots that share a data store.

config DM_DEBUG_BLOCK_STACK_TRACING
boolean "Keep stack trace of thin provisioning block lock holders"
depends on STACKTRACE_SUPPORT && DM_THIN_PROVISIONING
boolean "Keep stack trace of persistent data block lock holders"
depends on STACKTRACE_SUPPORT && DM_PERSISTENT_DATA
select STACKTRACE
---help---
Enable this for messages that may help debug problems with the
block manager locking used by thin provisioning.
block manager locking used by thin provisioning and caching.

If unsure, say N.

Expand Down
1 change: 1 addition & 0 deletions drivers/md/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ obj-$(CONFIG_MD_FAULTY) += faulty.o
obj-$(CONFIG_BCACHE) += bcache/
obj-$(CONFIG_BLK_DEV_MD) += md-mod.o
obj-$(CONFIG_BLK_DEV_DM) += dm-mod.o
obj-$(CONFIG_BLK_DEV_DM_BUILTIN) += dm-builtin.o
obj-$(CONFIG_DM_BUFIO) += dm-bufio.o
obj-$(CONFIG_DM_BIO_PRISON) += dm-bio-prison.o
obj-$(CONFIG_DM_CRYPT) += dm-crypt.o
Expand Down
36 changes: 34 additions & 2 deletions drivers/md/dm-bufio.c
Original file line number Diff line number Diff line change
Expand Up @@ -104,6 +104,8 @@ struct dm_bufio_client {
struct list_head reserved_buffers;
unsigned need_reserved_buffers;

unsigned minimum_buffers;

struct hlist_head *cache_hash;
wait_queue_head_t free_buffer_wait;

Expand Down Expand Up @@ -861,8 +863,8 @@ static void __get_memory_limit(struct dm_bufio_client *c,
buffers = dm_bufio_cache_size_per_client >>
(c->sectors_per_block_bits + SECTOR_SHIFT);

if (buffers < DM_BUFIO_MIN_BUFFERS)
buffers = DM_BUFIO_MIN_BUFFERS;
if (buffers < c->minimum_buffers)
buffers = c->minimum_buffers;

*limit_buffers = buffers;
*threshold_buffers = buffers * DM_BUFIO_WRITEBACK_PERCENT / 100;
Expand Down Expand Up @@ -1350,6 +1352,34 @@ void dm_bufio_release_move(struct dm_buffer *b, sector_t new_block)
}
EXPORT_SYMBOL_GPL(dm_bufio_release_move);

/*
* Free the given buffer.
*
* This is just a hint, if the buffer is in use or dirty, this function
* does nothing.
*/
void dm_bufio_forget(struct dm_bufio_client *c, sector_t block)
{
struct dm_buffer *b;

dm_bufio_lock(c);

b = __find(c, block);
if (b && likely(!b->hold_count) && likely(!b->state)) {
__unlink_buffer(b);
__free_buffer_wake(b);
}

dm_bufio_unlock(c);
}
EXPORT_SYMBOL(dm_bufio_forget);

void dm_bufio_set_minimum_buffers(struct dm_bufio_client *c, unsigned n)
{
c->minimum_buffers = n;
}
EXPORT_SYMBOL(dm_bufio_set_minimum_buffers);

unsigned dm_bufio_get_block_size(struct dm_bufio_client *c)
{
return c->block_size;
Expand Down Expand Up @@ -1546,6 +1576,8 @@ struct dm_bufio_client *dm_bufio_client_create(struct block_device *bdev, unsign
INIT_LIST_HEAD(&c->reserved_buffers);
c->need_reserved_buffers = reserved_buffers;

c->minimum_buffers = DM_BUFIO_MIN_BUFFERS;

init_waitqueue_head(&c->free_buffer_wait);
c->async_write_error = 0;

Expand Down
12 changes: 12 additions & 0 deletions drivers/md/dm-bufio.h
Original file line number Diff line number Diff line change
Expand Up @@ -108,6 +108,18 @@ int dm_bufio_issue_flush(struct dm_bufio_client *c);
*/
void dm_bufio_release_move(struct dm_buffer *b, sector_t new_block);

/*
* Free the given buffer.
* This is just a hint, if the buffer is in use or dirty, this function
* does nothing.
*/
void dm_bufio_forget(struct dm_bufio_client *c, sector_t block);

/*
* Set the minimum number of buffers before cleanup happens.
*/
void dm_bufio_set_minimum_buffers(struct dm_bufio_client *c, unsigned n);

unsigned dm_bufio_get_block_size(struct dm_bufio_client *c);
sector_t dm_bufio_get_device_size(struct dm_bufio_client *c);
sector_t dm_bufio_get_block_number(struct dm_buffer *b);
Expand Down
48 changes: 48 additions & 0 deletions drivers/md/dm-builtin.c
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
#include "dm.h"

/*
* The kobject release method must not be placed in the module itself,
* otherwise we are subject to module unload races.
*
* The release method is called when the last reference to the kobject is
* dropped. It may be called by any other kernel code that drops the last
* reference.
*
* The release method suffers from module unload race. We may prevent the
* module from being unloaded at the start of the release method (using
* increased module reference count or synchronizing against the release
* method), however there is no way to prevent the module from being
* unloaded at the end of the release method.
*
* If this code were placed in the dm module, the following race may
* happen:
* 1. Some other process takes a reference to dm kobject
* 2. The user issues ioctl function to unload the dm device
* 3. dm_sysfs_exit calls kobject_put, however the object is not released
* because of the other reference taken at step 1
* 4. dm_sysfs_exit waits on the completion
* 5. The other process that took the reference in step 1 drops it,
* dm_kobject_release is called from this process
* 6. dm_kobject_release calls complete()
* 7. a reschedule happens before dm_kobject_release returns
* 8. dm_sysfs_exit continues, the dm device is unloaded, module reference
* count is decremented
* 9. The user unloads the dm module
* 10. The other process that was rescheduled in step 7 continues to run,
* it is now executing code in unloaded module, so it crashes
*
* Note that if the process that takes the foreign reference to dm kobject
* has a low priority and the system is sufficiently loaded with
* higher-priority processes that prevent the low-priority process from
* being scheduled long enough, this bug may really happen.
*
* In order to fix this module unload race, we place the release method
* into a helper code that is compiled directly into the kernel.
*/

void dm_kobject_release(struct kobject *kobj)
{
complete(dm_get_completion_from_kobject(kobj));
}

EXPORT_SYMBOL(dm_kobject_release);
Loading

0 comments on commit fe41c2c

Please sign in to comment.