Skip to content

Commit

Permalink
Btrfs: rework outstanding_extents
Browse files Browse the repository at this point in the history
Right now we do a lot of weird hoops around outstanding_extents in order
to keep the extent count consistent.  This is because we logically
transfer the outstanding_extent count from the initial reservation
through the set_delalloc_bits.  This makes it pretty difficult to get a
handle on how and when we need to mess with outstanding_extents.

Fix this by revamping the rules of how we deal with outstanding_extents.
Now instead everybody that is holding on to a delalloc extent is
required to increase the outstanding extents count for itself.  This
means we'll have something like this

btrfs_delalloc_reserve_metadata	- outstanding_extents = 1
 btrfs_set_extent_delalloc	- outstanding_extents = 2
btrfs_release_delalloc_extents	- outstanding_extents = 1

for an initial file write.  Now take the append write where we extend an
existing delalloc range but still under the maximum extent size

btrfs_delalloc_reserve_metadata - outstanding_extents = 2
  btrfs_set_extent_delalloc
    btrfs_set_bit_hook		- outstanding_extents = 3
    btrfs_merge_extent_hook	- outstanding_extents = 2
btrfs_delalloc_release_extents	- outstanding_extnets = 1

In order to make the ordered extent transition we of course must now
make ordered extents carry their own outstanding_extent reservation, so
for cow_file_range we end up with

btrfs_add_ordered_extent	- outstanding_extents = 2
clear_extent_bit		- outstanding_extents = 1
btrfs_remove_ordered_extent	- outstanding_extents = 0

This makes all manipulations of outstanding_extents much more explicit.
Every successful call to btrfs_delalloc_reserve_metadata _must_ now be
combined with btrfs_release_delalloc_extents, even in the error case, as
that is the only function that actually modifies the
outstanding_extents counter.

The drawback to this is now we are much more likely to have transient
cases where outstanding_extents is much larger than it actually should
be.  This could happen before as we manipulated the delalloc bits, but
now it happens basically at every write.  This may put more pressure on
the ENOSPC flushing code, but I think making this code simpler is worth
the cost.  I have another change coming to mitigate this side-effect
somewhat.

I also added trace points for the counter manipulation.  These were used
by a bpf script I wrote to help track down leak issues.

Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
  • Loading branch information
Josef Bacik authored and David Sterba committed Nov 1, 2017
1 parent b115e3b commit 8b62f87
Show file tree
Hide file tree
Showing 10 changed files with 185 additions and 156 deletions.
17 changes: 17 additions & 0 deletions fs/btrfs/btrfs_inode.h
Original file line number Diff line number Diff line change
Expand Up @@ -267,6 +267,23 @@ static inline bool btrfs_is_free_space_inode(struct btrfs_inode *inode)
return false;
}

static inline void btrfs_mod_outstanding_extents(struct btrfs_inode *inode,
int mod)
{
lockdep_assert_held(&inode->lock);
inode->outstanding_extents += mod;
if (btrfs_is_free_space_inode(inode))
return;
}

static inline void btrfs_mod_reserved_extents(struct btrfs_inode *inode, int mod)
{
lockdep_assert_held(&inode->lock);
inode->reserved_extents += mod;
if (btrfs_is_free_space_inode(inode))
return;
}

static inline int btrfs_inode_in_log(struct btrfs_inode *inode, u64 generation)
{
int ret = 0;
Expand Down
2 changes: 2 additions & 0 deletions fs/btrfs/ctree.h
Original file line number Diff line number Diff line change
Expand Up @@ -2748,6 +2748,8 @@ int btrfs_subvolume_reserve_metadata(struct btrfs_root *root,
u64 *qgroup_reserved, bool use_global_rsv);
void btrfs_subvolume_release_metadata(struct btrfs_fs_info *fs_info,
struct btrfs_block_rsv *rsv);
void btrfs_delalloc_release_extents(struct btrfs_inode *inode, u64 num_bytes);

int btrfs_delalloc_reserve_metadata(struct btrfs_inode *inode, u64 num_bytes);
void btrfs_delalloc_release_metadata(struct btrfs_inode *inode, u64 num_bytes);
int btrfs_delalloc_reserve_space(struct inode *inode,
Expand Down
139 changes: 92 additions & 47 deletions fs/btrfs/extent-tree.c
Original file line number Diff line number Diff line change
Expand Up @@ -5954,42 +5954,31 @@ void btrfs_subvolume_release_metadata(struct btrfs_fs_info *fs_info,
}

/**
* drop_outstanding_extent - drop an outstanding extent
* drop_over_reserved_extents - drop our extra extent reservations
* @inode: the inode we're dropping the extent for
* @num_bytes: the number of bytes we're releasing.
*
* This is called when we are freeing up an outstanding extent, either called
* after an error or after an extent is written. This will return the number of
* reserved extents that need to be freed. This must be called with
* BTRFS_I(inode)->lock held.
* We reserve extents we may use, but they may have been merged with other
* extents and we may not need the extra reservation.
*
* We also call this when we've completed io to an extent or had an error and
* cleared the outstanding extent, in either case we no longer need our
* reservation and can drop the excess.
*/
static unsigned drop_outstanding_extent(struct btrfs_inode *inode,
u64 num_bytes)
static unsigned drop_over_reserved_extents(struct btrfs_inode *inode)
{
unsigned drop_inode_space = 0;
unsigned dropped_extents = 0;
unsigned num_extents;
unsigned num_extents = 0;

num_extents = count_max_extents(num_bytes);
ASSERT(num_extents);
ASSERT(inode->outstanding_extents >= num_extents);
inode->outstanding_extents -= num_extents;
if (inode->reserved_extents > inode->outstanding_extents) {
num_extents = inode->reserved_extents -
inode->outstanding_extents;
btrfs_mod_reserved_extents(inode, -num_extents);
}

if (inode->outstanding_extents == 0 &&
test_and_clear_bit(BTRFS_INODE_DELALLOC_META_RESERVED,
&inode->runtime_flags))
drop_inode_space = 1;

/*
* If we have more or the same amount of outstanding extents than we have
* reserved then we need to leave the reserved extents count alone.
*/
if (inode->outstanding_extents >= inode->reserved_extents)
return drop_inode_space;

dropped_extents = inode->reserved_extents - inode->outstanding_extents;
inode->reserved_extents -= dropped_extents;
return dropped_extents + drop_inode_space;
num_extents++;
return num_extents;
}

/**
Expand Down Expand Up @@ -6044,13 +6033,15 @@ int btrfs_delalloc_reserve_metadata(struct btrfs_inode *inode, u64 num_bytes)
struct btrfs_block_rsv *block_rsv = &fs_info->delalloc_block_rsv;
u64 to_reserve = 0;
u64 csum_bytes;
unsigned nr_extents;
unsigned nr_extents, reserve_extents;
enum btrfs_reserve_flush_enum flush = BTRFS_RESERVE_FLUSH_ALL;
int ret = 0;
bool delalloc_lock = true;
u64 to_free = 0;
unsigned dropped;
bool release_extra = false;
bool underflow = false;
bool did_retry = false;

/* If we are a free space inode we need to not flush since we will be in
* the middle of a transaction commit. We also don't need the delalloc
Expand All @@ -6075,18 +6066,31 @@ int btrfs_delalloc_reserve_metadata(struct btrfs_inode *inode, u64 num_bytes)
mutex_lock(&inode->delalloc_mutex);

num_bytes = ALIGN(num_bytes, fs_info->sectorsize);

retry:
spin_lock(&inode->lock);
nr_extents = count_max_extents(num_bytes);
inode->outstanding_extents += nr_extents;
reserve_extents = nr_extents = count_max_extents(num_bytes);
btrfs_mod_outstanding_extents(inode, nr_extents);

nr_extents = 0;
if (inode->outstanding_extents > inode->reserved_extents)
nr_extents += inode->outstanding_extents -
/*
* Because we add an outstanding extent for ordered before we clear
* delalloc we will double count our outstanding extents slightly. This
* could mean that we transiently over-reserve, which could result in an
* early ENOSPC if our timing is unlucky. Keep track of the case that
* we had a reservation underflow so we can retry if we fail.
*
* Keep in mind we can legitimately have more outstanding extents than
* reserved because of fragmentation, so only allow a retry once.
*/
if (inode->outstanding_extents >
inode->reserved_extents + nr_extents) {
reserve_extents = inode->outstanding_extents -
inode->reserved_extents;
underflow = true;
}

/* We always want to reserve a slot for updating the inode. */
to_reserve = btrfs_calc_trans_metadata_size(fs_info, nr_extents + 1);
to_reserve = btrfs_calc_trans_metadata_size(fs_info,
reserve_extents + 1);
to_reserve += calc_csum_metadata_size(inode, num_bytes, 1);
csum_bytes = inode->csum_bytes;
spin_unlock(&inode->lock);
Expand All @@ -6111,7 +6115,7 @@ int btrfs_delalloc_reserve_metadata(struct btrfs_inode *inode, u64 num_bytes)
to_reserve -= btrfs_calc_trans_metadata_size(fs_info, 1);
release_extra = true;
}
inode->reserved_extents += nr_extents;
btrfs_mod_reserved_extents(inode, reserve_extents);
spin_unlock(&inode->lock);

if (delalloc_lock)
Expand All @@ -6127,7 +6131,10 @@ int btrfs_delalloc_reserve_metadata(struct btrfs_inode *inode, u64 num_bytes)

out_fail:
spin_lock(&inode->lock);
dropped = drop_outstanding_extent(inode, num_bytes);
nr_extents = count_max_extents(num_bytes);
btrfs_mod_outstanding_extents(inode, -nr_extents);

dropped = drop_over_reserved_extents(inode);
/*
* If the inodes csum_bytes is the same as the original
* csum_bytes then we know we haven't raced with any free()ers
Expand Down Expand Up @@ -6184,19 +6191,24 @@ int btrfs_delalloc_reserve_metadata(struct btrfs_inode *inode, u64 num_bytes)
trace_btrfs_space_reservation(fs_info, "delalloc",
btrfs_ino(inode), to_free, 0);
}
if (underflow && !did_retry) {
did_retry = true;
underflow = false;
goto retry;
}
if (delalloc_lock)
mutex_unlock(&inode->delalloc_mutex);
return ret;
}

/**
* btrfs_delalloc_release_metadata - release a metadata reservation for an inode
* @inode: the inode to release the reservation for
* @num_bytes: the number of bytes we're releasing
* @inode: the inode to release the reservation for.
* @num_bytes: the number of bytes we are releasing.
*
* This will release the metadata reservation for an inode. This can be called
* once we complete IO for a given set of bytes to release their metadata
* reservations.
* reservations, or on error for the same reason.
*/
void btrfs_delalloc_release_metadata(struct btrfs_inode *inode, u64 num_bytes)
{
Expand All @@ -6206,8 +6218,7 @@ void btrfs_delalloc_release_metadata(struct btrfs_inode *inode, u64 num_bytes)

num_bytes = ALIGN(num_bytes, fs_info->sectorsize);
spin_lock(&inode->lock);
dropped = drop_outstanding_extent(inode, num_bytes);

dropped = drop_over_reserved_extents(inode);
if (num_bytes)
to_free = calc_csum_metadata_size(inode, num_bytes, 0);
spin_unlock(&inode->lock);
Expand All @@ -6223,6 +6234,42 @@ void btrfs_delalloc_release_metadata(struct btrfs_inode *inode, u64 num_bytes)
btrfs_block_rsv_release(fs_info, &fs_info->delalloc_block_rsv, to_free);
}

/**
* btrfs_delalloc_release_extents - release our outstanding_extents
* @inode: the inode to balance the reservation for.
* @num_bytes: the number of bytes we originally reserved with
*
* When we reserve space we increase outstanding_extents for the extents we may
* add. Once we've set the range as delalloc or created our ordered extents we
* have outstanding_extents to track the real usage, so we use this to free our
* temporarily tracked outstanding_extents. This _must_ be used in conjunction
* with btrfs_delalloc_reserve_metadata.
*/
void btrfs_delalloc_release_extents(struct btrfs_inode *inode, u64 num_bytes)
{
struct btrfs_fs_info *fs_info = btrfs_sb(inode->vfs_inode.i_sb);
unsigned num_extents;
u64 to_free;
unsigned dropped;

spin_lock(&inode->lock);
num_extents = count_max_extents(num_bytes);
btrfs_mod_outstanding_extents(inode, -num_extents);
dropped = drop_over_reserved_extents(inode);
spin_unlock(&inode->lock);

if (!dropped)
return;

if (btrfs_is_testing(fs_info))
return;

to_free = btrfs_calc_trans_metadata_size(fs_info, dropped);
trace_btrfs_space_reservation(fs_info, "delalloc", btrfs_ino(inode),
to_free, 0);
btrfs_block_rsv_release(fs_info, &fs_info->delalloc_block_rsv, to_free);
}

/**
* btrfs_delalloc_reserve_space - reserve data and metadata space for
* delalloc
Expand Down Expand Up @@ -6267,18 +6314,16 @@ int btrfs_delalloc_reserve_space(struct inode *inode,
* @inode: inode we're releasing space for
* @start: start position of the space already reserved
* @len: the len of the space already reserved
*
* This must be matched with a call to btrfs_delalloc_reserve_space. This is
* called in the case that we don't need the metadata AND data reservations
* anymore. So if there is an error or we insert an inline extent.
* @release_bytes: the len of the space we consumed or didn't use
*
* This function will release the metadata space that was not used and will
* decrement ->delalloc_bytes and remove it from the fs_info delalloc_inodes
* list if there are no delalloc bytes left.
* Also it will handle the qgroup reserved space.
*/
void btrfs_delalloc_release_space(struct inode *inode,
struct extent_changeset *reserved, u64 start, u64 len)
struct extent_changeset *reserved,
u64 start, u64 len)
{
btrfs_delalloc_release_metadata(BTRFS_I(inode), len);
btrfs_free_reserved_data_space(inode, reserved, start, len);
Expand Down
22 changes: 8 additions & 14 deletions fs/btrfs/file.c
Original file line number Diff line number Diff line change
Expand Up @@ -1656,6 +1656,7 @@ static noinline ssize_t __btrfs_buffered_write(struct file *file,
}
}

WARN_ON(reserve_bytes == 0);
ret = btrfs_delalloc_reserve_metadata(BTRFS_I(inode),
reserve_bytes);
if (ret) {
Expand All @@ -1678,8 +1679,11 @@ static noinline ssize_t __btrfs_buffered_write(struct file *file,
ret = prepare_pages(inode, pages, num_pages,
pos, write_bytes,
force_page_uptodate);
if (ret)
if (ret) {
btrfs_delalloc_release_extents(BTRFS_I(inode),
reserve_bytes);
break;
}

extents_locked = lock_and_cleanup_extent_if_need(
BTRFS_I(inode), pages,
Expand All @@ -1688,6 +1692,8 @@ static noinline ssize_t __btrfs_buffered_write(struct file *file,
if (extents_locked < 0) {
if (extents_locked == -EAGAIN)
goto again;
btrfs_delalloc_release_extents(BTRFS_I(inode),
reserve_bytes);
ret = extents_locked;
break;
}
Expand Down Expand Up @@ -1716,23 +1722,10 @@ static noinline ssize_t __btrfs_buffered_write(struct file *file,
PAGE_SIZE);
}

/*
* If we had a short copy we need to release the excess delaloc
* bytes we reserved. We need to increment outstanding_extents
* because btrfs_delalloc_release_space and
* btrfs_delalloc_release_metadata will decrement it, but
* we still have an outstanding extent for the chunk we actually
* managed to copy.
*/
if (num_sectors > dirty_sectors) {
/* release everything except the sectors we dirtied */
release_bytes -= dirty_sectors <<
fs_info->sb->s_blocksize_bits;
if (copied > 0) {
spin_lock(&BTRFS_I(inode)->lock);
BTRFS_I(inode)->outstanding_extents++;
spin_unlock(&BTRFS_I(inode)->lock);
}
if (only_release_metadata) {
btrfs_delalloc_release_metadata(BTRFS_I(inode),
release_bytes);
Expand All @@ -1758,6 +1751,7 @@ static noinline ssize_t __btrfs_buffered_write(struct file *file,
unlock_extent_cached(&BTRFS_I(inode)->io_tree,
lockstart, lockend, &cached_state,
GFP_NOFS);
btrfs_delalloc_release_extents(BTRFS_I(inode), reserve_bytes);
if (ret) {
btrfs_drop_pages(pages, num_pages);
break;
Expand Down
3 changes: 2 additions & 1 deletion fs/btrfs/inode-map.c
Original file line number Diff line number Diff line change
Expand Up @@ -500,11 +500,12 @@ int btrfs_save_ino_cache(struct btrfs_root *root,
ret = btrfs_prealloc_file_range_trans(inode, trans, 0, 0, prealloc,
prealloc, prealloc, &alloc_hint);
if (ret) {
btrfs_delalloc_release_metadata(BTRFS_I(inode), prealloc);
btrfs_delalloc_release_extents(BTRFS_I(inode), prealloc);
goto out_put;
}

ret = btrfs_write_out_ino_cache(root, trans, path, inode);
btrfs_delalloc_release_extents(BTRFS_I(inode), prealloc);
out_put:
iput(inode);
out_release:
Expand Down
Loading

0 comments on commit 8b62f87

Please sign in to comment.