Skip to content

Commit

Permalink
btrfs: re-check reclaim condition in reclaim worker
Browse files Browse the repository at this point in the history
I have observed the following case play out and lead to unnecessary
relocations:

1. write a file across multiple block groups
2. delete the file
3. several block groups fall below the reclaim threshold
4. reclaim the first, moving extents into the others
5. reclaim the others which are now actually very full, leading to poor
   reclaim behavior with lots of writing, allocating new block groups,
   etc.

I believe the risk of missing some reasonable reclaims is worth it
when traded off against the savings of avoiding overfull reclaims.

Going forward, it could be interesting to make the check more advanced
(zoned aware, fragmentation aware, etc...) so that it can be a really
strong signal both at extent delete and reclaim time.

Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
  • Loading branch information
Boris Burkov authored and David Sterba committed Dec 5, 2022
1 parent cc4804b commit 8153122
Showing 1 changed file with 40 additions and 25 deletions.
65 changes: 40 additions & 25 deletions fs/btrfs/block-group.c
Original file line number Diff line number Diff line change
Expand Up @@ -1539,6 +1539,30 @@ static inline bool btrfs_should_reclaim(struct btrfs_fs_info *fs_info)
return true;
}

static bool should_reclaim_block_group(struct btrfs_block_group *bg, u64 bytes_freed)
{
const struct btrfs_space_info *space_info = bg->space_info;
const int reclaim_thresh = READ_ONCE(space_info->bg_reclaim_threshold);
const u64 new_val = bg->used;
const u64 old_val = new_val + bytes_freed;
u64 thresh;

if (reclaim_thresh == 0)
return false;

thresh = div_factor_fine(bg->length, reclaim_thresh);

/*
* If we were below the threshold before don't reclaim, we are likely a
* brand new block group and we don't want to relocate new block groups.
*/
if (old_val < thresh)
return false;
if (new_val >= thresh)
return false;
return true;
}

void btrfs_reclaim_bgs_work(struct work_struct *work)
{
struct btrfs_fs_info *fs_info =
Expand Down Expand Up @@ -1623,6 +1647,22 @@ void btrfs_reclaim_bgs_work(struct work_struct *work)
spin_unlock(&bg->lock);
up_write(&space_info->groups_sem);
goto next;

}
/*
* The block group might no longer meet the reclaim condition by
* the time we get around to reclaiming it, so to avoid
* reclaiming overly full block_groups, skip reclaiming them.
*
* Since the decision making process also depends on the amount
* being freed, pass in a fake giant value to skip that extra
* check, which is more meaningful when adding to the list in
* the first place.
*/
if (!should_reclaim_block_group(bg, bg->length)) {
spin_unlock(&bg->lock);
up_write(&space_info->groups_sem);
goto next;
}
spin_unlock(&bg->lock);

Expand Down Expand Up @@ -3241,31 +3281,6 @@ int btrfs_write_dirty_block_groups(struct btrfs_trans_handle *trans)
return ret;
}

static inline bool should_reclaim_block_group(struct btrfs_block_group *bg,
u64 bytes_freed)
{
const struct btrfs_space_info *space_info = bg->space_info;
const int reclaim_thresh = READ_ONCE(space_info->bg_reclaim_threshold);
const u64 new_val = bg->used;
const u64 old_val = new_val + bytes_freed;
u64 thresh;

if (reclaim_thresh == 0)
return false;

thresh = div_factor_fine(bg->length, reclaim_thresh);

/*
* If we were below the threshold before don't reclaim, we are likely a
* brand new block group and we don't want to relocate new block groups.
*/
if (old_val < thresh)
return false;
if (new_val >= thresh)
return false;
return true;
}

int btrfs_update_block_group(struct btrfs_trans_handle *trans,
u64 bytenr, u64 num_bytes, bool alloc)
{
Expand Down

0 comments on commit 8153122

Please sign in to comment.