Skip to content

Commit

Permalink
Merge tag 'xfs-reflink-for-linus-4.9-rc1' of git://git.kernel.org/pub…
Browse files Browse the repository at this point in the history
…/scm/linux/kernel/git/dgc/linux-xfs

    < XFS has gained super CoW powers! >
     ----------------------------------
            \   ^__^
             \  (oo)\_______
                (__)\       )\/\
                    ||----w |
                    ||     ||

Pull XFS support for shared data extents from Dave Chinner:
 "This is the second part of the XFS updates for this merge cycle.  This
  pullreq contains the new shared data extents feature for XFS.

  Given the complexity and size of this change I am expecting - like the
  addition of reverse mapping last cycle - that there will be some
  follow-up bug fixes and cleanups around the -rc3 stage for issues that
  I'm sure will show up once the code hits a wider userbase.

  What it is:

  At the most basic level we are simply adding shared data extents to
  XFS - i.e. a single extent on disk can now have multiple owners. To do
  this we have to add new on-disk features to both track the shared
  extents and the number of times they've been shared. This is done by
  the new "refcount" btree that sits in every allocation group. When we
  share or unshare an extent, this tree gets updated.

  Along with this new tree, the reverse mapping tree needs to be updated
  to track each owner or a shared extent. This also needs to be updated
  ever share/unshare operation. These interactions at extent allocation
  and freeing time have complex ordering and recovery constraints, so
  there's a significant amount of new intent-based transaction code to
  ensure that operations are performed atomically from both the runtime
  and integrity/crash recovery perspectives.

  We also need to break sharing when writes hit a shared extent - this
  is where the new copy-on-write implementation comes in. We allocate
  new storage and copy the original data along with the overwrite data
  into the new location. We only do this for data as we don't share
  metadata at all - each inode has it's own metadata that tracks the
  shared data extents, the extents undergoing CoW and it's own private
  extents.

  Of course, being XFS, nothing is simple - we use delayed allocation
  for CoW similar to how we use it for normal writes. ENOSPC is a
  significant issue here - we build on the reservation code added in
  4.8-rc1 with the reverse mapping feature to ensure we don't get
  spurious ENOSPC issues part way through a CoW operation. These
  mechanisms also help minimise fragmentation due to repeated CoW
  operations. To further reduce fragmentation overhead, we've also
  introduced a CoW extent size hint, which indicates how large a region
  we should allocate when we execute a CoW operation.

  With all this functionality in place, we can hook up .copy_file_range,
  .clone_file_range and .dedupe_file_range and we gain all the
  capabilities of reflink and other vfs provided functionality that
  enable manipulation to shared extents. We also added a fallocate mode
  that explicitly unshares a range of a file, which we implemented as an
  explicit CoW of all the shared extents in a file.

  As such, it's a huge chunk of new functionality with new on-disk
  format features and internal infrastructure. It warns at mount time as
  an experimental feature and that it may eat data (as we do with all
  new on-disk features until they stabilise). We have not released
  userspace suport for it yet - userspace support currently requires
  download from Darrick's xfsprogs repo and build from source, so the
  access to this feature is really developer/tester only at this point.
  Initial userspace support will be released at the same time the kernel
  with this code in it is released.

  The new code causes 5-6 new failures with xfstests - these aren't
  serious functional failures but things the output of tests changing
  slightly due to perturbations in layouts, space usage, etc. OTOH,
  we've added 150+ new tests to xfstests that specifically exercise this
  new functionality so it's got far better test coverage than any
  functionality we've previously added to XFS.

  Darrick has done a pretty amazing job getting us to this stage, and
  special mention also needs to go to Christoph (review, testing,
  improvements and bug fixes) and Brian (caught several intricate bugs
  during review) for the effort they've also put in.

  Summary:

   - unshare range (FALLOC_FL_UNSHARE) support for fallocate

   - copy-on-write extent size hints (FS_XFLAG_COWEXTSIZE) for fsxattr
     interface

   - shared extent support for XFS

   - copy-on-write support for shared extents

   - copy_file_range support

   - clone_file_range support (implements reflink)

   - dedupe_file_range support

   - defrag support for reverse mapping enabled filesystems"

* tag 'xfs-reflink-for-linus-4.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs: (71 commits)
  xfs: convert COW blocks to real blocks before unwritten extent conversion
  xfs: rework refcount cow recovery error handling
  xfs: clear reflink flag if setting realtime flag
  xfs: fix error initialization
  xfs: fix label inaccuracies
  xfs: remove isize check from unshare operation
  xfs: reduce stack usage of _reflink_clear_inode_flag
  xfs: check inode reflink flag before calling reflink functions
  xfs: implement swapext for rmap filesystems
  xfs: refactor swapext code
  xfs: various swapext cleanups
  xfs: recognize the reflink feature bit
  xfs: simulate per-AG reservations being critically low
  xfs: don't mix reflink and DAX mode for now
  xfs: check for invalid inode reflink flags
  xfs: set a default CoW extent size of 32 blocks
  xfs: convert unwritten status of reverse mappings for shared files
  xfs: use interval query for rmap alloc operations on shared files
  xfs: add shared rmap map/unmap/convert log item types
  xfs: increase log reservations for reflink
  ...
  • Loading branch information
Linus Torvalds committed Oct 14, 2016
2 parents 40bd3a5 + feac470 commit 35a891b
Show file tree
Hide file tree
Showing 76 changed files with 10,683 additions and 413 deletions.
5 changes: 5 additions & 0 deletions fs/open.c
Original file line number Diff line number Diff line change
Expand Up @@ -267,6 +267,11 @@ int vfs_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
(mode & ~FALLOC_FL_INSERT_RANGE))
return -EINVAL;

/* Unshare range should only be used with allocate mode. */
if ((mode & FALLOC_FL_UNSHARE_RANGE) &&
(mode & ~(FALLOC_FL_UNSHARE_RANGE | FALLOC_FL_KEEP_SIZE)))
return -EINVAL;

if (!(file->f_mode & FMODE_WRITE))
return -EBADF;

Expand Down
7 changes: 7 additions & 0 deletions fs/xfs/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,8 @@ xfs-y += $(addprefix libxfs/, \
xfs_ag_resv.o \
xfs_rmap.o \
xfs_rmap_btree.o \
xfs_refcount.o \
xfs_refcount_btree.o \
xfs_sb.o \
xfs_symlink_remote.o \
xfs_trans_resv.o \
Expand Down Expand Up @@ -88,6 +90,7 @@ xfs-y += xfs_aops.o \
xfs_message.o \
xfs_mount.o \
xfs_mru_cache.o \
xfs_reflink.o \
xfs_stats.o \
xfs_super.o \
xfs_symlink.o \
Expand All @@ -100,16 +103,20 @@ xfs-y += xfs_aops.o \
# low-level transaction/log code
xfs-y += xfs_log.o \
xfs_log_cil.o \
xfs_bmap_item.o \
xfs_buf_item.o \
xfs_extfree_item.o \
xfs_icreate_item.o \
xfs_inode_item.o \
xfs_refcount_item.o \
xfs_rmap_item.o \
xfs_log_recover.o \
xfs_trans_ail.o \
xfs_trans_bmap.o \
xfs_trans_buf.o \
xfs_trans_extfree.o \
xfs_trans_inode.o \
xfs_trans_refcount.o \
xfs_trans_rmap.o \

# optional features
Expand Down
15 changes: 14 additions & 1 deletion fs/xfs/libxfs/xfs_ag_resv.c
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@
#include "xfs_trans_space.h"
#include "xfs_rmap_btree.h"
#include "xfs_btree.h"
#include "xfs_refcount_btree.h"

/*
* Per-AG Block Reservations
Expand Down Expand Up @@ -108,7 +109,9 @@ xfs_ag_resv_critical(
trace_xfs_ag_resv_critical(pag, type, avail);

/* Critically low if less than 10% or max btree height remains. */
return avail < orig / 10 || avail < XFS_BTREE_MAXLEVELS;
return XFS_TEST_ERROR(avail < orig / 10 || avail < XFS_BTREE_MAXLEVELS,
pag->pag_mount, XFS_ERRTAG_AG_RESV_CRITICAL,
XFS_RANDOM_AG_RESV_CRITICAL);
}

/*
Expand Down Expand Up @@ -228,6 +231,11 @@ xfs_ag_resv_init(
if (pag->pag_meta_resv.ar_asked == 0) {
ask = used = 0;

error = xfs_refcountbt_calc_reserves(pag->pag_mount,
pag->pag_agno, &ask, &used);
if (error)
goto out;

error = __xfs_ag_resv_init(pag, XFS_AG_RESV_METADATA,
ask, used);
if (error)
Expand All @@ -238,6 +246,11 @@ xfs_ag_resv_init(
if (pag->pag_agfl_resv.ar_asked == 0) {
ask = used = 0;

error = xfs_rmapbt_calc_reserves(pag->pag_mount, pag->pag_agno,
&ask, &used);
if (error)
goto out;

error = __xfs_ag_resv_init(pag, XFS_AG_RESV_AGFL, ask, used);
if (error)
goto out;
Expand Down
23 changes: 23 additions & 0 deletions fs/xfs/libxfs/xfs_alloc.c
Original file line number Diff line number Diff line change
Expand Up @@ -52,10 +52,23 @@ STATIC int xfs_alloc_ag_vextent_size(xfs_alloc_arg_t *);
STATIC int xfs_alloc_ag_vextent_small(xfs_alloc_arg_t *,
xfs_btree_cur_t *, xfs_agblock_t *, xfs_extlen_t *, int *);

unsigned int
xfs_refc_block(
struct xfs_mount *mp)
{
if (xfs_sb_version_hasrmapbt(&mp->m_sb))
return XFS_RMAP_BLOCK(mp) + 1;
if (xfs_sb_version_hasfinobt(&mp->m_sb))
return XFS_FIBT_BLOCK(mp) + 1;
return XFS_IBT_BLOCK(mp) + 1;
}

xfs_extlen_t
xfs_prealloc_blocks(
struct xfs_mount *mp)
{
if (xfs_sb_version_hasreflink(&mp->m_sb))
return xfs_refc_block(mp) + 1;
if (xfs_sb_version_hasrmapbt(&mp->m_sb))
return XFS_RMAP_BLOCK(mp) + 1;
if (xfs_sb_version_hasfinobt(&mp->m_sb))
Expand Down Expand Up @@ -115,6 +128,8 @@ xfs_alloc_ag_max_usable(
blocks++; /* finobt root block */
if (xfs_sb_version_hasrmapbt(&mp->m_sb))
blocks++; /* rmap root block */
if (xfs_sb_version_hasreflink(&mp->m_sb))
blocks++; /* refcount root block */

return mp->m_sb.sb_agblocks - blocks;
}
Expand Down Expand Up @@ -2321,6 +2336,9 @@ xfs_alloc_log_agf(
offsetof(xfs_agf_t, agf_btreeblks),
offsetof(xfs_agf_t, agf_uuid),
offsetof(xfs_agf_t, agf_rmap_blocks),
offsetof(xfs_agf_t, agf_refcount_blocks),
offsetof(xfs_agf_t, agf_refcount_root),
offsetof(xfs_agf_t, agf_refcount_level),
/* needed so that we don't log the whole rest of the structure: */
offsetof(xfs_agf_t, agf_spare64),
sizeof(xfs_agf_t)
Expand Down Expand Up @@ -2458,6 +2476,10 @@ xfs_agf_verify(
be32_to_cpu(agf->agf_btreeblks) > be32_to_cpu(agf->agf_length))
return false;

if (xfs_sb_version_hasreflink(&mp->m_sb) &&
be32_to_cpu(agf->agf_refcount_level) > XFS_BTREE_MAXLEVELS)
return false;

return true;;

}
Expand Down Expand Up @@ -2578,6 +2600,7 @@ xfs_alloc_read_agf(
be32_to_cpu(agf->agf_levels[XFS_BTNUM_CNTi]);
pag->pagf_levels[XFS_BTNUM_RMAPi] =
be32_to_cpu(agf->agf_levels[XFS_BTNUM_RMAPi]);
pag->pagf_refcount_level = be32_to_cpu(agf->agf_refcount_level);
spin_lock_init(&pag->pagb_lock);
pag->pagb_count = 0;
pag->pagb_tree = RB_ROOT;
Expand Down
Loading

0 comments on commit 35a891b

Please sign in to comment.