Skip to content

Commit

Permalink
Merge tag 'f2fs-for-4.17' of git://git.kernel.org/pub/scm/linux/kerne…
Browse files Browse the repository at this point in the history
…l/git/jaegeuk/f2fs

Pull f2fs update from Jaegeuk Kim:
 "In this round, we've mainly focused on performance tuning and critical
  bug fixes occurred in low-end devices. Sheng Yong introduced
  lost_found feature to keep missing files during recovery instead of
  thrashing them. We're preparing coming fsverity implementation. And,
  we've got more features to communicate with users for better
  performance. In low-end devices, some memory-related issues were
  fixed, and subtle race condtions and corner cases were addressed as
  well.

  Enhancements:
   - large nat bitmaps for more free node ids
   - add three block allocation policies to pass down write hints given by user
   - expose extension list to user and introduce hot file extension
   - tune small devices seamlessly for low-end devices
   - set readdir_ra by default
   - give more resources under gc_urgent mode regarding to discard and cleaning
   - introduce fsync_mode to enforce posix or not
   - nowait aio support
   - add lost_found feature to keep dangling inodes
   - reserve bits for future fsverity feature
   - add test_dummy_encryption for FBE

  Bug fixes:
   - don't use highmem for dentry pages
   - align memory boundary for bitops
   - truncate preallocated blocks in write errors
   - guarantee i_times on fsync call
   - clear CP_TRIMMED_FLAG correctly
   - prevent node chain loop during recovery
   - avoid data race between atomic write and background cleaning
   - avoid unnecessary selinux violation warnings on resgid option
   - GFP_NOFS to avoid deadlock in quota and read paths
   - fix f2fs_skip_inode_update to allow i_size recovery

  In addition to the above, there are several minor bug fixes and clean-ups"

* tag 'f2fs-for-4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (50 commits)
  f2fs: remain written times to update inode during fsync
  f2fs: make assignment of t->dentry_bitmap more readable
  f2fs: truncate preallocated blocks in error case
  f2fs: fix a wrong condition in f2fs_skip_inode_update
  f2fs: reserve bits for fs-verity
  f2fs: Add a segment type check in inplace write
  f2fs: no need to initialize zero value for GFP_F2FS_ZERO
  f2fs: don't track new nat entry in nat set
  f2fs: clean up with F2FS_BLK_ALIGN
  f2fs: check blkaddr more accuratly before issue a bio
  f2fs: Set GF_NOFS in read_cache_page_gfp while doing f2fs_quota_read
  f2fs: introduce a new mount option test_dummy_encryption
  f2fs: introduce F2FS_FEATURE_LOST_FOUND feature
  f2fs: release locks before return in f2fs_ioc_gc_range()
  f2fs: align memory boundary for bitops
  f2fs: remove unneeded set_cold_node()
  f2fs: add nowait aio support
  f2fs: wrap all options with f2fs_sb_info.mount_opt
  f2fs: Don't overwrite all types of node to keep node chain
  f2fs: introduce mount option for fsync mode
  ...
  • Loading branch information
Linus Torvalds committed Apr 6, 2018
2 parents 052c220 + 214c246 commit 274c0e7
Show file tree
Hide file tree
Showing 20 changed files with 1,081 additions and 382 deletions.
11 changes: 11 additions & 0 deletions Documentation/ABI/testing/sysfs-fs-f2fs
Original file line number Diff line number Diff line change
Expand Up @@ -192,3 +192,14 @@ Date: November 2017
Contact: "Sheng Yong" <shengyong1@huawei.com>
Description:
Controls readahead inode block in readdir.

What: /sys/fs/f2fs/<disk>/extension_list
Date: Feburary 2018
Contact: "Chao Yu" <yuchao0@huawei.com>
Description:
Used to control configure extension list:
- Query: cat /sys/fs/f2fs/<disk>/extension_list
- Add: echo '[h/c]extension' > /sys/fs/f2fs/<disk>/extension_list
- Del: echo '[h/c]!extension' > /sys/fs/f2fs/<disk>/extension_list
- [h] means add/del hot file extension
- [c] means add/del cold file extension
77 changes: 77 additions & 0 deletions Documentation/filesystems/f2fs.txt
Original file line number Diff line number Diff line change
Expand Up @@ -174,6 +174,23 @@ offgrpjquota Turn off group journelled quota.
offprjjquota Turn off project journelled quota.
quota Enable plain user disk quota accounting.
noquota Disable all plain disk quota option.
whint_mode=%s Control which write hints are passed down to block
layer. This supports "off", "user-based", and
"fs-based". In "off" mode (default), f2fs does not pass
down hints. In "user-based" mode, f2fs tries to pass
down hints given by users. And in "fs-based" mode, f2fs
passes down hints with its policy.
alloc_mode=%s Adjust block allocation policy, which supports "reuse"
and "default".
fsync_mode=%s Control the policy of fsync. Currently supports "posix"
and "strict". In "posix" mode, which is default, fsync
will follow POSIX semantics and does a light operation
to improve the filesystem performance. In "strict" mode,
fsync will be heavy and behaves in line with xfs, ext4
and btrfs, where xfstest generic/342 will pass, but the
performance will regress.
test_dummy_encryption Enable dummy encryption, which provides a fake fscrypt
context. The fake fscrypt context is used by xfstests.

================================================================================
DEBUGFS ENTRIES
Expand Down Expand Up @@ -611,3 +628,63 @@ algorithm.
In order to identify whether the data in the victim segment are valid or not,
F2FS manages a bitmap. Each bit represents the validity of a block, and the
bitmap is composed of a bit stream covering whole blocks in main area.

Write-hint Policy
-----------------

1) whint_mode=off. F2FS only passes down WRITE_LIFE_NOT_SET.

2) whint_mode=user-based. F2FS tries to pass down hints given by
users.

User F2FS Block
---- ---- -----
META WRITE_LIFE_NOT_SET
HOT_NODE "
WARM_NODE "
COLD_NODE "
*ioctl(COLD) COLD_DATA WRITE_LIFE_EXTREME
*extension list " "

-- buffered io
WRITE_LIFE_EXTREME COLD_DATA WRITE_LIFE_EXTREME
WRITE_LIFE_SHORT HOT_DATA WRITE_LIFE_SHORT
WRITE_LIFE_NOT_SET WARM_DATA WRITE_LIFE_NOT_SET
WRITE_LIFE_NONE " "
WRITE_LIFE_MEDIUM " "
WRITE_LIFE_LONG " "

-- direct io
WRITE_LIFE_EXTREME COLD_DATA WRITE_LIFE_EXTREME
WRITE_LIFE_SHORT HOT_DATA WRITE_LIFE_SHORT
WRITE_LIFE_NOT_SET WARM_DATA WRITE_LIFE_NOT_SET
WRITE_LIFE_NONE " WRITE_LIFE_NONE
WRITE_LIFE_MEDIUM " WRITE_LIFE_MEDIUM
WRITE_LIFE_LONG " WRITE_LIFE_LONG

3) whint_mode=fs-based. F2FS passes down hints with its policy.

User F2FS Block
---- ---- -----
META WRITE_LIFE_MEDIUM;
HOT_NODE WRITE_LIFE_NOT_SET
WARM_NODE "
COLD_NODE WRITE_LIFE_NONE
ioctl(COLD) COLD_DATA WRITE_LIFE_EXTREME
extension list " "

-- buffered io
WRITE_LIFE_EXTREME COLD_DATA WRITE_LIFE_EXTREME
WRITE_LIFE_SHORT HOT_DATA WRITE_LIFE_SHORT
WRITE_LIFE_NOT_SET WARM_DATA WRITE_LIFE_LONG
WRITE_LIFE_NONE " "
WRITE_LIFE_MEDIUM " "
WRITE_LIFE_LONG " "

-- direct io
WRITE_LIFE_EXTREME COLD_DATA WRITE_LIFE_EXTREME
WRITE_LIFE_SHORT HOT_DATA WRITE_LIFE_SHORT
WRITE_LIFE_NOT_SET WARM_DATA WRITE_LIFE_NOT_SET
WRITE_LIFE_NONE " WRITE_LIFE_NONE
WRITE_LIFE_MEDIUM " WRITE_LIFE_MEDIUM
WRITE_LIFE_LONG " WRITE_LIFE_LONG
101 changes: 66 additions & 35 deletions fs/f2fs/checkpoint.c
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,7 @@ static struct page *__get_meta_page(struct f2fs_sb_info *sbi, pgoff_t index,
.old_blkaddr = index,
.new_blkaddr = index,
.encrypted_page = NULL,
.is_meta = is_meta,
};

if (unlikely(!is_meta))
Expand Down Expand Up @@ -162,6 +163,7 @@ int ra_meta_pages(struct f2fs_sb_info *sbi, block_t start, int nrpages,
.op_flags = sync ? (REQ_META | REQ_PRIO) : REQ_RAHEAD,
.encrypted_page = NULL,
.in_list = false,
.is_meta = (type != META_POR),
};
struct blk_plug plug;

Expand Down Expand Up @@ -569,13 +571,8 @@ static int recover_orphan_inode(struct f2fs_sb_info *sbi, nid_t ino)
struct node_info ni;
int err = acquire_orphan_inode(sbi);

if (err) {
set_sbi_flag(sbi, SBI_NEED_FSCK);
f2fs_msg(sbi->sb, KERN_WARNING,
"%s: orphan failed (ino=%x), run fsck to fix.",
__func__, ino);
return err;
}
if (err)
goto err_out;

__add_ino_entry(sbi, ino, 0, ORPHAN_INO);

Expand All @@ -589,6 +586,11 @@ static int recover_orphan_inode(struct f2fs_sb_info *sbi, nid_t ino)
return PTR_ERR(inode);
}

err = dquot_initialize(inode);
if (err)
goto err_out;

dquot_initialize(inode);
clear_nlink(inode);

/* truncate all the data during iput */
Expand All @@ -598,14 +600,18 @@ static int recover_orphan_inode(struct f2fs_sb_info *sbi, nid_t ino)

/* ENOMEM was fully retried in f2fs_evict_inode. */
if (ni.blk_addr != NULL_ADDR) {
set_sbi_flag(sbi, SBI_NEED_FSCK);
f2fs_msg(sbi->sb, KERN_WARNING,
"%s: orphan failed (ino=%x) by kernel, retry mount.",
__func__, ino);
return -EIO;
err = -EIO;
goto err_out;
}
__remove_ino_entry(sbi, ino, ORPHAN_INO);
return 0;

err_out:
set_sbi_flag(sbi, SBI_NEED_FSCK);
f2fs_msg(sbi->sb, KERN_WARNING,
"%s: orphan failed (ino=%x), run fsck to fix.",
__func__, ino);
return err;
}

int recover_orphan_inodes(struct f2fs_sb_info *sbi)
Expand Down Expand Up @@ -1136,6 +1142,8 @@ static void update_ckpt_flags(struct f2fs_sb_info *sbi, struct cp_control *cpc)

if (cpc->reason & CP_TRIMMED)
__set_ckpt_flags(ckpt, CP_TRIMMED_FLAG);
else
__clear_ckpt_flags(ckpt, CP_TRIMMED_FLAG);

if (cpc->reason & CP_UMOUNT)
__set_ckpt_flags(ckpt, CP_UMOUNT_FLAG);
Expand All @@ -1162,6 +1170,39 @@ static void update_ckpt_flags(struct f2fs_sb_info *sbi, struct cp_control *cpc)
spin_unlock_irqrestore(&sbi->cp_lock, flags);
}

static void commit_checkpoint(struct f2fs_sb_info *sbi,
void *src, block_t blk_addr)
{
struct writeback_control wbc = {
.for_reclaim = 0,
};

/*
* pagevec_lookup_tag and lock_page again will take
* some extra time. Therefore, update_meta_pages and
* sync_meta_pages are combined in this function.
*/
struct page *page = grab_meta_page(sbi, blk_addr);
int err;

memcpy(page_address(page), src, PAGE_SIZE);
set_page_dirty(page);

f2fs_wait_on_page_writeback(page, META, true);
f2fs_bug_on(sbi, PageWriteback(page));
if (unlikely(!clear_page_dirty_for_io(page)))
f2fs_bug_on(sbi, 1);

/* writeout cp pack 2 page */
err = __f2fs_write_meta_page(page, &wbc, FS_CP_META_IO);
f2fs_bug_on(sbi, err);

f2fs_put_page(page, 0);

/* submit checkpoint (with barrier if NOBARRIER is not set) */
f2fs_submit_merged_write(sbi, META_FLUSH);
}

static int do_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc)
{
struct f2fs_checkpoint *ckpt = F2FS_CKPT(sbi);
Expand Down Expand Up @@ -1264,16 +1305,6 @@ static int do_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc)
}
}

/* need to wait for end_io results */
wait_on_all_pages_writeback(sbi);
if (unlikely(f2fs_cp_error(sbi)))
return -EIO;

/* flush all device cache */
err = f2fs_flush_device_cache(sbi);
if (err)
return err;

/* write out checkpoint buffer at block 0 */
update_meta_page(sbi, ckpt, start_blk++);

Expand Down Expand Up @@ -1301,26 +1332,26 @@ static int do_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc)
start_blk += NR_CURSEG_NODE_TYPE;
}

/* writeout checkpoint block */
update_meta_page(sbi, ckpt, start_blk);
/* update user_block_counts */
sbi->last_valid_block_count = sbi->total_valid_block_count;
percpu_counter_set(&sbi->alloc_valid_block_count, 0);

/* Here, we have one bio having CP pack except cp pack 2 page */
sync_meta_pages(sbi, META, LONG_MAX, FS_CP_META_IO);

/* wait for previous submitted node/meta pages writeback */
/* wait for previous submitted meta pages writeback */
wait_on_all_pages_writeback(sbi);

if (unlikely(f2fs_cp_error(sbi)))
return -EIO;

filemap_fdatawait_range(NODE_MAPPING(sbi), 0, LLONG_MAX);
filemap_fdatawait_range(META_MAPPING(sbi), 0, LLONG_MAX);

/* update user_block_counts */
sbi->last_valid_block_count = sbi->total_valid_block_count;
percpu_counter_set(&sbi->alloc_valid_block_count, 0);

/* Here, we only have one bio having CP pack */
sync_meta_pages(sbi, META_FLUSH, LONG_MAX, FS_CP_META_IO);
/* flush all device cache */
err = f2fs_flush_device_cache(sbi);
if (err)
return err;

/* wait for previous submitted meta pages writeback */
/* barrier and flush checkpoint cp pack 2 page if it can */
commit_checkpoint(sbi, ckpt, start_blk);
wait_on_all_pages_writeback(sbi);

release_ino_entry(sbi, false);
Expand Down
Loading

0 comments on commit 274c0e7

Please sign in to comment.