Skip to content

Commit

Permalink
Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kern…
Browse files Browse the repository at this point in the history
…el/git/tytso/ext4

Pull ext4 updates from Ted Ts'o:
 "In addition to some ext4 bug fixes and cleanups, this cycle we add the
  orphan_file feature, which eliminates bottlenecks when doing a large
  number of parallel truncates and file deletions, and move the discard
  operation out of the jbd2 commit thread when using the discard mount
  option, to better support devices with slow discard operations"

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (23 commits)
  ext4: make the updating inode data procedure atomic
  ext4: remove an unnecessary if statement in __ext4_get_inode_loc()
  ext4: move inode eio simulation behind io completeion
  ext4: Improve scalability of ext4 orphan file handling
  ext4: Orphan file documentation
  ext4: Speedup ext4 orphan inode handling
  ext4: Move orphan inode handling into a separate file
  ext4: Support for checksumming from journal triggers
  ext4: fix race writing to an inline_data file while its xattrs are changing
  jbd2: add sparse annotations for add_transaction_credits()
  ext4: fix sparse warnings
  ext4: Make sure quota files are not grabbed accidentally
  ext4: fix e2fsprogs checksum failure for mounted filesystem
  ext4: if zeroout fails fall back to splitting the extent node
  ext4: reduce arguments of ext4_fc_add_dentry_tlv
  ext4: flush background discard kwork when retry allocation
  ext4: get discard out of jbd2 commit kthread contex
  ext4: remove the repeated comment of ext4_trim_all_free
  ext4: add new helper interface ext4_try_to_trim_range()
  ext4: remove the 'group' parameter of ext4_trim_extent
  ...
  • Loading branch information
Linus Torvalds committed Sep 2, 2021
2 parents 815409a + baaae97 commit 111c1aa
Show file tree
Hide file tree
Showing 27 changed files with 1,443 additions and 731 deletions.
1 change: 1 addition & 0 deletions Documentation/filesystems/ext4/globals.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,3 +11,4 @@ have static metadata at fixed locations.
.. include:: bitmaps.rst
.. include:: mmp.rst
.. include:: journal.rst
.. include:: orphan.rst
10 changes: 5 additions & 5 deletions Documentation/filesystems/ext4/inodes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -498,11 +498,11 @@ structure -- inode change time (ctime), access time (atime), data
modification time (mtime), and deletion time (dtime). The four fields
are 32-bit signed integers that represent seconds since the Unix epoch
(1970-01-01 00:00:00 GMT), which means that the fields will overflow in
January 2038. For inodes that are not linked from any directory but are
still open (orphan inodes), the dtime field is overloaded for use with
the orphan list. The superblock field ``s_last_orphan`` points to the
first inode in the orphan list; dtime is then the number of the next
orphaned inode, or zero if there are no more orphans.
January 2038. If the filesystem does not have orphan_file feature, inodes
that are not linked from any directory but are still open (orphan inodes) have
the dtime field overloaded for use with the orphan list. The superblock field
``s_last_orphan`` points to the first inode in the orphan list; dtime is then
the number of the next orphaned inode, or zero if there are no more orphans.

If the inode structure size ``sb->s_inode_size`` is larger than 128
bytes and the ``i_inode_extra`` field is large enough to encompass the
Expand Down
52 changes: 52 additions & 0 deletions Documentation/filesystems/ext4/orphan.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
.. SPDX-License-Identifier: GPL-2.0
Orphan file
-----------

In unix there can inodes that are unlinked from directory hierarchy but that
are still alive because they are open. In case of crash the filesystem has to
clean up these inodes as otherwise they (and the blocks referenced from them)
would leak. Similarly if we truncate or extend the file, we need not be able
to perform the operation in a single journalling transaction. In such case we
track the inode as orphan so that in case of crash extra blocks allocated to
the file get truncated.

Traditionally ext4 tracks orphan inodes in a form of single linked list where
superblock contains the inode number of the last orphan inode (s\_last\_orphan
field) and then each inode contains inode number of the previously orphaned
inode (we overload i\_dtime inode field for this). However this filesystem
global single linked list is a scalability bottleneck for workloads that result
in heavy creation of orphan inodes. When orphan file feature
(COMPAT\_ORPHAN\_FILE) is enabled, the filesystem has a special inode
(referenced from the superblock through s\_orphan_file_inum) with several
blocks. Each of these blocks has a structure:

.. list-table::
:widths: 8 8 24 40
:header-rows: 1

* - Offset
- Type
- Name
- Description
* - 0x0
- Array of \_\_le32 entries
- Orphan inode entries
- Each \_\_le32 entry is either empty (0) or it contains inode number of
an orphan inode.
* - blocksize - 8
- \_\_le32
- ob\_magic
- Magic value stored in orphan block tail (0x0b10ca04)
* - blocksize - 4
- \_\_le32
- ob\_checksum
- Checksum of the orphan block.

When a filesystem with orphan file feature is writeably mounted, we set
RO\_COMPAT\_ORPHAN\_PRESENT feature in the superblock to indicate there may
be valid orphan entries. In case we see this feature when mounting the
filesystem, we read the whole orphan file and process all orphan inodes found
there as usual. When cleanly unmounting the filesystem we remove the
RO\_COMPAT\_ORPHAN\_PRESENT feature to avoid unnecessary scanning of the orphan
file and also make the filesystem fully compatible with older kernels.
17 changes: 17 additions & 0 deletions Documentation/filesystems/ext4/special_inodes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -36,3 +36,20 @@ ext4 reserves some inode for special features, as follows:
* - 11
- Traditional first non-reserved inode. Usually this is the lost+found directory. See s\_first\_ino in the superblock.

Note that there are also some inodes allocated from non-reserved inode numbers
for other filesystem features which are not referenced from standard directory
hierarchy. These are generally reference from the superblock. They are:

.. list-table::
:widths: 20 50
:header-rows: 1

* - Superblock field
- Description

* - s\_lpf\_ino
- Inode number of lost+found directory.
* - s\_prj\_quota\_inum
- Inode number of quota file tracking project quotas
* - s\_orphan\_file\_inum
- Inode number of file tracking orphan inodes.
15 changes: 14 additions & 1 deletion Documentation/filesystems/ext4/super.rst
Original file line number Diff line number Diff line change
Expand Up @@ -479,7 +479,11 @@ The ext4 superblock is laid out as follows in
- Filename charset encoding flags.
* - 0x280
- \_\_le32
- s\_reserved[95]
- s\_orphan\_file\_inum
- Orphan file inode number.
* - 0x284
- \_\_le32
- s\_reserved[94]
- Padding to the end of the block.
* - 0x3FC
- \_\_le32
Expand Down Expand Up @@ -603,6 +607,11 @@ following:
the journal, JBD2 incompat feature
(JBD2\_FEATURE\_INCOMPAT\_FAST\_COMMIT) gets
set (COMPAT\_FAST\_COMMIT).
* - 0x1000
- Orphan file allocated. This is the special file for more efficient
tracking of unlinked but still open inodes. When there may be any
entries in the file, we additionally set proper rocompat feature
(RO\_COMPAT\_ORPHAN\_PRESENT).

.. _super_incompat:

Expand Down Expand Up @@ -713,6 +722,10 @@ the following:
- Filesystem tracks project quotas. (RO\_COMPAT\_PROJECT)
* - 0x8000
- Verity inodes may be present on the filesystem. (RO\_COMPAT\_VERITY)
* - 0x10000
- Indicates orphan file may have valid orphan entries and thus we need
to clean them up when mounting the filesystem
(RO\_COMPAT\_ORPHAN\_PRESENT).

.. _super_def_hash:

Expand Down
2 changes: 1 addition & 1 deletion fs/ext4/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ ext4-y := balloc.o bitmap.o block_validity.o dir.o ext4_jbd2.o extents.o \
indirect.o inline.o inode.o ioctl.o mballoc.o migrate.o \
mmp.o move_extent.o namei.o page-io.o readpage.o resize.o \
super.o symlink.o sysfs.o xattr.o xattr_hurd.o xattr_trusted.o \
xattr_user.o fast_commit.o
xattr_user.o fast_commit.o orphan.o

ext4-$(CONFIG_EXT4_FS_POSIX_ACL) += acl.o
ext4-$(CONFIG_EXT4_FS_SECURITY) += xattr_security.o
Expand Down
8 changes: 7 additions & 1 deletion fs/ext4/balloc.c
Original file line number Diff line number Diff line change
Expand Up @@ -652,8 +652,14 @@ int ext4_should_retry_alloc(struct super_block *sb, int *retries)
* possible we just missed a transaction commit that did so
*/
smp_mb();
if (sbi->s_mb_free_pending == 0)
if (sbi->s_mb_free_pending == 0) {
if (test_opt(sb, DISCARD)) {
atomic_inc(&sbi->s_retry_alloc_pending);
flush_work(&sbi->s_discard_work);
atomic_dec(&sbi->s_retry_alloc_pending);
}
return ext4_has_free_clusters(sbi, 1, 0);
}

/*
* it's possible we've just missed a transaction commit here,
Expand Down
Loading

0 comments on commit 111c1aa

Please sign in to comment.