Skip to content

Commit

Permalink
Merge tag 'erofs-for-6.2-rc1' of git://git.kernel.org/pub/scm/linux/k…
Browse files Browse the repository at this point in the history
…ernel/git/xiang/erofs

Pull erofs updates from Gao Xiang:
 "In this cycle, large folios are now enabled in the iomap/fscache mode
  for uncompressed files first. In order to do that, we've also cleaned
  up better interfaces between erofs and fscache, which are acked by
  fscache/netfs folks and included in this pull request.

  Other than that, there are random fixes around erofs over fscache and
  crafted images by syzbot, minor cleanups and documentation updates.

  Summary:

   - Enable large folios for iomap/fscache mode

   - Avoid sysfs warning due to mounting twice with the same fsid and
     domain_id in fscache mode

   - Refine fscache interface among erofs, fscache, and cachefiles

   - Use kmap_local_page() only for metabuf

   - Fixes around crafted images found by syzbot

   - Minor cleanups and documentation updates"

* tag 'erofs-for-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
  erofs: validate the extent length for uncompressed pclusters
  erofs: fix missing unmap if z_erofs_get_extent_compressedlen() fails
  erofs: Fix pcluster memleak when its block address is zero
  erofs: use kmap_local_page() only for erofs_bread()
  erofs: enable large folios for fscache mode
  erofs: support large folios for fscache mode
  erofs: switch to prepare_ondemand_read() in fscache mode
  fscache,cachefiles: add prepare_ondemand_read() callback
  erofs: clean up cached I/O strategies
  erofs: update documentation
  erofs: check the uniqueness of fsid in shared domain in advance
  erofs: enable large folios for iomap mode
  • Loading branch information
Linus Torvalds committed Dec 13, 2022
2 parents ad0d9da + c505feb commit 4a6bff1
Show file tree
Hide file tree
Showing 12 changed files with 344 additions and 344 deletions.
38 changes: 25 additions & 13 deletions Documentation/filesystems/erofs.rst
Original file line number Diff line number Diff line change
Expand Up @@ -30,12 +30,18 @@ It is implemented to be a better choice for the following scenarios:
especially for those embedded devices with limited memory and high-density
hosts with numerous containers.

Here is the main features of EROFS:
Here are the main features of EROFS:

- Little endian on-disk design;

- 4KiB block size and 32-bit block addresses, therefore 16TiB address space
at most for now;
- Block-based distribution and file-based distribution over fscache are
supported;

- Support multiple devices to refer to external blobs, which can be used
for container images;

- 4KiB block size and 32-bit block addresses for each device, therefore
16TiB address space at most for now;

- Two inode layouts for different requirements:

Expand All @@ -50,28 +56,31 @@ Here is the main features of EROFS:
Metadata reserved 8 bytes 18 bytes
===================== ============ ======================================

- Metadata and data could be mixed as an option;

- Support extended attributes (xattrs) as an option;
- Support extended attributes as an option;

- Support tailpacking data and xattr inline compared to byte-addressed
unaligned metadata or smaller block size alternatives;

- Support POSIX.1e ACLs by using xattrs;
- Support POSIX.1e ACLs by using extended attributes;

- Support transparent data compression as an option:
LZ4 and MicroLZMA algorithms can be used on a per-file basis; In addition,
inplace decompression is also supported to avoid bounce compressed buffers
and page cache thrashing.

- Support chunk-based data deduplication and rolling-hash compressed data
deduplication;

- Support tailpacking inline compared to byte-addressed unaligned metadata
or smaller block size alternatives;

- Support merging tail-end data into a special inode as fragments.

- Support large folios for uncompressed files.

- Support direct I/O on uncompressed files to avoid double caching for loop
devices;

- Support FSDAX on uncompressed images for secure containers and ramdisks in
order to get rid of unnecessary page cache.

- Support multiple devices for multi blob container images;

- Support file-based on-demand loading with the Fscache infrastructure.

The following git tree provides the file system user-space tools under
Expand Down Expand Up @@ -259,7 +268,7 @@ By the way, chunk-based files are all uncompressed for now.

Data compression
----------------
EROFS implements LZ4 fixed-sized output compression which generates fixed-sized
EROFS implements fixed-sized output compression which generates fixed-sized
compressed data blocks from variable-sized input in contrast to other existing
fixed-sized input solutions. Relatively higher compression ratios can be gotten
by using fixed-sized output compression since nowadays popular data compression
Expand Down Expand Up @@ -314,3 +323,6 @@ to understand its delta0 is constantly 1, as illustrated below::

If another HEAD follows a HEAD lcluster, there is no room to record CBLKCNT,
but it's easy to know the size of such pcluster is 1 lcluster as well.

Since Linux v6.1, each pcluster can be used for multiple variable-sized extents,
therefore it can be used for compressed data deduplication.
77 changes: 50 additions & 27 deletions fs/cachefiles/io.c
Original file line number Diff line number Diff line change
Expand Up @@ -385,38 +385,35 @@ static int cachefiles_write(struct netfs_cache_resources *cres,
term_func, term_func_priv);
}

/*
* Prepare a read operation, shortening it to a cached/uncached
* boundary as appropriate.
*/
static enum netfs_io_source cachefiles_prepare_read(struct netfs_io_subrequest *subreq,
loff_t i_size)
static inline enum netfs_io_source
cachefiles_do_prepare_read(struct netfs_cache_resources *cres,
loff_t start, size_t *_len, loff_t i_size,
unsigned long *_flags, ino_t netfs_ino)
{
enum cachefiles_prepare_read_trace why;
struct netfs_io_request *rreq = subreq->rreq;
struct netfs_cache_resources *cres = &rreq->cache_resources;
struct cachefiles_object *object;
struct cachefiles_object *object = NULL;
struct cachefiles_cache *cache;
struct fscache_cookie *cookie = fscache_cres_cookie(cres);
const struct cred *saved_cred;
struct file *file = cachefiles_cres_file(cres);
enum netfs_io_source ret = NETFS_DOWNLOAD_FROM_SERVER;
size_t len = *_len;
loff_t off, to;
ino_t ino = file ? file_inode(file)->i_ino : 0;
int rc;

_enter("%zx @%llx/%llx", subreq->len, subreq->start, i_size);
_enter("%zx @%llx/%llx", len, start, i_size);

if (subreq->start >= i_size) {
if (start >= i_size) {
ret = NETFS_FILL_WITH_ZEROES;
why = cachefiles_trace_read_after_eof;
goto out_no_object;
}

if (test_bit(FSCACHE_COOKIE_NO_DATA_TO_READ, &cookie->flags)) {
__set_bit(NETFS_SREQ_COPY_TO_CACHE, &subreq->flags);
__set_bit(NETFS_SREQ_COPY_TO_CACHE, _flags);
why = cachefiles_trace_read_no_data;
if (!test_bit(NETFS_SREQ_ONDEMAND, &subreq->flags))
if (!test_bit(NETFS_SREQ_ONDEMAND, _flags))
goto out_no_object;
}

Expand All @@ -437,7 +434,7 @@ static enum netfs_io_source cachefiles_prepare_read(struct netfs_io_subrequest *
retry:
off = cachefiles_inject_read_error();
if (off == 0)
off = vfs_llseek(file, subreq->start, SEEK_DATA);
off = vfs_llseek(file, start, SEEK_DATA);
if (off < 0 && off >= (loff_t)-MAX_ERRNO) {
if (off == (loff_t)-ENXIO) {
why = cachefiles_trace_read_seek_nxio;
Expand All @@ -449,58 +446,83 @@ static enum netfs_io_source cachefiles_prepare_read(struct netfs_io_subrequest *
goto out;
}

if (off >= subreq->start + subreq->len) {
if (off >= start + len) {
why = cachefiles_trace_read_found_hole;
goto download_and_store;
}

if (off > subreq->start) {
if (off > start) {
off = round_up(off, cache->bsize);
subreq->len = off - subreq->start;
len = off - start;
*_len = len;
why = cachefiles_trace_read_found_part;
goto download_and_store;
}

to = cachefiles_inject_read_error();
if (to == 0)
to = vfs_llseek(file, subreq->start, SEEK_HOLE);
to = vfs_llseek(file, start, SEEK_HOLE);
if (to < 0 && to >= (loff_t)-MAX_ERRNO) {
trace_cachefiles_io_error(object, file_inode(file), to,
cachefiles_trace_seek_error);
why = cachefiles_trace_read_seek_error;
goto out;
}

if (to < subreq->start + subreq->len) {
if (subreq->start + subreq->len >= i_size)
if (to < start + len) {
if (start + len >= i_size)
to = round_up(to, cache->bsize);
else
to = round_down(to, cache->bsize);
subreq->len = to - subreq->start;
len = to - start;
*_len = len;
}

why = cachefiles_trace_read_have_data;
ret = NETFS_READ_FROM_CACHE;
goto out;

download_and_store:
__set_bit(NETFS_SREQ_COPY_TO_CACHE, &subreq->flags);
if (test_bit(NETFS_SREQ_ONDEMAND, &subreq->flags)) {
rc = cachefiles_ondemand_read(object, subreq->start,
subreq->len);
__set_bit(NETFS_SREQ_COPY_TO_CACHE, _flags);
if (test_bit(NETFS_SREQ_ONDEMAND, _flags)) {
rc = cachefiles_ondemand_read(object, start, len);
if (!rc) {
__clear_bit(NETFS_SREQ_ONDEMAND, &subreq->flags);
__clear_bit(NETFS_SREQ_ONDEMAND, _flags);
goto retry;
}
ret = NETFS_INVALID_READ;
}
out:
cachefiles_end_secure(cache, saved_cred);
out_no_object:
trace_cachefiles_prep_read(subreq, ret, why, ino);
trace_cachefiles_prep_read(object, start, len, *_flags, ret, why, ino, netfs_ino);
return ret;
}

/*
* Prepare a read operation, shortening it to a cached/uncached
* boundary as appropriate.
*/
static enum netfs_io_source cachefiles_prepare_read(struct netfs_io_subrequest *subreq,
loff_t i_size)
{
return cachefiles_do_prepare_read(&subreq->rreq->cache_resources,
subreq->start, &subreq->len, i_size,
&subreq->flags, subreq->rreq->inode->i_ino);
}

/*
* Prepare an on-demand read operation, shortening it to a cached/uncached
* boundary as appropriate.
*/
static enum netfs_io_source
cachefiles_prepare_ondemand_read(struct netfs_cache_resources *cres,
loff_t start, size_t *_len, loff_t i_size,
unsigned long *_flags, ino_t ino)
{
return cachefiles_do_prepare_read(cres, start, _len, i_size, _flags, ino);
}

/*
* Prepare for a write to occur.
*/
Expand Down Expand Up @@ -621,6 +643,7 @@ static const struct netfs_cache_ops cachefiles_netfs_cache_ops = {
.write = cachefiles_write,
.prepare_read = cachefiles_prepare_read,
.prepare_write = cachefiles_prepare_write,
.prepare_ondemand_read = cachefiles_prepare_ondemand_read,
.query_occupancy = cachefiles_query_occupancy,
};

Expand Down
10 changes: 4 additions & 6 deletions fs/erofs/data.c
Original file line number Diff line number Diff line change
Expand Up @@ -13,9 +13,7 @@
void erofs_unmap_metabuf(struct erofs_buf *buf)
{
if (buf->kmap_type == EROFS_KMAP)
kunmap(buf->page);
else if (buf->kmap_type == EROFS_KMAP_ATOMIC)
kunmap_atomic(buf->base);
kunmap_local(buf->base);
buf->base = NULL;
buf->kmap_type = EROFS_NO_KMAP;
}
Expand Down Expand Up @@ -54,9 +52,7 @@ void *erofs_bread(struct erofs_buf *buf, struct inode *inode,
}
if (buf->kmap_type == EROFS_NO_KMAP) {
if (type == EROFS_KMAP)
buf->base = kmap(page);
else if (type == EROFS_KMAP_ATOMIC)
buf->base = kmap_atomic(page);
buf->base = kmap_local_page(page);
buf->kmap_type = type;
} else if (buf->kmap_type != type) {
DBG_BUGON(1);
Expand Down Expand Up @@ -403,6 +399,8 @@ const struct address_space_operations erofs_raw_access_aops = {
.readahead = erofs_readahead,
.bmap = erofs_bmap,
.direct_IO = noop_direct_IO,
.release_folio = iomap_release_folio,
.invalidate_folio = iomap_invalidate_folio,
};

#ifdef CONFIG_FS_DAX
Expand Down
Loading

0 comments on commit 4a6bff1

Please sign in to comment.