Skip to content

Commit

Permalink
Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel…
Browse files Browse the repository at this point in the history
…/git/tytso/ext4

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: Remove automatic enabling of the HUGE_FILE feature flag
  ext4: Replace hackish ext4_mb_poll_new_transaction with commit callback
  ext4: Update Documentation/filesystems/ext4.txt
  ext4: Remove unused mount options: nomballoc, mballoc, nocheck
  ext4: Remove compile warnings when building w/o CONFIG_PROC_FS
  ext4: Add missing newlines to printk messages
  ext4: Fix file fragmentation during large file write.
  vfs: Add no_nrwrite_index_update writeback control flag
  vfs: Remove the range_cont writeback mode.
  ext4: Use tag dirty lookup during mpage_da_submit_io
  ext4: let the block device know when unused blocks can be discarded
  ext4: Don't reuse released data blocks until transaction commits
  ext4: Use an rbtree for tracking blocks freed during transaction.
  ext4: Do mballoc init before doing filesystem recovery
  ext4: Free ext4_prealloc_space using kmem_cache_free
  ext4: Fix Kconfig typo for ext4dev
  ext4: Remove an old reference to ext4dev in Makefile comment
  • Loading branch information
Linus Torvalds committed Oct 17, 2008
2 parents 26e9a39 + f287a1a commit 58617d5
Show file tree
Hide file tree
Showing 15 changed files with 320 additions and 336 deletions.
32 changes: 15 additions & 17 deletions Documentation/filesystems/ext4.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,19 +2,24 @@
Ext4 Filesystem
===============

This is a development version of the ext4 filesystem, an advanced level
of the ext3 filesystem which incorporates scalability and reliability
enhancements for supporting large filesystems (64 bit) in keeping with
increasing disk capacities and state-of-the-art feature requirements.
Ext4 is an an advanced level of the ext3 filesystem which incorporates
scalability and reliability enhancements for supporting large filesystems
(64 bit) in keeping with increasing disk capacities and state-of-the-art
feature requirements.

Mailing list: linux-ext4@vger.kernel.org
Mailing list: linux-ext4@vger.kernel.org
Web site: http://ext4.wiki.kernel.org


1. Quick usage instructions:
===========================

Note: More extensive information for getting started with ext4 can be
found at the ext4 wiki site at the URL:
http://ext4.wiki.kernel.org/index.php/Ext4_Howto

- Compile and install the latest version of e2fsprogs (as of this
writing version 1.41) from:
writing version 1.41.3) from:

http://sourceforge.net/project/showfiles.php?group_id=2406

Expand All @@ -36,11 +41,9 @@ Mailing list: linux-ext4@vger.kernel.org

# mke2fs -t ext4 /dev/hda1

Or configure an existing ext3 filesystem to support extents and set
the test_fs flag to indicate that it's ok for an in-development
filesystem to touch this filesystem:
Or to configure an existing ext3 filesystem to support extents:

# tune2fs -O extents -E test_fs /dev/hda1
# tune2fs -O extents /dev/hda1

If the filesystem was created with 128 byte inodes, it can be
converted to use 256 byte for greater efficiency via:
Expand Down Expand Up @@ -104,8 +107,8 @@ exist yet so I'm not sure they're in the near-term roadmap.
The big performance win will come with mballoc, delalloc and flex_bg
grouping of bitmaps and inode tables. Some test results available here:

- http://www.bullopensource.org/ext4/20080530/ffsb-write-2.6.26-rc2.html
- http://www.bullopensource.org/ext4/20080530/ffsb-readwrite-2.6.26-rc2.html
- http://www.bullopensource.org/ext4/20080818-ffsb/ffsb-write-2.6.27-rc1.html
- http://www.bullopensource.org/ext4/20080818-ffsb/ffsb-readwrite-2.6.27-rc1.html

3. Options
==========
Expand Down Expand Up @@ -214,9 +217,6 @@ noreservation
bsddf (*) Make 'df' act like BSD.
minixdf Make 'df' act like Minix.

check=none Don't do extra checking of bitmaps on mount.
nocheck

debug Extra debugging information is sent to syslog.

errors=remount-ro(*) Remount the filesystem read-only on an error.
Expand Down Expand Up @@ -253,8 +253,6 @@ nobh (a) cache disk block mapping information
"nobh" option tries to avoid associating buffer
heads (supported only for "writeback" mode).

mballoc (*) Use the multiple block allocator for block allocation
nomballoc disabled multiple block allocator for block allocation.
stripe=n Number of filesystem blocks that mballoc will try
to use for allocation size and alignment. For RAID5/6
systems this should be the number of data
Expand Down
2 changes: 1 addition & 1 deletion fs/Kconfig
Original file line number Diff line number Diff line change
Expand Up @@ -160,7 +160,7 @@ config EXT4_FS
filesystem initially.

To compile this file system support as a module, choose M here. The
module will be called ext4dev.
module will be called ext4.

If unsure, say N.

Expand Down
2 changes: 1 addition & 1 deletion fs/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,7 @@ obj-$(CONFIG_DLM) += dlm/
# Do not add any filesystems before this line
obj-$(CONFIG_REISERFS_FS) += reiserfs/
obj-$(CONFIG_EXT3_FS) += ext3/ # Before ext2 so root fs can be ext3
obj-$(CONFIG_EXT4_FS) += ext4/ # Before ext2 so root fs can be ext4dev
obj-$(CONFIG_EXT4_FS) += ext4/ # Before ext2 so root fs can be ext4
obj-$(CONFIG_JBD) += jbd/
obj-$(CONFIG_JBD2) += jbd2/
obj-$(CONFIG_EXT2_FS) += ext2/
Expand Down
12 changes: 10 additions & 2 deletions fs/ext4/balloc.c
Original file line number Diff line number Diff line change
Expand Up @@ -568,8 +568,16 @@ void ext4_free_blocks(handle_t *handle, struct inode *inode,

/* this isn't the right place to decide whether block is metadata
* inode.c/extents.c knows better, but for safety ... */
if (S_ISDIR(inode->i_mode) || S_ISLNK(inode->i_mode) ||
ext4_should_journal_data(inode))
if (S_ISDIR(inode->i_mode) || S_ISLNK(inode->i_mode))
metadata = 1;

/* We need to make sure we don't reuse
* block released untill the transaction commit.
* writeback mode have weak data consistency so
* don't force data as metadata when freeing block
* for writeback mode.
*/
if (metadata == 0 && !ext4_should_writeback_data(inode))
metadata = 1;

sb = inode->i_sb;
Expand Down
1 change: 0 additions & 1 deletion fs/ext4/ext4.h
Original file line number Diff line number Diff line change
Expand Up @@ -511,7 +511,6 @@ do { \
/*
* Mount flags
*/
#define EXT4_MOUNT_CHECK 0x00001 /* Do mount-time checks */
#define EXT4_MOUNT_OLDALLOC 0x00002 /* Don't use the new Orlov allocator */
#define EXT4_MOUNT_GRPID 0x00004 /* Create files with directory's group */
#define EXT4_MOUNT_DEBUG 0x00008 /* Some debugging messages */
Expand Down
3 changes: 0 additions & 3 deletions fs/ext4/ext4_sb.h
Original file line number Diff line number Diff line change
Expand Up @@ -99,9 +99,6 @@ struct ext4_sb_info {
struct inode *s_buddy_cache;
long s_blocks_reserved;
spinlock_t s_reserve_lock;
struct list_head s_active_transaction;
struct list_head s_closed_transaction;
struct list_head s_committed_transaction;
spinlock_t s_md_lock;
tid_t s_last_transaction;
unsigned short *s_mb_offsets, *s_mb_maxs;
Expand Down
143 changes: 76 additions & 67 deletions fs/ext4/inode.c
Original file line number Diff line number Diff line change
Expand Up @@ -1648,27 +1648,38 @@ static int mpage_da_submit_io(struct mpage_da_data *mpd)
int ret = 0, err, nr_pages, i;
unsigned long index, end;
struct pagevec pvec;
long pages_skipped;

BUG_ON(mpd->next_page <= mpd->first_page);
pagevec_init(&pvec, 0);
index = mpd->first_page;
end = mpd->next_page - 1;

while (index <= end) {
/* XXX: optimize tail */
nr_pages = pagevec_lookup(&pvec, mapping, index, PAGEVEC_SIZE);
/*
* We can use PAGECACHE_TAG_DIRTY lookup here because
* even though we have cleared the dirty flag on the page
* We still keep the page in the radix tree with tag
* PAGECACHE_TAG_DIRTY. See clear_page_dirty_for_io.
* The PAGECACHE_TAG_DIRTY is cleared in set_page_writeback
* which is called via the below writepage callback.
*/
nr_pages = pagevec_lookup_tag(&pvec, mapping, &index,
PAGECACHE_TAG_DIRTY,
min(end - index,
(pgoff_t)PAGEVEC_SIZE-1) + 1);
if (nr_pages == 0)
break;
for (i = 0; i < nr_pages; i++) {
struct page *page = pvec.pages[i];

index = page->index;
if (index > end)
break;
index++;

pages_skipped = mpd->wbc->pages_skipped;
err = mapping->a_ops->writepage(page, mpd->wbc);
if (!err)
if (!err && (pages_skipped == mpd->wbc->pages_skipped))
/*
* have successfully written the page
* without skipping the same
*/
mpd->pages_written++;
/*
* In error case, we have to continue because
Expand Down Expand Up @@ -2104,7 +2115,6 @@ static int mpage_da_writepages(struct address_space *mapping,
struct writeback_control *wbc,
struct mpage_da_data *mpd)
{
long to_write;
int ret;

if (!mpd->get_block)
Expand All @@ -2119,19 +2129,18 @@ static int mpage_da_writepages(struct address_space *mapping,
mpd->pages_written = 0;
mpd->retval = 0;

to_write = wbc->nr_to_write;

ret = write_cache_pages(mapping, wbc, __mpage_da_writepage, mpd);

/*
* Handle last extent of pages
*/
if (!mpd->io_done && mpd->next_page != mpd->first_page) {
if (mpage_da_map_blocks(mpd) == 0)
mpage_da_submit_io(mpd);
}

wbc->nr_to_write = to_write - mpd->pages_written;
mpd->io_done = 1;
ret = MPAGE_DA_EXTENT_TAIL;
}
wbc->nr_to_write -= mpd->pages_written;
return ret;
}

Expand Down Expand Up @@ -2360,12 +2369,14 @@ static int ext4_da_writepages_trans_blocks(struct inode *inode)
static int ext4_da_writepages(struct address_space *mapping,
struct writeback_control *wbc)
{
pgoff_t index;
int range_whole = 0;
handle_t *handle = NULL;
loff_t range_start = 0;
struct mpage_da_data mpd;
struct inode *inode = mapping->host;
int no_nrwrite_index_update;
long pages_written = 0, pages_skipped;
int needed_blocks, ret = 0, nr_to_writebump = 0;
long to_write, pages_skipped = 0;
struct ext4_sb_info *sbi = EXT4_SB(mapping->host->i_sb);

/*
Expand All @@ -2385,23 +2396,26 @@ static int ext4_da_writepages(struct address_space *mapping,
nr_to_writebump = sbi->s_mb_stream_request - wbc->nr_to_write;
wbc->nr_to_write = sbi->s_mb_stream_request;
}
if (wbc->range_start == 0 && wbc->range_end == LLONG_MAX)
range_whole = 1;

if (!wbc->range_cyclic)
/*
* If range_cyclic is not set force range_cont
* and save the old writeback_index
*/
wbc->range_cont = 1;

range_start = wbc->range_start;
pages_skipped = wbc->pages_skipped;
if (wbc->range_cyclic)
index = mapping->writeback_index;
else
index = wbc->range_start >> PAGE_CACHE_SHIFT;

mpd.wbc = wbc;
mpd.inode = mapping->host;

restart_loop:
to_write = wbc->nr_to_write;
while (!ret && to_write > 0) {
/*
* we don't want write_cache_pages to update
* nr_to_write and writeback_index
*/
no_nrwrite_index_update = wbc->no_nrwrite_index_update;
wbc->no_nrwrite_index_update = 1;
pages_skipped = wbc->pages_skipped;

while (!ret && wbc->nr_to_write > 0) {

/*
* we insert one extent at a time. So we need
Expand All @@ -2422,48 +2436,53 @@ static int ext4_da_writepages(struct address_space *mapping,
dump_stack();
goto out_writepages;
}
to_write -= wbc->nr_to_write;

mpd.get_block = ext4_da_get_block_write;
ret = mpage_da_writepages(mapping, wbc, &mpd);

ext4_journal_stop(handle);

if (mpd.retval == -ENOSPC)
if (mpd.retval == -ENOSPC) {
/* commit the transaction which would
* free blocks released in the transaction
* and try again
*/
jbd2_journal_force_commit_nested(sbi->s_journal);

/* reset the retry count */
if (ret == MPAGE_DA_EXTENT_TAIL) {
wbc->pages_skipped = pages_skipped;
ret = 0;
} else if (ret == MPAGE_DA_EXTENT_TAIL) {
/*
* got one extent now try with
* rest of the pages
*/
to_write += wbc->nr_to_write;
pages_written += mpd.pages_written;
wbc->pages_skipped = pages_skipped;
ret = 0;
} else if (wbc->nr_to_write) {
} else if (wbc->nr_to_write)
/*
* There is no more writeout needed
* or we requested for a noblocking writeout
* and we found the device congested
*/
to_write += wbc->nr_to_write;
break;
}
wbc->nr_to_write = to_write;
}

if (wbc->range_cont && (pages_skipped != wbc->pages_skipped)) {
/* We skipped pages in this loop */
wbc->range_start = range_start;
wbc->nr_to_write = to_write +
wbc->pages_skipped - pages_skipped;
wbc->pages_skipped = pages_skipped;
goto restart_loop;
}
if (pages_skipped != wbc->pages_skipped)
printk(KERN_EMERG "This should not happen leaving %s "
"with nr_to_write = %ld ret = %d\n",
__func__, wbc->nr_to_write, ret);

/* Update index */
index += pages_written;
if (wbc->range_cyclic || (range_whole && wbc->nr_to_write > 0))
/*
* set the writeback_index so that range_cyclic
* mode will write it back later
*/
mapping->writeback_index = index;

out_writepages:
wbc->nr_to_write = to_write - nr_to_writebump;
wbc->range_start = range_start;
if (!no_nrwrite_index_update)
wbc->no_nrwrite_index_update = 0;
wbc->nr_to_write -= nr_to_writebump;
return ret;
}

Expand Down Expand Up @@ -4175,7 +4194,6 @@ static int ext4_inode_blocks_set(handle_t *handle,
struct inode *inode = &(ei->vfs_inode);
u64 i_blocks = inode->i_blocks;
struct super_block *sb = inode->i_sb;
int err = 0;

if (i_blocks <= ~0U) {
/*
Expand All @@ -4185,36 +4203,27 @@ static int ext4_inode_blocks_set(handle_t *handle,
raw_inode->i_blocks_lo = cpu_to_le32(i_blocks);
raw_inode->i_blocks_high = 0;
ei->i_flags &= ~EXT4_HUGE_FILE_FL;
} else if (i_blocks <= 0xffffffffffffULL) {
return 0;
}
if (!EXT4_HAS_RO_COMPAT_FEATURE(sb, EXT4_FEATURE_RO_COMPAT_HUGE_FILE))
return -EFBIG;

if (i_blocks <= 0xffffffffffffULL) {
/*
* i_blocks can be represented in a 48 bit variable
* as multiple of 512 bytes
*/
err = ext4_update_rocompat_feature(handle, sb,
EXT4_FEATURE_RO_COMPAT_HUGE_FILE);
if (err)
goto err_out;
/* i_block is stored in the split 48 bit fields */
raw_inode->i_blocks_lo = cpu_to_le32(i_blocks);
raw_inode->i_blocks_high = cpu_to_le16(i_blocks >> 32);
ei->i_flags &= ~EXT4_HUGE_FILE_FL;
} else {
/*
* i_blocks should be represented in a 48 bit variable
* as multiple of file system block size
*/
err = ext4_update_rocompat_feature(handle, sb,
EXT4_FEATURE_RO_COMPAT_HUGE_FILE);
if (err)
goto err_out;
ei->i_flags |= EXT4_HUGE_FILE_FL;
/* i_block is stored in file system block size */
i_blocks = i_blocks >> (inode->i_blkbits - 9);
raw_inode->i_blocks_lo = cpu_to_le32(i_blocks);
raw_inode->i_blocks_high = cpu_to_le16(i_blocks >> 32);
}
err_out:
return err;
return 0;
}

/*
Expand Down
Loading

0 comments on commit 58617d5

Please sign in to comment.