Skip to content

Commit

Permalink
---
Browse files Browse the repository at this point in the history
yaml
---
r: 101133
b: refs/heads/master
c: 1dc60c5
h: refs/heads/master
i:
  101131: 0dd5674
v: v3
  • Loading branch information
Linus Torvalds committed Jul 15, 2008
1 parent f1ff18c commit b7ecc23
Show file tree
Hide file tree
Showing 167 changed files with 4,499 additions and 2,058 deletions.
2 changes: 1 addition & 1 deletion [refs]
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
---
refs/heads/master: 3f1c38723eb467d34d704d0ee6e7b796ba4981ee
refs/heads/master: 1dc60c53d36b08f361e1a2767c41196acce96d08
125 changes: 75 additions & 50 deletions trunk/Documentation/filesystems/ext4.txt
Original file line number Diff line number Diff line change
Expand Up @@ -13,72 +13,93 @@ Mailing list: linux-ext4@vger.kernel.org
1. Quick usage instructions:
===========================

- Grab updated e2fsprogs from
ftp://ftp.kernel.org/pub/linux/kernel/people/tytso/e2fsprogs-interim/
This is a patchset on top of e2fsprogs-1.39, which can be found at
- Compile and install the latest version of e2fsprogs (as of this
writing version 1.41) from:

http://sourceforge.net/project/showfiles.php?group_id=2406

or

ftp://ftp.kernel.org/pub/linux/kernel/people/tytso/e2fsprogs/

- It's still mke2fs -j /dev/hda1
or grab the latest git repository from:

git://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git

- Create a new filesystem using the ext4dev filesystem type:

# mke2fs -t ext4dev /dev/hda1

Or configure an existing ext3 filesystem to support extents and set
the test_fs flag to indicate that it's ok for an in-development
filesystem to touch this filesystem:

- mount /dev/hda1 /wherever -t ext4dev
# tune2fs -O extents -E test_fs /dev/hda1

- To enable extents,
If the filesystem was created with 128 byte inodes, it can be
converted to use 256 byte for greater efficiency via:

mount /dev/hda1 /wherever -t ext4dev -o extents
# tune2fs -I 256 /dev/hda1

- The filesystem is compatible with the ext3 driver until you add a file
which has extents (ie: `mount -o extents', then create a file).
(Note: we currently do not have tools to convert an ext4dev
filesystem back to ext3; so please do not do try this on production
filesystems.)

NOTE: The "extents" mount flag is temporary. It will soon go away and
extents will be enabled by the "-o extents" flag to mke2fs or tune2fs
- Mounting:

# mount -t ext4dev /dev/hda1 /wherever

- When comparing performance with other filesystems, remember that
ext3/4 by default offers higher data integrity guarantees than most. So
when comparing with a metadata-only journalling filesystem, use `mount -o
data=writeback'. And you might as well use `mount -o nobh' too along
with it. Making the journal larger than the mke2fs default often helps
performance with metadata-intensive workloads.
ext3/4 by default offers higher data integrity guarantees than most.
So when comparing with a metadata-only journalling filesystem, such
as ext3, use `mount -o data=writeback'. And you might as well use
`mount -o nobh' too along with it. Making the journal larger than
the mke2fs default often helps performance with metadata-intensive
workloads.

2. Features
===========

2.1 Currently available

* ability to use filesystems > 16TB
* ability to use filesystems > 16TB (e2fsprogs support not available yet)
* extent format reduces metadata overhead (RAM, IO for access, transactions)
* extent format more robust in face of on-disk corruption due to magics,
* internal redunancy in tree

2.1 Previously available, soon to be enabled by default by "mkefs.ext4":

* dir_index and resize inode will be on by default
* large inodes will be used by default for fast EAs, nsec timestamps, etc
* improved file allocation (multi-block alloc)
* fix 32000 subdirectory limit
* nsec timestamps for mtime, atime, ctime, create time
* inode version field on disk (NFSv4, Lustre)
* reduced e2fsck time via uninit_bg feature
* journal checksumming for robustness, performance
* persistent file preallocation (e.g for streaming media, databases)
* ability to pack bitmaps and inode tables into larger virtual groups via the
flex_bg feature
* large file support
* Inode allocation using large virtual block groups via flex_bg
* delayed allocation
* large block (up to pagesize) support
* efficent new ordered mode in JBD2 and ext4(avoid using buffer head to force
the ordering)

2.2 Candidate features for future inclusion

There are several under discussion, whether they all make it in is
partly a function of how much time everyone has to work on them:
* Online defrag (patches available but not well tested)
* reduced mke2fs time via lazy itable initialization in conjuction with
the uninit_bg feature (capability to do this is available in e2fsprogs
but a kernel thread to do lazy zeroing of unused inode table blocks
after filesystem is first mounted is required for safety)

* improved file allocation (multi-block alloc, delayed alloc; basically done)
* fix 32000 subdirectory limit (patch exists, needs some e2fsck work)
* nsec timestamps for mtime, atime, ctime, create time (patch exists,
needs some e2fsck work)
* inode version field on disk (NFSv4, Lustre; prototype exists)
* reduced mke2fs/e2fsck time via uninitialized groups (prototype exists)
* journal checksumming for robustness, performance (prototype exists)
* persistent file preallocation (e.g for streaming media, databases)
There are several others under discussion, whether they all make it in is
partly a function of how much time everyone has to work on them. Features like
metadata checksumming have been discussed and planned for a bit but no patches
exist yet so I'm not sure they're in the near-term roadmap.

Features like metadata checksumming have been discussed and planned for
a bit but no patches exist yet so I'm not sure they're in the near-term
roadmap.
The big performance win will come with mballoc, delalloc and flex_bg
grouping of bitmaps and inode tables. Some test results available here:

The big performance win will come with mballoc and delalloc. CFS has
been using mballoc for a few years already with Lustre, and IBM + Bull
did a lot of benchmarking on it. The reason it isn't in the first set of
patches is partly a manageability issue, and partly because it doesn't
directly affect the on-disk format (outside of much better allocation)
so it isn't critical to get into the first round of changes. I believe
Alex is working on a new set of patches right now.
- http://www.bullopensource.org/ext4/20080530/ffsb-write-2.6.26-rc2.html
- http://www.bullopensource.org/ext4/20080530/ffsb-readwrite-2.6.26-rc2.html

3. Options
==========
Expand Down Expand Up @@ -222,9 +243,11 @@ stripe=n Number of filesystem blocks that mballoc will try
to use for allocation size and alignment. For RAID5/6
systems this should be the number of data
disks * RAID chunk size in file system blocks.

delalloc (*) Deferring block allocation until write-out time.
nodelalloc Disable delayed allocation. Blocks are allocation
when data is copied from user to page cache.
Data Mode
---------
=========
There are 3 different data modes:

* writeback mode
Expand All @@ -236,18 +259,19 @@ typically provide the best ext4 performance.

* ordered mode
In data=ordered mode, ext4 only officially journals metadata, but it logically
groups metadata and data blocks into a single unit called a transaction. When
it's time to write the new metadata out to disk, the associated data blocks
are written first. In general, this mode performs slightly slower than
writeback but significantly faster than journal mode.
groups metadata information related to data changes with the data blocks into a
single unit called a transaction. When it's time to write the new metadata
out to disk, the associated data blocks are written first. In general,
this mode performs slightly slower than writeback but significantly faster than journal mode.

* journal mode
data=journal mode provides full data and metadata journaling. All new data is
written to the journal first, and then to its final location.
In the event of a crash, the journal can be replayed, bringing both data and
metadata into a consistent state. This mode is the slowest except when data
needs to be read from and written to disk at the same time where it
outperforms all others modes.
outperforms all others modes. Curently ext4 does not have delayed
allocation support if this data journalling mode is selected.

References
==========
Expand All @@ -256,7 +280,8 @@ kernel source: <file:fs/ext4/>
<file:fs/jbd2/>

programs: http://e2fsprogs.sourceforge.net/
http://ext2resize.sourceforge.net

useful links: http://fedoraproject.org/wiki/ext3-devel
http://www.bullopensource.org/ext4/
http://ext4.wiki.kernel.org/index.php/Main_Page
http://fedoraproject.org/wiki/Features/Ext4
5 changes: 5 additions & 0 deletions trunk/drivers/gpu/drm/drm_memory.c
Original file line number Diff line number Diff line change
Expand Up @@ -167,6 +167,11 @@ void drm_core_ioremap(struct drm_map *map, struct drm_device *dev)
}
EXPORT_SYMBOL(drm_core_ioremap);

void drm_core_ioremap_wc(struct drm_map *map, struct drm_device *dev)
{
map->handle = ioremap_wc(map->offset, map->size);
}
EXPORT_SYMBOL(drm_core_ioremap_wc);
void drm_core_ioremapfree(struct drm_map *map, struct drm_device *dev)
{
if (!map->handle || !map->size)
Expand Down
2 changes: 1 addition & 1 deletion trunk/drivers/gpu/drm/radeon/radeon_cp.c
Original file line number Diff line number Diff line change
Expand Up @@ -1154,7 +1154,7 @@ static int radeon_do_init_cp(struct drm_device * dev, drm_radeon_init_t * init)
dev_priv->gart_info.mapping.size =
dev_priv->gart_info.table_size;

drm_core_ioremap(&dev_priv->gart_info.mapping, dev);
drm_core_ioremap_wc(&dev_priv->gart_info.mapping, dev);
dev_priv->gart_info.addr =
dev_priv->gart_info.mapping.handle;

Expand Down
42 changes: 24 additions & 18 deletions trunk/drivers/infiniband/core/addr.c
Original file line number Diff line number Diff line change
Expand Up @@ -4,28 +4,33 @@
* Copyright (c) 1999-2005, Mellanox Technologies, Inc. All rights reserved.
* Copyright (c) 2005 Intel Corporation. All rights reserved.
*
* This Software is licensed under one of the following licenses:
* This software is available to you under a choice of one of two
* licenses. You may choose to be licensed under the terms of the GNU
* General Public License (GPL) Version 2, available from the file
* COPYING in the main directory of this source tree, or the
* OpenIB.org BSD license below:
*
* 1) under the terms of the "Common Public License 1.0" a copy of which is
* available from the Open Source Initiative, see
* http://www.opensource.org/licenses/cpl.php.
* Redistribution and use in source and binary forms, with or
* without modification, are permitted provided that the following
* conditions are met:
*
* 2) under the terms of the "The BSD License" a copy of which is
* available from the Open Source Initiative, see
* http://www.opensource.org/licenses/bsd-license.php.
* - Redistributions of source code must retain the above
* copyright notice, this list of conditions and the following
* disclaimer.
*
* 3) under the terms of the "GNU General Public License (GPL) Version 2" a
* copy of which is available from the Open Source Initiative, see
* http://www.opensource.org/licenses/gpl-license.php.
* - Redistributions in binary form must reproduce the above
* copyright notice, this list of conditions and the following
* disclaimer in the documentation and/or other materials
* provided with the distribution.
*
* Licensee has the right to choose one of the above licenses.
*
* Redistributions of source code must retain the above copyright
* notice and one of the license notices.
*
* Redistributions in binary form must reproduce both the above copyright
* notice, one of the license notices in the documentation
* and/or other materials provided with the distribution.
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
* MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
* BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
* ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
* CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
* SOFTWARE.
*/

#include <linux/mutex.h>
Expand Down Expand Up @@ -100,6 +105,7 @@ int rdma_copy_addr(struct rdma_dev_addr *dev_addr, struct net_device *dev,
memcpy(dev_addr->broadcast, dev->broadcast, MAX_ADDR_LEN);
if (dst_dev_addr)
memcpy(dev_addr->dst_dev_addr, dst_dev_addr, MAX_ADDR_LEN);
dev_addr->src_dev = dev;
return 0;
}
EXPORT_SYMBOL(rdma_copy_addr);
Expand Down
2 changes: 0 additions & 2 deletions trunk/drivers/infiniband/core/agent.h
Original file line number Diff line number Diff line change
Expand Up @@ -32,8 +32,6 @@
* ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
* CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
* SOFTWARE.
*
* $Id: agent.h 1389 2004-12-27 22:56:47Z roland $
*/

#ifndef __AGENT_H_
Expand Down
2 changes: 0 additions & 2 deletions trunk/drivers/infiniband/core/cache.c
Original file line number Diff line number Diff line change
Expand Up @@ -31,8 +31,6 @@
* ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
* CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
* SOFTWARE.
*
* $Id: cache.c 1349 2004-12-16 21:09:43Z roland $
*/

#include <linux/module.h>
Expand Down
2 changes: 0 additions & 2 deletions trunk/drivers/infiniband/core/cm.c
Original file line number Diff line number Diff line change
Expand Up @@ -31,8 +31,6 @@
* ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
* CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
* SOFTWARE.
*
* $Id: cm.c 4311 2005-12-05 18:42:01Z sean.hefty $
*/

#include <linux/completion.h>
Expand Down
Loading

0 comments on commit b7ecc23

Please sign in to comment.