Skip to content

Commit

Permalink
---
Browse files Browse the repository at this point in the history
yaml
---
r: 242726
b: refs/heads/master
c: cf55bb2
h: refs/heads/master
v: v3
  • Loading branch information
Roland Dreier committed Mar 25, 2011
1 parent 9f15145 commit 88b1e82
Show file tree
Hide file tree
Showing 644 changed files with 12,673 additions and 24,405 deletions.
2 changes: 1 addition & 1 deletion [refs]
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
---
refs/heads/master: 9f34217c846a96dea03f4418e2f27423658d3542
refs/heads/master: cf55bb2439d2a7080fae6edf84919fd81f891574
13 changes: 3 additions & 10 deletions trunk/Documentation/ABI/testing/sysfs-fs-ext4
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ Description:
will have its blocks allocated out of its own unique
preallocation pool.

What: /sys/fs/ext4/<disk>/inode_readahead_blks
What: /sys/fs/ext4/<disk>/inode_readahead
Date: March 2008
Contact: "Theodore Ts'o" <tytso@mit.edu>
Description:
Expand Down Expand Up @@ -85,14 +85,7 @@ Date: June 2008
Contact: "Theodore Ts'o" <tytso@mit.edu>
Description:
Tuning parameter which (if non-zero) controls the goal
inode used by the inode allocator in preference to
all other allocation heuristics. This is intended for
inode used by the inode allocator in p0reference to
all other allocation hueristics. This is intended for
debugging use only, and should be 0 on production
systems.

What: /sys/fs/ext4/<disk>/max_writeback_mb_bump
Date: September 2009
Contact: "Theodore Ts'o" <tytso@mit.edu>
Description:
The maximum number of megabytes the writeback code will
try to write out before move on to another inode.
17 changes: 0 additions & 17 deletions trunk/Documentation/device-mapper/dm-flakey.txt

This file was deleted.

2 changes: 1 addition & 1 deletion trunk/Documentation/filesystems/Locking
Original file line number Diff line number Diff line change
Expand Up @@ -128,7 +128,7 @@ alloc_inode:
destroy_inode:
dirty_inode: (must not sleep)
write_inode:
drop_inode: !!!inode->i_lock!!!
drop_inode: !!!inode_lock!!!
evict_inode:
put_super: write
write_super: read
Expand Down
207 changes: 1 addition & 206 deletions trunk/Documentation/filesystems/ext4.txt
Original file line number Diff line number Diff line change
Expand Up @@ -367,47 +367,12 @@ init_itable=n The lazy itable init code will wait n times the
minimizes the impact on the systme performance
while file system's inode table is being initialized.

discard Controls whether ext4 should issue discard/TRIM
discard Controls whether ext4 should issue discard/TRIM
nodiscard(*) commands to the underlying block device when
blocks are freed. This is useful for SSD devices
and sparse/thinly-provisioned LUNs, but it is off
by default until sufficient testing has been done.

nouid32 Disables 32-bit UIDs and GIDs. This is for
interoperability with older kernels which only
store and expect 16-bit values.

resize Allows to resize filesystem to the end of the last
existing block group, further resize has to be done
with resize2fs either online, or offline. It can be
used only with conjunction with remount.

block_validity This options allows to enables/disables the in-kernel
noblock_validity facility for tracking filesystem metadata blocks
within internal data structures. This allows multi-
block allocator and other routines to quickly locate
extents which might overlap with filesystem metadata
blocks. This option is intended for debugging
purposes and since it negatively affects the
performance, it is off by default.

dioread_lock Controls whether or not ext4 should use the DIO read
dioread_nolock locking. If the dioread_nolock option is specified
ext4 will allocate uninitialized extent before buffer
write and convert the extent to initialized after IO
completes. This approach allows ext4 code to avoid
using inode mutex, which improves scalability on high
speed storages. However this does not work with nobh
option and the mount will fail. Nor does it work with
data journaling and dioread_nolock option will be
ignored with kernel warning. Note that dioread_nolock
code path is only used for extent-based files.
Because of the restrictions this options comprises
it is off by default (e.g. dioread_lock).

i_version Enable 64-bit inode version support. This option is
off by default.

Data Mode
=========
There are 3 different data modes:
Expand Down Expand Up @@ -435,176 +400,6 @@ needs to be read from and written to disk at the same time where it
outperforms all others modes. Currently ext4 does not have delayed
allocation support if this data journalling mode is selected.

/proc entries
=============

Information about mounted ext4 file systems can be found in
/proc/fs/ext4. Each mounted filesystem will have a directory in
/proc/fs/ext4 based on its device name (i.e., /proc/fs/ext4/hdc or
/proc/fs/ext4/dm-0). The files in each per-device directory are shown
in table below.

Files in /proc/fs/ext4/<devname>
..............................................................................
File Content
mb_groups details of multiblock allocator buddy cache of free blocks
..............................................................................

/sys entries
============

Information about mounted ext4 file systems can be found in
/sys/fs/ext4. Each mounted filesystem will have a directory in
/sys/fs/ext4 based on its device name (i.e., /sys/fs/ext4/hdc or
/sys/fs/ext4/dm-0). The files in each per-device directory are shown
in table below.

Files in /sys/fs/ext4/<devname>
(see also Documentation/ABI/testing/sysfs-fs-ext4)
..............................................................................
File Content

delayed_allocation_blocks This file is read-only and shows the number of
blocks that are dirty in the page cache, but
which do not have their location in the
filesystem allocated yet.

inode_goal Tuning parameter which (if non-zero) controls
the goal inode used by the inode allocator in
preference to all other allocation heuristics.
This is intended for debugging use only, and
should be 0 on production systems.

inode_readahead_blks Tuning parameter which controls the maximum
number of inode table blocks that ext4's inode
table readahead algorithm will pre-read into
the buffer cache

lifetime_write_kbytes This file is read-only and shows the number of
kilobytes of data that have been written to this
filesystem since it was created.

max_writeback_mb_bump The maximum number of megabytes the writeback
code will try to write out before move on to
another inode.

mb_group_prealloc The multiblock allocator will round up allocation
requests to a multiple of this tuning parameter if
the stripe size is not set in the ext4 superblock

mb_max_to_scan The maximum number of extents the multiblock
allocator will search to find the best extent

mb_min_to_scan The minimum number of extents the multiblock
allocator will search to find the best extent

mb_order2_req Tuning parameter which controls the minimum size
for requests (as a power of 2) where the buddy
cache is used

mb_stats Controls whether the multiblock allocator should
collect statistics, which are shown during the
unmount. 1 means to collect statistics, 0 means
not to collect statistics

mb_stream_req Files which have fewer blocks than this tunable
parameter will have their blocks allocated out
of a block group specific preallocation pool, so
that small files are packed closely together.
Each large file will have its blocks allocated
out of its own unique preallocation pool.

session_write_kbytes This file is read-only and shows the number of
kilobytes of data that have been written to this
filesystem since it was mounted.
..............................................................................

Ioctls
======

There is some Ext4 specific functionality which can be accessed by applications
through the system call interfaces. The list of all Ext4 specific ioctls are
shown in the table below.

Table of Ext4 specific ioctls
..............................................................................
Ioctl Description
EXT4_IOC_GETFLAGS Get additional attributes associated with inode.
The ioctl argument is an integer bitfield, with
bit values described in ext4.h. This ioctl is an
alias for FS_IOC_GETFLAGS.

EXT4_IOC_SETFLAGS Set additional attributes associated with inode.
The ioctl argument is an integer bitfield, with
bit values described in ext4.h. This ioctl is an
alias for FS_IOC_SETFLAGS.

EXT4_IOC_GETVERSION
EXT4_IOC_GETVERSION_OLD
Get the inode i_generation number stored for
each inode. The i_generation number is normally
changed only when new inode is created and it is
particularly useful for network filesystems. The
'_OLD' version of this ioctl is an alias for
FS_IOC_GETVERSION.

EXT4_IOC_SETVERSION
EXT4_IOC_SETVERSION_OLD
Set the inode i_generation number stored for
each inode. The '_OLD' version of this ioctl
is an alias for FS_IOC_SETVERSION.

EXT4_IOC_GROUP_EXTEND This ioctl has the same purpose as the resize
mount option. It allows to resize filesystem
to the end of the last existing block group,
further resize has to be done with resize2fs,
either online, or offline. The argument points
to the unsigned logn number representing the
filesystem new block count.

EXT4_IOC_MOVE_EXT Move the block extents from orig_fd (the one
this ioctl is pointing to) to the donor_fd (the
one specified in move_extent structure passed
as an argument to this ioctl). Then, exchange
inode metadata between orig_fd and donor_fd.
This is especially useful for online
defragmentation, because the allocator has the
opportunity to allocate moved blocks better,
ideally into one contiguous extent.

EXT4_IOC_GROUP_ADD Add a new group descriptor to an existing or
new group descriptor block. The new group
descriptor is described by ext4_new_group_input
structure, which is passed as an argument to
this ioctl. This is especially useful in
conjunction with EXT4_IOC_GROUP_EXTEND,
which allows online resize of the filesystem
to the end of the last existing block group.
Those two ioctls combined is used in userspace
online resize tool (e.g. resize2fs).

EXT4_IOC_MIGRATE This ioctl operates on the filesystem itself.
It converts (migrates) ext3 indirect block mapped
inode to ext4 extent mapped inode by walking
through indirect block mapping of the original
inode and converting contiguous block ranges
into ext4 extents of the temporary inode. Then,
inodes are swapped. This ioctl might help, when
migrating from ext3 to ext4 filesystem, however
suggestion is to create fresh ext4 filesystem
and copy data from the backup. Note, that
filesystem has to support extents for this ioctl
to work.

EXT4_IOC_ALLOC_DA_BLKS Force all of the delay allocated blocks to be
allocated to preserve application-expected ext3
behaviour. Note that this will also start
triggering a write of the data blocks, but this
behaviour may change in the future as it is
not necessary and has been done this way only
for sake of simplicity.
..............................................................................

References
==========

Expand Down
16 changes: 5 additions & 11 deletions trunk/Documentation/filesystems/porting
Original file line number Diff line number Diff line change
Expand Up @@ -298,14 +298,11 @@ be used instead. It gets called whenever the inode is evicted, whether it has
remaining links or not. Caller does *not* evict the pagecache or inode-associated
metadata buffers; getting rid of those is responsibility of method, as it had
been for ->delete_inode().

->drop_inode() returns int now; it's called on final iput() with
inode->i_lock held and it returns true if filesystems wants the inode to be
dropped. As before, generic_drop_inode() is still the default and it's been
updated appropriately. generic_delete_inode() is also alive and it consists
simply of return 1. Note that all actual eviction work is done by caller after
->drop_inode() returns.

->drop_inode() returns int now; it's called on final iput() with inode_lock
held and it returns true if filesystems wants the inode to be dropped. As before,
generic_drop_inode() is still the default and it's been updated appropriately.
generic_delete_inode() is also alive and it consists simply of return 1. Note that
all actual eviction work is done by caller after ->drop_inode() returns.
clear_inode() is gone; use end_writeback() instead. As before, it must
be called exactly once on each call of ->evict_inode() (as it used to be for
each call of ->delete_inode()). Unlike before, if you are using inode-associated
Expand Down Expand Up @@ -398,9 +395,6 @@ Currently you can only have FALLOC_FL_PUNCH_HOLE with FALLOC_FL_KEEP_SIZE set,
so the i_size should not change when hole punching, even when puching the end of
a file off.

--
[mandatory]

--
[mandatory]
->get_sb() is gone. Switch to use of ->mount(). Typically it's just
Expand Down
2 changes: 1 addition & 1 deletion trunk/Documentation/filesystems/vfs.txt
Original file line number Diff line number Diff line change
Expand Up @@ -254,7 +254,7 @@ or bottom half).
should be synchronous or not, not all filesystems check this flag.

drop_inode: called when the last access to the inode is dropped,
with the inode->i_lock spinlock held.
with the inode_lock spinlock held.

This method should be either NULL (normal UNIX filesystem
semantics) or "generic_delete_inode" (for filesystems that do not
Expand Down
7 changes: 6 additions & 1 deletion trunk/Documentation/scheduler/sched-design-CFS.txt
Original file line number Diff line number Diff line change
Expand Up @@ -164,7 +164,7 @@ This is the (partial) list of the hooks:
It puts the scheduling entity (task) into the red-black tree and
increments the nr_running variable.

- dequeue_task(...)
- dequeue_tree(...)

When a task is no longer runnable, this function is called to keep the
corresponding scheduling entity out of the red-black tree. It decrements
Expand Down Expand Up @@ -195,6 +195,11 @@ This is the (partial) list of the hooks:
This function is mostly called from time tick functions; it might lead to
process switch. This drives the running preemption.

- task_new(...)

The core scheduler gives the scheduling module an opportunity to manage new
task startup. The CFS scheduling module uses it for group scheduling, while
the scheduling module for a real-time task does not use it.



Expand Down
Loading

0 comments on commit 88b1e82

Please sign in to comment.