Skip to content

Commit

Permalink
exofs: RAID0 support
Browse files Browse the repository at this point in the history
We now support striping over mirror devices. Including variable sized
stripe_unit.

Some limits:
* stripe_unit must be a multiple of PAGE_SIZE
* stripe_unit * stripe_count is maximum upto 32-bit (4Gb)

Tested RAID0 over mirrors, RAID0 only, mirrors only. All check.

Design notes:
* I'm not using a vectored raid-engine mechanism yet. Following the
  pnfs-objects-layout data-map structure, "Mirror" is just a private
  case of "group_width" == 1, and RAID0 is a private case of
  "Mirrors" == 1. The performance lose of the general case over the
  particular special case optimization is totally negligible, also
  considering the extra code size.

* In general I added a prepare_stripes() stage that divides the
  to-be-io pages to the participating devices, the previous
  exofs_ios_write/read, now becomes _write/read_mirrors and a new
  write/read upper layer loops on all devices calling
  _write/read_mirrors. Effectively the prepare_stripes stage is the all
  secret.
  Also truncate need fixing to accommodate for striping.

* In a RAID0 arrangement, in a regular usage scenario, if all inode
  layouts will start at the same device, the small files fill up the
  first device and the later devices stay empty, the farther the device
  the emptier it is.

  To fix that, each inode will start at a different stripe_unit,
  according to it's obj_id modulus number-of-stripe-units. And
  will then span all stripe-units in the same incrementing order
  wrapping back to the beginning of the device table. We call it
  a stripe-units moving window.

  Special consideration was taken to keep all devices in a mirror
  arrangement identical. So a broken osd-device could just be cloned
  from one of the mirrors and no FS scrubbing is needed. (We do that
  by rotating stripe-unit at a time and not a single device at a time.)

TODO:
 We no longer verify object_length == inode->i_size in exofs_iget.
 (since i_size is stripped on multiple objects now).
 I should introduce a multiple-device attribute reading, and use
 it in exofs_iget.

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
  • Loading branch information
Boaz Harrosh committed Feb 28, 2010
1 parent d9c740d commit 5d952b8
Show file tree
Hide file tree
Showing 4 changed files with 333 additions and 83 deletions.
11 changes: 11 additions & 0 deletions fs/exofs/exofs.h
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,14 @@
struct exofs_layout {
osd_id s_pid; /* partition ID of file system*/

/* Our way of looking at the data_map */
unsigned stripe_unit;
unsigned mirrors_p1;

unsigned group_width;

enum exofs_inode_layout_gen_functions lay_func;

unsigned s_numdevs; /* Num of devices in array */
struct osd_dev *s_ods[0]; /* Variable length */
};
Expand Down Expand Up @@ -133,6 +141,9 @@ struct exofs_io_state {
struct exofs_per_dev_state {
struct osd_request *or;
struct bio *bio;
loff_t offset;
unsigned length;
unsigned dev;
} per_dev[];
};

Expand Down
26 changes: 4 additions & 22 deletions fs/exofs/inode.c
Original file line number Diff line number Diff line change
Expand Up @@ -869,18 +869,17 @@ static const struct osd_attr g_attr_inode_dir_layout = ATTR_DEF(
0);

/*
* Read an inode from the OSD, and return it as is. We also return the size
* attribute in the 'obj_size' argument.
* Read the Linux inode info from the OSD, and return it as is. In exofs the
* inode info is in an application specific page/attribute of the osd-object.
*/
static int exofs_get_inode(struct super_block *sb, struct exofs_i_info *oi,
struct exofs_fcb *inode, uint64_t *obj_size)
struct exofs_fcb *inode)
{
struct exofs_sb_info *sbi = sb->s_fs_info;
struct osd_attr attrs[] = {
[0] = g_attr_inode_data,
[1] = g_attr_inode_file_layout,
[2] = g_attr_inode_dir_layout,
[3] = g_attr_logical_length,
};
struct exofs_io_state *ios;
struct exofs_on_disk_inode_layout *layout;
Expand Down Expand Up @@ -944,15 +943,6 @@ static int exofs_get_inode(struct super_block *sb, struct exofs_i_info *oi,
}
}

*obj_size = ~0;
ret = extract_attr_from_ios(ios, &attrs[3]);
if (ret) {
EXOFS_ERR("%s: extract_attr of logical_length failed\n",
__func__);
goto out;
}
*obj_size = get_unaligned_be64(attrs[3].val_ptr);

out:
exofs_put_io_state(ios);
return ret;
Expand All @@ -971,7 +961,6 @@ struct inode *exofs_iget(struct super_block *sb, unsigned long ino)
struct exofs_i_info *oi;
struct exofs_fcb fcb;
struct inode *inode;
uint64_t obj_size;
int ret;

inode = iget_locked(sb, ino);
Expand All @@ -983,7 +972,7 @@ struct inode *exofs_iget(struct super_block *sb, unsigned long ino)
__oi_init(oi);

/* read the inode from the osd */
ret = exofs_get_inode(sb, oi, &fcb, &obj_size);
ret = exofs_get_inode(sb, oi, &fcb);
if (ret)
goto bad_inode;

Expand All @@ -1004,13 +993,6 @@ struct inode *exofs_iget(struct super_block *sb, unsigned long ino)
inode->i_blkbits = EXOFS_BLKSHIFT;
inode->i_generation = le32_to_cpu(fcb.i_generation);

if ((inode->i_size != obj_size) &&
(!exofs_inode_is_fast_symlink(inode))) {
EXOFS_ERR("WARNING: Size of inode=%llu != object=%llu\n",
inode->i_size, _LLU(obj_size));
/* FIXME: call exofs_inode_recovery() */
}

oi->i_dir_start_lookup = 0;

if ((inode->i_nlink == 0) && (inode->i_mode == 0)) {
Expand Down
Loading

0 comments on commit 5d952b8

Please sign in to comment.