Skip to content

Commit

Permalink
Merge branch 'nd/split-index'
Browse files Browse the repository at this point in the history
An experiment to use two files (the base file and incremental
changes relative to it) to represent the index to reduce I/O cost
of rewriting a large index when only small part of the working tree
changes.

* nd/split-index: (32 commits)
  t1700: new tests for split-index mode
  t2104: make sure split index mode is off for the version test
  read-cache: force split index mode with GIT_TEST_SPLIT_INDEX
  read-tree: note about dropping split-index mode or index version
  read-tree: force split-index mode off on --index-output
  rev-parse: add --shared-index-path to get shared index path
  update-index --split-index: do not split if $GIT_DIR is read only
  update-index: new options to enable/disable split index mode
  split-index: strip pathname of on-disk replaced entries
  split-index: do not invalidate cache-tree at read time
  split-index: the reading part
  split-index: the writing part
  read-cache: mark updated entries for split index
  read-cache: save deleted entries in split index
  read-cache: mark new entries for split index
  read-cache: split-index mode
  read-cache: save index SHA-1 after reading
  entry.c: update cache_changed if refresh_cache is set in checkout_entry()
  cache-tree: mark istate->cache_changed on prime_cache_tree()
  cache-tree: mark istate->cache_changed on cache tree update
  ...
  • Loading branch information
Junio C Hamano committed Jul 16, 2014
2 parents e0a064a + 3e52f70 commit 788cef8
Show file tree
Hide file tree
Showing 41 changed files with 1,088 additions and 193 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -181,6 +181,7 @@
/test-date
/test-delta
/test-dump-cache-tree
/test-dump-split-index
/test-scrap-cache-tree
/test-genrandom
/test-hashmap
Expand Down
4 changes: 4 additions & 0 deletions Documentation/git-rev-parse.txt
Original file line number Diff line number Diff line change
Expand Up @@ -245,6 +245,10 @@ print a message to stderr and exit with nonzero status.
--show-toplevel::
Show the absolute path of the top-level directory.

--shared-index-path::
Show the path to the shared index file in split index mode, or
empty if not in split-index mode.

Other Options
~~~~~~~~~~~~~

Expand Down
11 changes: 11 additions & 0 deletions Documentation/git-update-index.txt
Original file line number Diff line number Diff line change
Expand Up @@ -161,6 +161,17 @@ may not support it yet.
Only meaningful with `--stdin` or `--index-info`; paths are
separated with NUL character instead of LF.

--split-index::
--no-split-index::
Enable or disable split index mode. If enabled, the index is
split into two files, $GIT_DIR/index and $GIT_DIR/sharedindex.<SHA-1>.
Changes are accumulated in $GIT_DIR/index while the shared
index file contains all index entries stays unchanged. If
split-index mode is already enabled and `--split-index` is
given again, all changes in $GIT_DIR/index are pushed back to
the shared index file. This mode is designed for very large
indexes that take a signficant amount of time to read or write.

\--::
Do not interpret any more arguments as options.

Expand Down
4 changes: 4 additions & 0 deletions Documentation/gitrepository-layout.txt
Original file line number Diff line number Diff line change
Expand Up @@ -155,6 +155,10 @@ index::
The current index file for the repository. It is
usually not found in a bare repository.

sharedindex.<SHA-1>::
The shared index part, to be referenced by $GIT_DIR/index and
other temporary index files. Only valid in split index mode.

info::
Additional information about the repository is recorded
in this directory.
Expand Down
35 changes: 35 additions & 0 deletions Documentation/technical/index-format.txt
Original file line number Diff line number Diff line change
Expand Up @@ -129,6 +129,9 @@ Git index format
(Version 4) In version 4, the padding after the pathname does not
exist.

Interpretation of index entries in split index mode is completely
different. See below for details.

== Extensions

=== Cached tree
Expand Down Expand Up @@ -198,3 +201,35 @@ Git index format
- At most three 160-bit object names of the entry in stages from 1 to 3
(nothing is written for a missing stage).

=== Split index

In split index mode, the majority of index entries could be stored
in a separate file. This extension records the changes to be made on
top of that to produce the final index.

The signature for this extension is { 'l', 'i, 'n', 'k' }.

The extension consists of:

- 160-bit SHA-1 of the shared index file. The shared index file path
is $GIT_DIR/sharedindex.<SHA-1>. If all 160 bits are zero, the
index does not require a shared index file.

- An ewah-encoded delete bitmap, each bit represents an entry in the
shared index. If a bit is set, its corresponding entry in the
shared index will be removed from the final index. Note, because
a delete operation changes index entry positions, but we do need
original positions in replace phase, it's best to just mark
entries for removal, then do a mass deletion after replacement.

- An ewah-encoded replace bitmap, each bit represents an entry in
the shared index. If a bit is set, its corresponding entry in the
shared index will be replaced with an entry in this index
file. All replaced entries are stored in sorted order in this
index. The first "1" bit in the replace bitmap corresponds to the
first index entry, the second "1" bit to the second entry and so
on. Replaced entries may have empty path names to save space.

The remaining index entries after replaced ones will be added to the
final index. These added entries are also sorted by entry namme then
stage.
2 changes: 2 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -552,6 +552,7 @@ TEST_PROGRAMS_NEED_X += test-ctype
TEST_PROGRAMS_NEED_X += test-date
TEST_PROGRAMS_NEED_X += test-delta
TEST_PROGRAMS_NEED_X += test-dump-cache-tree
TEST_PROGRAMS_NEED_X += test-dump-split-index
TEST_PROGRAMS_NEED_X += test-genrandom
TEST_PROGRAMS_NEED_X += test-hashmap
TEST_PROGRAMS_NEED_X += test-index-version
Expand Down Expand Up @@ -875,6 +876,7 @@ LIB_OBJS += sha1_name.o
LIB_OBJS += shallow.o
LIB_OBJS += sideband.o
LIB_OBJS += sigchain.o
LIB_OBJS += split-index.o
LIB_OBJS += strbuf.o
LIB_OBJS += streaming.o
LIB_OBJS += string-list.o
Expand Down
6 changes: 2 additions & 4 deletions builtin/add.c
Original file line number Diff line number Diff line change
Expand Up @@ -299,7 +299,6 @@ static int add_files(struct dir_struct *dir, int flags)
int cmd_add(int argc, const char **argv, const char *prefix)
{
int exit_status = 0;
int newfd;
struct pathspec pathspec;
struct dir_struct dir;
int flags;
Expand Down Expand Up @@ -345,7 +344,7 @@ int cmd_add(int argc, const char **argv, const char *prefix)
add_new_files = !take_worktree_changes && !refresh_only;
require_pathspec = !take_worktree_changes;

newfd = hold_locked_index(&lock_file, 1);
hold_locked_index(&lock_file, 1);

flags = ((verbose ? ADD_CACHE_VERBOSE : 0) |
(show_only ? ADD_CACHE_PRETEND : 0) |
Expand Down Expand Up @@ -443,8 +442,7 @@ int cmd_add(int argc, const char **argv, const char *prefix)

finish:
if (active_cache_changed) {
if (write_cache(newfd, active_cache, active_nr) ||
commit_locked_index(&lock_file))
if (write_locked_index(&the_index, &lock_file, COMMIT_LOCK))
die(_("Unable to write new index file"));
}

Expand Down
17 changes: 9 additions & 8 deletions builtin/apply.c
Original file line number Diff line number Diff line change
Expand Up @@ -3084,13 +3084,15 @@ static void prepare_fn_table(struct patch *patch)
}
}

static int checkout_target(struct cache_entry *ce, struct stat *st)
static int checkout_target(struct index_state *istate,
struct cache_entry *ce, struct stat *st)
{
struct checkout costate;

memset(&costate, 0, sizeof(costate));
costate.base_dir = "";
costate.refresh_cache = 1;
costate.istate = istate;
if (checkout_entry(ce, &costate, NULL) || lstat(ce->name, st))
return error(_("cannot checkout %s"), ce->name);
return 0;
Expand Down Expand Up @@ -3257,7 +3259,7 @@ static int load_current(struct image *image, struct patch *patch)
if (lstat(name, &st)) {
if (errno != ENOENT)
return error(_("%s: %s"), name, strerror(errno));
if (checkout_target(ce, &st))
if (checkout_target(&the_index, ce, &st))
return -1;
}
if (verify_index_match(ce, &st))
Expand Down Expand Up @@ -3411,7 +3413,7 @@ static int check_preimage(struct patch *patch, struct cache_entry **ce, struct s
}
*ce = active_cache[pos];
if (stat_ret < 0) {
if (checkout_target(*ce, st))
if (checkout_target(&the_index, *ce, st))
return -1;
}
if (!cached && verify_index_match(*ce, st))
Expand Down Expand Up @@ -3644,7 +3646,7 @@ static void build_fake_ancestor(struct patch *list, const char *filename)
{
struct patch *patch;
struct index_state result = { NULL };
int fd;
static struct lock_file lock;

/* Once we start supporting the reverse patch, it may be
* worth showing the new sha1 prefix, but until then...
Expand Down Expand Up @@ -3682,8 +3684,8 @@ static void build_fake_ancestor(struct patch *list, const char *filename)
die ("Could not add %s to temporary index", name);
}

fd = open(filename, O_WRONLY | O_CREAT, 0666);
if (fd < 0 || write_index(&result, fd) || close(fd))
hold_lock_file_for_update(&lock, filename, LOCK_DIE_ON_ERROR);
if (write_locked_index(&result, &lock, COMMIT_LOCK))
die ("Could not write temporary index to %s", filename);

discard_index(&result);
Expand Down Expand Up @@ -4502,8 +4504,7 @@ int cmd_apply(int argc, const char **argv, const char *prefix_)
}

if (update_index) {
if (write_cache(newfd, active_cache, active_nr) ||
commit_locked_index(&lock_file))
if (write_locked_index(&the_index, &lock_file, COMMIT_LOCK))
die(_("Unable to write new index file"));
}

Expand Down
2 changes: 1 addition & 1 deletion builtin/blame.c
Original file line number Diff line number Diff line change
Expand Up @@ -2389,7 +2389,7 @@ static struct commit *fake_working_tree_commit(struct diff_options *opt,
* right now, but someday we might optimize diff-index --cached
* with cache-tree information.
*/
cache_tree_invalidate_path(active_cache_tree, path);
cache_tree_invalidate_path(&the_index, path);

return commit;
}
Expand Down
4 changes: 2 additions & 2 deletions builtin/checkout-index.c
Original file line number Diff line number Diff line change
Expand Up @@ -135,6 +135,7 @@ static int option_parse_u(const struct option *opt,
int *newfd = opt->value;

state.refresh_cache = 1;
state.istate = &the_index;
if (*newfd < 0)
*newfd = hold_locked_index(&lock_file, 1);
return 0;
Expand Down Expand Up @@ -279,8 +280,7 @@ int cmd_checkout_index(int argc, const char **argv, const char *prefix)
checkout_all(prefix, prefix_length);

if (0 <= newfd &&
(write_cache(newfd, active_cache, active_nr) ||
commit_locked_index(&lock_file)))
write_locked_index(&the_index, &lock_file, COMMIT_LOCK))
die("Unable to write new index file");
return 0;
}
12 changes: 5 additions & 7 deletions builtin/checkout.c
Original file line number Diff line number Diff line change
Expand Up @@ -225,7 +225,6 @@ static int checkout_paths(const struct checkout_opts *opts,
int flag;
struct commit *head;
int errs = 0;
int newfd;
struct lock_file *lock_file;

if (opts->track != BRANCH_TRACK_UNSPECIFIED)
Expand Down Expand Up @@ -256,7 +255,7 @@ static int checkout_paths(const struct checkout_opts *opts,

lock_file = xcalloc(1, sizeof(struct lock_file));

newfd = hold_locked_index(lock_file, 1);
hold_locked_index(lock_file, 1);
if (read_cache_preload(&opts->pathspec) < 0)
return error(_("corrupt index file"));

Expand Down Expand Up @@ -337,6 +336,7 @@ static int checkout_paths(const struct checkout_opts *opts,
memset(&state, 0, sizeof(state));
state.force = 1;
state.refresh_cache = 1;
state.istate = &the_index;
for (pos = 0; pos < active_nr; pos++) {
struct cache_entry *ce = active_cache[pos];
if (ce->ce_flags & CE_MATCHED) {
Expand All @@ -352,8 +352,7 @@ static int checkout_paths(const struct checkout_opts *opts,
}
}

if (write_cache(newfd, active_cache, active_nr) ||
commit_locked_index(lock_file))
if (write_locked_index(&the_index, lock_file, COMMIT_LOCK))
die(_("unable to write new index file"));

read_ref_full("HEAD", rev, 0, &flag);
Expand Down Expand Up @@ -444,8 +443,8 @@ static int merge_working_tree(const struct checkout_opts *opts,
{
int ret;
struct lock_file *lock_file = xcalloc(1, sizeof(struct lock_file));
int newfd = hold_locked_index(lock_file, 1);

hold_locked_index(lock_file, 1);
if (read_cache_preload(NULL) < 0)
return error(_("corrupt index file"));

Expand Down Expand Up @@ -553,8 +552,7 @@ static int merge_working_tree(const struct checkout_opts *opts,
}
}

if (write_cache(newfd, active_cache, active_nr) ||
commit_locked_index(lock_file))
if (write_locked_index(&the_index, lock_file, COMMIT_LOCK))
die(_("unable to write new index file"));

if (!opts->force && !opts->quiet)
Expand Down
7 changes: 3 additions & 4 deletions builtin/clone.c
Original file line number Diff line number Diff line change
Expand Up @@ -617,7 +617,7 @@ static int checkout(void)
struct unpack_trees_options opts;
struct tree *tree;
struct tree_desc t;
int err = 0, fd;
int err = 0;

if (option_no_checkout)
return 0;
Expand All @@ -641,7 +641,7 @@ static int checkout(void)
setup_work_tree();

lock_file = xcalloc(1, sizeof(struct lock_file));
fd = hold_locked_index(lock_file, 1);
hold_locked_index(lock_file, 1);

memset(&opts, 0, sizeof opts);
opts.update = 1;
Expand All @@ -657,8 +657,7 @@ static int checkout(void)
if (unpack_trees(1, &t, &opts) < 0)
die(_("unable to checkout working tree"));

if (write_cache(fd, active_cache, active_nr) ||
commit_locked_index(lock_file))
if (write_locked_index(&the_index, lock_file, COMMIT_LOCK))
die(_("unable to write new index file"));

err |= run_hook_le(NULL, "post-checkout", sha1_to_hex(null_sha1),
Expand Down
33 changes: 14 additions & 19 deletions builtin/commit.c
Original file line number Diff line number Diff line change
Expand Up @@ -305,7 +305,6 @@ static void refresh_cache_or_die(int refresh_flags)
static char *prepare_index(int argc, const char **argv, const char *prefix,
const struct commit *current_head, int is_status)
{
int fd;
struct string_list partial;
struct pathspec pathspec;
int refresh_flags = REFRESH_QUIET;
Expand All @@ -321,12 +320,11 @@ static char *prepare_index(int argc, const char **argv, const char *prefix,

if (interactive) {
char *old_index_env = NULL;
fd = hold_locked_index(&index_lock, 1);
hold_locked_index(&index_lock, 1);

refresh_cache_or_die(refresh_flags);

if (write_cache(fd, active_cache, active_nr) ||
close_lock_file(&index_lock))
if (write_locked_index(&the_index, &index_lock, CLOSE_LOCK))
die(_("unable to create temporary index"));

old_index_env = getenv(INDEX_ENVIRONMENT);
Expand Down Expand Up @@ -360,12 +358,11 @@ static char *prepare_index(int argc, const char **argv, const char *prefix,
* (B) on failure, rollback the real index.
*/
if (all || (also && pathspec.nr)) {
fd = hold_locked_index(&index_lock, 1);
hold_locked_index(&index_lock, 1);
add_files_to_cache(also ? prefix : NULL, &pathspec, 0);
refresh_cache_or_die(refresh_flags);
update_main_cache_tree(WRITE_TREE_SILENT);
if (write_cache(fd, active_cache, active_nr) ||
close_lock_file(&index_lock))
if (write_locked_index(&the_index, &index_lock, CLOSE_LOCK))
die(_("unable to write new_index file"));
commit_style = COMMIT_NORMAL;
return index_lock.filename;
Expand All @@ -381,12 +378,12 @@ static char *prepare_index(int argc, const char **argv, const char *prefix,
* We still need to refresh the index here.
*/
if (!only && !pathspec.nr) {
fd = hold_locked_index(&index_lock, 1);
hold_locked_index(&index_lock, 1);
refresh_cache_or_die(refresh_flags);
if (active_cache_changed) {
update_main_cache_tree(WRITE_TREE_SILENT);
if (write_cache(fd, active_cache, active_nr) ||
commit_locked_index(&index_lock))
if (write_locked_index(&the_index, &index_lock,
COMMIT_LOCK))
die(_("unable to write new_index file"));
} else {
rollback_lock_file(&index_lock);
Expand Down Expand Up @@ -432,24 +429,22 @@ static char *prepare_index(int argc, const char **argv, const char *prefix,
if (read_cache() < 0)
die(_("cannot read the index"));

fd = hold_locked_index(&index_lock, 1);
hold_locked_index(&index_lock, 1);
add_remove_files(&partial);
refresh_cache(REFRESH_QUIET);
if (write_cache(fd, active_cache, active_nr) ||
close_lock_file(&index_lock))
if (write_locked_index(&the_index, &index_lock, CLOSE_LOCK))
die(_("unable to write new_index file"));

fd = hold_lock_file_for_update(&false_lock,
git_path("next-index-%"PRIuMAX,
(uintmax_t) getpid()),
LOCK_DIE_ON_ERROR);
hold_lock_file_for_update(&false_lock,
git_path("next-index-%"PRIuMAX,
(uintmax_t) getpid()),
LOCK_DIE_ON_ERROR);

create_base_index(current_head);
add_remove_files(&partial);
refresh_cache(REFRESH_QUIET);

if (write_cache(fd, active_cache, active_nr) ||
close_lock_file(&false_lock))
if (write_locked_index(&the_index, &false_lock, CLOSE_LOCK))
die(_("unable to write temporary index file"));

discard_cache();
Expand Down
Loading

0 comments on commit 788cef8

Please sign in to comment.