Skip to content

Commit

Permalink
diffcore-rename: cache file deltas
Browse files Browse the repository at this point in the history
We find rename candidates by computing a fingerprint hash of
each file, and then comparing those fingerprints. There are
inherently O(n^2) comparisons, so it pays in CPU time to
hoist the (rather expensive) computation of the fingerprint
out of that loop (or to cache it once we have computed it once).

Previously, we didn't keep the filespec information around
because then we had the potential to consume a great deal of
memory. However, instead of keeping all of the filespec
data, we can instead just keep the fingerprint.

This patch implements and uses diff_free_filespec_data_large
to accomplish that goal. We also have to change
estimate_similarity not to needlessly repopulate the
filespec data when we already have the hash.

Practical tests showed 4.5x speedup for a 10% memory usage
increase.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
  • Loading branch information
Jeff King authored and Junio C Hamano committed Sep 26, 2007
1 parent 5166810 commit a5a3878
Show file tree
Hide file tree
Showing 3 changed files with 11 additions and 4 deletions.
7 changes: 6 additions & 1 deletion diff.c
Original file line number Diff line number Diff line change
Expand Up @@ -1675,7 +1675,7 @@ int diff_populate_filespec(struct diff_filespec *s, int size_only)
return 0;
}

void diff_free_filespec_data(struct diff_filespec *s)
void diff_free_filespec_data_large(struct diff_filespec *s)
{
if (s->should_free)
free(s->data);
Expand All @@ -1686,6 +1686,11 @@ void diff_free_filespec_data(struct diff_filespec *s)
s->should_free = s->should_munmap = 0;
s->data = NULL;
}
}

void diff_free_filespec_data(struct diff_filespec *s)
{
diff_free_filespec_data_large(s);
free(s->cnt_data);
s->cnt_data = NULL;
}
Expand Down
7 changes: 4 additions & 3 deletions diffcore-rename.c
Original file line number Diff line number Diff line change
Expand Up @@ -184,7 +184,8 @@ static int estimate_similarity(struct diff_filespec *src,
if (base_size * (MAX_SCORE-minimum_score) < delta_size * MAX_SCORE)
return 0;

if (diff_populate_filespec(src, 0) || diff_populate_filespec(dst, 0))
if ((!src->cnt_data && diff_populate_filespec(src, 0))
|| (!dst->cnt_data && diff_populate_filespec(dst, 0)))
return 0; /* error but caught downstream */


Expand Down Expand Up @@ -377,10 +378,10 @@ void diffcore_rename(struct diff_options *options)
m->score = estimate_similarity(one, two,
minimum_score);
m->name_score = basename_same(one, two);
diff_free_filespec_data(one);
diff_free_filespec_data_large(one);
}
/* We do not need the text anymore */
diff_free_filespec_data(two);
diff_free_filespec_data_large(two);
dst_cnt++;
}
/* cost matrix sorted by most to least similar pair */
Expand Down
1 change: 1 addition & 0 deletions diffcore.h
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,7 @@ extern void fill_filespec(struct diff_filespec *, const unsigned char *,

extern int diff_populate_filespec(struct diff_filespec *, int);
extern void diff_free_filespec_data(struct diff_filespec *);
extern void diff_free_filespec_data_large(struct diff_filespec *);
extern int diff_filespec_is_binary(struct diff_filespec *);

struct diff_filepair {
Expand Down

0 comments on commit a5a3878

Please sign in to comment.