-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
improve delta long block matching with big files
Martin Koegler noted that create_delta() performs a new hash lookup after every block copy encoding which are currently limited to 64KB. In case of larger identical blocks, the next hash lookup would normally point to the next 64KB block in the reference buffer and multiple block copy operations will be consecutively encoded. It is however possible that the reference buffer be sparsely indexed if hash buckets have been trimmed down in create_delta_index() when hashing of the reference buffer isn't well balanced. In that case the hash lookup following a block copy might fail to match anything and the fact that the reference buffer still matches beyond the previous 64KB block will be missed. Let's rework the code so that buffer comparison isn't bounded to 64KB anymore. The match size should be as large as possible up front and only then should multiple block copy be encoded to cover it all. Also, fewer hash lookups will be performed in the end. According to Martin, this patch should reduce his 92MB pack down to 75MB with the dataset he has. Tests performed on the Linux kernel repo show a slightly smaller pack and a slightly faster repack. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
- Loading branch information
Nicolas Pitre
authored and
Junio C Hamano
committed
May 27, 2007
1 parent
99b5a79
commit 8433669
Showing
1 changed file
with
57 additions
and
51 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters