Skip to content

Commit

Permalink
user-manual: move object format details to hacking-git chapter
Browse files Browse the repository at this point in the history
Most of this is probably only of interest to git developers.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
  • Loading branch information
J. Bruce Fields committed Sep 16, 2007
1 parent 971aa71 commit f2327c6
Showing 1 changed file with 32 additions and 23 deletions.
55 changes: 32 additions & 23 deletions Documentation/user-manual.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2760,29 +2760,6 @@ used to sign other objects. It contains the identifier and type of
another object, a symbolic name (of course!) and, optionally, a
signature.

Regardless of object type, all objects share the following
characteristics: they are all deflated with zlib, and have a header
that not only specifies their type, but also provides size information
about the data in the object. It's worth noting that the SHA1 hash
that is used to name the object is the hash of the original data
plus this header, so `sha1sum` 'file' does not match the object name
for 'file'.
(Historical note: in the dawn of the age of git the hash
was the sha1 of the 'compressed' object.)

As a result, the general consistency of an object can always be tested
independently of the contents or the type of the object: all objects can
be validated by verifying that (a) their hashes match the content of the
file and (b) the object successfully inflates to a stream of bytes that
forms a sequence of <ascii type without space> {plus} <space> {plus} <ascii decimal
size> {plus} <byte\0> {plus} <binary object data>.

The structured objects can further have their structure and
connectivity to other objects verified. This is generally done with
the `git-fsck` program, which generates a full dependency graph
of all objects, and verifies their internal consistency (in addition
to just verifying their superficial consistency through the hash).

The object types in some more detail:

[[blob-object]]
Expand Down Expand Up @@ -3481,6 +3458,38 @@ Hacking git
This chapter covers internal details of the git implementation which
probably only git developers need to understand.

[[object-details]]
Object storage format
---------------------

All objects have a statically determined "type" which identifies the
format of the object (i.e. how it is used, and how it can refer to other
objects). There are currently four different object types: "blob",
"tree", "commit", and "tag".

Regardless of object type, all objects share the following
characteristics: they are all deflated with zlib, and have a header
that not only specifies their type, but also provides size information
about the data in the object. It's worth noting that the SHA1 hash
that is used to name the object is the hash of the original data
plus this header, so `sha1sum` 'file' does not match the object name
for 'file'.
(Historical note: in the dawn of the age of git the hash
was the sha1 of the 'compressed' object.)

As a result, the general consistency of an object can always be tested
independently of the contents or the type of the object: all objects can
be validated by verifying that (a) their hashes match the content of the
file and (b) the object successfully inflates to a stream of bytes that
forms a sequence of <ascii type without space> {plus} <space> {plus} <ascii decimal
size> {plus} <byte\0> {plus} <binary object data>.

The structured objects can further have their structure and
connectivity to other objects verified. This is generally done with
the `git-fsck` program, which generates a full dependency graph
of all objects, and verifies their internal consistency (in addition
to just verifying their superficial consistency through the hash).

[[birdview-on-the-source-code]]
A birds-eye view of Git's source code
-------------------------------------
Expand Down

0 comments on commit f2327c6

Please sign in to comment.