Write Yourself a Git (by Thibault Polge) - Part 2
Git takes in a file, calculates its sha-1 hash. Example:
Hash: a1b2c3d4...
Then the first characters are turned into a directory, and rest of the hash is turned into a filename. So something like:
Stored as: .git/objects/a1/b2c3d4...
Essentially git has potentially 256 directories (00 to ff in hexadecimal) in which these hash-named files can live. This is to distribute the file count across folders, since many OS functionalities crawl to a halt if too many files fill up under a single directory.
Hash functions “compute forward” and almost impossible to “compute backward”.
Length of string is an example of a hash function.
I can easily calculate len(str) for any string. But given the number, I cannot derive what the string was in a definitive manner.
And len(str) is fixed, deterministic. Given the input remains the same, the function returns stable output — the same thing pops out on every invocation.
Git uses a “cryptographic hash”, the SHA-1 algorithm.
However, one of the weaknesses with length of string is that — one may still try to narrow down the original string space based on the “clue” given — namely the length. Say “3” means - we are looking for a 3 word input.
A cryptographic hash goes further beyond this weakness.
A cryptographic hash produces a larger output value (40-character hexademical string). It is almost impossible to find the original string from this hash. Heck, it is almost impossible to find any string that will generate the given hash.
This is how git-lrc repo objects look like:
And here’s hex dump of an object:
Was trying to figure out what type of object the file showed above is. It starts with 0178, that’s for zlib. Git compresses data. So we cannot hexdump the file directly.
We should use this command:
git cat-file -t 0a0df4c14146079ab578af0ee5dde72785587165
And that returns the string: “tree”
So the object that I am looking at is “tree type”
Here is the structure of a git object:







