Commits


attempt to speed up the deltification for big files The current hash table perform poorly on big files due to a small resize step that pushes the table to its limits continuously. Instead, to have both a better performing hash table and keep the memory consumption low, save the blocks in an array and use the hash table as index. Then, use a more generous resizing scheme that guarantees the good properties of the hash table. To avoid having to rebuild the table when the array is resized, save the indexes in the table, and to further reduce the memory consumption use 32 bit indices. On amd64 this means that each slot is 4 bytes instead of 8 for a pointer or 24 for a struct got_deltify_block. ok stsp@


bump the deltify table resize step By incrementing the resize step from 64 to 256 deltifying takes less time on modestly sized files; the resize is still a small number instead of a fraction of the current table size (which would be more usual for a hash table) since this code is also used in gotd. ok stsp


fix off_t type mismatches off_t is a signed type and depending on the platform, it can be "long" or "long long", so cast to long long for printf(). ok stsp


use random seeds for murmurhash2 change the three hardcoded seeds to fresh ones generated on demand via arc4random. Suggested/fixed by and ok stsp@


const-ify tables ok thomas_adam millert


consistently match size of hash variables to that returned by murmurhash ok millert stsp


use murmurhash instead of sha1 for deltification blocks; suggested by ori


map raw object files into memory while packing if possible


fix wrong function in error string of emitdelta()


link to the FastCDC paper from deltify.c; suggested by Ori some time ago


make the number of elements in deltify's geartab explicit


fix seek to incorrect offset in the delta base when creating deltas The stretchblk() function needs to compare data located after the block which has just been matched. However, upon entry it was resetting the file pointer of the delta base to the beginning(!) of the block. The other file is correctly positioned after the block. In many cases the data won't match and stretchblk() will not stretch the matched block. But when the data did happen to match this resulted in a bogus delta, and wrong file contents when the delta was applied. Fix this by setting the delta base file pointer to end of the block. Problem reported by naddy after our server refused a pack file which was sent by 'got send'. I could reproduce the issue by running the 'gotadmin pack' command on a copy of naddy's repository. ok naddy


plug a memory leak in an error path of got_deltify()


make got_deltify() rellocate the deltas array less often


fix deltas with trailing data that is smaller than the minimum chunk size


allow the delta base file to lose its header between deltify_init and deltify This simplifies pack file creation. A delta base could be read from a loose object, a packfile, or it might be available in a temporary file. All these cases can now be handled the same way. We may need to open, close, and re-open a given delta base multiple times while packing.


check for errors from emitdelta() in got_deltify()


handle fseek in got_deltify() instead of in stretchblk(); simplifies the code


Allow for skipping the base object header in got_deltify().


in addblk(), only read data into buffer1 if we will compare it to buffer2 suggested by and ok naddy@


addblk() may seek in its input file; reposition the file pointer afterwards


addblk: iterate over the correct number of entries after growing the array ok naddy


addblk: be more careful about expanding the blocks array when we outgrow it fixes + ok naddy


check a block's hash as well as its length before expensive comparisons suggested by + ok naddy, and Ori agrees


allow got_deltify_free(NULL); will be needed by 'gotadmin pack'