Gotweb

Commits

Commit:: fcece7180725bba9a781eaa892af379b1986208b
From:: Omar Polo <op@omarpolo.com>
Date:: Mon Feb 26 16:25:15 2024 UTC

attempt to speed up the deltification for big files The current hash table perform poorly on big files due to a small resize step that pushes the table to its limits continuously. Instead, to have both a better performing hash table and keep the memory consumption low, save the blocks in an array and use the hash table as index. Then, use a more generous resizing scheme that guarantees the good properties of the hash table. To avoid having to rebuild the table when the array is resized, save the indexes in the table, and to further reduce the memory consumption use 32 bit indices. On amd64 this means that each slot is 4 bytes instead of 8 for a pointer or 24 for a struct got_deltify_block. ok stsp@

diff | patch | tree

Commit:: 0f2e686eec562e28977521d25101acfa4396b47a
From:: Omar Polo <op@omarpolo.com>
Date:: Fri Jul 28 19:10:27 2023 UTC

bump the deltify table resize step By incrementing the resize step from 64 to 256 deltifying takes less time on modestly sized files; the resize is still a small number instead of a fraction of the current table size (which would be more usual for a hash table) since this code is also used in gotd. ok stsp

diff | patch | tree

Commit:: 04aed1557bf2e67bfef8d3a991fd54526142c8a8
From:: Christian Weisgerber <naddy@mips.inka.de>
Date:: Sun Jul 24 21:41:50 2022 UTC

fix off_t type mismatches off_t is a signed type and depending on the platform, it can be "long" or "long long", so cast to long long for printf(). ok stsp

diff | patch | tree

Commit:: d6a28ffe187127e3247254d7e242bb52d66eb26b
From:: Omar Polo <op@omarpolo.com>
Date:: Fri May 20 21:21:42 2022 UTC

use random seeds for murmurhash2 change the three hardcoded seeds to fresh ones generated on demand via arc4random. Suggested/fixed by and ok stsp@

diff | patch | tree

Commit:: d58ddaf3fc10239711ae7a88664e3a100567ba3c
From:: Christian Weisgerber <naddy@mips.inka.de>
Date:: Thu Mar 17 20:02:40 2022 UTC

const-ify tables ok thomas_adam millert

diff | patch | tree

Commit:: f6027426102430eb80a6df7ce1bf2e31d15cf85d
From:: Christian Weisgerber <naddy@mips.inka.de>
Date:: Sat Feb 12 21:48:46 2022 UTC

consistently match size of hash variables to that returned by murmurhash ok millert stsp

diff | patch | tree

Commit:: 2b474c2514b417c6ead14e07c19c19c97dcbf7ff
From:: Stefan Sperling <stsp@stsp.name>
Date:: Fri Feb 11 22:45:00 2022 UTC

use murmurhash instead of sha1 for deltification blocks; suggested by ori

diff | patch | tree

Commit:: 64a8571e126da3ef8c0488551727b87e4509b50d
From:: Stefan Sperling <stsp@stsp.name>
Date:: Fri Jan 7 23:32:27 2022 UTC

map raw object files into memory while packing if possible

diff | patch | tree

Commit:: 2d467c6d020f635039e8a2fadf1b6ea7f7a18a9e
From:: Stefan Sperling <stsp@stsp.name>
Date:: Wed Oct 13 18:07:29 2021 UTC

fix wrong function in error string of emitdelta()

diff | patch | tree

Commit:: f736d93a2da5b433c03766eee9f631af9dec2318
From:: Stefan Sperling <stsp@stsp.name>
Date:: Thu Oct 7 19:12:36 2021 UTC

link to the FastCDC paper from deltify.c; suggested by Ori some time ago

diff | patch | tree

Commit:: 6eab69f730c8340837a82452cf8797251b3e69c2
From:: Stefan Sperling <stsp@stsp.name>
Date:: Thu Oct 7 19:08:52 2021 UTC

make the number of elements in deltify's geartab explicit

diff | patch | tree

Commit:: 5de743f8fddcaaf2912ffc92dce239aa6227d6d0
From:: Stefan Sperling <stsp@stsp.name>
Date:: Sun Aug 29 13:15:27 2021 UTC

fix seek to incorrect offset in the delta base when creating deltas The stretchblk() function needs to compare data located after the block which has just been matched. However, upon entry it was resetting the file pointer of the delta base to the beginning(!) of the block. The other file is correctly positioned after the block. In many cases the data won't match and stretchblk() will not stretch the matched block. But when the data did happen to match this resulted in a bogus delta, and wrong file contents when the delta was applied. Fix this by setting the delta base file pointer to end of the block. Problem reported by naddy after our server refused a pack file which was sent by 'got send'. I could reproduce the issue by running the 'gotadmin pack' command on a copy of naddy's repository. ok naddy

diff | patch | tree

Commit:: 0af64e86449b8d836b04b25ece0bbc5543a75238
From:: Stefan Sperling <stsp@stsp.name>
Date:: Sun Aug 22 12:54:21 2021 UTC

plug a memory leak in an error path of got_deltify()

diff | patch | tree

Commit:: dd29967c8be9311a99ae3310d49789c65989498e
From:: Stefan Sperling <stsp@stsp.name>
Date:: Sun Aug 22 12:53:22 2021 UTC

make got_deltify() rellocate the deltas array less often

diff | patch | tree

Commit:: 9a8dc2b3ec216fd01b3c33137eb92d98ddadb63e
From:: Stefan Sperling <stsp@stsp.name>
Date:: Fri Jun 18 14:10:55 2021 UTC

fix deltas with trailing data that is smaller than the minimum chunk size

diff | patch | tree

Commit:: 740bba1c3179a597c83f7dd3a23bffb50a494bdf
From:: Stefan Sperling <stsp@stsp.name>
Date:: Fri Jun 18 14:07:35 2021 UTC

allow the delta base file to lose its header between deltify_init and deltify This simplifies pack file creation. A delta base could be read from a loose object, a packfile, or it might be available in a temporary file. All these cases can now be handled the same way. We may need to open, close, and re-open a given delta base multiple times while packing.

diff | patch | tree

Commit:: 7550e799ee994b0b74689a6895f84d8aaec86f49
From:: Stefan Sperling <stsp@stsp.name>
Date:: Fri Jun 18 13:59:46 2021 UTC

check for errors from emitdelta() in got_deltify()

diff | patch | tree

Commit:: aa51f4a4acac901a4f1bf4062664644ce95d3e8c
From:: Stefan Sperling <stsp@stsp.name>
Date:: Fri Jun 18 13:57:59 2021 UTC

handle fseek in got_deltify() instead of in stretchblk(); simplifies the code

diff | patch | tree

Commit:: f34b169e54fc4d4960f06b804cabe1aeec70e07d
From:: Stefan Sperling <stsp@stsp.name>
Date:: Fri Jun 18 13:28:25 2021 UTC

Allow for skipping the base object header in got_deltify().

diff | patch | tree

Commit:: 0d15f6dcf929ae42606d3ca046621aee79e45890
From:: Stefan Sperling <stsp@stsp.name>
Date:: Sun Jun 13 17:03:59 2021 UTC

in addblk(), only read data into buffer1 if we will compare it to buffer2 suggested by and ok naddy@

diff | patch | tree

Commit:: 68bdcdc2f5d3c37d918f85368c2537a8aa7d90eb
From:: Stefan Sperling <stsp@stsp.name>
Date:: Fri Jun 11 17:10:50 2021 UTC

addblk() may seek in its input file; reposition the file pointer afterwards

diff | patch | tree

Commit:: a893025fd207950945eed1482170223a2d3b9ce3
From:: Stefan Sperling <stsp@stsp.name>
Date:: Fri Jun 11 17:02:57 2021 UTC

addblk: iterate over the correct number of entries after growing the array ok naddy

diff | patch | tree

Commit:: e89540a95a268f47ef2d1b24c41fbb72a1f0bdc9
From:: Stefan Sperling <stsp@stsp.name>
Date:: Fri Jun 11 17:02:13 2021 UTC

addblk: be more careful about expanding the blocks array when we outgrow it fixes + ok naddy

diff | patch | tree

Commit:: 51a494da48acb57ed84501a6d10f39ed624c711e
From:: Stefan Sperling <stsp@stsp.name>
Date:: Fri Jun 11 17:00:02 2021 UTC

check a block's hash as well as its length before expensive comparisons suggested by + ok naddy, and Ori agrees

diff | patch | tree

Commit:: dbbf4a5f0cfb712c5970dcb79a65c5dd2e62b19a
From:: Stefan Sperling <stsp@stsp.name>
Date:: Thu May 20 09:51:59 2021 UTC

allow got_deltify_free(NULL); will be needed by 'gotadmin pack'

diff | patch | tree