Vdiff vpatch: Blockwise Keccaking

July 7th, 2019

Keccak hashes of the input and output files are crucial components of the vpatch format. The original Ada code for hash calculation allocates the buffer for the whole file on stack; also, the array holding bit representation of the octets from the file is also found on the stack, increasing its use 8x. This was the reason behind the vdiff failure when diffing large files that was reported in the forum.

In this vpatch, I adapt the code found in ksum utility to do the hashing block-by-block, making much more modest use of stack. This modification makes it possible for vdiff to process larger files1. With an early announcement, I got feedback on vpatch from phf, which I took into account in the revised vpatch:

   procedure C_Hash(Ctx: in C_Context_Access;
                    Input: Char_Star;
                    Len: Interfaces.C.size_t) is
      Buffer_Size: constant Natural := 2048;
      Byte_Size: constant Natural := 8;
      N: Natural := 0;
      I: Natural := 0;
      L: Natural := Natural(Len);
      Buf: String(1..Buffer_Size);
      B: Bitstream(1..Buf'Length*Byte_Size);
      Ptr: Char_Star := Input;
   begin
      if Input = null then
         raise Strings.Dereference_Error;
      end if;
      while L > I loop
         N := 0;
         for Chr of Buf loop
               exit when L <= I;
               Chr := Character(Ptr.all);
               Char_Ptrs.Increment(Ptr);
               N := N + 1;
               I := I + 1;
         end loop;
         ToBitstream(Buf(1..N), B(1..N*Byte_Size));
         KeccakHash(Ctx.all, B(1..N*Byte_Size));
      end loop;
   end C_Hash;

Here, N is the counter of how many bytes of Buf are filled, necessary to later provide the slice of correct length to ToBitstream/KeccackHash, while I is the count of processed input octets, necessary to break out of the hashing loop when the input is fully processed.

The vpatch and seal are available at the following links:

curl 'http://bvt-trace.net/vpatches/vdiff_blockwise_read-2.vpatch' > vdiff_blockwise_read.vpatch
curl 'http://bvt-trace.net/vpatches/vdiff_blockwise_read-2.vpatch.bvt.sig' > vdiff_blockwise_read.vpatch.bvt.sig
  1. I did not study the input size limits of vdiff, so can't claim 'arbitrary large'. []