v.sh part 2: Ada utilities

February 8th, 2020

In the part one, we had a look at the v.sh script, which should give the reader an understanding of how this v-presser works. In the part two, I will add the utilities required by the v.sh:

  • vflow
  • vtoposort
  • vsimplify
  • vfilter (in awk).

I should mention that while writing this article, I really started to hate the duplication of input-output code, so I think I have figured out a way of restructuring the whole thing as a single application. But I did not want to delay the release any further, so I present what I currently have.


vflow

The goal is to determine descendant-antecedent relationships between vpatches based on the information about hunks. From data processing point of view, the goal is to transform the input:

vpatch-id in-file-id out-file-id in-hash-id out-hash-id

into output:

v vpatches-count
d descendant-id antecedent-id
...
c vpatch1-id vpatch2-id
...

by matching the hunk inputs (path, hash) in vpatch A to the hunk output with the same values in vpatch B: in this case A is the descendant of B.

The source:

with Ada.Text_IO; use Ada.Text_IO;
with Ada.Integer_Text_IO; use Ada.Integer_Text_IO;
with Ada.Command_Line; use Ada.Command_Line;
with Ada.Containers.Generic_Sort;

procedure Vflow is
   type Hunk_State is (Unsatisfied, Satisfied);
   Empty_Hash : constant := 0;
   type Hunk is record
      Vpatch:   Natural;
      In_File:  Natural;
      Out_File: Natural;
      In_Hash:  Natural;
      Out_Hash: Natural;
      State:    Hunk_State;
   end record;

Hunks are described by a tuple of (vpatch, input file path, output file path, input file hash, output file hash). State is added for internal bookkeeping.

   Num_Hunks    : Integer := Integer'Value(Argument(1));
   Num_Vpatches : Integer := Integer'Value(Argument(2));

The number of hunks and vpatches are provided as command-line arguments. Rest of the types are elaborated based on this input.

   subtype Hunk_Index is Natural range 1..Num_Hunks;
   type    Hunk_Array is array (Hunk_Index range <>) of Hunk;
   All_Hunks : Hunk_Array(Hunk_Index'Range);

All_Hunks stores all the input to the application.

   subtype Vpatch_Index is Natural range 0..Num_Vpatches - 1;
   type    Vpatch_Hunks is array (Vpatch_Index'Range) of Hunk_Index;
   VH_Low : Vpatch_Hunks := (others => Hunk_Index'First);
   VH_High: Vpatch_Hunks := (others => Hunk_Index'First);

Input should be provided sorted by vpatch identifier. VH_Low(I) stores the first index into All_Hunks where hunks belonging to vpatch I are stored, VH_High(I) - the last.

One problem is that all these data structures are stored on stack, which has been fine so far even with the whole Linux tree, but may require either "ulimit -s unlimited" in v.sh, or dynamic allocation here, if in the future the vtree becomes too large.

   type Vpatch_Relationship is array (Vpatch_Index'Range, Vpatch_Index'Range) of Boolean;
   Vpatch_Dependencies: Vpatch_Relationship := (others => (others => False));
   Vpatch_Conflicts   : Vpatch_Relationship := (others => (others => False));

Vpatch_Dependencies(I,J) is set to true if vpatch I is a descendant on vpatch J, the same goes for Vpatch_Conflicts if vpatches I and J are mutually incompatible descendants of the same vpatch.

   function Get_Hunk return Hunk is
      H: Hunk;
   begin
      Get(H.Vpatch);
      Get(H.In_File);
      Get(H.Out_File);
      Get(H.In_Hash);
      Get(H.Out_Hash);
      H.State := Unsatisfied;
      return H;
   end Get_Hunk;

With simple numeric input, we don't have to rely on any hand-rolled parsers, and can just use Ada's built-in tools.

   procedure Read_Hunks is
      I: Natural := All_Hunks'First;
   begin
  Read_Loop:
      loop
	 exit Read_Loop when End_Of_File;
	 All_Hunks(I) := Get_Hunk;
	 I := I + 1;
    end loop Read_Loop;

   end Read_Hunks;

Here, we just read all hunks one-by-one.

   procedure Init_Vpatch_Hunk_Bounds is
   begin
      for I in All_Hunks'Range loop
	 declare
	    V : Integer := All_Hunks(I).Vpatch;
	 begin
	    if VH_Low(V) > I or All_Hunks(VH_Low(V)).Vpatch /= V then
	       VH_Low(V) := I;
	    end if;
	    VH_High(V) := I;
	 end;
      end loop;
   end Init_Vpatch_Hunk_Bounds;

We do not keep track of where each vpatch starts and ends in All_Hunks when reading, so this information is recovered by Init_Vpatch_Hunk_Bounds. One linear scan is enough.

   procedure Populate_Conflicts is
      function Conflicts(I: Vpatch_Index; J: Vpatch_Index) return Boolean is
      begin
	 for A in VH_Low(I)..VH_High(I) loop
	    for B in VH_Low(J)..VH_High(J) loop
	       if (All_Hunks(A).In_File = All_Hunks(B).In_File) and
		 (All_Hunks(A).In_Hash = All_Hunks(B).In_Hash) then
		  return True;
	       end if;
	    end loop;
	 end loop;
	 return False;
      end Conflicts;
   begin
      for I in Vpatch_Index'Range loop
	 for J in Vpatch_Index'Range loop
	    if I < J and then Conflicts(I,J) then
	       Vpatch_Conflicts(I,J) := True;
	       Vpatch_Conflicts(J,I) := True;
	    end if;
	 end loop;
      end loop;
   end Populate_Conflicts;

The definition of conflict between two vpatches is rather naive: two vpatches are conflicting when they both contain hunks have the same (path,hash) as inputs. We check this property among all pairs of vpatches.

Next comes the main dependency resolution function, Solve.

   procedure Solve is
      Input_Hunks : Hunk_Array := All_Hunks;
      Output_Hunks: Hunk_Array renames All_Hunks;
      Finished    : Boolean    := False;

The way vflow detects if vpatch I is a descendant of vpatch J is by checking, if any (output path, output hash) created by vpatch J is used as (input path, input hash) in vpatch I. To reduce the algorithmic complexity, we sort All_Hunks by join keys (because that's what this procedure is essentially doing) - input or output path ids. We have to sort using different keys because we cannot assume that the input path is always the same as the output path. So hunks sorted by input path are stored in Input_Hunks, for the hunks sorted by output path (Output_Hunks) we reuse the storage of All_Hunks.

      -- Start of sorting boilerplate
      --
      function Input_Before(A: Positive; B: Positive) return Boolean is
      begin
	 return Input_Hunks(A).In_File < Input_Hunks(B).In_File;
      end Input_Before;

      function Output_Before(A: Positive; B: Positive) return Boolean is
      begin
	 return Output_Hunks(A).Out_File < Output_Hunks(B).Out_File;
      end Output_Before;

      procedure Input_Swap(A: Positive; B: Positive) is
	 Tmp : Hunk;
      begin
	 Tmp := Input_Hunks(B);
	 Input_Hunks(B) := Input_Hunks(A);
	 Input_Hunks(A) := Tmp;
      end Input_Swap;

      procedure Output_Swap(A: Positive; B: Positive) is
	 Tmp : Hunk;
      begin
	 Tmp := Output_Hunks(B);
	 Output_Hunks(B) := Output_Hunks(A);
	 Output_Hunks(A) := Tmp;
      end Output_Swap;

      procedure Input_Sort is
	 new Ada.Containers.Generic_Sort (Positive, Input_Before, Input_Swap);
      procedure Output_Sort is
	 new Ada.Containers.Generic_Sort (Positive, Output_Before, Output_Swap);
      --
      -- End of sorting boilerplate

This boilerplate is required to use Ada's built-in generic sorting facilities.

      function Check_Genesis(Input: in out Hunk_Array) return Boolean is
	 Is_Genesis : Boolean := True;
	 Vpatch: Vpatch_Index := Input(Input'First).Vpatch;
      begin
	 for I in Input'Range loop
	    Is_Genesis := Is_Genesis and (Input(I).In_Hash = Empty_Hash);
	 end loop;
	 return Is_Genesis;
      end Check_Genesis;

A vpatch is genesis if all its hunks are file creations.

      function Try_Connect(Input: in out Hunk_Array;
			   Output: in out Hunk_Array) return Boolean is
	 I         : Positive := Input'First;
	 J         : Positive := Output'First;
	 All_Marked: Boolean  := True;
      begin
	 while I <= Input'Last and J <= Output'Last loop
	    if Input(I).State /= Unsatisfied or
	      Input(I).In_File < Output(J).Out_File then
	       I := I + 1;
	    elsif Input(I).In_File > Output(J).Out_File then
	       J := J + 1;
	    else
	       if Input(I).In_Hash = Output(J).Out_Hash and
		 Input(I).In_Hash /= Empty_Hash then
		  Input(I).State := Satisfied;
		  Vpatch_Dependencies(Input(Input'First).Vpatch,
				      Output(Output'First).Vpatch) := True;
	       end if;
	       I := I + 1;
	    end if;
	 end loop;

	 for I in Input'Range loop
	    All_Marked := All_Marked and Input(I).State = Satisfied;
	 end loop;
	 return All_Marked;
      end Try_Connect;

Try matching the hunks belonging of candidate descendant (in array Input) with hunks of candidate antecedent (in array Output). The arrays are sorted, so a variant of a sorted join can be used. If there is a match, we assume that there is a dependency among the vpatches. The return value of the function is True is all hunks in the vpatch are satisfied (that is, have an antecedent); to determine the return value, we iterate over Input once again.

   begin
      for I in Vpatch_Index'Range loop
	 Output_Sort(VH_Low(I), VH_High(I));
	 Input_Sort(VH_Low(I), VH_High(I));
      end loop;

      for I in Vpatch_Index'Range loop
	 declare
	    Hunks_In : Hunk_Array renames Input_Hunks(VH_Low(I)..VH_High(I));
	 begin
	    Finished := False;

	    for J in Vpatch_Index'Range loop
	       if J /= I then
		  declare
		     Hunks_Out : Hunk_Array renames
		       Output_Hunks(VH_Low(J)..VH_High(J));
		  begin
		     Finished := Try_Connect(Hunks_In, Hunks_Out);
		     exit when Finished;
		  end;
	       else
		  Finished := Check_Genesis(Hunks_In);
		  exit when Finished;
	       end if;
	    end loop;

	    -- Looped through all vpatches, there are hunks without matched inputs:
	    if not Finished then
	       -- Check if the remaining hunks are file creations:
	       declare
		  OK : Boolean := True;
	       begin
		  for I in Hunks_In'Range loop
		     if Hunks_In(I).In_Hash = Empty_Hash and
		       Hunks_In(I).State = Unsatisfied then
			Hunks_In(I).State := Satisfied;
		     end if;
		     OK := OK and (Hunks_In(I).State = Satisfied);
		  end loop;
		  if not OK then
		     raise Constraint_Error with "Unsatisfied hunks.";
		  end if;
	       end;
	    end if;
	 end;
      end loop;
   end Solve;

The body of the function, in a loop:

  1. First, sort hunks by input path and output path, in blocks belonging to different vpatches.
  2. Then, we try to find antecedents for each of the vpatches, also checking if a vpatch is a genesis.
  3. If after the looping through each potential antecedent there still are hunks without a matching input, we exit with an error.
   procedure Produce_Output is
   begin
      Put_Line("v " & Integer'Image(Num_Vpatches));
      for I in Vpatch_Dependencies'Range(1) loop
	 for J in Vpatch_Dependencies'Range(2) loop
	    if Vpatch_Dependencies(I,J) then
	       Put_Line("d " & Integer'Image(I) & " " & Integer'Image(J));
	    end if;
	 end loop;
      end loop;
      for I in Vpatch_Conflicts'Range(1) loop
	 for J in Vpatch_Conflicts'Range(2) loop
	    if Vpatch_Conflicts(I,J) then
	       Put_Line("c " & Integer'Image(I) & " " & Integer'Image(J));
	    end if;
	 end loop;
      end loop;
   end Produce_Output;
   

Produce_Output just prints out the number of vpatches, the antecedent-descendant relationships between vpatches stored in Vpatch_Dependencies, and the conflicts stored in Vpatch_Conflicts.

begin
   Read_Hunks;
   Init_Vpatch_Hunk_Bounds;
   Populate_Conflicts;
   Solve;
   Produce_Output;
end Vflow;

vtoposort

The goal is to calculate the press order of vpatches based on the dependency and the conflict information produced by vflow concatenated with the set of vpatches selected by the operator for pressing.

with Ada.Text_IO; use Ada.Text_IO;
with Ada.Integer_Text_IO; use Ada.Integer_Text_IO;

procedure Vtoposort is

   Command      : Character;
   Num_Vpatches : Integer;

Command is used for reading all input, Num_Vpatches is self-describing.

   procedure Toposort(Num_Vpatches : Integer) is
      subtype Vpatch_Index        is Natural range 0..Num_Vpatches - 1;
      type    Vpatch_Set          is array (Vpatch_Index'Range) of Boolean;
      type    Vpatch_List         is array (Vpatch_Index'Range) of Vpatch_Index;
      type    Vpatch_Relationship is array
	(Vpatch_Index'Range, Vpatch_Index'Range) of Boolean;

Some new types are Vpatch_Set, which we simulate by an array of booleans, where Set(I) = True means that element I is present in the set, and Vpatch_List, which is used for storing the vpatch indices (the press order). Vpatch_Relationship we have seen before.

      Vpatch_Dependencies: Vpatch_Relationship := (others => (others => False));
      Vpatch_Conflicts   : Vpatch_Relationship := (others => (others => False));
      Free_Vpatches      : Vpatch_Set          := (others => False);

Vpatch_Dependencies and Vpatch_Conflicts are the same as in vflow, and their duplication suggest that perhaps uniting all utilities in one tool would be useful. Free_Vpatches initially contains the list of vpatches selected for pressing by user, but if Toposort detects that it is one of the dependencies of any of the vpatches, it will add this dependency to the Free_Vpatches set.

      procedure Read_Commands is
	 I: Vpatch_Index;
	 J: Vpatch_Index;
      begin
     Read_Loop:
	 loop
	    exit Read_Loop when End_Of_File;
	    Get(Command);
	    if Command = 'd' then
	       Get(I);
	       Get(J);
	       Vpatch_Dependencies(I,J) := True;
	    elsif Command = 'c' then
	       Get(I);
	       Get(J);
	       Vpatch_Conflicts(I,J) := True;
	    elsif Command = 's' then
	       Get(I);
	       Free_Vpatches(I) := True;
	    else
	       raise Constraint_Error with "Unknown Command";
	    end if;
	 end loop Read_Loop;
      end Read_Commands;

Straightforward loop for reading the input, not much to write about.

      procedure Add_Dependencies is
	 Finish : Boolean := False;
      begin
	 while not Finish loop
	    Finish := True;
	    for I in Free_Vpatches'Range loop
	       if Free_Vpatches(I) then
		  for J in Vpatch_Dependencies'Range(2) loop
		     if Vpatch_Dependencies(I,J) and not Free_Vpatches(J) then
			Finish := False;
			Free_Vpatches(J) := True;
		     end if;
		  end loop;
	       end if;
	    end loop;
	 end loop;
      end Add_Dependencies;

This function loops through all vpatches in the Free_Vpatches set, and for each vpatch adds all the antecedents of the vpatch to Free_Vpatches set. Iteration stops when there are no more vpatches to add to the set.

      procedure Report_Conflicts is
      begin
	 for I in Free_Vpatches'Range loop
	    if Free_Vpatches(I) then
	       for J in Vpatch_Conflicts'Range(2) loop
		  if Vpatch_Conflicts(I,J) and Free_Vpatches(J) then
		     Put_Line("C " & Integer'Image(I));
		     Put_Line("C " & Integer'Image(J));
		  end if;
	       end loop;
	    end if;
	 end loop;
      end Report_Conflicts;

Report_Conflicts checks if mutually exclusive vpatches are present among the selected.

      procedure Sort is
	 Added_Vpatches : Vpatch_Set  := (others => False);
	 Vpatch_Order   : Vpatch_List := (others => 0);
	 N_Selected     : Natural     := 0;
	 Finished       : Boolean     := False;
	 Has_Loops      : Boolean     := False;
      begin
	 while not Finished loop
	    Finished := True;
	    for I in Free_Vpatches'Range loop
	       if Free_Vpatches(I) then
		  declare
		     All_Satisfied : Boolean := True;
		  begin

		     -- Check if all dependencies are already in the press order
		     for J in Vpatch_Dependencies'Range(2) loop
			if Vpatch_Dependencies(I,J) then
			   All_Satisfied := All_Satisfied and Added_Vpatches(J);
			end if;
		     end loop;

		     -- All dependencies present, can add this vpatch:
		     if All_Satisfied then
			Free_Vpatches(I)         := False;
			Added_Vpatches(I)        := True;
			Vpatch_Order(N_Selected) := I;
			N_Selected               := N_Selected + 1;
			Finished                 := False;
		     end if;
		  end;
	       end if;
	    end loop;
	 end loop;

	 -- Check if there are still free vpatches (there is a loop):
	 for I in Free_Vpatches'Range loop
	    if Free_Vpatches(I) then
	       Put_Line("L " & Integer'Image(I));
	       Has_Loops := True;
	    end if;
	 end loop;

	 -- Print vpatches in press order:
	 if not Has_Loops then
	    for I in Vpatch_Order'First..N_Selected - 1 loop
	       Put_Line("P " & Integer'Image(Vpatch_Order(I)));
	    end loop;
	 end if;
      end Sort;

Sort is the main, well, sorting function. The variables are:

  • Vpatch_Order - the press order, which is supposed to be main output of this tool.
  • N_Selected - the first free index in Vpatch_Order (point into which new vpatch is added).
  • Added_Vpatches - a set of vpatches already in Vpatch_Order.
  • Finished - the sorting loop converged on a solution.
  • Has_Loops - flag for the case when a loop has been discovered.

First, loop through vpatches in the Free_Vpatches set, checking if all the antecedents of each vpatch are present. If yes, the vpatch is added to Vpatch_Order, updating all the related variable accordingly.
If after this loop finishes, there are still selected but not added vpatches, there is a loop which gets printed. Otherwise, everything is OK and the press order gets printed.

   begin
      Read_Commands;
      Add_Dependencies;
      Report_Conflicts;
      Sort;
   end Toposort;

It should be clear what Toposort does, and why, at this point.

begin
   Get(Command);
   if Command /= 'v' then
      raise Constraint_Error with "Expected 'v' command.";
   end if;
   Get(Num_Vpatches);
   Toposort(Num_Vpatches);
end Vtoposort;

The main function gets the first command, verifies that it is the "number of vpatches" command, and starts toposort program, which can now allocate all its data structures on stack.


vsimplify

with Ada.Text_IO; use Ada.Text_IO;
with Ada.Integer_Text_IO; use Ada.Integer_Text_IO;

procedure Vsimplify is

   Command      : Character;
   Num_Vpatches : Integer;

   procedure Simplify(Num_Vpatches : Integer) is
      subtype Vpatch_Index     is Natural range 0..Num_Vpatches - 1;
      type Vpatch_Set          is array (Vpatch_Index'Range) of Boolean;
      type Vpatch_List         is array (Vpatch_Index'Range) of Vpatch_Index;
      type Vpatch_Relationship is array (Vpatch_Index'Range, Vpatch_Index'Range) of Boolean;

      Vpatch_Dependencies: Vpatch_Relationship := (others => (others => False));
      Vpatch_TC_Indirect : Vpatch_Relationship := (others => (others => False));
      Vpatch_Order   : Vpatch_List := (others => 0);
      Free_Vpatches  : Vpatch_Set  := (others => False);
      N_Selected     : Natural     := 0;

New item here is Vpatch_TC_Indirect, which holds the indirect transitive antecedent set of each vpatch.

      procedure Read_Commands is
	 I: Vpatch_Index;
	 J: Vpatch_Index;
      begin
     Read_Loop:
	 loop
	    exit Read_Loop when End_Of_File;
	    Get(Command);
	    if Command = 'd' then
	       Get(I);
	       Get(J);
	       Vpatch_Dependencies(I,J) := True;
	    elsif Command = 'C' then
	       -- Ignore conflicts in this tool, but consume the command
	       Get(I);
	    elsif Command = 'L' then
	       raise Constraint_Error with "Cannot simplify loops";
	    elsif Command = 'P' then
	       Get(I);
	       Vpatch_Order(N_Selected) := I;
	       N_Selected := N_Selected + 1;
	    end if;
	 end loop Read_Loop;

	 if N_Selected - 1 /= Vpatch_Order'Last then
	    raise Constraint_Error with "Not all vpatches present in the press order";
	 end if;

      end Read_Commands;

Up to this point, we have already seen everything, though Read_Commands is updated to be more in line with output of vtoposort.

      procedure Simplify_Inner is
      begin
	 -- Fill Vpatch_TC_Indirect
	 for I in Vpatch_Order'Range loop
	    declare
	       N : Vpatch_Index := Vpatch_Order(I);
	    begin
	       for J in Vpatch_Dependencies'Range(2) loop
		  if Vpatch_Dependencies(N,J) then
		     for K in Vpatch_Dependencies'Range(2) loop
			Vpatch_TC_Indirect(N,K) := Vpatch_TC_Indirect(N,K) or
			  Vpatch_Dependencies(J,K) or
			  Vpatch_TC_Indirect(J,K);
		     end loop;
		  end if;
	       end loop;
	    end;
	 end loop;

	 -- Output Vpatch_TC_Indirect
	 for I in Vpatch_TC_Indirect'Range(1) loop
	    for J in Vpatch_TC_Indirect'Range(2) loop
	       if Vpatch_TC_Indirect(I,J) then
		  Put_Line("I " & Integer'Image(I) & " " & Integer'Image(J));
	       end if;
	    end loop;
	 end loop;

	 -- Remove redundant connections from Vpatch_Dependencies
	 for I in Vpatch_Dependencies'Range(1) loop
	    for J in Vpatch_TC_Indirect'Range(2) loop
	       Vpatch_Dependencies(I,J) := Vpatch_Dependencies(I,J) and
		 not Vpatch_TC_Indirect(I,J);
	    end loop;
	 end loop;

	 -- Output filtered out Vpatch_Dependencies
	 for I in Vpatch_Dependencies'Range(1) loop
	    for J in Vpatch_TC_Indirect'Range(2) loop
	       if Vpatch_Dependencies(I,J) then
		  Put_Line("d " & Integer'Image(I) & " " & Integer'Image(J));
	       end if;
	    end loop;
	 end loop;
      end Simplify_Inner;

Simplify_Inner is the core of vsimplify:
To fill Vpatch_TC_Indirect, for each vpatch N in the press order, loop through antecedents of N, summing their direct antecedent sets (from vflow), and indirect antecedent sets (the ones being calculated in this very loop) into the row N. Because iteration through the vpatches happens in the press order, the indirect antecedent sets are guaranteed to be completed at the time when they are accessed.

Then, output the indirect antecedent sets, and substract the indirect antecedent set from the antecedent sets that came as input from vflow. Finally, we output the cleaned up direct antecendent sets.

   begin
      Read_Commands;
      Simplify_Inner;
   end Simplify;
begin
   Get(Command);
   if Command /= 'v' then
      raise Constraint_Error with "Expected 'v' command.";
   end if;
   Get(Num_Vpatches);
   Simplify(Num_Vpatches);
end Vsimplify;

Main function of vsimplify follows the pattern of vtoposort.


vfilter (in awk this time)

This is added to v.sh:

vfilter() {
	awk -v N="$1" '
BEGIN {r=0;}
$0 ~ /^diff -uNr/ {r=1;}
r == 1 && $1 == "---" {sub("[^/]*/", "", $2); ip=$2; ih=$3}
r == 1 && $1 == "+++" {sub("[^/]*/", "", $2); print N,ip,$2,ih,$3; r=0;}'
}

Upon detecting the "diff -uNr" in the colunm 0 of a line, read input and output information of a hunk. The path has the initial directory stripped (because a/ and b/ prefixes inside vpatches do not matter for the purposes of v-presser).


curl 'http://bvt-trace.net/vpatches/vtools_add_vsh_utils.vpatch' > vtools_add_vsh_utils.vpatch
curl 'http://bvt-trace.net/vpatches/vtools_add_vsh_utils.vpatch.bvt.sig' > vtools_add_vsh_utils.vpatch.bvt.sig

10 Responses to “v.sh part 2: Ada utilities”

  1. Diana Coman says:
    1

    A vpatch is genesis if all its hunks are file creations.

    Hmm, this risks to misfire for instance on older trees that have the manifest added at some point rather than baked in from the start. Arguably they are due for a regrind anyway but the above definition (and its implicit reliance on the manifest file mechanism) sits rather uncomfortably with me. To my mind a vpatch is genesis if it doesn't have any ancestors - would this be all that harder to use at this specific point?

  2. bvt says:
    2

    @Diana Coman:
    If you consider a case of two vpatches that only create files, a genesis by intention (which V can't know) and a vpatch that adds manifest, those two vpatches can either conflict (if they create files with the same name) or commute (if there is no intersection between the sets of created names). How would a ~stateless system of V know which one is the genesis, and which one is just a leaf with manifest? AFAIK in V as described by Ben Vulpes, these will be ordered in flow somehow, and you would be able to press genesis, genesis+manifest OR manifest, manifest+genesis, depending on the exact flow the tool will give you; in v.sh, pressing one of them *will not* pull in the other, however you can select multiple leafs for pressing, not just one, so you can press any combination of them.

    If you consider a case of more vpatches, then genesis is not special vpatch in any way (so I guess Is_Genesis is rather misleading name) - it just happens that all other vpatches have this one vpatch - genesis - as their antecedent, while not all vpatches have the vpatch with manifest in their antecedent set.

  3. Diana Coman says:
    3

    I kept thinking this over and despite my initial uneasiness, I think you are actually right in that a vpatch with *all* hunks creating files is indeed a genesis since it does not/can not properly speaking have an antecedent as it doesn't require/rely on anything being present.

  4. Jacob Welsh says:
    4

    > Upon detecting the "diff -uNr" in the colunm 0 of a line, read input and output information of a hunk.

    This wasn't previously a required part of the unified diff format, and at least busybox diff doesn't include it (see also). It also means one might need to be more careful in any commentary at the top of a patch file.

    Still, I'm thinking this is the saner way to go since it simplifies any tool that needs to parse patches while complicating only theoretical vdiffs that don't conform; nicely anti-Postel if we can afford it.

  5. bvt says:
    5

    Yes, it's nice, however it's phf who came up with this: vpatch utility won't detect a header unless it is preceded by a line that matches "^diff -uNr", and this is what I carried over to the rest of the v.sh components -- in the end, vpatch is the ultimate downstream consumer. Also, vdiff produces this header unconditionally.

    IMO if you want to get rid of Ada dependency, it's better to move vdiff to keksum code instead of relying on whatever diff busybox has -- also because if you diff two trees, it's not guaranteed that phf's vdiff and busybox+wrapper will produce identical hunks inside the vpatch.

  6. Jacob Welsh says:
    6

    Oh I see. The exact pattern looks to be "^diff ".

    I can't say I like the idea of needing a 2k+ line GNUistic vdiff compared to <100 lines of awk on top of a standard utility, or this bitcoin-like situation where "the code is the spec and realistically there can be only one implementation", or the idea that all vdiffs must produce the same output anyway as that closes off improvement of the algorithm. Do you have a ref handy for why identical output is important? I recall complaints about locale-dependent file sort order from the logs but not beyond that. Anyway, thank you for the caution and I will probably do as you suggest, as this doesn't seem like a battle I can or need to win.

  7. bvt says:
    7

    I don't have a reference specifically, but the clue for me was reading somewhere that phf took care that vdiff would reproduce all TRB vpatches accurately from the pressed trees. If you think about it, if vdiffs are allowed to produce differing outputs, shipping vpatch signatures without a vpatch à-la trinque becomes impossible. To still have this possibility without cementing the diff algorithm, signature would have to cover only the vpatch headers, with filenames and input and output hashes, leaving the hunk bodies out.

    And, well, the "standard utility" is also implemented in some ~1000 LoC even with busybox. This is not to say that vdiff could not use some improvement: it does have some crap that should go.

  8. Jacob Welsh says:
    8

    > shipping vpatch signatures without a vpatch à-la trinque becomes impossible

    Although that was for a genesis, where the diff hunks are fully determined anyway.

    I'd say it makes a good example of the importance of choosing optionality wisely, because putting it in one place closes it off in another.

    > it does have some crap that should go.

    Ah yes, the "quoting is good mmkay, so let's do it, sometimes, if we feel like it".

    Any thoughts on what if anything it should do on encountering file names with embedded whitespace (which on unix can even include newlines)?

  9. bvt says:
    9

    Any thoughts on what if anything it should do on encountering file names with embedded whitespace (which on unix can even include newlines)?

    In the context of vpatches I don't see a good solution, but for the sake of honestly I considered aborting vdiff process or at least reporting that the vpatch is borked to stderr, as most likely no V presser/patcher would be able to do anything with it.

  10. [...] his awk + posix shell + Ada Vtron in a 3 part series (1, 2, [...]

Leave a Reply