In the part one, we had a look at the v.sh script, which should give the reader an understanding of how this v-presser works. In the part two, I will add the utilities required by the v.sh:
- vflow
- vtoposort
- vsimplify
- vfilter (in awk).
I should mention that while writing this article, I really started to hate the duplication of input-output code, so I think I have figured out a way of restructuring the whole thing as a single application. But I did not want to delay the release any further, so I present what I currently have.
vflow
The goal is to determine descendant-antecedent relationships between vpatches based on the information about hunks. From data processing point of view, the goal is to transform the input:
vpatch-id in-file-id out-file-id in-hash-id out-hash-id
into output:
v vpatches-count d descendant-id antecedent-id ... c vpatch1-id vpatch2-id ...
by matching the hunk inputs (path, hash) in vpatch A to the hunk output with the same values in vpatch B: in this case A is the descendant of B.
The source:
with Ada.Text_IO; use Ada.Text_IO; with Ada.Integer_Text_IO; use Ada.Integer_Text_IO; with Ada.Command_Line; use Ada.Command_Line; with Ada.Containers.Generic_Sort; procedure Vflow is type Hunk_State is (Unsatisfied, Satisfied); Empty_Hash : constant := 0; type Hunk is record Vpatch: Natural; In_File: Natural; Out_File: Natural; In_Hash: Natural; Out_Hash: Natural; State: Hunk_State; end record;
Hunks are described by a tuple of (vpatch, input file path, output file path, input file hash, output file hash). State is added for internal bookkeeping.
Num_Hunks : Integer := Integer'Value(Argument(1)); Num_Vpatches : Integer := Integer'Value(Argument(2));
The number of hunks and vpatches are provided as command-line arguments. Rest of the types are elaborated based on this input.
subtype Hunk_Index is Natural range 1..Num_Hunks; type Hunk_Array is array (Hunk_Index range <>) of Hunk; All_Hunks : Hunk_Array(Hunk_Index'Range);
All_Hunks stores all the input to the application.
subtype Vpatch_Index is Natural range 0..Num_Vpatches - 1; type Vpatch_Hunks is array (Vpatch_Index'Range) of Hunk_Index; VH_Low : Vpatch_Hunks := (others => Hunk_Index'First); VH_High: Vpatch_Hunks := (others => Hunk_Index'First);
Input should be provided sorted by vpatch identifier. VH_Low(I) stores the first index into All_Hunks where hunks belonging to vpatch I are stored, VH_High(I) - the last.
One problem is that all these data structures are stored on stack, which has been fine so far even with the whole Linux tree, but may require either "ulimit -s unlimited" in v.sh, or dynamic allocation here, if in the future the vtree becomes too large.
type Vpatch_Relationship is array (Vpatch_Index'Range, Vpatch_Index'Range) of Boolean; Vpatch_Dependencies: Vpatch_Relationship := (others => (others => False)); Vpatch_Conflicts : Vpatch_Relationship := (others => (others => False));
Vpatch_Dependencies(I,J) is set to true if vpatch I is a descendant on vpatch J, the same goes for Vpatch_Conflicts if vpatches I and J are mutually incompatible descendants of the same vpatch.
function Get_Hunk return Hunk is H: Hunk; begin Get(H.Vpatch); Get(H.In_File); Get(H.Out_File); Get(H.In_Hash); Get(H.Out_Hash); H.State := Unsatisfied; return H; end Get_Hunk;
With simple numeric input, we don't have to rely on any hand-rolled parsers, and can just use Ada's built-in tools.
procedure Read_Hunks is I: Natural := All_Hunks'First; begin Read_Loop: loop exit Read_Loop when End_Of_File; All_Hunks(I) := Get_Hunk; I := I + 1; end loop Read_Loop; end Read_Hunks;
Here, we just read all hunks one-by-one.
procedure Init_Vpatch_Hunk_Bounds is begin for I in All_Hunks'Range loop declare V : Integer := All_Hunks(I).Vpatch; begin if VH_Low(V) > I or All_Hunks(VH_Low(V)).Vpatch /= V then VH_Low(V) := I; end if; VH_High(V) := I; end; end loop; end Init_Vpatch_Hunk_Bounds;
We do not keep track of where each vpatch starts and ends in All_Hunks when reading, so this information is recovered by Init_Vpatch_Hunk_Bounds. One linear scan is enough.
procedure Populate_Conflicts is function Conflicts(I: Vpatch_Index; J: Vpatch_Index) return Boolean is begin for A in VH_Low(I)..VH_High(I) loop for B in VH_Low(J)..VH_High(J) loop if (All_Hunks(A).In_File = All_Hunks(B).In_File) and (All_Hunks(A).In_Hash = All_Hunks(B).In_Hash) then return True; end if; end loop; end loop; return False; end Conflicts; begin for I in Vpatch_Index'Range loop for J in Vpatch_Index'Range loop if I < J and then Conflicts(I,J) then Vpatch_Conflicts(I,J) := True; Vpatch_Conflicts(J,I) := True; end if; end loop; end loop; end Populate_Conflicts;
The definition of conflict between two vpatches is rather naive: two vpatches are conflicting when they both contain hunks have the same (path,hash) as inputs. We check this property among all pairs of vpatches.
Next comes the main dependency resolution function, Solve.
procedure Solve is Input_Hunks : Hunk_Array := All_Hunks; Output_Hunks: Hunk_Array renames All_Hunks; Finished : Boolean := False;
The way vflow detects if vpatch I is a descendant of vpatch J is by checking, if any (output path, output hash) created by vpatch J is used as (input path, input hash) in vpatch I. To reduce the algorithmic complexity, we sort All_Hunks by join keys (because that's what this procedure is essentially doing) - input or output path ids. We have to sort using different keys because we cannot assume that the input path is always the same as the output path. So hunks sorted by input path are stored in Input_Hunks, for the hunks sorted by output path (Output_Hunks) we reuse the storage of All_Hunks.
-- Start of sorting boilerplate -- function Input_Before(A: Positive; B: Positive) return Boolean is begin return Input_Hunks(A).In_File < Input_Hunks(B).In_File; end Input_Before; function Output_Before(A: Positive; B: Positive) return Boolean is begin return Output_Hunks(A).Out_File < Output_Hunks(B).Out_File; end Output_Before; procedure Input_Swap(A: Positive; B: Positive) is Tmp : Hunk; begin Tmp := Input_Hunks(B); Input_Hunks(B) := Input_Hunks(A); Input_Hunks(A) := Tmp; end Input_Swap; procedure Output_Swap(A: Positive; B: Positive) is Tmp : Hunk; begin Tmp := Output_Hunks(B); Output_Hunks(B) := Output_Hunks(A); Output_Hunks(A) := Tmp; end Output_Swap; procedure Input_Sort is new Ada.Containers.Generic_Sort (Positive, Input_Before, Input_Swap); procedure Output_Sort is new Ada.Containers.Generic_Sort (Positive, Output_Before, Output_Swap); -- -- End of sorting boilerplate
This boilerplate is required to use Ada's built-in generic sorting facilities.
function Check_Genesis(Input: in out Hunk_Array) return Boolean is Is_Genesis : Boolean := True; Vpatch: Vpatch_Index := Input(Input'First).Vpatch; begin for I in Input'Range loop Is_Genesis := Is_Genesis and (Input(I).In_Hash = Empty_Hash); end loop; return Is_Genesis; end Check_Genesis;
A vpatch is genesis if all its hunks are file creations.
function Try_Connect(Input: in out Hunk_Array; Output: in out Hunk_Array) return Boolean is I : Positive := Input'First; J : Positive := Output'First; All_Marked: Boolean := True; begin while I <= Input'Last and J <= Output'Last loop if Input(I).State /= Unsatisfied or Input(I).In_File < Output(J).Out_File then I := I + 1; elsif Input(I).In_File > Output(J).Out_File then J := J + 1; else if Input(I).In_Hash = Output(J).Out_Hash and Input(I).In_Hash /= Empty_Hash then Input(I).State := Satisfied; Vpatch_Dependencies(Input(Input'First).Vpatch, Output(Output'First).Vpatch) := True; end if; I := I + 1; end if; end loop; for I in Input'Range loop All_Marked := All_Marked and Input(I).State = Satisfied; end loop; return All_Marked; end Try_Connect;
Try matching the hunks belonging of candidate descendant (in array Input) with hunks of candidate antecedent (in array Output). The arrays are sorted, so a variant of a sorted join can be used. If there is a match, we assume that there is a dependency among the vpatches. The return value of the function is True is all hunks in the vpatch are satisfied (that is, have an antecedent); to determine the return value, we iterate over Input once again.
begin for I in Vpatch_Index'Range loop Output_Sort(VH_Low(I), VH_High(I)); Input_Sort(VH_Low(I), VH_High(I)); end loop; for I in Vpatch_Index'Range loop declare Hunks_In : Hunk_Array renames Input_Hunks(VH_Low(I)..VH_High(I)); begin Finished := False; for J in Vpatch_Index'Range loop if J /= I then declare Hunks_Out : Hunk_Array renames Output_Hunks(VH_Low(J)..VH_High(J)); begin Finished := Try_Connect(Hunks_In, Hunks_Out); exit when Finished; end; else Finished := Check_Genesis(Hunks_In); exit when Finished; end if; end loop; -- Looped through all vpatches, there are hunks without matched inputs: if not Finished then -- Check if the remaining hunks are file creations: declare OK : Boolean := True; begin for I in Hunks_In'Range loop if Hunks_In(I).In_Hash = Empty_Hash and Hunks_In(I).State = Unsatisfied then Hunks_In(I).State := Satisfied; end if; OK := OK and (Hunks_In(I).State = Satisfied); end loop; if not OK then raise Constraint_Error with "Unsatisfied hunks."; end if; end; end if; end; end loop; end Solve;
The body of the function, in a loop:
- First, sort hunks by input path and output path, in blocks belonging to different vpatches.
- Then, we try to find antecedents for each of the vpatches, also checking if a vpatch is a genesis.
- If after the looping through each potential antecedent there still are hunks without a matching input, we exit with an error.
procedure Produce_Output is begin Put_Line("v " & Integer'Image(Num_Vpatches)); for I in Vpatch_Dependencies'Range(1) loop for J in Vpatch_Dependencies'Range(2) loop if Vpatch_Dependencies(I,J) then Put_Line("d " & Integer'Image(I) & " " & Integer'Image(J)); end if; end loop; end loop; for I in Vpatch_Conflicts'Range(1) loop for J in Vpatch_Conflicts'Range(2) loop if Vpatch_Conflicts(I,J) then Put_Line("c " & Integer'Image(I) & " " & Integer'Image(J)); end if; end loop; end loop; end Produce_Output;
Produce_Output just prints out the number of vpatches, the antecedent-descendant relationships between vpatches stored in Vpatch_Dependencies, and the conflicts stored in Vpatch_Conflicts.
begin Read_Hunks; Init_Vpatch_Hunk_Bounds; Populate_Conflicts; Solve; Produce_Output; end Vflow;
vtoposort
The goal is to calculate the press order of vpatches based on the dependency and the conflict information produced by vflow concatenated with the set of vpatches selected by the operator for pressing.
with Ada.Text_IO; use Ada.Text_IO; with Ada.Integer_Text_IO; use Ada.Integer_Text_IO; procedure Vtoposort is Command : Character; Num_Vpatches : Integer;
Command is used for reading all input, Num_Vpatches is self-describing.
procedure Toposort(Num_Vpatches : Integer) is subtype Vpatch_Index is Natural range 0..Num_Vpatches - 1; type Vpatch_Set is array (Vpatch_Index'Range) of Boolean; type Vpatch_List is array (Vpatch_Index'Range) of Vpatch_Index; type Vpatch_Relationship is array (Vpatch_Index'Range, Vpatch_Index'Range) of Boolean;
Some new types are Vpatch_Set, which we simulate by an array of booleans, where Set(I) = True means that element I is present in the set, and Vpatch_List, which is used for storing the vpatch indices (the press order). Vpatch_Relationship we have seen before.
Vpatch_Dependencies: Vpatch_Relationship := (others => (others => False)); Vpatch_Conflicts : Vpatch_Relationship := (others => (others => False)); Free_Vpatches : Vpatch_Set := (others => False);
Vpatch_Dependencies and Vpatch_Conflicts are the same as in vflow, and their duplication suggest that perhaps uniting all utilities in one tool would be useful. Free_Vpatches initially contains the list of vpatches selected for pressing by user, but if Toposort detects that it is one of the dependencies of any of the vpatches, it will add this dependency to the Free_Vpatches set.
procedure Read_Commands is I: Vpatch_Index; J: Vpatch_Index; begin Read_Loop: loop exit Read_Loop when End_Of_File; Get(Command); if Command = 'd' then Get(I); Get(J); Vpatch_Dependencies(I,J) := True; elsif Command = 'c' then Get(I); Get(J); Vpatch_Conflicts(I,J) := True; elsif Command = 's' then Get(I); Free_Vpatches(I) := True; else raise Constraint_Error with "Unknown Command"; end if; end loop Read_Loop; end Read_Commands;
Straightforward loop for reading the input, not much to write about.
procedure Add_Dependencies is Finish : Boolean := False; begin while not Finish loop Finish := True; for I in Free_Vpatches'Range loop if Free_Vpatches(I) then for J in Vpatch_Dependencies'Range(2) loop if Vpatch_Dependencies(I,J) and not Free_Vpatches(J) then Finish := False; Free_Vpatches(J) := True; end if; end loop; end if; end loop; end loop; end Add_Dependencies;
This function loops through all vpatches in the Free_Vpatches set, and for each vpatch adds all the antecedents of the vpatch to Free_Vpatches set. Iteration stops when there are no more vpatches to add to the set.
procedure Report_Conflicts is begin for I in Free_Vpatches'Range loop if Free_Vpatches(I) then for J in Vpatch_Conflicts'Range(2) loop if Vpatch_Conflicts(I,J) and Free_Vpatches(J) then Put_Line("C " & Integer'Image(I)); Put_Line("C " & Integer'Image(J)); end if; end loop; end if; end loop; end Report_Conflicts;
Report_Conflicts checks if mutually exclusive vpatches are present among the selected.
procedure Sort is Added_Vpatches : Vpatch_Set := (others => False); Vpatch_Order : Vpatch_List := (others => 0); N_Selected : Natural := 0; Finished : Boolean := False; Has_Loops : Boolean := False; begin while not Finished loop Finished := True; for I in Free_Vpatches'Range loop if Free_Vpatches(I) then declare All_Satisfied : Boolean := True; begin -- Check if all dependencies are already in the press order for J in Vpatch_Dependencies'Range(2) loop if Vpatch_Dependencies(I,J) then All_Satisfied := All_Satisfied and Added_Vpatches(J); end if; end loop; -- All dependencies present, can add this vpatch: if All_Satisfied then Free_Vpatches(I) := False; Added_Vpatches(I) := True; Vpatch_Order(N_Selected) := I; N_Selected := N_Selected + 1; Finished := False; end if; end; end if; end loop; end loop; -- Check if there are still free vpatches (there is a loop): for I in Free_Vpatches'Range loop if Free_Vpatches(I) then Put_Line("L " & Integer'Image(I)); Has_Loops := True; end if; end loop; -- Print vpatches in press order: if not Has_Loops then for I in Vpatch_Order'First..N_Selected - 1 loop Put_Line("P " & Integer'Image(Vpatch_Order(I))); end loop; end if; end Sort;
Sort is the main, well, sorting function. The variables are:
- Vpatch_Order - the press order, which is supposed to be main output of this tool.
- N_Selected - the first free index in Vpatch_Order (point into which new vpatch is added).
- Added_Vpatches - a set of vpatches already in Vpatch_Order.
- Finished - the sorting loop converged on a solution.
- Has_Loops - flag for the case when a loop has been discovered.
First, loop through vpatches in the Free_Vpatches set, checking if all the antecedents of each vpatch are present. If yes, the vpatch is added to Vpatch_Order, updating all the related variable accordingly.
If after this loop finishes, there are still selected but not added vpatches, there is a loop which gets printed. Otherwise, everything is OK and the press order gets printed.
begin Read_Commands; Add_Dependencies; Report_Conflicts; Sort; end Toposort;
It should be clear what Toposort does, and why, at this point.
begin Get(Command); if Command /= 'v' then raise Constraint_Error with "Expected 'v' command."; end if; Get(Num_Vpatches); Toposort(Num_Vpatches); end Vtoposort;
The main function gets the first command, verifies that it is the "number of vpatches" command, and starts toposort program, which can now allocate all its data structures on stack.
vsimplify
with Ada.Text_IO; use Ada.Text_IO; with Ada.Integer_Text_IO; use Ada.Integer_Text_IO; procedure Vsimplify is Command : Character; Num_Vpatches : Integer; procedure Simplify(Num_Vpatches : Integer) is subtype Vpatch_Index is Natural range 0..Num_Vpatches - 1; type Vpatch_Set is array (Vpatch_Index'Range) of Boolean; type Vpatch_List is array (Vpatch_Index'Range) of Vpatch_Index; type Vpatch_Relationship is array (Vpatch_Index'Range, Vpatch_Index'Range) of Boolean; Vpatch_Dependencies: Vpatch_Relationship := (others => (others => False)); Vpatch_TC_Indirect : Vpatch_Relationship := (others => (others => False)); Vpatch_Order : Vpatch_List := (others => 0); Free_Vpatches : Vpatch_Set := (others => False); N_Selected : Natural := 0;
New item here is Vpatch_TC_Indirect, which holds the indirect transitive antecedent set of each vpatch.
procedure Read_Commands is I: Vpatch_Index; J: Vpatch_Index; begin Read_Loop: loop exit Read_Loop when End_Of_File; Get(Command); if Command = 'd' then Get(I); Get(J); Vpatch_Dependencies(I,J) := True; elsif Command = 'C' then -- Ignore conflicts in this tool, but consume the command Get(I); elsif Command = 'L' then raise Constraint_Error with "Cannot simplify loops"; elsif Command = 'P' then Get(I); Vpatch_Order(N_Selected) := I; N_Selected := N_Selected + 1; end if; end loop Read_Loop; if N_Selected - 1 /= Vpatch_Order'Last then raise Constraint_Error with "Not all vpatches present in the press order"; end if; end Read_Commands;
Up to this point, we have already seen everything, though Read_Commands is updated to be more in line with output of vtoposort.
procedure Simplify_Inner is begin -- Fill Vpatch_TC_Indirect for I in Vpatch_Order'Range loop declare N : Vpatch_Index := Vpatch_Order(I); begin for J in Vpatch_Dependencies'Range(2) loop if Vpatch_Dependencies(N,J) then for K in Vpatch_Dependencies'Range(2) loop Vpatch_TC_Indirect(N,K) := Vpatch_TC_Indirect(N,K) or Vpatch_Dependencies(J,K) or Vpatch_TC_Indirect(J,K); end loop; end if; end loop; end; end loop; -- Output Vpatch_TC_Indirect for I in Vpatch_TC_Indirect'Range(1) loop for J in Vpatch_TC_Indirect'Range(2) loop if Vpatch_TC_Indirect(I,J) then Put_Line("I " & Integer'Image(I) & " " & Integer'Image(J)); end if; end loop; end loop; -- Remove redundant connections from Vpatch_Dependencies for I in Vpatch_Dependencies'Range(1) loop for J in Vpatch_TC_Indirect'Range(2) loop Vpatch_Dependencies(I,J) := Vpatch_Dependencies(I,J) and not Vpatch_TC_Indirect(I,J); end loop; end loop; -- Output filtered out Vpatch_Dependencies for I in Vpatch_Dependencies'Range(1) loop for J in Vpatch_TC_Indirect'Range(2) loop if Vpatch_Dependencies(I,J) then Put_Line("d " & Integer'Image(I) & " " & Integer'Image(J)); end if; end loop; end loop; end Simplify_Inner;
Simplify_Inner is the core of vsimplify:
To fill Vpatch_TC_Indirect, for each vpatch N in the press order, loop through antecedents of N, summing their direct antecedent sets (from vflow), and indirect antecedent sets (the ones being calculated in this very loop) into the row N. Because iteration through the vpatches happens in the press order, the indirect antecedent sets are guaranteed to be completed at the time when they are accessed.
Then, output the indirect antecedent sets, and substract the indirect antecedent set from the antecedent sets that came as input from vflow. Finally, we output the cleaned up direct antecendent sets.
begin Read_Commands; Simplify_Inner; end Simplify;
begin Get(Command); if Command /= 'v' then raise Constraint_Error with "Expected 'v' command."; end if; Get(Num_Vpatches); Simplify(Num_Vpatches); end Vsimplify;
Main function of vsimplify follows the pattern of vtoposort.
vfilter (in awk this time)
This is added to v.sh:
vfilter() { awk -v N="$1" ' BEGIN {r=0;} $0 ~ /^diff -uNr/ {r=1;} r == 1 && $1 == "---" {sub("[^/]*/", "", $2); ip=$2; ih=$3} r == 1 && $1 == "+++" {sub("[^/]*/", "", $2); print N,ip,$2,ih,$3; r=0;}' }
Upon detecting the "diff -uNr" in the colunm 0 of a line, read input and output information of a hunk. The path has the initial directory stripped (because a/ and b/ prefixes inside vpatches do not matter for the purposes of v-presser).
curl 'http://bvt-trace.net/vpatches/vtools_add_vsh_utils.vpatch' > vtools_add_vsh_utils.vpatch
curl 'http://bvt-trace.net/vpatches/vtools_add_vsh_utils.vpatch.bvt.sig' > vtools_add_vsh_utils.vpatch.bvt.sig