In the part one, we had a look at the v.sh script, which should give the reader an understanding of how this v-presser works. In the part two, I will add the utilities required by the v.sh:
- vflow
- vtoposort
- vsimplify
- vfilter (in awk).
I should mention that while writing this article, I really started to hate the duplication of input-output code, so I think I have figured out a way of restructuring the whole thing as a single application. But I did not want to delay the release any further, so I present what I currently have.
vflow
The goal is to determine descendant-antecedent relationships between vpatches based on the information about hunks. From data processing point of view, the goal is to transform the input:
vpatch-id in-file-id out-file-id in-hash-id out-hash-id
into output:
v vpatches-count d descendant-id antecedent-id ... c vpatch1-id vpatch2-id ...
by matching the hunk inputs (path, hash) in vpatch A to the hunk output with the same values in vpatch B: in this case A is the descendant of B.
The source:
with Ada.Text_IO; use Ada.Text_IO;
with Ada.Integer_Text_IO; use Ada.Integer_Text_IO;
with Ada.Command_Line; use Ada.Command_Line;
with Ada.Containers.Generic_Sort;
procedure Vflow is
type Hunk_State is (Unsatisfied, Satisfied);
Empty_Hash : constant := 0;
type Hunk is record
Vpatch: Natural;
In_File: Natural;
Out_File: Natural;
In_Hash: Natural;
Out_Hash: Natural;
State: Hunk_State;
end record;
Hunks are described by a tuple of (vpatch, input file path, output file path, input file hash, output file hash). State is added for internal bookkeeping.
Num_Hunks : Integer := Integer'Value(Argument(1)); Num_Vpatches : Integer := Integer'Value(Argument(2));
The number of hunks and vpatches are provided as command-line arguments. Rest of the types are elaborated based on this input.
subtype Hunk_Index is Natural range 1..Num_Hunks; type Hunk_Array is array (Hunk_Index range <>) of Hunk; All_Hunks : Hunk_Array(Hunk_Index'Range);
All_Hunks stores all the input to the application.
subtype Vpatch_Index is Natural range 0..Num_Vpatches - 1; type Vpatch_Hunks is array (Vpatch_Index'Range) of Hunk_Index; VH_Low : Vpatch_Hunks := (others => Hunk_Index'First); VH_High: Vpatch_Hunks := (others => Hunk_Index'First);
Input should be provided sorted by vpatch identifier. VH_Low(I) stores the first index into All_Hunks where hunks belonging to vpatch I are stored, VH_High(I) - the last.
One problem is that all these data structures are stored on stack, which has been fine so far even with the whole Linux tree, but may require either "ulimit -s unlimited" in v.sh, or dynamic allocation here, if in the future the vtree becomes too large.
type Vpatch_Relationship is array (Vpatch_Index'Range, Vpatch_Index'Range) of Boolean; Vpatch_Dependencies: Vpatch_Relationship := (others => (others => False)); Vpatch_Conflicts : Vpatch_Relationship := (others => (others => False));
Vpatch_Dependencies(I,J) is set to true if vpatch I is a descendant on vpatch J, the same goes for Vpatch_Conflicts if vpatches I and J are mutually incompatible descendants of the same vpatch.
function Get_Hunk return Hunk is
H: Hunk;
begin
Get(H.Vpatch);
Get(H.In_File);
Get(H.Out_File);
Get(H.In_Hash);
Get(H.Out_Hash);
H.State := Unsatisfied;
return H;
end Get_Hunk;
With simple numeric input, we don't have to rely on any hand-rolled parsers, and can just use Ada's built-in tools.
procedure Read_Hunks is
I: Natural := All_Hunks'First;
begin
Read_Loop:
loop
exit Read_Loop when End_Of_File;
All_Hunks(I) := Get_Hunk;
I := I + 1;
end loop Read_Loop;
end Read_Hunks;
Here, we just read all hunks one-by-one.
procedure Init_Vpatch_Hunk_Bounds is
begin
for I in All_Hunks'Range loop
declare
V : Integer := All_Hunks(I).Vpatch;
begin
if VH_Low(V) > I or All_Hunks(VH_Low(V)).Vpatch /= V then
VH_Low(V) := I;
end if;
VH_High(V) := I;
end;
end loop;
end Init_Vpatch_Hunk_Bounds;
We do not keep track of where each vpatch starts and ends in All_Hunks when reading, so this information is recovered by Init_Vpatch_Hunk_Bounds. One linear scan is enough.
procedure Populate_Conflicts is
function Conflicts(I: Vpatch_Index; J: Vpatch_Index) return Boolean is
begin
for A in VH_Low(I)..VH_High(I) loop
for B in VH_Low(J)..VH_High(J) loop
if (All_Hunks(A).In_File = All_Hunks(B).In_File) and
(All_Hunks(A).In_Hash = All_Hunks(B).In_Hash) then
return True;
end if;
end loop;
end loop;
return False;
end Conflicts;
begin
for I in Vpatch_Index'Range loop
for J in Vpatch_Index'Range loop
if I < J and then Conflicts(I,J) then
Vpatch_Conflicts(I,J) := True;
Vpatch_Conflicts(J,I) := True;
end if;
end loop;
end loop;
end Populate_Conflicts;
The definition of conflict between two vpatches is rather naive: two vpatches are conflicting when they both contain hunks have the same (path,hash) as inputs. We check this property among all pairs of vpatches.
Next comes the main dependency resolution function, Solve.
procedure Solve is
Input_Hunks : Hunk_Array := All_Hunks;
Output_Hunks: Hunk_Array renames All_Hunks;
Finished : Boolean := False;
The way vflow detects if vpatch I is a descendant of vpatch J is by checking, if any (output path, output hash) created by vpatch J is used as (input path, input hash) in vpatch I. To reduce the algorithmic complexity, we sort All_Hunks by join keys (because that's what this procedure is essentially doing) - input or output path ids. We have to sort using different keys because we cannot assume that the input path is always the same as the output path. So hunks sorted by input path are stored in Input_Hunks, for the hunks sorted by output path (Output_Hunks) we reuse the storage of All_Hunks.
-- Start of sorting boilerplate
--
function Input_Before(A: Positive; B: Positive) return Boolean is
begin
return Input_Hunks(A).In_File < Input_Hunks(B).In_File;
end Input_Before;
function Output_Before(A: Positive; B: Positive) return Boolean is
begin
return Output_Hunks(A).Out_File < Output_Hunks(B).Out_File;
end Output_Before;
procedure Input_Swap(A: Positive; B: Positive) is
Tmp : Hunk;
begin
Tmp := Input_Hunks(B);
Input_Hunks(B) := Input_Hunks(A);
Input_Hunks(A) := Tmp;
end Input_Swap;
procedure Output_Swap(A: Positive; B: Positive) is
Tmp : Hunk;
begin
Tmp := Output_Hunks(B);
Output_Hunks(B) := Output_Hunks(A);
Output_Hunks(A) := Tmp;
end Output_Swap;
procedure Input_Sort is
new Ada.Containers.Generic_Sort (Positive, Input_Before, Input_Swap);
procedure Output_Sort is
new Ada.Containers.Generic_Sort (Positive, Output_Before, Output_Swap);
--
-- End of sorting boilerplate
This boilerplate is required to use Ada's built-in generic sorting facilities.
function Check_Genesis(Input: in out Hunk_Array) return Boolean is
Is_Genesis : Boolean := True;
Vpatch: Vpatch_Index := Input(Input'First).Vpatch;
begin
for I in Input'Range loop
Is_Genesis := Is_Genesis and (Input(I).In_Hash = Empty_Hash);
end loop;
return Is_Genesis;
end Check_Genesis;
A vpatch is genesis if all its hunks are file creations.
function Try_Connect(Input: in out Hunk_Array;
Output: in out Hunk_Array) return Boolean is
I : Positive := Input'First;
J : Positive := Output'First;
All_Marked: Boolean := True;
begin
while I <= Input'Last and J <= Output'Last loop
if Input(I).State /= Unsatisfied or
Input(I).In_File < Output(J).Out_File then
I := I + 1;
elsif Input(I).In_File > Output(J).Out_File then
J := J + 1;
else
if Input(I).In_Hash = Output(J).Out_Hash and
Input(I).In_Hash /= Empty_Hash then
Input(I).State := Satisfied;
Vpatch_Dependencies(Input(Input'First).Vpatch,
Output(Output'First).Vpatch) := True;
end if;
I := I + 1;
end if;
end loop;
for I in Input'Range loop
All_Marked := All_Marked and Input(I).State = Satisfied;
end loop;
return All_Marked;
end Try_Connect;
Try matching the hunks belonging of candidate descendant (in array Input) with hunks of candidate antecedent (in array Output). The arrays are sorted, so a variant of a sorted join can be used. If there is a match, we assume that there is a dependency among the vpatches. The return value of the function is True is all hunks in the vpatch are satisfied (that is, have an antecedent); to determine the return value, we iterate over Input once again.
begin
for I in Vpatch_Index'Range loop
Output_Sort(VH_Low(I), VH_High(I));
Input_Sort(VH_Low(I), VH_High(I));
end loop;
for I in Vpatch_Index'Range loop
declare
Hunks_In : Hunk_Array renames Input_Hunks(VH_Low(I)..VH_High(I));
begin
Finished := False;
for J in Vpatch_Index'Range loop
if J /= I then
declare
Hunks_Out : Hunk_Array renames
Output_Hunks(VH_Low(J)..VH_High(J));
begin
Finished := Try_Connect(Hunks_In, Hunks_Out);
exit when Finished;
end;
else
Finished := Check_Genesis(Hunks_In);
exit when Finished;
end if;
end loop;
-- Looped through all vpatches, there are hunks without matched inputs:
if not Finished then
-- Check if the remaining hunks are file creations:
declare
OK : Boolean := True;
begin
for I in Hunks_In'Range loop
if Hunks_In(I).In_Hash = Empty_Hash and
Hunks_In(I).State = Unsatisfied then
Hunks_In(I).State := Satisfied;
end if;
OK := OK and (Hunks_In(I).State = Satisfied);
end loop;
if not OK then
raise Constraint_Error with "Unsatisfied hunks.";
end if;
end;
end if;
end;
end loop;
end Solve;
The body of the function, in a loop:
- First, sort hunks by input path and output path, in blocks belonging to different vpatches.
- Then, we try to find antecedents for each of the vpatches, also checking if a vpatch is a genesis.
- If after the looping through each potential antecedent there still are hunks without a matching input, we exit with an error.
procedure Produce_Output is
begin
Put_Line("v " & Integer'Image(Num_Vpatches));
for I in Vpatch_Dependencies'Range(1) loop
for J in Vpatch_Dependencies'Range(2) loop
if Vpatch_Dependencies(I,J) then
Put_Line("d " & Integer'Image(I) & " " & Integer'Image(J));
end if;
end loop;
end loop;
for I in Vpatch_Conflicts'Range(1) loop
for J in Vpatch_Conflicts'Range(2) loop
if Vpatch_Conflicts(I,J) then
Put_Line("c " & Integer'Image(I) & " " & Integer'Image(J));
end if;
end loop;
end loop;
end Produce_Output;
Produce_Output just prints out the number of vpatches, the antecedent-descendant relationships between vpatches stored in Vpatch_Dependencies, and the conflicts stored in Vpatch_Conflicts.
begin Read_Hunks; Init_Vpatch_Hunk_Bounds; Populate_Conflicts; Solve; Produce_Output; end Vflow;
vtoposort
The goal is to calculate the press order of vpatches based on the dependency and the conflict information produced by vflow concatenated with the set of vpatches selected by the operator for pressing.
with Ada.Text_IO; use Ada.Text_IO; with Ada.Integer_Text_IO; use Ada.Integer_Text_IO; procedure Vtoposort is Command : Character; Num_Vpatches : Integer;
Command is used for reading all input, Num_Vpatches is self-describing.
procedure Toposort(Num_Vpatches : Integer) is
subtype Vpatch_Index is Natural range 0..Num_Vpatches - 1;
type Vpatch_Set is array (Vpatch_Index'Range) of Boolean;
type Vpatch_List is array (Vpatch_Index'Range) of Vpatch_Index;
type Vpatch_Relationship is array
(Vpatch_Index'Range, Vpatch_Index'Range) of Boolean;
Some new types are Vpatch_Set, which we simulate by an array of booleans, where Set(I) = True means that element I is present in the set, and Vpatch_List, which is used for storing the vpatch indices (the press order). Vpatch_Relationship we have seen before.
Vpatch_Dependencies: Vpatch_Relationship := (others => (others => False));
Vpatch_Conflicts : Vpatch_Relationship := (others => (others => False));
Free_Vpatches : Vpatch_Set := (others => False);
Vpatch_Dependencies and Vpatch_Conflicts are the same as in vflow, and their duplication suggest that perhaps uniting all utilities in one tool would be useful. Free_Vpatches initially contains the list of vpatches selected for pressing by user, but if Toposort detects that it is one of the dependencies of any of the vpatches, it will add this dependency to the Free_Vpatches set.
procedure Read_Commands is
I: Vpatch_Index;
J: Vpatch_Index;
begin
Read_Loop:
loop
exit Read_Loop when End_Of_File;
Get(Command);
if Command = 'd' then
Get(I);
Get(J);
Vpatch_Dependencies(I,J) := True;
elsif Command = 'c' then
Get(I);
Get(J);
Vpatch_Conflicts(I,J) := True;
elsif Command = 's' then
Get(I);
Free_Vpatches(I) := True;
else
raise Constraint_Error with "Unknown Command";
end if;
end loop Read_Loop;
end Read_Commands;
Straightforward loop for reading the input, not much to write about.
procedure Add_Dependencies is
Finish : Boolean := False;
begin
while not Finish loop
Finish := True;
for I in Free_Vpatches'Range loop
if Free_Vpatches(I) then
for J in Vpatch_Dependencies'Range(2) loop
if Vpatch_Dependencies(I,J) and not Free_Vpatches(J) then
Finish := False;
Free_Vpatches(J) := True;
end if;
end loop;
end if;
end loop;
end loop;
end Add_Dependencies;
This function loops through all vpatches in the Free_Vpatches set, and for each vpatch adds all the antecedents of the vpatch to Free_Vpatches set. Iteration stops when there are no more vpatches to add to the set.
procedure Report_Conflicts is
begin
for I in Free_Vpatches'Range loop
if Free_Vpatches(I) then
for J in Vpatch_Conflicts'Range(2) loop
if Vpatch_Conflicts(I,J) and Free_Vpatches(J) then
Put_Line("C " & Integer'Image(I));
Put_Line("C " & Integer'Image(J));
end if;
end loop;
end if;
end loop;
end Report_Conflicts;
Report_Conflicts checks if mutually exclusive vpatches are present among the selected.
procedure Sort is
Added_Vpatches : Vpatch_Set := (others => False);
Vpatch_Order : Vpatch_List := (others => 0);
N_Selected : Natural := 0;
Finished : Boolean := False;
Has_Loops : Boolean := False;
begin
while not Finished loop
Finished := True;
for I in Free_Vpatches'Range loop
if Free_Vpatches(I) then
declare
All_Satisfied : Boolean := True;
begin
-- Check if all dependencies are already in the press order
for J in Vpatch_Dependencies'Range(2) loop
if Vpatch_Dependencies(I,J) then
All_Satisfied := All_Satisfied and Added_Vpatches(J);
end if;
end loop;
-- All dependencies present, can add this vpatch:
if All_Satisfied then
Free_Vpatches(I) := False;
Added_Vpatches(I) := True;
Vpatch_Order(N_Selected) := I;
N_Selected := N_Selected + 1;
Finished := False;
end if;
end;
end if;
end loop;
end loop;
-- Check if there are still free vpatches (there is a loop):
for I in Free_Vpatches'Range loop
if Free_Vpatches(I) then
Put_Line("L " & Integer'Image(I));
Has_Loops := True;
end if;
end loop;
-- Print vpatches in press order:
if not Has_Loops then
for I in Vpatch_Order'First..N_Selected - 1 loop
Put_Line("P " & Integer'Image(Vpatch_Order(I)));
end loop;
end if;
end Sort;
Sort is the main, well, sorting function. The variables are:
- Vpatch_Order - the press order, which is supposed to be main output of this tool.
- N_Selected - the first free index in Vpatch_Order (point into which new vpatch is added).
- Added_Vpatches - a set of vpatches already in Vpatch_Order.
- Finished - the sorting loop converged on a solution.
- Has_Loops - flag for the case when a loop has been discovered.
First, loop through vpatches in the Free_Vpatches set, checking if all the antecedents of each vpatch are present. If yes, the vpatch is added to Vpatch_Order, updating all the related variable accordingly.
If after this loop finishes, there are still selected but not added vpatches, there is a loop which gets printed. Otherwise, everything is OK and the press order gets printed.
begin
Read_Commands;
Add_Dependencies;
Report_Conflicts;
Sort;
end Toposort;
It should be clear what Toposort does, and why, at this point.
begin
Get(Command);
if Command /= 'v' then
raise Constraint_Error with "Expected 'v' command.";
end if;
Get(Num_Vpatches);
Toposort(Num_Vpatches);
end Vtoposort;
The main function gets the first command, verifies that it is the "number of vpatches" command, and starts toposort program, which can now allocate all its data structures on stack.
vsimplify
with Ada.Text_IO; use Ada.Text_IO;
with Ada.Integer_Text_IO; use Ada.Integer_Text_IO;
procedure Vsimplify is
Command : Character;
Num_Vpatches : Integer;
procedure Simplify(Num_Vpatches : Integer) is
subtype Vpatch_Index is Natural range 0..Num_Vpatches - 1;
type Vpatch_Set is array (Vpatch_Index'Range) of Boolean;
type Vpatch_List is array (Vpatch_Index'Range) of Vpatch_Index;
type Vpatch_Relationship is array (Vpatch_Index'Range, Vpatch_Index'Range) of Boolean;
Vpatch_Dependencies: Vpatch_Relationship := (others => (others => False));
Vpatch_TC_Indirect : Vpatch_Relationship := (others => (others => False));
Vpatch_Order : Vpatch_List := (others => 0);
Free_Vpatches : Vpatch_Set := (others => False);
N_Selected : Natural := 0;
New item here is Vpatch_TC_Indirect, which holds the indirect transitive antecedent set of each vpatch.
procedure Read_Commands is
I: Vpatch_Index;
J: Vpatch_Index;
begin
Read_Loop:
loop
exit Read_Loop when End_Of_File;
Get(Command);
if Command = 'd' then
Get(I);
Get(J);
Vpatch_Dependencies(I,J) := True;
elsif Command = 'C' then
-- Ignore conflicts in this tool, but consume the command
Get(I);
elsif Command = 'L' then
raise Constraint_Error with "Cannot simplify loops";
elsif Command = 'P' then
Get(I);
Vpatch_Order(N_Selected) := I;
N_Selected := N_Selected + 1;
end if;
end loop Read_Loop;
if N_Selected - 1 /= Vpatch_Order'Last then
raise Constraint_Error with "Not all vpatches present in the press order";
end if;
end Read_Commands;
Up to this point, we have already seen everything, though Read_Commands is updated to be more in line with output of vtoposort.
procedure Simplify_Inner is
begin
-- Fill Vpatch_TC_Indirect
for I in Vpatch_Order'Range loop
declare
N : Vpatch_Index := Vpatch_Order(I);
begin
for J in Vpatch_Dependencies'Range(2) loop
if Vpatch_Dependencies(N,J) then
for K in Vpatch_Dependencies'Range(2) loop
Vpatch_TC_Indirect(N,K) := Vpatch_TC_Indirect(N,K) or
Vpatch_Dependencies(J,K) or
Vpatch_TC_Indirect(J,K);
end loop;
end if;
end loop;
end;
end loop;
-- Output Vpatch_TC_Indirect
for I in Vpatch_TC_Indirect'Range(1) loop
for J in Vpatch_TC_Indirect'Range(2) loop
if Vpatch_TC_Indirect(I,J) then
Put_Line("I " & Integer'Image(I) & " " & Integer'Image(J));
end if;
end loop;
end loop;
-- Remove redundant connections from Vpatch_Dependencies
for I in Vpatch_Dependencies'Range(1) loop
for J in Vpatch_TC_Indirect'Range(2) loop
Vpatch_Dependencies(I,J) := Vpatch_Dependencies(I,J) and
not Vpatch_TC_Indirect(I,J);
end loop;
end loop;
-- Output filtered out Vpatch_Dependencies
for I in Vpatch_Dependencies'Range(1) loop
for J in Vpatch_TC_Indirect'Range(2) loop
if Vpatch_Dependencies(I,J) then
Put_Line("d " & Integer'Image(I) & " " & Integer'Image(J));
end if;
end loop;
end loop;
end Simplify_Inner;
Simplify_Inner is the core of vsimplify:
To fill Vpatch_TC_Indirect, for each vpatch N in the press order, loop through antecedents of N, summing their direct antecedent sets (from vflow), and indirect antecedent sets (the ones being calculated in this very loop) into the row N. Because iteration through the vpatches happens in the press order, the indirect antecedent sets are guaranteed to be completed at the time when they are accessed.
Then, output the indirect antecedent sets, and substract the indirect antecedent set from the antecedent sets that came as input from vflow. Finally, we output the cleaned up direct antecendent sets.
begin
Read_Commands;
Simplify_Inner;
end Simplify;
begin
Get(Command);
if Command /= 'v' then
raise Constraint_Error with "Expected 'v' command.";
end if;
Get(Num_Vpatches);
Simplify(Num_Vpatches);
end Vsimplify;
Main function of vsimplify follows the pattern of vtoposort.
vfilter (in awk this time)
This is added to v.sh:
vfilter() {
awk -v N="$1" '
BEGIN {r=0;}
$0 ~ /^diff -uNr/ {r=1;}
r == 1 && $1 == "---" {sub("[^/]*/", "", $2); ip=$2; ih=$3}
r == 1 && $1 == "+++" {sub("[^/]*/", "", $2); print N,ip,$2,ih,$3; r=0;}'
}
Upon detecting the "diff -uNr" in the colunm 0 of a line, read input and output information of a hunk. The path has the initial directory stripped (because a/ and b/ prefixes inside vpatches do not matter for the purposes of v-presser).
curl 'http://bvt-trace.net/vpatches/vtools_add_vsh_utils.vpatch' > vtools_add_vsh_utils.vpatch
curl 'http://bvt-trace.net/vpatches/vtools_add_vsh_utils.vpatch.bvt.sig' > vtools_add_vsh_utils.vpatch.bvt.sig