In the part one, we had a look at the v.sh script, which should give the reader an understanding of how this v-presser works. In the part two, I will add the utilities required by the v.sh:
- vflow
- vtoposort
- vsimplify
- vfilter (in awk).
I should mention that while writing this article, I really started to hate the duplication of input-output code, so I think I have figured out a way of restructuring the whole thing as a single application. But I did not want to delay the release any further, so I present what I currently have.
vflow
The goal is to determine descendant-antecedent relationships between vpatches based on the information about hunks. From data processing point of view, the goal is to transform the input:
vpatch-id in-file-id out-file-id in-hash-id out-hash-id
into output:
v vpatches-count d descendant-id antecedent-id ... c vpatch1-id vpatch2-id ...
by matching the hunk inputs (path, hash) in vpatch A to the hunk output with the same values in vpatch B: in this case A is the descendant of B.
The source:
with Ada.Text_IO; use Ada.Text_IO;
with Ada.Integer_Text_IO; use Ada.Integer_Text_IO;
with Ada.Command_Line; use Ada.Command_Line;
with Ada.Containers.Generic_Sort;
procedure Vflow is
type Hunk_State is (Unsatisfied, Satisfied);
Empty_Hash : constant := 0;
type Hunk is record
Vpatch: Natural;
In_File: Natural;
Out_File: Natural;
In_Hash: Natural;
Out_Hash: Natural;
State: Hunk_State;
end record;
Hunks are described by a tuple of (vpatch, input file path, output file path, input file hash, output file hash). State is added for internal bookkeeping.
Num_Hunks : Integer := Integer'Value(Argument(1)); Num_Vpatches : Integer := Integer'Value(Argument(2));
The number of hunks and vpatches are provided as command-line arguments. Rest of the types are elaborated based on this input.
subtype Hunk_Index is Natural range 1..Num_Hunks; type Hunk_Array is array (Hunk_Index range <>) of Hunk; All_Hunks : Hunk_Array(Hunk_Index'Range);
All_Hunks stores all the input to the application.
subtype Vpatch_Index is Natural range 0..Num_Vpatches - 1; type Vpatch_Hunks is array (Vpatch_Index'Range) of Hunk_Index; VH_Low : Vpatch_Hunks := (others => Hunk_Index'First); VH_High: Vpatch_Hunks := (others => Hunk_Index'First);
Input should be provided sorted by vpatch identifier. VH_Low(I) stores the first index into All_Hunks where hunks belonging to vpatch I are stored, VH_High(I) - the last.
One problem is that all these data structures are stored on stack, which has been fine so far even with the whole Linux tree, but may require either "ulimit -s unlimited" in v.sh, or dynamic allocation here, if in the future the vtree becomes too large.
type Vpatch_Relationship is array (Vpatch_Index'Range, Vpatch_Index'Range) of Boolean; Vpatch_Dependencies: Vpatch_Relationship := (others => (others => False)); Vpatch_Conflicts : Vpatch_Relationship := (others => (others => False));
Vpatch_Dependencies(I,J) is set to true if vpatch I is a descendant on vpatch J, the same goes for Vpatch_Conflicts if vpatches I and J are mutually incompatible descendants of the same vpatch.
function Get_Hunk return Hunk is
H: Hunk;
begin
Get(H.Vpatch);
Get(H.In_File);
Get(H.Out_File);
Get(H.In_Hash);
Get(H.Out_Hash);
H.State := Unsatisfied;
return H;
end Get_Hunk;
With simple numeric input, we don't have to rely on any hand-rolled parsers, and can just use Ada's built-in tools.
procedure Read_Hunks is
I: Natural := All_Hunks'First;
begin
Read_Loop:
loop
exit Read_Loop when End_Of_File;
All_Hunks(I) := Get_Hunk;
I := I + 1;
end loop Read_Loop;
end Read_Hunks;
Here, we just read all hunks one-by-one.
procedure Init_Vpatch_Hunk_Bounds is
begin
for I in All_Hunks'Range loop
declare
V : Integer := All_Hunks(I).Vpatch;
begin
if VH_Low(V) > I or All_Hunks(VH_Low(V)).Vpatch /= V then
VH_Low(V) := I;
end if;
VH_High(V) := I;
end;
end loop;
end Init_Vpatch_Hunk_Bounds;
We do not keep track of where each vpatch starts and ends in All_Hunks when reading, so this information is recovered by Init_Vpatch_Hunk_Bounds. One linear scan is enough.
procedure Populate_Conflicts is
function Conflicts(I: Vpatch_Index; J: Vpatch_Index) return Boolean is
begin
for A in VH_Low(I)..VH_High(I) loop
for B in VH_Low(J)..VH_High(J) loop
if (All_Hunks(A).In_File = All_Hunks(B).In_File) and
(All_Hunks(A).In_Hash = All_Hunks(B).In_Hash) then
return True;
end if;
end loop;
end loop;
return False;
end Conflicts;
begin
for I in Vpatch_Index'Range loop
for J in Vpatch_Index'Range loop
if I < J and then Conflicts(I,J) then
Vpatch_Conflicts(I,J) := True;
Vpatch_Conflicts(J,I) := True;
end if;
end loop;
end loop;
end Populate_Conflicts;
The definition of conflict between two vpatches is rather naive: two vpatches are conflicting when they both contain hunks have the same (path,hash) as inputs. We check this property among all pairs of vpatches.
Next comes the main dependency resolution function, Solve.
procedure Solve is
Input_Hunks : Hunk_Array := All_Hunks;
Output_Hunks: Hunk_Array renames All_Hunks;
Finished : Boolean := False;
The way vflow detects if vpatch I is a descendant of vpatch J is by checking, if any (output path, output hash) created by vpatch J is used as (input path, input hash) in vpatch I. To reduce the algorithmic complexity, we sort All_Hunks by join keys (because that's what this procedure is essentially doing) - input or output path ids. We have to sort using different keys because we cannot assume that the input path is always the same as the output path. So hunks sorted by input path are stored in Input_Hunks, for the hunks sorted by output path (Output_Hunks) we reuse the storage of All_Hunks.
-- Start of sorting boilerplate
--
function Input_Before(A: Positive; B: Positive) return Boolean is
begin
return Input_Hunks(A).In_File < Input_Hunks(B).In_File;
end Input_Before;
function Output_Before(A: Positive; B: Positive) return Boolean is
begin
return Output_Hunks(A).Out_File < Output_Hunks(B).Out_File;
end Output_Before;
procedure Input_Swap(A: Positive; B: Positive) is
Tmp : Hunk;
begin
Tmp := Input_Hunks(B);
Input_Hunks(B) := Input_Hunks(A);
Input_Hunks(A) := Tmp;
end Input_Swap;
procedure Output_Swap(A: Positive; B: Positive) is
Tmp : Hunk;
begin
Tmp := Output_Hunks(B);
Output_Hunks(B) := Output_Hunks(A);
Output_Hunks(A) := Tmp;
end Output_Swap;
procedure Input_Sort is
new Ada.Containers.Generic_Sort (Positive, Input_Before, Input_Swap);
procedure Output_Sort is
new Ada.Containers.Generic_Sort (Positive, Output_Before, Output_Swap);
--
-- End of sorting boilerplate
This boilerplate is required to use Ada's built-in generic sorting facilities.
function Check_Genesis(Input: in out Hunk_Array) return Boolean is
Is_Genesis : Boolean := True;
Vpatch: Vpatch_Index := Input(Input'First).Vpatch;
begin
for I in Input'Range loop
Is_Genesis := Is_Genesis and (Input(I).In_Hash = Empty_Hash);
end loop;
return Is_Genesis;
end Check_Genesis;
A vpatch is genesis if all its hunks are file creations.
function Try_Connect(Input: in out Hunk_Array;
Output: in out Hunk_Array) return Boolean is
I : Positive := Input'First;
J : Positive := Output'First;
All_Marked: Boolean := True;
begin
while I <= Input'Last and J <= Output'Last loop
if Input(I).State /= Unsatisfied or
Input(I).In_File < Output(J).Out_File then
I := I + 1;
elsif Input(I).In_File > Output(J).Out_File then
J := J + 1;
else
if Input(I).In_Hash = Output(J).Out_Hash and
Input(I).In_Hash /= Empty_Hash then
Input(I).State := Satisfied;
Vpatch_Dependencies(Input(Input'First).Vpatch,
Output(Output'First).Vpatch) := True;
end if;
I := I + 1;
end if;
end loop;
for I in Input'Range loop
All_Marked := All_Marked and Input(I).State = Satisfied;
end loop;
return All_Marked;
end Try_Connect;
Try matching the hunks belonging of candidate descendant (in array Input) with hunks of candidate antecedent (in array Output). The arrays are sorted, so a variant of a sorted join can be used. If there is a match, we assume that there is a dependency among the vpatches. The return value of the function is True is all hunks in the vpatch are satisfied (that is, have an antecedent); to determine the return value, we iterate over Input once again.
begin
for I in Vpatch_Index'Range loop
Output_Sort(VH_Low(I), VH_High(I));
Input_Sort(VH_Low(I), VH_High(I));
end loop;
for I in Vpatch_Index'Range loop
declare
Hunks_In : Hunk_Array renames Input_Hunks(VH_Low(I)..VH_High(I));
begin
Finished := False;
for J in Vpatch_Index'Range loop
if J /= I then
declare
Hunks_Out : Hunk_Array renames
Output_Hunks(VH_Low(J)..VH_High(J));
begin
Finished := Try_Connect(Hunks_In, Hunks_Out);
exit when Finished;
end;
else
Finished := Check_Genesis(Hunks_In);
exit when Finished;
end if;
end loop;
-- Looped through all vpatches, there are hunks without matched inputs:
if not Finished then
-- Check if the remaining hunks are file creations:
declare
OK : Boolean := True;
begin
for I in Hunks_In'Range loop
if Hunks_In(I).In_Hash = Empty_Hash and
Hunks_In(I).State = Unsatisfied then
Hunks_In(I).State := Satisfied;
end if;
OK := OK and (Hunks_In(I).State = Satisfied);
end loop;
if not OK then
raise Constraint_Error with "Unsatisfied hunks.";
end if;
end;
end if;
end;
end loop;
end Solve;
The body of the function, in a loop:
- First, sort hunks by input path and output path, in blocks belonging to different vpatches.
- Then, we try to find antecedents for each of the vpatches, also checking if a vpatch is a genesis.
- If after the looping through each potential antecedent there still are hunks without a matching input, we exit with an error.
procedure Produce_Output is
begin
Put_Line("v " & Integer'Image(Num_Vpatches));
for I in Vpatch_Dependencies'Range(1) loop
for J in Vpatch_Dependencies'Range(2) loop
if Vpatch_Dependencies(I,J) then
Put_Line("d " & Integer'Image(I) & " " & Integer'Image(J));
end if;
end loop;
end loop;
for I in Vpatch_Conflicts'Range(1) loop
for J in Vpatch_Conflicts'Range(2) loop
if Vpatch_Conflicts(I,J) then
Put_Line("c " & Integer'Image(I) & " " & Integer'Image(J));
end if;
end loop;
end loop;
end Produce_Output;
Produce_Output just prints out the number of vpatches, the antecedent-descendant relationships between vpatches stored in Vpatch_Dependencies, and the conflicts stored in Vpatch_Conflicts.
begin Read_Hunks; Init_Vpatch_Hunk_Bounds; Populate_Conflicts; Solve; Produce_Output; end Vflow;
vtoposort
The goal is to calculate the press order of vpatches based on the dependency and the conflict information produced by vflow concatenated with the set of vpatches selected by the operator for pressing.
with Ada.Text_IO; use Ada.Text_IO; with Ada.Integer_Text_IO; use Ada.Integer_Text_IO; procedure Vtoposort is Command : Character; Num_Vpatches : Integer;
Command is used for reading all input, Num_Vpatches is self-describing.
procedure Toposort(Num_Vpatches : Integer) is
subtype Vpatch_Index is Natural range 0..Num_Vpatches - 1;
type Vpatch_Set is array (Vpatch_Index'Range) of Boolean;
type Vpatch_List is array (Vpatch_Index'Range) of Vpatch_Index;
type Vpatch_Relationship is array
(Vpatch_Index'Range, Vpatch_Index'Range) of Boolean;
Some new types are Vpatch_Set, which we simulate by an array of booleans, where Set(I) = True means that element I is present in the set, and Vpatch_List, which is used for storing the vpatch indices (the press order). Vpatch_Relationship we have seen before.
Vpatch_Dependencies: Vpatch_Relationship := (others => (others => False));
Vpatch_Conflicts : Vpatch_Relationship := (others => (others => False));
Free_Vpatches : Vpatch_Set := (others => False);
Vpatch_Dependencies and Vpatch_Conflicts are the same as in vflow, and their duplication suggest that perhaps uniting all utilities in one tool would be useful. Free_Vpatches initially contains the list of vpatches selected for pressing by user, but if Toposort detects that it is one of the dependencies of any of the vpatches, it will add this dependency to the Free_Vpatches set.
procedure Read_Commands is
I: Vpatch_Index;
J: Vpatch_Index;
begin
Read_Loop:
loop
exit Read_Loop when End_Of_File;
Get(Command);
if Command = 'd' then
Get(I);
Get(J);
Vpatch_Dependencies(I,J) := True;
elsif Command = 'c' then
Get(I);
Get(J);
Vpatch_Conflicts(I,J) := True;
elsif Command = 's' then
Get(I);
Free_Vpatches(I) := True;
else
raise Constraint_Error with "Unknown Command";
end if;
end loop Read_Loop;
end Read_Commands;
Straightforward loop for reading the input, not much to write about.
procedure Add_Dependencies is
Finish : Boolean := False;
begin
while not Finish loop
Finish := True;
for I in Free_Vpatches'Range loop
if Free_Vpatches(I) then
for J in Vpatch_Dependencies'Range(2) loop
if Vpatch_Dependencies(I,J) and not Free_Vpatches(J) then
Finish := False;
Free_Vpatches(J) := True;
end if;
end loop;
end if;
end loop;
end loop;
end Add_Dependencies;
This function loops through all vpatches in the Free_Vpatches set, and for each vpatch adds all the antecedents of the vpatch to Free_Vpatches set. Iteration stops when there are no more vpatches to add to the set.
procedure Report_Conflicts is
begin
for I in Free_Vpatches'Range loop
if Free_Vpatches(I) then
for J in Vpatch_Conflicts'Range(2) loop
if Vpatch_Conflicts(I,J) and Free_Vpatches(J) then
Put_Line("C " & Integer'Image(I));
Put_Line("C " & Integer'Image(J));
end if;
end loop;
end if;
end loop;
end Report_Conflicts;
Report_Conflicts checks if mutually exclusive vpatches are present among the selected.
procedure Sort is
Added_Vpatches : Vpatch_Set := (others => False);
Vpatch_Order : Vpatch_List := (others => 0);
N_Selected : Natural := 0;
Finished : Boolean := False;
Has_Loops : Boolean := False;
begin
while not Finished loop
Finished := True;
for I in Free_Vpatches'Range loop
if Free_Vpatches(I) then
declare
All_Satisfied : Boolean := True;
begin
-- Check if all dependencies are already in the press order
for J in Vpatch_Dependencies'Range(2) loop
if Vpatch_Dependencies(I,J) then
All_Satisfied := All_Satisfied and Added_Vpatches(J);
end if;
end loop;
-- All dependencies present, can add this vpatch:
if All_Satisfied then
Free_Vpatches(I) := False;
Added_Vpatches(I) := True;
Vpatch_Order(N_Selected) := I;
N_Selected := N_Selected + 1;
Finished := False;
end if;
end;
end if;
end loop;
end loop;
-- Check if there are still free vpatches (there is a loop):
for I in Free_Vpatches'Range loop
if Free_Vpatches(I) then
Put_Line("L " & Integer'Image(I));
Has_Loops := True;
end if;
end loop;
-- Print vpatches in press order:
if not Has_Loops then
for I in Vpatch_Order'First..N_Selected - 1 loop
Put_Line("P " & Integer'Image(Vpatch_Order(I)));
end loop;
end if;
end Sort;
Sort is the main, well, sorting function. The variables are:
- Vpatch_Order - the press order, which is supposed to be main output of this tool.
- N_Selected - the first free index in Vpatch_Order (point into which new vpatch is added).
- Added_Vpatches - a set of vpatches already in Vpatch_Order.
- Finished - the sorting loop converged on a solution.
- Has_Loops - flag for the case when a loop has been discovered.
First, loop through vpatches in the Free_Vpatches set, checking if all the antecedents of each vpatch are present. If yes, the vpatch is added to Vpatch_Order, updating all the related variable accordingly.
If after this loop finishes, there are still selected but not added vpatches, there is a loop which gets printed. Otherwise, everything is OK and the press order gets printed.
begin
Read_Commands;
Add_Dependencies;
Report_Conflicts;
Sort;
end Toposort;
It should be clear what Toposort does, and why, at this point.
begin
Get(Command);
if Command /= 'v' then
raise Constraint_Error with "Expected 'v' command.";
end if;
Get(Num_Vpatches);
Toposort(Num_Vpatches);
end Vtoposort;
The main function gets the first command, verifies that it is the "number of vpatches" command, and starts toposort program, which can now allocate all its data structures on stack.
vsimplify
with Ada.Text_IO; use Ada.Text_IO;
with Ada.Integer_Text_IO; use Ada.Integer_Text_IO;
procedure Vsimplify is
Command : Character;
Num_Vpatches : Integer;
procedure Simplify(Num_Vpatches : Integer) is
subtype Vpatch_Index is Natural range 0..Num_Vpatches - 1;
type Vpatch_Set is array (Vpatch_Index'Range) of Boolean;
type Vpatch_List is array (Vpatch_Index'Range) of Vpatch_Index;
type Vpatch_Relationship is array (Vpatch_Index'Range, Vpatch_Index'Range) of Boolean;
Vpatch_Dependencies: Vpatch_Relationship := (others => (others => False));
Vpatch_TC_Indirect : Vpatch_Relationship := (others => (others => False));
Vpatch_Order : Vpatch_List := (others => 0);
Free_Vpatches : Vpatch_Set := (others => False);
N_Selected : Natural := 0;
New item here is Vpatch_TC_Indirect, which holds the indirect transitive antecedent set of each vpatch.
procedure Read_Commands is
I: Vpatch_Index;
J: Vpatch_Index;
begin
Read_Loop:
loop
exit Read_Loop when End_Of_File;
Get(Command);
if Command = 'd' then
Get(I);
Get(J);
Vpatch_Dependencies(I,J) := True;
elsif Command = 'C' then
-- Ignore conflicts in this tool, but consume the command
Get(I);
elsif Command = 'L' then
raise Constraint_Error with "Cannot simplify loops";
elsif Command = 'P' then
Get(I);
Vpatch_Order(N_Selected) := I;
N_Selected := N_Selected + 1;
end if;
end loop Read_Loop;
if N_Selected - 1 /= Vpatch_Order'Last then
raise Constraint_Error with "Not all vpatches present in the press order";
end if;
end Read_Commands;
Up to this point, we have already seen everything, though Read_Commands is updated to be more in line with output of vtoposort.
procedure Simplify_Inner is
begin
-- Fill Vpatch_TC_Indirect
for I in Vpatch_Order'Range loop
declare
N : Vpatch_Index := Vpatch_Order(I);
begin
for J in Vpatch_Dependencies'Range(2) loop
if Vpatch_Dependencies(N,J) then
for K in Vpatch_Dependencies'Range(2) loop
Vpatch_TC_Indirect(N,K) := Vpatch_TC_Indirect(N,K) or
Vpatch_Dependencies(J,K) or
Vpatch_TC_Indirect(J,K);
end loop;
end if;
end loop;
end;
end loop;
-- Output Vpatch_TC_Indirect
for I in Vpatch_TC_Indirect'Range(1) loop
for J in Vpatch_TC_Indirect'Range(2) loop
if Vpatch_TC_Indirect(I,J) then
Put_Line("I " & Integer'Image(I) & " " & Integer'Image(J));
end if;
end loop;
end loop;
-- Remove redundant connections from Vpatch_Dependencies
for I in Vpatch_Dependencies'Range(1) loop
for J in Vpatch_TC_Indirect'Range(2) loop
Vpatch_Dependencies(I,J) := Vpatch_Dependencies(I,J) and
not Vpatch_TC_Indirect(I,J);
end loop;
end loop;
-- Output filtered out Vpatch_Dependencies
for I in Vpatch_Dependencies'Range(1) loop
for J in Vpatch_TC_Indirect'Range(2) loop
if Vpatch_Dependencies(I,J) then
Put_Line("d " & Integer'Image(I) & " " & Integer'Image(J));
end if;
end loop;
end loop;
end Simplify_Inner;
Simplify_Inner is the core of vsimplify:
To fill Vpatch_TC_Indirect, for each vpatch N in the press order, loop through antecedents of N, summing their direct antecedent sets (from vflow), and indirect antecedent sets (the ones being calculated in this very loop) into the row N. Because iteration through the vpatches happens in the press order, the indirect antecedent sets are guaranteed to be completed at the time when they are accessed.
Then, output the indirect antecedent sets, and substract the indirect antecedent set from the antecedent sets that came as input from vflow. Finally, we output the cleaned up direct antecendent sets.
begin
Read_Commands;
Simplify_Inner;
end Simplify;
begin
Get(Command);
if Command /= 'v' then
raise Constraint_Error with "Expected 'v' command.";
end if;
Get(Num_Vpatches);
Simplify(Num_Vpatches);
end Vsimplify;
Main function of vsimplify follows the pattern of vtoposort.
vfilter (in awk this time)
This is added to v.sh:
vfilter() {
awk -v N="$1" '
BEGIN {r=0;}
$0 ~ /^diff -uNr/ {r=1;}
r == 1 && $1 == "---" {sub("[^/]*/", "", $2); ip=$2; ih=$3}
r == 1 && $1 == "+++" {sub("[^/]*/", "", $2); print N,ip,$2,ih,$3; r=0;}'
}
Upon detecting the "diff -uNr" in the colunm 0 of a line, read input and output information of a hunk. The path has the initial directory stripped (because a/ and b/ prefixes inside vpatches do not matter for the purposes of v-presser).
curl 'http://bvt-trace.net/vpatches/vtools_add_vsh_utils.vpatch' > vtools_add_vsh_utils.vpatch
curl 'http://bvt-trace.net/vpatches/vtools_add_vsh_utils.vpatch.bvt.sig' > vtools_add_vsh_utils.vpatch.bvt.sig
Hmm, this risks to misfire for instance on older trees that have the manifest added at some point rather than baked in from the start. Arguably they are due for a regrind anyway but the above definition (and its implicit reliance on the manifest file mechanism) sits rather uncomfortably with me. To my mind a vpatch is genesis if it doesn't have any ancestors - would this be all that harder to use at this specific point?
@Diana Coman:
If you consider a case of two vpatches that only create files, a genesis by intention (which V can't know) and a vpatch that adds manifest, those two vpatches can either conflict (if they create files with the same name) or commute (if there is no intersection between the sets of created names). How would a ~stateless system of V know which one is the genesis, and which one is just a leaf with manifest? AFAIK in V as described by Ben Vulpes, these will be ordered in flow somehow, and you would be able to press genesis, genesis+manifest OR manifest, manifest+genesis, depending on the exact flow the tool will give you; in v.sh, pressing one of them *will not* pull in the other, however you can select multiple leafs for pressing, not just one, so you can press any combination of them.
If you consider a case of more vpatches, then genesis is not special vpatch in any way (so I guess Is_Genesis is rather misleading name) - it just happens that all other vpatches have this one vpatch - genesis - as their antecedent, while not all vpatches have the vpatch with manifest in their antecedent set.
I kept thinking this over and despite my initial uneasiness, I think you are actually right in that a vpatch with *all* hunks creating files is indeed a genesis since it does not/can not properly speaking have an antecedent as it doesn't require/rely on anything being present.
> Upon detecting the "diff -uNr" in the colunm 0 of a line, read input and output information of a hunk.
This wasn't previously a required part of the unified diff format, and at least busybox diff doesn't include it (see also). It also means one might need to be more careful in any commentary at the top of a patch file.
Still, I'm thinking this is the saner way to go since it simplifies any tool that needs to parse patches while complicating only theoretical vdiffs that don't conform; nicely anti-Postel if we can afford it.
Yes, it's nice, however it's phf who came up with this: vpatch utility won't detect a header unless it is preceded by a line that matches "^diff -uNr", and this is what I carried over to the rest of the v.sh components -- in the end, vpatch is the ultimate downstream consumer. Also, vdiff produces this header unconditionally.
IMO if you want to get rid of Ada dependency, it's better to move vdiff to keksum code instead of relying on whatever diff busybox has -- also because if you diff two trees, it's not guaranteed that phf's vdiff and busybox+wrapper will produce identical hunks inside the vpatch.
Oh I see. The exact pattern looks to be "^diff ".
I can't say I like the idea of needing a 2k+ line GNUistic vdiff compared to <100 lines of awk on top of a standard utility, or this bitcoin-like situation where "the code is the spec and realistically there can be only one implementation", or the idea that all vdiffs must produce the same output anyway as that closes off improvement of the algorithm. Do you have a ref handy for why identical output is important? I recall complaints about locale-dependent file sort order from the logs but not beyond that. Anyway, thank you for the caution and I will probably do as you suggest, as this doesn't seem like a battle I can or need to win.
I don't have a reference specifically, but the clue for me was reading somewhere that phf took care that vdiff would reproduce all TRB vpatches accurately from the pressed trees. If you think about it, if vdiffs are allowed to produce differing outputs, shipping vpatch signatures without a vpatch à-la trinque becomes impossible. To still have this possibility without cementing the diff algorithm, signature would have to cover only the vpatch headers, with filenames and input and output hashes, leaving the hunk bodies out.
And, well, the "standard utility" is also implemented in some ~1000 LoC even with busybox. This is not to say that vdiff could not use some improvement: it does have some crap that should go.
> shipping vpatch signatures without a vpatch à-la trinque becomes impossible
Although that was for a genesis, where the diff hunks are fully determined anyway.
I'd say it makes a good example of the importance of choosing optionality wisely, because putting it in one place closes it off in another.
> it does have some crap that should go.
Ah yes, the "quoting is good mmkay, so let's do it, sometimes, if we feel like it".
Any thoughts on what if anything it should do on encountering file names with embedded whitespace (which on unix can even include newlines)?
In the context of vpatches I don't see a good solution, but for the sake of honestly I considered aborting vdiff process or at least reporting that the vpatch is borked to stderr, as most likely no V presser/patcher would be able to do anything with it.
[...] his awk + posix shell + Ada Vtron in a 3 part series (1, 2, [...]