create charts for methodology section

2026-02-25 17:50:40 +01:00
parent c1c94fdf78
commit 841973f26f
2 changed files with 193 additions and 63 deletions
@@ -21,11 +21,11 @@ All experiments were conducted on three bare-metal servers with
 identical specifications:

 \begin{itemize}
-  \bitem{CPU:} Intel Model 94, 4 cores / 8 threads
-  \bitem{Memory:} 64 GB RAM
-  \bitem{Network:} 1 Gbps Ethernet (e1000e driver; one machine
+    \bitem{CPU:} Intel Model 94, 4 cores / 8 threads
+    \bitem{Memory:} 64 GB RAM
+    \bitem{Network:} 1 Gbps Ethernet (e1000e driver; one machine
    uses r8169)
-  \bitem{Cryptographic acceleration:} AES-NI, AVX, AVX2, PCLMULQDQ,
+    \bitem{Cryptographic acceleration:} AES-NI, AVX, AVX2, PCLMULQDQ,
    RDRAND, SSE4.2
 \end{itemize}

@@ -36,9 +36,35 @@ may differ on systems without these features.
 \subsection{Network Topology}

 The three machines are connected via a direct 1 Gbps LAN on the same
-network segment. This baseline topology provides a controlled environment
-with minimal latency and no packet loss, allowing the overhead introduced
-by each VPN implementation to be measured in isolation.
+network segment. Each machine has a publicly reachable IPv4 address,
+which is used to deploy configuration changes via Clan. This baseline
+topology provides a controlled environment with minimal latency and no
+packet loss, allowing the overhead introduced by each VPN implementation
+to be measured in isolation. Figure~\ref{fig:mesh_topology} illustrates
+the full-mesh connectivity between the three machines.
+
+\begin{figure}[H]
+  \centering
+  \begin{tikzpicture}[
+      node/.style={
+        draw, rounded corners, minimum width=2.2cm, minimum height=1cm,
+        font=\ttfamily\bfseries, align=center
+      },
+      link/.style={thick, <->}
+    ]
+    % Nodes in an equilateral triangle
+    \node[node] (luna) at (0, 3.5) {luna};
+    \node[node] (yuki) at (-3, 0) {yuki};
+    \node[node] (lom)  at (3, 0)  {lom};
+
+    % Mesh links
+    \draw[link] (luna) -- node[left,  font=\small] {1 Gbps} (yuki);
+    \draw[link] (luna) -- node[right, font=\small] {1 Gbps} (lom);
+    \draw[link] (yuki) -- node[below, font=\small] {1 Gbps} (lom);
+  \end{tikzpicture}
+  \caption{Full-mesh network topology of the three benchmark machines}
+  \label{fig:mesh_topology}
+\end{figure}

 To simulate real-world network conditions, Linux traffic control
 (\texttt{tc netem}) is used to inject latency, jitter, packet loss,
@@ -85,20 +111,20 @@ for understanding the cost of mesh coordination and NAT traversal logic.

 VPNs were selected based on:
 \begin{itemize}
-  \bitem{NAT traversal capability:} All selected VPNs can establish
+    \bitem{NAT traversal capability:} All selected VPNs can establish
    connections between peers behind NAT without manual port forwarding.
-  \bitem{Decentralization:} Preference for solutions without mandatory
+    \bitem{Decentralization:} Preference for solutions without mandatory
    central servers, though coordinated-mesh VPNs were included for comparison.
-  \bitem{Active development:} Only VPNs with recent commits and
+    \bitem{Active development:} Only VPNs with recent commits and
    maintained releases were considered.
-  \bitem{Linux support:} All VPNs must run on Linux.
+    \bitem{Linux support:} All VPNs must run on Linux.
 \end{itemize}

 \subsection{Configuration Methodology}

 Each VPN is built from source within the Nix flake, ensuring that all
 dependencies are pinned to exact versions. VPNs not packaged in nixpkgs
-(Hyprspace, EasyTier, VpnCloud, qperf) have dedicated build expressions
+(Hyprspace, EasyTier, VpnCloud) have dedicated build expressions
 under \texttt{pkgs/} in the flake.

 Cryptographic material (WireGuard keys, Nebula certificates, ZeroTier
@@ -122,43 +148,63 @@ work that relied exclusively on iperf3.

 \subsection{Ping}

-Measures round-trip latency and packet delivery reliability.
+Measures ICMP round-trip latency and packet delivery reliability.

 \begin{itemize}
-  \bitem{Method:} 100 ICMP echo requests at 200 ms intervals,
+    \bitem{Method:} 100 ICMP echo requests at 200 ms intervals,
    1-second per-packet timeout, repeated for 3 runs.
-  \bitem{Metrics:} RTT (min, avg, max, mdev), packet loss percentage,
+    \bitem{Metrics:} RTT (min, avg, max, mdev), packet loss percentage,
    per-packet RTTs.
 \end{itemize}

-\subsection{iPerf3}
+\subsection{TCP iPerf3}

-Measures bulk data transfer throughput.
-
-\textbf{TCP variant:} 30-second bidirectional test with RSA authentication
-and zero-copy mode (\texttt{-Z}) to minimize CPU overhead.
-
-\textbf{UDP variant:} Same configuration with unlimited target bandwidth
-(\texttt{-b 0}) and 64-bit counters.
-
-\textbf{Parallel TCP variant:} Tests concurrent mesh traffic by running
-TCP streams on all machines simultaneously in a circular pattern
-(A$\rightarrow$B, B$\rightarrow$C, C$\rightarrow$A) for 60 seconds.
-This simulates contention across the mesh.
+Measures bulk TCP throughput with iperf3
+a common tool used in research to measure network performance.

 \begin{itemize}
-  \bitem{Metrics:} Throughput (bits/s), retransmits, congestion window,
-    jitter (UDP), packet loss (UDP).
+    \bitem{Method:}  30-second bidirectional test zero-copy mode
+    (\texttt{-Z}) to minimize CPU
+    overhead.
+    \bitem{Metrics:} Throughput (bits/s), retransmits, congestion
+    window and CPU utilization.
 \end{itemize}

-\subsection{qPerf}
+\subsection{UDP iPerf3}

-Measures connection-level performance rather than bulk throughput.
+Measures bulk UDP throughput with the same flags as the TCP Iperf3 benchmark.

 \begin{itemize}
-  \bitem{Method:} One qperf instance per CPU core in parallel, each
+    \bitem{Method:} plus unlimited target bandwidth (\texttt{-b 0}) and
+    64-bit counters flags.
+    \bitem{Metrics:} Throughput (bits/s), jitter, packet loss and CPU
+    utilization.
+\end{itemize}
+
+\subsection{Parallel iPerf3}
+
+Tests concurrent overlay network traffic by running TCP streams on all machines
+simultaneously in a circular pattern (A$\rightarrow$B,
+B$\rightarrow$C, C$\rightarrow$A) for 60 seconds. This simulates
+contention across the overlay network.
+
+\begin{itemize}
+    \bitem{Method:}  60-second bidirectional test zero-copy mode
+    (\texttt{-Z}) to minimize CPU
+    overhead.
+    \bitem{Metrics:} Throughput (bits/s), retransmits, congestion
+    window and CPU utilization.
+\end{itemize}
+
+\subsection{QPerf}
+
+Measures connection-level QUIC performance rather
+than bulk UDP or TCP throughput.
+
+\begin{itemize}
+    \bitem{Method:} One qperf process per CPU core in parallel, each
    running for 30 seconds. Bandwidth from all cores is summed per second.
-  \bitem{Metrics:} Total bandwidth (Mbps), CPU usage, time to first
+    \bitem{Metrics:} Total bandwidth (Mbps), CPU usage, time to first
    byte (TTFB), connection establishment time.
 \end{itemize}

@@ -167,12 +213,12 @@ Measures connection-level performance rather than bulk throughput.
 Measures real-time multimedia streaming performance.

 \begin{itemize}
-  \bitem{Method:} The sender generates a 4K (3840$\times$2160) test
+    \bitem{Method:} The sender generates a 4K ($3840\times2160$) test
    pattern at 30 fps using ffmpeg with H.264 encoding (ultrafast preset,
    zerolatency tuning) at 25 Mbps target bitrate. The stream is transmitted
    over the RIST protocol to a receiver on the target machine for 30 seconds.
-  \bitem{Encoding metrics:} Actual bitrate, frame rate, dropped frames.
-  \bitem{Network metrics:} Packets dropped, packets recovered via
+    \bitem{Encoding metrics:} Actual bitrate, frame rate, dropped frames.
+    \bitem{Network metrics:} Packets dropped, packets recovered via
    RIST retransmission, RTT, quality score (0--100), received bitrate.
 \end{itemize}

@@ -182,22 +228,19 @@ realistic test of VPN behavior under multimedia workloads.

 \subsection{Nix Cache Download}

-Measures sustained download performance using a real-world workload.
+Measures sustained HTTP download performance of many small files
+using a real-world workload.

 \begin{itemize}
-  \bitem{Method:} A Harmonia Nix binary cache server on the target
+    \bitem{Method:} A Harmonia Nix binary cache server on the target
    machine serves the Firefox package. The client downloads it via
    \texttt{nix copy} through the VPN. Benchmarked with hyperfine:
    1 warmup run followed by 2 timed runs. The local cache and Nix's
    SQLite metadata are cleared between runs.
-  \bitem{Metrics:} Mean duration (seconds), standard deviation,
+    \bitem{Metrics:} Mean duration (seconds), standard deviation,
    min/max duration.
 \end{itemize}

-This benchmark tests realistic HTTP traffic patterns and sustained
-sequential download performance, complementing the synthetic throughput
-tests.
-
 \section{Network Impairment Profiles}

 Four impairment profiles simulate a range of network conditions, from
@@ -215,14 +258,21 @@ effective round-trip impairment is approximately doubled.
    \textbf{Profile} & \textbf{Latency} & \textbf{Jitter} &
    \textbf{Loss} & \textbf{Reorder} & \textbf{Correlation} \\
    \hline
-    Baseline & ;  & ;  & ;  & ;  & ;  \\
+    Baseline & -  & -  & -  & -  & -  \\
    Low & 2 ms & 2 ms & 0.25\% & 0.5\% & 25\% \\
    Medium & 4 ms & 7 ms & 1.0\% & 2.5\% & 50\% \\
-    High & 12 ms & 30 ms & 5.0\% & 10\% & 50\% \\
+    High & 6 ms & 15 ms & 2.5\% & 5\% & 50\% \\
    \hline
  \end{tabular}
 \end{table}

+The correlation column controls how strongly each packet's impairment
+depends on the preceding packet. At 0\% correlation, loss and
+reordering events are independent; at higher values they occur in
+bursts, because a packet that was lost or reordered increases the
+probability that the next packet suffers the same fate. This produces
+realistic bursty degradation rather than uniformly distributed drops.
+
 The ``Low'' profile approximates a well-provisioned continental
 connection, ``Medium'' represents intercontinental links or congested
 networks, and ``High'' simulates severely degraded conditions such as
@@ -249,12 +299,70 @@ The benchmark suite is fully automated via a Python orchestrator
    \begin{enumerate}
      \item Applies TC rules via context manager (guarantees cleanup)
      \item Waits 30 seconds for stabilization
-      \item Executes all benchmarks
+      \item Executes each benchmark three times sequentially,
+        once per machine pair: $A\to B$, then
+        $B\to C$, lastly $C\to A$
      \item Clears TC rules
    \end{enumerate}
  \item Collects results and metadata
 \end{enumerate}

+Figure~\ref{fig:orchestrator_flow} illustrates this procedure as a
+flowchart.
+
+\begin{figure}[H]
+  \centering
+  \begin{tikzpicture}[
+      box/.style={
+        draw, rounded corners, minimum width=4.8cm, minimum height=0.9cm,
+        font=\small, align=center, fill=white
+      },
+      decision/.style={
+        draw, diamond, aspect=2.5, minimum width=3cm,
+        font=\small, align=center, fill=white, inner sep=1pt
+      },
+      arr/.style={->, thick},
+      every node/.style={font=\small}
+    ]
+    % Main flow
+    \node[box] (clean) at (0, 0) {Clean state directories};
+    \node[box] (deploy) at (0, -1.5) {Deploy VPN via Clan};
+    \node[box] (restart) at (0, -3) {Restart VPN services\\(up to 3 attempts)};
+    \node[box] (verify) at (0, -4.5) {Verify connectivity\\(120\,s timeout)};
+
+    % Inner loop
+    \node[decision] (profile) at (0, -6.3) {Next impairment\\profile?};
+    \node[box] (tc) at (0, -8.3) {Apply TC rules};
+    \node[box] (wait) at (0, -9.8) {Wait 30\,s};
+    \node[box] (bench) at (0, -11.3) {Run benchmarks\\$A{\to}B,\;
+    B{\to}C,\; C{\to}A$};
+    \node[box] (clear) at (0, -12.8) {Clear TC rules};
+
+    % After loop
+    \node[box] (collect) at (0, -14.8) {Collect results};
+
+    % Arrows -- main spine
+    \draw[arr] (clean) -- (deploy);
+    \draw[arr] (deploy) -- (restart);
+    \draw[arr] (restart) -- (verify);
+    \draw[arr] (verify) -- (profile);
+    \draw[arr] (profile) -- node[right] {yes} (tc);
+    \draw[arr] (tc) -- (wait);
+    \draw[arr] (wait) -- (bench);
+    \draw[arr] (bench) -- (clear);
+
+    % Loop back
+    \draw[arr] (clear) -- ++(3.8, 0) |- (profile);
+
+    % Exit loop
+    \draw[arr] (profile) -- ++(-3.2, 0) node[above, pos=0.3] {no}
+    |- (collect);
+  \end{tikzpicture}
+  \caption{Flowchart of the benchmark orchestrator procedure for a
+  single VPN}
+  \label{fig:orchestrator_flow}
+\end{figure}
+
 \subsection{Retry Logic}

 Tests use a retry wrapper with up to 2 retries (3 total attempts),
@@ -267,18 +375,35 @@ be identified during analysis.
 Each metric is summarized as a statistics dictionary containing:

 \begin{itemize}
-  \bitem{min / max:} Extreme values observed
-  \bitem{average:} Arithmetic mean across samples
-  \bitem{p25 / p50 / p75:} Quartiles via \texttt{statistics.quantiles()}
+    \bitem{min / max:} Extreme values observed
+    \bitem{average:} Arithmetic mean across samples
+    \bitem{p25 / p50 / p75:} Quartiles via pythons
+    \texttt{statistics.quantiles()} method
 \end{itemize}

-Multi-run tests (ping, nix-cache) aggregate across runs. Per-second
-tests (qperf, RIST) aggregate across all per-second samples.
+Aggregation differs by benchmark type. Benchmarks that execute
+multiple discrete runs, ping (3 runs of 100 packets each) and
+nix-cache (2 timed runs via hyperfine), first compute statistics
+within each run, then average the resulting statistics across runs.
+Concretely, if ping produces three runs with mean RTTs of
+5.1, 5.3, and 5.0\,ms, the reported average is the mean of
+those three values (5.13\,ms). The reported minimum is the
+single lowest RTT observed across all three runs.

-The approach uses empirical percentiles rather than parametric
-confidence intervals, which is appropriate for benchmark data that
-may not follow a normal distribution. The nix-cache test (via hyperfine)
-additionally reports standard deviation.
+Benchmarks that produce continuous per-second samples, qperf and
+RIST streaming for example, pool all per-second measurements from a single
+execution into one series before computing statistics. For qperf,
+bandwidth is first summed across CPU cores for each second, and
+statistics are then computed over the resulting time series.
+
+The analysis reports empirical percentiles (p25, p50, p75) alongside
+min/max bounds rather than parametric confidence intervals. This
+choice is deliberate: benchmark latency and throughput distributions
+are often skewed or multimodal, making assumptions of normality
+unreliable. The interquartile range (p25--p75) conveys the spread of
+typical observations, while min and max capture outlier behavior.
+The nix-cache benchmark additionally reports standard deviation via
+hyperfine's built-in statistical output.

 \section{Source Code Analysis}

@@ -345,17 +470,17 @@ cryptographic hashes (\texttt{narHash}) and commit SHAs for each input.
 Key pinned inputs include:

 \begin{itemize}
-  \bitem{nixpkgs:} Follows \texttt{clan-core/nixpkgs}, ensuring a
+    \bitem{nixpkgs:} Follows \texttt{clan-core/nixpkgs}, ensuring a
    single version across the dependency graph
-  \bitem{clan-core:} The Clan framework, pinned to a specific commit
-  \bitem{VPN sources:} Hyprspace, EasyTier, Nebula locked to
+    \bitem{clan-core:} The Clan framework, pinned to a specific commit
+    \bitem{VPN sources:} Hyprspace, EasyTier, Nebula locked to
    exact commits
-  \bitem{Build infrastructure:} flake-parts, treefmt-nix, disko,
+    \bitem{Build infrastructure:} flake-parts, treefmt-nix, disko,
    nixos-facter-modules
 \end{itemize}

 Custom packages not in nixpkgs (qperf, VpnCloud, iperf with auth patches,
-phantun, EasyTier, Hyprspace) are built from source within the flake.
+EasyTier, Hyprspace) are built from source within the flake.

 \subsection{Declarative System Configuration}