create charts for methodology section

This commit is contained in:
2026-02-25 17:50:40 +01:00
parent c1c94fdf78
commit 841973f26f
2 changed files with 193 additions and 63 deletions

View File

@@ -21,11 +21,11 @@ All experiments were conducted on three bare-metal servers with
identical specifications:
\begin{itemize}
\bitem{CPU:} Intel Model 94, 4 cores / 8 threads
\bitem{Memory:} 64 GB RAM
\bitem{Network:} 1 Gbps Ethernet (e1000e driver; one machine
\bitem{CPU:} Intel Model 94, 4 cores / 8 threads
\bitem{Memory:} 64 GB RAM
\bitem{Network:} 1 Gbps Ethernet (e1000e driver; one machine
uses r8169)
\bitem{Cryptographic acceleration:} AES-NI, AVX, AVX2, PCLMULQDQ,
\bitem{Cryptographic acceleration:} AES-NI, AVX, AVX2, PCLMULQDQ,
RDRAND, SSE4.2
\end{itemize}
@@ -36,9 +36,35 @@ may differ on systems without these features.
\subsection{Network Topology}
The three machines are connected via a direct 1 Gbps LAN on the same
network segment. This baseline topology provides a controlled environment
with minimal latency and no packet loss, allowing the overhead introduced
by each VPN implementation to be measured in isolation.
network segment. Each machine has a publicly reachable IPv4 address,
which is used to deploy configuration changes via Clan. This baseline
topology provides a controlled environment with minimal latency and no
packet loss, allowing the overhead introduced by each VPN implementation
to be measured in isolation. Figure~\ref{fig:mesh_topology} illustrates
the full-mesh connectivity between the three machines.
\begin{figure}[H]
\centering
\begin{tikzpicture}[
node/.style={
draw, rounded corners, minimum width=2.2cm, minimum height=1cm,
font=\ttfamily\bfseries, align=center
},
link/.style={thick, <->}
]
% Nodes in an equilateral triangle
\node[node] (luna) at (0, 3.5) {luna};
\node[node] (yuki) at (-3, 0) {yuki};
\node[node] (lom) at (3, 0) {lom};
% Mesh links
\draw[link] (luna) -- node[left, font=\small] {1 Gbps} (yuki);
\draw[link] (luna) -- node[right, font=\small] {1 Gbps} (lom);
\draw[link] (yuki) -- node[below, font=\small] {1 Gbps} (lom);
\end{tikzpicture}
\caption{Full-mesh network topology of the three benchmark machines}
\label{fig:mesh_topology}
\end{figure}
To simulate real-world network conditions, Linux traffic control
(\texttt{tc netem}) is used to inject latency, jitter, packet loss,
@@ -85,20 +111,20 @@ for understanding the cost of mesh coordination and NAT traversal logic.
VPNs were selected based on:
\begin{itemize}
\bitem{NAT traversal capability:} All selected VPNs can establish
\bitem{NAT traversal capability:} All selected VPNs can establish
connections between peers behind NAT without manual port forwarding.
\bitem{Decentralization:} Preference for solutions without mandatory
\bitem{Decentralization:} Preference for solutions without mandatory
central servers, though coordinated-mesh VPNs were included for comparison.
\bitem{Active development:} Only VPNs with recent commits and
\bitem{Active development:} Only VPNs with recent commits and
maintained releases were considered.
\bitem{Linux support:} All VPNs must run on Linux.
\bitem{Linux support:} All VPNs must run on Linux.
\end{itemize}
\subsection{Configuration Methodology}
Each VPN is built from source within the Nix flake, ensuring that all
dependencies are pinned to exact versions. VPNs not packaged in nixpkgs
(Hyprspace, EasyTier, VpnCloud, qperf) have dedicated build expressions
(Hyprspace, EasyTier, VpnCloud) have dedicated build expressions
under \texttt{pkgs/} in the flake.
Cryptographic material (WireGuard keys, Nebula certificates, ZeroTier
@@ -122,43 +148,63 @@ work that relied exclusively on iperf3.
\subsection{Ping}
Measures round-trip latency and packet delivery reliability.
Measures ICMP round-trip latency and packet delivery reliability.
\begin{itemize}
\bitem{Method:} 100 ICMP echo requests at 200 ms intervals,
\bitem{Method:} 100 ICMP echo requests at 200 ms intervals,
1-second per-packet timeout, repeated for 3 runs.
\bitem{Metrics:} RTT (min, avg, max, mdev), packet loss percentage,
\bitem{Metrics:} RTT (min, avg, max, mdev), packet loss percentage,
per-packet RTTs.
\end{itemize}
\subsection{iPerf3}
\subsection{TCP iPerf3}
Measures bulk data transfer throughput.
\textbf{TCP variant:} 30-second bidirectional test with RSA authentication
and zero-copy mode (\texttt{-Z}) to minimize CPU overhead.
\textbf{UDP variant:} Same configuration with unlimited target bandwidth
(\texttt{-b 0}) and 64-bit counters.
\textbf{Parallel TCP variant:} Tests concurrent mesh traffic by running
TCP streams on all machines simultaneously in a circular pattern
(A$\rightarrow$B, B$\rightarrow$C, C$\rightarrow$A) for 60 seconds.
This simulates contention across the mesh.
Measures bulk TCP throughput with iperf3
a common tool used in research to measure network performance.
\begin{itemize}
\bitem{Metrics:} Throughput (bits/s), retransmits, congestion window,
jitter (UDP), packet loss (UDP).
\bitem{Method:} 30-second bidirectional test zero-copy mode
(\texttt{-Z}) to minimize CPU
overhead.
\bitem{Metrics:} Throughput (bits/s), retransmits, congestion
window and CPU utilization.
\end{itemize}
\subsection{qPerf}
\subsection{UDP iPerf3}
Measures connection-level performance rather than bulk throughput.
Measures bulk UDP throughput with the same flags as the TCP Iperf3 benchmark.
\begin{itemize}
\bitem{Method:} One qperf instance per CPU core in parallel, each
\bitem{Method:} plus unlimited target bandwidth (\texttt{-b 0}) and
64-bit counters flags.
\bitem{Metrics:} Throughput (bits/s), jitter, packet loss and CPU
utilization.
\end{itemize}
\subsection{Parallel iPerf3}
Tests concurrent overlay network traffic by running TCP streams on all machines
simultaneously in a circular pattern (A$\rightarrow$B,
B$\rightarrow$C, C$\rightarrow$A) for 60 seconds. This simulates
contention across the overlay network.
\begin{itemize}
\bitem{Method:} 60-second bidirectional test zero-copy mode
(\texttt{-Z}) to minimize CPU
overhead.
\bitem{Metrics:} Throughput (bits/s), retransmits, congestion
window and CPU utilization.
\end{itemize}
\subsection{QPerf}
Measures connection-level QUIC performance rather
than bulk UDP or TCP throughput.
\begin{itemize}
\bitem{Method:} One qperf process per CPU core in parallel, each
running for 30 seconds. Bandwidth from all cores is summed per second.
\bitem{Metrics:} Total bandwidth (Mbps), CPU usage, time to first
\bitem{Metrics:} Total bandwidth (Mbps), CPU usage, time to first
byte (TTFB), connection establishment time.
\end{itemize}
@@ -167,12 +213,12 @@ Measures connection-level performance rather than bulk throughput.
Measures real-time multimedia streaming performance.
\begin{itemize}
\bitem{Method:} The sender generates a 4K (3840$\times$2160) test
\bitem{Method:} The sender generates a 4K ($3840\times2160$) test
pattern at 30 fps using ffmpeg with H.264 encoding (ultrafast preset,
zerolatency tuning) at 25 Mbps target bitrate. The stream is transmitted
over the RIST protocol to a receiver on the target machine for 30 seconds.
\bitem{Encoding metrics:} Actual bitrate, frame rate, dropped frames.
\bitem{Network metrics:} Packets dropped, packets recovered via
\bitem{Encoding metrics:} Actual bitrate, frame rate, dropped frames.
\bitem{Network metrics:} Packets dropped, packets recovered via
RIST retransmission, RTT, quality score (0--100), received bitrate.
\end{itemize}
@@ -182,22 +228,19 @@ realistic test of VPN behavior under multimedia workloads.
\subsection{Nix Cache Download}
Measures sustained download performance using a real-world workload.
Measures sustained HTTP download performance of many small files
using a real-world workload.
\begin{itemize}
\bitem{Method:} A Harmonia Nix binary cache server on the target
\bitem{Method:} A Harmonia Nix binary cache server on the target
machine serves the Firefox package. The client downloads it via
\texttt{nix copy} through the VPN. Benchmarked with hyperfine:
1 warmup run followed by 2 timed runs. The local cache and Nix's
SQLite metadata are cleared between runs.
\bitem{Metrics:} Mean duration (seconds), standard deviation,
\bitem{Metrics:} Mean duration (seconds), standard deviation,
min/max duration.
\end{itemize}
This benchmark tests realistic HTTP traffic patterns and sustained
sequential download performance, complementing the synthetic throughput
tests.
\section{Network Impairment Profiles}
Four impairment profiles simulate a range of network conditions, from
@@ -215,14 +258,21 @@ effective round-trip impairment is approximately doubled.
\textbf{Profile} & \textbf{Latency} & \textbf{Jitter} &
\textbf{Loss} & \textbf{Reorder} & \textbf{Correlation} \\
\hline
Baseline & ; & ; & ; & ; & ; \\
Baseline & - & - & - & - & - \\
Low & 2 ms & 2 ms & 0.25\% & 0.5\% & 25\% \\
Medium & 4 ms & 7 ms & 1.0\% & 2.5\% & 50\% \\
High & 12 ms & 30 ms & 5.0\% & 10\% & 50\% \\
High & 6 ms & 15 ms & 2.5\% & 5\% & 50\% \\
\hline
\end{tabular}
\end{table}
The correlation column controls how strongly each packet's impairment
depends on the preceding packet. At 0\% correlation, loss and
reordering events are independent; at higher values they occur in
bursts, because a packet that was lost or reordered increases the
probability that the next packet suffers the same fate. This produces
realistic bursty degradation rather than uniformly distributed drops.
The ``Low'' profile approximates a well-provisioned continental
connection, ``Medium'' represents intercontinental links or congested
networks, and ``High'' simulates severely degraded conditions such as
@@ -249,12 +299,70 @@ The benchmark suite is fully automated via a Python orchestrator
\begin{enumerate}
\item Applies TC rules via context manager (guarantees cleanup)
\item Waits 30 seconds for stabilization
\item Executes all benchmarks
\item Executes each benchmark three times sequentially,
once per machine pair: $A\to B$, then
$B\to C$, lastly $C\to A$
\item Clears TC rules
\end{enumerate}
\item Collects results and metadata
\end{enumerate}
Figure~\ref{fig:orchestrator_flow} illustrates this procedure as a
flowchart.
\begin{figure}[H]
\centering
\begin{tikzpicture}[
box/.style={
draw, rounded corners, minimum width=4.8cm, minimum height=0.9cm,
font=\small, align=center, fill=white
},
decision/.style={
draw, diamond, aspect=2.5, minimum width=3cm,
font=\small, align=center, fill=white, inner sep=1pt
},
arr/.style={->, thick},
every node/.style={font=\small}
]
% Main flow
\node[box] (clean) at (0, 0) {Clean state directories};
\node[box] (deploy) at (0, -1.5) {Deploy VPN via Clan};
\node[box] (restart) at (0, -3) {Restart VPN services\\(up to 3 attempts)};
\node[box] (verify) at (0, -4.5) {Verify connectivity\\(120\,s timeout)};
% Inner loop
\node[decision] (profile) at (0, -6.3) {Next impairment\\profile?};
\node[box] (tc) at (0, -8.3) {Apply TC rules};
\node[box] (wait) at (0, -9.8) {Wait 30\,s};
\node[box] (bench) at (0, -11.3) {Run benchmarks\\$A{\to}B,\;
B{\to}C,\; C{\to}A$};
\node[box] (clear) at (0, -12.8) {Clear TC rules};
% After loop
\node[box] (collect) at (0, -14.8) {Collect results};
% Arrows -- main spine
\draw[arr] (clean) -- (deploy);
\draw[arr] (deploy) -- (restart);
\draw[arr] (restart) -- (verify);
\draw[arr] (verify) -- (profile);
\draw[arr] (profile) -- node[right] {yes} (tc);
\draw[arr] (tc) -- (wait);
\draw[arr] (wait) -- (bench);
\draw[arr] (bench) -- (clear);
% Loop back
\draw[arr] (clear) -- ++(3.8, 0) |- (profile);
% Exit loop
\draw[arr] (profile) -- ++(-3.2, 0) node[above, pos=0.3] {no}
|- (collect);
\end{tikzpicture}
\caption{Flowchart of the benchmark orchestrator procedure for a
single VPN}
\label{fig:orchestrator_flow}
\end{figure}
\subsection{Retry Logic}
Tests use a retry wrapper with up to 2 retries (3 total attempts),
@@ -267,18 +375,35 @@ be identified during analysis.
Each metric is summarized as a statistics dictionary containing:
\begin{itemize}
\bitem{min / max:} Extreme values observed
\bitem{average:} Arithmetic mean across samples
\bitem{p25 / p50 / p75:} Quartiles via \texttt{statistics.quantiles()}
\bitem{min / max:} Extreme values observed
\bitem{average:} Arithmetic mean across samples
\bitem{p25 / p50 / p75:} Quartiles via pythons
\texttt{statistics.quantiles()} method
\end{itemize}
Multi-run tests (ping, nix-cache) aggregate across runs. Per-second
tests (qperf, RIST) aggregate across all per-second samples.
Aggregation differs by benchmark type. Benchmarks that execute
multiple discrete runs, ping (3 runs of 100 packets each) and
nix-cache (2 timed runs via hyperfine), first compute statistics
within each run, then average the resulting statistics across runs.
Concretely, if ping produces three runs with mean RTTs of
5.1, 5.3, and 5.0\,ms, the reported average is the mean of
those three values (5.13\,ms). The reported minimum is the
single lowest RTT observed across all three runs.
The approach uses empirical percentiles rather than parametric
confidence intervals, which is appropriate for benchmark data that
may not follow a normal distribution. The nix-cache test (via hyperfine)
additionally reports standard deviation.
Benchmarks that produce continuous per-second samples, qperf and
RIST streaming for example, pool all per-second measurements from a single
execution into one series before computing statistics. For qperf,
bandwidth is first summed across CPU cores for each second, and
statistics are then computed over the resulting time series.
The analysis reports empirical percentiles (p25, p50, p75) alongside
min/max bounds rather than parametric confidence intervals. This
choice is deliberate: benchmark latency and throughput distributions
are often skewed or multimodal, making assumptions of normality
unreliable. The interquartile range (p25--p75) conveys the spread of
typical observations, while min and max capture outlier behavior.
The nix-cache benchmark additionally reports standard deviation via
hyperfine's built-in statistical output.
\section{Source Code Analysis}
@@ -345,17 +470,17 @@ cryptographic hashes (\texttt{narHash}) and commit SHAs for each input.
Key pinned inputs include:
\begin{itemize}
\bitem{nixpkgs:} Follows \texttt{clan-core/nixpkgs}, ensuring a
\bitem{nixpkgs:} Follows \texttt{clan-core/nixpkgs}, ensuring a
single version across the dependency graph
\bitem{clan-core:} The Clan framework, pinned to a specific commit
\bitem{VPN sources:} Hyprspace, EasyTier, Nebula locked to
\bitem{clan-core:} The Clan framework, pinned to a specific commit
\bitem{VPN sources:} Hyprspace, EasyTier, Nebula locked to
exact commits
\bitem{Build infrastructure:} flake-parts, treefmt-nix, disko,
\bitem{Build infrastructure:} flake-parts, treefmt-nix, disko,
nixos-facter-modules
\end{itemize}
Custom packages not in nixpkgs (qperf, VpnCloud, iperf with auth patches,
phantun, EasyTier, Hyprspace) are built from source within the flake.
EasyTier, Hyprspace) are built from source within the flake.
\subsection{Declarative System Configuration}