Improved Writing made it less verbouse

This commit is contained in:
2026-03-18 21:50:15 +01:00
parent 6b32967f32
commit 64aeeb5772
4 changed files with 177 additions and 195 deletions

View File

@@ -7,11 +7,9 @@
This chapter describes the methodology used to benchmark and analyze
peer-to-peer mesh VPN implementations. The evaluation combines
performance benchmarking under controlled network conditions with a
structured source code analysis of each implementation. The
benchmarking framework prioritizes reproducibility at every layer,
from pinned dependencies and declarative system configuration to
automated test orchestration, enabling independent verification of
results and facilitating future comparative studies.
structured source code analysis of each implementation. All
dependencies, system configurations, and test procedures are pinned
or declared so that the experiments can be independently reproduced.
\section{Experimental Setup}
@@ -29,19 +27,19 @@ identical specifications:
RDRAND, SSE4.2
\end{itemize}
The presence of hardware cryptographic acceleration is relevant because
many VPN implementations use AES-NI for encryption, and the results
may differ on systems without these features.
Results may differ on systems without hardware cryptographic
acceleration, since most of the tested VPNs offload encryption to
AES-NI.
\subsection{Network Topology}
The three machines are connected via a direct 1 Gbps LAN on the same
network segment. Each machine has a publicly reachable IPv4 address,
which is used to deploy configuration changes via Clan. This baseline
topology provides a controlled environment with minimal latency and no
packet loss, allowing the overhead introduced by each VPN implementation
to be measured in isolation. Figure~\ref{fig:mesh_topology} illustrates
the full-mesh connectivity between the three machines.
which is used to deploy configuration changes via Clan. On this
baseline topology, latency is sub-millisecond and there is no packet
loss, so measured overhead can be attributed to the VPN itself.
Figure~\ref{fig:mesh_topology} illustrates the full-mesh connectivity
between the three machines.
\begin{figure}[H]
\centering
@@ -74,8 +72,8 @@ double the per-machine values.
\subsection{Configuration Methodology}
Each VPN is built from source within the Nix flake, ensuring that all
dependencies are pinned to exact versions. VPNs not packaged in nixpkgs
Each VPN is built from source within the Nix flake, with all
dependencies pinned to exact versions. VPNs not packaged in nixpkgs
(Hyprspace, EasyTier, VpnCloud) have dedicated build expressions
under \texttt{pkgs/} in the flake.
@@ -85,13 +83,14 @@ system.
Generated keys are stored in version control under
\texttt{vars/per-machine/\{name\}/} and read at NixOS evaluation time,
making key material part of the reproducible configuration.
so key material is part of the reproducible configuration.
\section{Benchmark Suite}
The benchmark suite includes both synthetic throughput tests and
real-world workloads. This combination addresses a limitation of prior
work that relied exclusively on iperf3.
The benchmark suite includes synthetic throughput tests and
application-level workloads. Prior comparative work relied exclusively
on iperf3; the additional benchmarks here capture behavior that
iperf3 alone misses.
Table~\ref{tab:benchmark_suite} summarises each benchmark.
\begin{table}[H]
@@ -114,8 +113,8 @@ Table~\ref{tab:benchmark_suite} summarises each benchmark.
\end{tabular}
\end{table}
The first four benchmarks use well-known network testing tools;
the remaining three target workloads closer to real-world usage.
The first four benchmarks use standard network testing tools;
the remaining three test application-level workloads.
The subsections below describe configuration details that the table
does not capture.
@@ -133,48 +132,49 @@ counters.
\subsection{Parallel iPerf3}
Runs TCP streams on all three machines simultaneously in a circular
pattern (A$\rightarrow$B, B$\rightarrow$C, C$\rightarrow$A) for
60 seconds with zero-copy (\texttt{-Z}). This creates contention
across the overlay network, stressing shared resources that
single-stream tests leave idle.
Runs one bidirectional TCP stream on all three machine pairs
simultaneously in a circular pattern (A$\rightarrow$B,
B$\rightarrow$C, C$\rightarrow$A) for 60 seconds with zero-copy
(\texttt{-Z}). The three concurrent bidirectional links produce six
unidirectional flows in total. This contention stresses shared
resources that single-stream tests leave idle.
\subsection{QPerf}
Spawns one qperf process per CPU core, each running for 30 seconds.
Per-core bandwidth is summed per second. Unlike the iPerf3 tests,
QPerf targets QUIC connection-level performance, capturing time to
first byte and connection establishment time alongside throughput.
Per-core bandwidth is summed per second. In addition to throughput,
QPerf reports time to first byte and connection establishment time,
which iPerf3 does not measure.
\subsection{RIST Video Streaming}
Generates a 4K ($3840\times2160$) H.264 test pattern at 30\,fps
(ultrafast preset, zerolatency tuning, 25\,Mbps target bitrate) with
ffmpeg and transmits it over the RIST protocol for 30 seconds. RIST
(Reliable Internet Stream Transport) is designed for low-latency
video contribution over unreliable networks, making it a realistic
test of VPN behavior under multimedia workloads. In addition to
standard network metrics, the benchmark records encoding-side
statistics (actual bitrate, frame rate, dropped frames) and
RIST-specific counters (packets recovered via retransmission, quality
score).
(ultrafast preset, zerolatency tuning, 25\,Mbps bitrate cap) with
ffmpeg and transmits it over the RIST protocol for 30 seconds. Because
the synthetic test pattern is highly compressible, the actual encoding
bitrate is approximately 3.3\,Mbps, well below the configured cap. RIST
(Reliable Internet Stream Transport) is a protocol for low-latency
video contribution over unreliable networks. The benchmark records
encoding-side statistics (actual bitrate, frame rate, dropped frames)
and RIST-specific counters (packets recovered via retransmission,
quality score).
\subsection{Nix Cache Download}
A Harmonia Nix binary cache server on the target machine serves the
Firefox package. The client downloads it via \texttt{nix copy}
through the VPN, exercising many small HTTP requests rather than a
single bulk transfer. Benchmarked with hyperfine (1 warmup run,
2 timed runs); the local Nix store and SQLite metadata are cleared
between runs.
through the VPN. Unlike the iPerf3 tests, this workload issues many
short-lived HTTP requests instead of a single bulk transfer.
Benchmarked with hyperfine (1 warmup run, 2 timed runs); the local
Nix store and SQLite metadata are cleared between runs.
\section{Network Impairment Profiles}
To evaluate VPN performance under different network conditions, four
impairment profiles are defined, ranging from an unmodified baseline
to a severely degraded link. All impairments are injected with Linux
traffic control (\texttt{tc netem}) on the egress side of every
machine's primary interface.
Four impairment profiles simulate progressively worse network
conditions, from an unmodified baseline to a severely degraded link.
All impairments are injected with Linux traffic control
(\texttt{tc netem}) on the egress side of every machine's primary
interface.
Table~\ref{tab:impairment_profiles} lists the per-machine values.
Because impairments are applied on both ends of a connection, the
effective round-trip impact is roughly double the listed values.
@@ -222,14 +222,14 @@ aspect of the simulated degradation:
\end{itemize}
A 30-second stabilization period follows TC application before
measurements begin, allowing queuing disciplines to settle.
measurements begin so that queuing disciplines can settle.
\section{Experimental Procedure}
\subsection{Automation}
The benchmark suite is fully automated via a Python orchestrator
(\texttt{vpn\_bench/}). For each VPN under test, the orchestrator:
A Python orchestrator (\texttt{vpn\_bench/}) automates the full
benchmark suite. For each VPN under test, it:
\begin{enumerate}
\item Cleans all state directories from previous VPN runs
@@ -327,11 +327,12 @@ Each metric is summarized as a statistics dictionary containing:
Aggregation differs by benchmark type. Benchmarks that execute
multiple discrete runs, ping (3 runs of 100 packets each) and
nix-cache (2 timed runs via hyperfine), first compute statistics
within each run, then average the resulting statistics across runs.
Concretely, if ping produces three runs with mean RTTs of
5.1, 5.3, and 5.0\,ms, the reported average is the mean of
those three values (5.13\,ms). The reported minimum is the
single lowest RTT observed across all three runs.
within each run, then aggregate across runs: averages and percentiles
are averaged, while the reported minimum and maximum are the global
extremes across all runs. Concretely, if ping produces three runs
with mean RTTs of 5.1, 5.3, and 5.0\,ms, the reported average is
the mean of those three values (5.13\,ms). The reported minimum is
the single lowest RTT observed across all three runs.
Benchmarks that produce continuous per-second samples, qperf and
RIST streaming for example, pool all per-second measurements from a single
@@ -340,9 +341,9 @@ bandwidth is first summed across CPU cores for each second, and
statistics are then computed over the resulting time series.
The analysis reports empirical percentiles (p25, p50, p75) alongside
min/max bounds rather than parametric confidence intervals. This
choice is deliberate: benchmark latency and throughput distributions
are often skewed or multimodal, making assumptions of normality
min/max bounds rather than parametric confidence intervals.
Benchmark latency and throughput distributions are often skewed or
multimodal, so parametric assumptions of normality would be
unreliable. The interquartile range (p25--p75) conveys the spread of
typical observations, while min and max capture outlier behavior.
The nix-cache benchmark additionally reports standard deviation via
@@ -350,9 +351,8 @@ hyperfine's built-in statistical output.
\section{Source Code Analysis}
To complement the performance benchmarks with architectural
understanding, we conducted a structured source code analysis of
all ten VPN implementations. The analysis followed three phases.
We also conducted a structured source code analysis of all ten VPN
implementations. The analysis followed three phases.
\subsection{Repository Collection and LLM-Assisted Overview}
@@ -378,23 +378,23 @@ aspects:
\end{itemize}
Each agent was required to reference the specific file and line
range supporting every claim, enabling direct verification.
range supporting every claim so that outputs could be verified
against the source.
\subsection{Manual Verification}
The LLM-generated overviews served as a navigational aid rather than
a trusted source. The most important code paths identified in each
overview were manually read and verified against the actual source
code, correcting inaccuracies and deepening the analysis where the
automated summaries remained superficial.
code. Where the automated summaries were inaccurate or superficial,
they were corrected and expanded.
\subsection{Feature Matrix and Maintainer Review}
The findings from both the automated and manual analysis were
consolidated into a feature matrix cataloguing 131 features across
all ten VPN implementations. The matrix covers
protocol characteristics, cryptographic primitives, NAT traversal
strategies, routing behavior, and security properties.
The findings from both phases were consolidated into a feature matrix
of 131 features across all ten VPN implementations, covering protocol
characteristics, cryptographic primitives, NAT traversal strategies,
routing behavior, and security properties.
The completed feature matrix was published and sent to the respective
VPN maintainers for review. We incorporated their feedback as
@@ -402,7 +402,7 @@ corrections and clarifications to the final classification.
\section{Reproducibility}
The experimental stack pins or declares every variable that could
The experimental stack pins or declares the variables that could
affect results.
\subsection{Dependency Pinning}
@@ -412,8 +412,8 @@ cryptographic hashes (\texttt{narHash}) and commit SHAs for each input.
Key pinned inputs include:
\begin{itemize}
\bitem{nixpkgs:} Follows \texttt{clan-core/nixpkgs}, ensuring a
single version across the dependency graph
\bitem{nixpkgs:} Follows \texttt{clan-core/nixpkgs}, so a single
version is used across the dependency graph
\bitem{clan-core:} The Clan framework, pinned to a specific commit
\bitem{VPN sources:} Hyprspace, EasyTier, Nebula locked to
exact commits
@@ -527,9 +527,8 @@ VPNs were selected based on:
\bitem{Linux support:} All VPNs must run on Linux.
\end{itemize}
Ten VPN implementations were selected for evaluation, spanning a range
of architectures from centralized coordination to fully decentralized
mesh topologies. Table~\ref{tab:vpn_selection} summarizes the selection.
Table~\ref{tab:vpn_selection} lists the ten VPN implementations
selected for evaluation.
\begin{table}[H]
\centering
@@ -556,7 +555,7 @@ mesh topologies. Table~\ref{tab:vpn_selection} summarizes the selection.
\end{tabular}
\end{table}
WireGuard is included as a reference point despite not being a mesh VPN.
Its minimal overhead and widespread adoption make it a useful comparison
for understanding the cost of mesh coordination and NAT traversal logic.
WireGuard is not a mesh VPN but is included as a reference point.
Comparing its overhead to the mesh VPNs isolates the cost of mesh
coordination and NAT traversal.