Improved Writing made it less verbouse
This commit is contained in:
@@ -7,11 +7,9 @@
|
||||
This chapter describes the methodology used to benchmark and analyze
|
||||
peer-to-peer mesh VPN implementations. The evaluation combines
|
||||
performance benchmarking under controlled network conditions with a
|
||||
structured source code analysis of each implementation. The
|
||||
benchmarking framework prioritizes reproducibility at every layer,
|
||||
from pinned dependencies and declarative system configuration to
|
||||
automated test orchestration, enabling independent verification of
|
||||
results and facilitating future comparative studies.
|
||||
structured source code analysis of each implementation. All
|
||||
dependencies, system configurations, and test procedures are pinned
|
||||
or declared so that the experiments can be independently reproduced.
|
||||
|
||||
\section{Experimental Setup}
|
||||
|
||||
@@ -29,19 +27,19 @@ identical specifications:
|
||||
RDRAND, SSE4.2
|
||||
\end{itemize}
|
||||
|
||||
The presence of hardware cryptographic acceleration is relevant because
|
||||
many VPN implementations use AES-NI for encryption, and the results
|
||||
may differ on systems without these features.
|
||||
Results may differ on systems without hardware cryptographic
|
||||
acceleration, since most of the tested VPNs offload encryption to
|
||||
AES-NI.
|
||||
|
||||
\subsection{Network Topology}
|
||||
|
||||
The three machines are connected via a direct 1 Gbps LAN on the same
|
||||
network segment. Each machine has a publicly reachable IPv4 address,
|
||||
which is used to deploy configuration changes via Clan. This baseline
|
||||
topology provides a controlled environment with minimal latency and no
|
||||
packet loss, allowing the overhead introduced by each VPN implementation
|
||||
to be measured in isolation. Figure~\ref{fig:mesh_topology} illustrates
|
||||
the full-mesh connectivity between the three machines.
|
||||
which is used to deploy configuration changes via Clan. On this
|
||||
baseline topology, latency is sub-millisecond and there is no packet
|
||||
loss, so measured overhead can be attributed to the VPN itself.
|
||||
Figure~\ref{fig:mesh_topology} illustrates the full-mesh connectivity
|
||||
between the three machines.
|
||||
|
||||
\begin{figure}[H]
|
||||
\centering
|
||||
@@ -74,8 +72,8 @@ double the per-machine values.
|
||||
|
||||
\subsection{Configuration Methodology}
|
||||
|
||||
Each VPN is built from source within the Nix flake, ensuring that all
|
||||
dependencies are pinned to exact versions. VPNs not packaged in nixpkgs
|
||||
Each VPN is built from source within the Nix flake, with all
|
||||
dependencies pinned to exact versions. VPNs not packaged in nixpkgs
|
||||
(Hyprspace, EasyTier, VpnCloud) have dedicated build expressions
|
||||
under \texttt{pkgs/} in the flake.
|
||||
|
||||
@@ -85,13 +83,14 @@ system.
|
||||
|
||||
Generated keys are stored in version control under
|
||||
\texttt{vars/per-machine/\{name\}/} and read at NixOS evaluation time,
|
||||
making key material part of the reproducible configuration.
|
||||
so key material is part of the reproducible configuration.
|
||||
|
||||
\section{Benchmark Suite}
|
||||
|
||||
The benchmark suite includes both synthetic throughput tests and
|
||||
real-world workloads. This combination addresses a limitation of prior
|
||||
work that relied exclusively on iperf3.
|
||||
The benchmark suite includes synthetic throughput tests and
|
||||
application-level workloads. Prior comparative work relied exclusively
|
||||
on iperf3; the additional benchmarks here capture behavior that
|
||||
iperf3 alone misses.
|
||||
Table~\ref{tab:benchmark_suite} summarises each benchmark.
|
||||
|
||||
\begin{table}[H]
|
||||
@@ -114,8 +113,8 @@ Table~\ref{tab:benchmark_suite} summarises each benchmark.
|
||||
\end{tabular}
|
||||
\end{table}
|
||||
|
||||
The first four benchmarks use well-known network testing tools;
|
||||
the remaining three target workloads closer to real-world usage.
|
||||
The first four benchmarks use standard network testing tools;
|
||||
the remaining three test application-level workloads.
|
||||
The subsections below describe configuration details that the table
|
||||
does not capture.
|
||||
|
||||
@@ -133,48 +132,49 @@ counters.
|
||||
|
||||
\subsection{Parallel iPerf3}
|
||||
|
||||
Runs TCP streams on all three machines simultaneously in a circular
|
||||
pattern (A$\rightarrow$B, B$\rightarrow$C, C$\rightarrow$A) for
|
||||
60 seconds with zero-copy (\texttt{-Z}). This creates contention
|
||||
across the overlay network, stressing shared resources that
|
||||
single-stream tests leave idle.
|
||||
Runs one bidirectional TCP stream on all three machine pairs
|
||||
simultaneously in a circular pattern (A$\rightarrow$B,
|
||||
B$\rightarrow$C, C$\rightarrow$A) for 60 seconds with zero-copy
|
||||
(\texttt{-Z}). The three concurrent bidirectional links produce six
|
||||
unidirectional flows in total. This contention stresses shared
|
||||
resources that single-stream tests leave idle.
|
||||
|
||||
\subsection{QPerf}
|
||||
|
||||
Spawns one qperf process per CPU core, each running for 30 seconds.
|
||||
Per-core bandwidth is summed per second. Unlike the iPerf3 tests,
|
||||
QPerf targets QUIC connection-level performance, capturing time to
|
||||
first byte and connection establishment time alongside throughput.
|
||||
Per-core bandwidth is summed per second. In addition to throughput,
|
||||
QPerf reports time to first byte and connection establishment time,
|
||||
which iPerf3 does not measure.
|
||||
|
||||
\subsection{RIST Video Streaming}
|
||||
|
||||
Generates a 4K ($3840\times2160$) H.264 test pattern at 30\,fps
|
||||
(ultrafast preset, zerolatency tuning, 25\,Mbps target bitrate) with
|
||||
ffmpeg and transmits it over the RIST protocol for 30 seconds. RIST
|
||||
(Reliable Internet Stream Transport) is designed for low-latency
|
||||
video contribution over unreliable networks, making it a realistic
|
||||
test of VPN behavior under multimedia workloads. In addition to
|
||||
standard network metrics, the benchmark records encoding-side
|
||||
statistics (actual bitrate, frame rate, dropped frames) and
|
||||
RIST-specific counters (packets recovered via retransmission, quality
|
||||
score).
|
||||
(ultrafast preset, zerolatency tuning, 25\,Mbps bitrate cap) with
|
||||
ffmpeg and transmits it over the RIST protocol for 30 seconds. Because
|
||||
the synthetic test pattern is highly compressible, the actual encoding
|
||||
bitrate is approximately 3.3\,Mbps, well below the configured cap. RIST
|
||||
(Reliable Internet Stream Transport) is a protocol for low-latency
|
||||
video contribution over unreliable networks. The benchmark records
|
||||
encoding-side statistics (actual bitrate, frame rate, dropped frames)
|
||||
and RIST-specific counters (packets recovered via retransmission,
|
||||
quality score).
|
||||
|
||||
\subsection{Nix Cache Download}
|
||||
|
||||
A Harmonia Nix binary cache server on the target machine serves the
|
||||
Firefox package. The client downloads it via \texttt{nix copy}
|
||||
through the VPN, exercising many small HTTP requests rather than a
|
||||
single bulk transfer. Benchmarked with hyperfine (1 warmup run,
|
||||
2 timed runs); the local Nix store and SQLite metadata are cleared
|
||||
between runs.
|
||||
through the VPN. Unlike the iPerf3 tests, this workload issues many
|
||||
short-lived HTTP requests instead of a single bulk transfer.
|
||||
Benchmarked with hyperfine (1 warmup run, 2 timed runs); the local
|
||||
Nix store and SQLite metadata are cleared between runs.
|
||||
|
||||
\section{Network Impairment Profiles}
|
||||
|
||||
To evaluate VPN performance under different network conditions, four
|
||||
impairment profiles are defined, ranging from an unmodified baseline
|
||||
to a severely degraded link. All impairments are injected with Linux
|
||||
traffic control (\texttt{tc netem}) on the egress side of every
|
||||
machine's primary interface.
|
||||
Four impairment profiles simulate progressively worse network
|
||||
conditions, from an unmodified baseline to a severely degraded link.
|
||||
All impairments are injected with Linux traffic control
|
||||
(\texttt{tc netem}) on the egress side of every machine's primary
|
||||
interface.
|
||||
Table~\ref{tab:impairment_profiles} lists the per-machine values.
|
||||
Because impairments are applied on both ends of a connection, the
|
||||
effective round-trip impact is roughly double the listed values.
|
||||
@@ -222,14 +222,14 @@ aspect of the simulated degradation:
|
||||
\end{itemize}
|
||||
|
||||
A 30-second stabilization period follows TC application before
|
||||
measurements begin, allowing queuing disciplines to settle.
|
||||
measurements begin so that queuing disciplines can settle.
|
||||
|
||||
\section{Experimental Procedure}
|
||||
|
||||
\subsection{Automation}
|
||||
|
||||
The benchmark suite is fully automated via a Python orchestrator
|
||||
(\texttt{vpn\_bench/}). For each VPN under test, the orchestrator:
|
||||
A Python orchestrator (\texttt{vpn\_bench/}) automates the full
|
||||
benchmark suite. For each VPN under test, it:
|
||||
|
||||
\begin{enumerate}
|
||||
\item Cleans all state directories from previous VPN runs
|
||||
@@ -327,11 +327,12 @@ Each metric is summarized as a statistics dictionary containing:
|
||||
Aggregation differs by benchmark type. Benchmarks that execute
|
||||
multiple discrete runs, ping (3 runs of 100 packets each) and
|
||||
nix-cache (2 timed runs via hyperfine), first compute statistics
|
||||
within each run, then average the resulting statistics across runs.
|
||||
Concretely, if ping produces three runs with mean RTTs of
|
||||
5.1, 5.3, and 5.0\,ms, the reported average is the mean of
|
||||
those three values (5.13\,ms). The reported minimum is the
|
||||
single lowest RTT observed across all three runs.
|
||||
within each run, then aggregate across runs: averages and percentiles
|
||||
are averaged, while the reported minimum and maximum are the global
|
||||
extremes across all runs. Concretely, if ping produces three runs
|
||||
with mean RTTs of 5.1, 5.3, and 5.0\,ms, the reported average is
|
||||
the mean of those three values (5.13\,ms). The reported minimum is
|
||||
the single lowest RTT observed across all three runs.
|
||||
|
||||
Benchmarks that produce continuous per-second samples, qperf and
|
||||
RIST streaming for example, pool all per-second measurements from a single
|
||||
@@ -340,9 +341,9 @@ bandwidth is first summed across CPU cores for each second, and
|
||||
statistics are then computed over the resulting time series.
|
||||
|
||||
The analysis reports empirical percentiles (p25, p50, p75) alongside
|
||||
min/max bounds rather than parametric confidence intervals. This
|
||||
choice is deliberate: benchmark latency and throughput distributions
|
||||
are often skewed or multimodal, making assumptions of normality
|
||||
min/max bounds rather than parametric confidence intervals.
|
||||
Benchmark latency and throughput distributions are often skewed or
|
||||
multimodal, so parametric assumptions of normality would be
|
||||
unreliable. The interquartile range (p25--p75) conveys the spread of
|
||||
typical observations, while min and max capture outlier behavior.
|
||||
The nix-cache benchmark additionally reports standard deviation via
|
||||
@@ -350,9 +351,8 @@ hyperfine's built-in statistical output.
|
||||
|
||||
\section{Source Code Analysis}
|
||||
|
||||
To complement the performance benchmarks with architectural
|
||||
understanding, we conducted a structured source code analysis of
|
||||
all ten VPN implementations. The analysis followed three phases.
|
||||
We also conducted a structured source code analysis of all ten VPN
|
||||
implementations. The analysis followed three phases.
|
||||
|
||||
\subsection{Repository Collection and LLM-Assisted Overview}
|
||||
|
||||
@@ -378,23 +378,23 @@ aspects:
|
||||
\end{itemize}
|
||||
|
||||
Each agent was required to reference the specific file and line
|
||||
range supporting every claim, enabling direct verification.
|
||||
range supporting every claim so that outputs could be verified
|
||||
against the source.
|
||||
|
||||
\subsection{Manual Verification}
|
||||
|
||||
The LLM-generated overviews served as a navigational aid rather than
|
||||
a trusted source. The most important code paths identified in each
|
||||
overview were manually read and verified against the actual source
|
||||
code, correcting inaccuracies and deepening the analysis where the
|
||||
automated summaries remained superficial.
|
||||
code. Where the automated summaries were inaccurate or superficial,
|
||||
they were corrected and expanded.
|
||||
|
||||
\subsection{Feature Matrix and Maintainer Review}
|
||||
|
||||
The findings from both the automated and manual analysis were
|
||||
consolidated into a feature matrix cataloguing 131 features across
|
||||
all ten VPN implementations. The matrix covers
|
||||
protocol characteristics, cryptographic primitives, NAT traversal
|
||||
strategies, routing behavior, and security properties.
|
||||
The findings from both phases were consolidated into a feature matrix
|
||||
of 131 features across all ten VPN implementations, covering protocol
|
||||
characteristics, cryptographic primitives, NAT traversal strategies,
|
||||
routing behavior, and security properties.
|
||||
|
||||
The completed feature matrix was published and sent to the respective
|
||||
VPN maintainers for review. We incorporated their feedback as
|
||||
@@ -402,7 +402,7 @@ corrections and clarifications to the final classification.
|
||||
|
||||
\section{Reproducibility}
|
||||
|
||||
The experimental stack pins or declares every variable that could
|
||||
The experimental stack pins or declares the variables that could
|
||||
affect results.
|
||||
|
||||
\subsection{Dependency Pinning}
|
||||
@@ -412,8 +412,8 @@ cryptographic hashes (\texttt{narHash}) and commit SHAs for each input.
|
||||
Key pinned inputs include:
|
||||
|
||||
\begin{itemize}
|
||||
\bitem{nixpkgs:} Follows \texttt{clan-core/nixpkgs}, ensuring a
|
||||
single version across the dependency graph
|
||||
\bitem{nixpkgs:} Follows \texttt{clan-core/nixpkgs}, so a single
|
||||
version is used across the dependency graph
|
||||
\bitem{clan-core:} The Clan framework, pinned to a specific commit
|
||||
\bitem{VPN sources:} Hyprspace, EasyTier, Nebula locked to
|
||||
exact commits
|
||||
@@ -527,9 +527,8 @@ VPNs were selected based on:
|
||||
\bitem{Linux support:} All VPNs must run on Linux.
|
||||
\end{itemize}
|
||||
|
||||
Ten VPN implementations were selected for evaluation, spanning a range
|
||||
of architectures from centralized coordination to fully decentralized
|
||||
mesh topologies. Table~\ref{tab:vpn_selection} summarizes the selection.
|
||||
Table~\ref{tab:vpn_selection} lists the ten VPN implementations
|
||||
selected for evaluation.
|
||||
|
||||
\begin{table}[H]
|
||||
\centering
|
||||
@@ -556,7 +555,7 @@ mesh topologies. Table~\ref{tab:vpn_selection} summarizes the selection.
|
||||
\end{tabular}
|
||||
\end{table}
|
||||
|
||||
WireGuard is included as a reference point despite not being a mesh VPN.
|
||||
Its minimal overhead and widespread adoption make it a useful comparison
|
||||
for understanding the cost of mesh coordination and NAT traversal logic.
|
||||
WireGuard is not a mesh VPN but is included as a reference point.
|
||||
Comparing its overhead to the mesh VPNs isolates the cost of mesh
|
||||
coordination and NAT traversal.
|
||||
|
||||
|
||||
Reference in New Issue
Block a user