527 lines
20 KiB
TeX
Executable File
527 lines
20 KiB
TeX
Executable File
% Chapter Template
|
|
|
|
\chapter{Methodology} % Main chapter title
|
|
|
|
\label{Methodology}
|
|
|
|
This chapter describes the methodology used to benchmark and analyze
|
|
peer-to-peer mesh VPN implementations. The evaluation combines
|
|
performance benchmarking under controlled network conditions with a
|
|
structured source code analysis of each implementation. The
|
|
benchmarking framework prioritizes reproducibility at every layer;
|
|
from pinned dependencies and declarative system configuration to
|
|
automated test orchestration; enabling independent verification of
|
|
results and facilitating future comparative studies.
|
|
|
|
\section{Experimental Setup}
|
|
|
|
\subsection{Hardware Configuration}
|
|
|
|
All experiments were conducted on three bare-metal servers with
|
|
identical specifications:
|
|
|
|
\begin{itemize}
|
|
\bitem{CPU:} Intel Model 94, 4 cores / 8 threads
|
|
\bitem{Memory:} 64 GB RAM
|
|
\bitem{Network:} 1 Gbps Ethernet (e1000e driver; one machine
|
|
uses r8169)
|
|
\bitem{Cryptographic acceleration:} AES-NI, AVX, AVX2, PCLMULQDQ,
|
|
RDRAND, SSE4.2
|
|
\end{itemize}
|
|
|
|
The presence of hardware cryptographic acceleration is relevant because
|
|
many VPN implementations leverage AES-NI for encryption, and the results
|
|
may differ on systems without these features.
|
|
|
|
\subsection{Network Topology}
|
|
|
|
The three machines are connected via a direct 1 Gbps LAN on the same
|
|
network segment. Each machine has a publicly reachable IPv4 address,
|
|
which is used to deploy configuration changes via Clan. This baseline
|
|
topology provides a controlled environment with minimal latency and no
|
|
packet loss, allowing the overhead introduced by each VPN implementation
|
|
to be measured in isolation. Figure~\ref{fig:mesh_topology} illustrates
|
|
the full-mesh connectivity between the three machines.
|
|
|
|
\begin{figure}[H]
|
|
\centering
|
|
\begin{tikzpicture}[
|
|
node/.style={
|
|
draw, rounded corners, minimum width=2.2cm, minimum height=1cm,
|
|
font=\ttfamily\bfseries, align=center
|
|
},
|
|
link/.style={thick, <->}
|
|
]
|
|
% Nodes in an equilateral triangle
|
|
\node[node] (luna) at (0, 3.5) {luna};
|
|
\node[node] (yuki) at (-3, 0) {yuki};
|
|
\node[node] (lom) at (3, 0) {lom};
|
|
|
|
% Mesh links
|
|
\draw[link] (luna) -- node[left, font=\small] {1 Gbps} (yuki);
|
|
\draw[link] (luna) -- node[right, font=\small] {1 Gbps} (lom);
|
|
\draw[link] (yuki) -- node[below, font=\small] {1 Gbps} (lom);
|
|
\end{tikzpicture}
|
|
\caption{Full-mesh network topology of the three benchmark machines}
|
|
\label{fig:mesh_topology}
|
|
\end{figure}
|
|
|
|
To simulate real-world network conditions, Linux traffic control
|
|
(\texttt{tc netem}) is used to inject latency, jitter, packet loss,
|
|
and reordering. These impairments are applied symmetrically on all
|
|
machines, meaning effective round-trip impairment is approximately
|
|
double the per-machine values.
|
|
|
|
\section{VPNs Under Test}
|
|
|
|
Ten VPN implementations were selected for evaluation, spanning a range
|
|
of architectures from centralized coordination to fully decentralized
|
|
mesh topologies. Table~\ref{tab:vpn_selection} summarizes the selection.
|
|
|
|
\begin{table}[H]
|
|
\centering
|
|
\caption{VPN implementations included in the benchmark}
|
|
\label{tab:vpn_selection}
|
|
\begin{tabular}{lll}
|
|
\hline
|
|
\textbf{VPN} & \textbf{Architecture} & \textbf{Notes} \\
|
|
\hline
|
|
Tailscale (Headscale) & Coordinated mesh & Open-source
|
|
coordination server \\
|
|
ZeroTier & Coordinated mesh & Global virtual Ethernet \\
|
|
Nebula & Coordinated mesh & Slack's overlay network \\
|
|
Tinc & Fully decentralized & Established since 1998 \\
|
|
Yggdrasil & Fully decentralized & Spanning-tree routing \\
|
|
Mycelium & Fully decentralized & End-to-end encrypted IPv6 overlay \\
|
|
Hyprspace & Fully decentralized & libp2p-based, IPFS-compatible \\
|
|
EasyTier & Fully decentralized & Rust-based, multi-protocol \\
|
|
VpnCloud & Fully decentralized & Lightweight, kernel bypass option \\
|
|
WireGuard & Point-to-point & Reference baseline (not a mesh VPN) \\
|
|
\hline
|
|
Internal (no VPN) & N/A & Baseline for raw network performance \\
|
|
\hline
|
|
\end{tabular}
|
|
\end{table}
|
|
|
|
WireGuard is included as a reference point despite not being a mesh VPN.
|
|
Its minimal overhead and widespread adoption make it a useful comparison
|
|
for understanding the cost of mesh coordination and NAT traversal logic.
|
|
|
|
\subsection{Selection Criteria}
|
|
|
|
VPNs were selected based on:
|
|
\begin{itemize}
|
|
\bitem{NAT traversal capability:} All selected VPNs can establish
|
|
connections between peers behind NAT without manual port forwarding.
|
|
\bitem{Decentralization:} Preference for solutions without mandatory
|
|
central servers, though coordinated-mesh VPNs were included for comparison.
|
|
\bitem{Active development:} Only VPNs with recent commits and
|
|
maintained releases were considered.
|
|
\bitem{Linux support:} All VPNs must run on Linux.
|
|
\end{itemize}
|
|
|
|
\subsection{Configuration Methodology}
|
|
|
|
Each VPN is built from source within the Nix flake, ensuring that all
|
|
dependencies are pinned to exact versions. VPNs not packaged in nixpkgs
|
|
(Hyprspace, EasyTier, VpnCloud) have dedicated build expressions
|
|
under \texttt{pkgs/} in the flake.
|
|
|
|
Cryptographic material (WireGuard keys, Nebula certificates, ZeroTier
|
|
identities) is generated deterministically via Clan's vars generator
|
|
system. For example, WireGuard keys are generated as:
|
|
|
|
\begin{verbatim}
|
|
wg genkey > "$out/private-key"
|
|
wg pubkey < "$out/private-key" > "$out/public-key"
|
|
\end{verbatim}
|
|
|
|
Generated keys are stored in version control under
|
|
\texttt{vars/per-machine/\{name\}/} and read at NixOS evaluation time,
|
|
making key material part of the reproducible configuration.
|
|
|
|
\section{Benchmark Suite}
|
|
|
|
The benchmark suite includes both synthetic throughput tests and
|
|
real-world workloads. This combination addresses a limitation of prior
|
|
work that relied exclusively on iperf3.
|
|
|
|
\subsection{Ping}
|
|
|
|
Measures ICMP round-trip latency and packet delivery reliability.
|
|
|
|
\begin{itemize}
|
|
\bitem{Method:} 100 ICMP echo requests at 200 ms intervals,
|
|
1-second per-packet timeout, repeated for 3 runs.
|
|
\bitem{Metrics:} RTT (min, avg, max, mdev), packet loss percentage,
|
|
per-packet RTTs.
|
|
\end{itemize}
|
|
|
|
\subsection{TCP iPerf3}
|
|
|
|
Measures bulk TCP throughput with iperf3
|
|
a common tool used in research to measure network performance.
|
|
|
|
\begin{itemize}
|
|
\bitem{Method:} 30-second bidirectional test zero-copy mode
|
|
(\texttt{-Z}) to minimize CPU
|
|
overhead.
|
|
\bitem{Metrics:} Throughput (bits/s), retransmits, congestion
|
|
window and CPU utilization.
|
|
\end{itemize}
|
|
|
|
\subsection{UDP iPerf3}
|
|
|
|
Measures bulk UDP throughput with the same flags as the TCP Iperf3 benchmark.
|
|
|
|
\begin{itemize}
|
|
\bitem{Method:} plus unlimited target bandwidth (\texttt{-b 0}) and
|
|
64-bit counters flags.
|
|
\bitem{Metrics:} Throughput (bits/s), jitter, packet loss and CPU
|
|
utilization.
|
|
\end{itemize}
|
|
|
|
\subsection{Parallel iPerf3}
|
|
|
|
Tests concurrent overlay network traffic by running TCP streams on all machines
|
|
simultaneously in a circular pattern (A$\rightarrow$B,
|
|
B$\rightarrow$C, C$\rightarrow$A) for 60 seconds. This simulates
|
|
contention across the overlay network.
|
|
|
|
\begin{itemize}
|
|
\bitem{Method:} 60-second bidirectional test zero-copy mode
|
|
(\texttt{-Z}) to minimize CPU
|
|
overhead.
|
|
\bitem{Metrics:} Throughput (bits/s), retransmits, congestion
|
|
window and CPU utilization.
|
|
\end{itemize}
|
|
|
|
\subsection{QPerf}
|
|
|
|
Measures connection-level QUIC performance rather
|
|
than bulk UDP or TCP throughput.
|
|
|
|
\begin{itemize}
|
|
\bitem{Method:} One qperf process per CPU core in parallel, each
|
|
running for 30 seconds. Bandwidth from all cores is summed per second.
|
|
\bitem{Metrics:} Total bandwidth (Mbps), CPU usage, time to first
|
|
byte (TTFB), connection establishment time.
|
|
\end{itemize}
|
|
|
|
\subsection{RIST Video Streaming}
|
|
|
|
Measures real-time multimedia streaming performance.
|
|
|
|
\begin{itemize}
|
|
\bitem{Method:} The sender generates a 4K ($3840\times2160$) test
|
|
pattern at 30 fps using ffmpeg with H.264 encoding (ultrafast preset,
|
|
zerolatency tuning) at 25 Mbps target bitrate. The stream is transmitted
|
|
over the RIST protocol to a receiver on the target machine for 30 seconds.
|
|
\bitem{Encoding metrics:} Actual bitrate, frame rate, dropped frames.
|
|
\bitem{Network metrics:} Packets dropped, packets recovered via
|
|
RIST retransmission, RTT, quality score (0--100), received bitrate.
|
|
\end{itemize}
|
|
|
|
RIST (Reliable Internet Stream Transport) is a protocol designed for
|
|
low-latency video contribution over unreliable networks, making it a
|
|
realistic test of VPN behavior under multimedia workloads.
|
|
|
|
\subsection{Nix Cache Download}
|
|
|
|
Measures sustained HTTP download performance of many small files
|
|
using a real-world workload.
|
|
|
|
\begin{itemize}
|
|
\bitem{Method:} A Harmonia Nix binary cache server on the target
|
|
machine serves the Firefox package. The client downloads it via
|
|
\texttt{nix copy} through the VPN. Benchmarked with hyperfine:
|
|
1 warmup run followed by 2 timed runs. The local cache and Nix's
|
|
SQLite metadata are cleared between runs.
|
|
\bitem{Metrics:} Mean duration (seconds), standard deviation,
|
|
min/max duration.
|
|
\end{itemize}
|
|
|
|
\section{Network Impairment Profiles}
|
|
|
|
Four impairment profiles simulate a range of network conditions, from
|
|
ideal to severely degraded. Impairments are applied via Linux traffic
|
|
control (\texttt{tc netem}) on every machine's primary interface.
|
|
Table~\ref{tab:impairment_profiles} shows the per-machine values;
|
|
effective round-trip impairment is approximately doubled.
|
|
|
|
\begin{table}[H]
|
|
\centering
|
|
\caption{Network impairment profiles (per-machine egress values)}
|
|
\label{tab:impairment_profiles}
|
|
\begin{tabular}{lccccc}
|
|
\hline
|
|
\textbf{Profile} & \textbf{Latency} & \textbf{Jitter} &
|
|
\textbf{Loss} & \textbf{Reorder} & \textbf{Correlation} \\
|
|
\hline
|
|
Baseline & - & - & - & - & - \\
|
|
Low & 2 ms & 2 ms & 0.25\% & 0.5\% & 25\% \\
|
|
Medium & 4 ms & 7 ms & 1.0\% & 2.5\% & 50\% \\
|
|
High & 6 ms & 15 ms & 2.5\% & 5\% & 50\% \\
|
|
\hline
|
|
\end{tabular}
|
|
\end{table}
|
|
|
|
The correlation column controls how strongly each packet's impairment
|
|
depends on the preceding packet. At 0\% correlation, loss and
|
|
reordering events are independent; at higher values they occur in
|
|
bursts, because a packet that was lost or reordered increases the
|
|
probability that the next packet suffers the same fate. This produces
|
|
realistic bursty degradation rather than uniformly distributed drops.
|
|
|
|
The ``Low'' profile approximates a well-provisioned continental
|
|
connection, ``Medium'' represents intercontinental links or congested
|
|
networks, and ``High'' simulates severely degraded conditions such as
|
|
satellite links or highly congested mobile networks.
|
|
|
|
A 30-second stabilization period follows TC application before
|
|
measurements begin, allowing queuing disciplines to settle.
|
|
|
|
\section{Experimental Procedure}
|
|
|
|
\subsection{Automation}
|
|
|
|
The benchmark suite is fully automated via a Python orchestrator
|
|
(\texttt{vpn\_bench/}). For each VPN under test, the orchestrator:
|
|
|
|
\begin{enumerate}
|
|
\item Cleans all state directories from previous VPN runs
|
|
\item Deploys the VPN configuration to all machines via Clan
|
|
\item Restarts the VPN service on every machine (with retry:
|
|
up to 3 attempts, 2-second backoff)
|
|
\item Verifies VPN connectivity via a connection-check service
|
|
(120-second timeout)
|
|
\item For each impairment profile:
|
|
\begin{enumerate}
|
|
\item Applies TC rules via context manager (guarantees cleanup)
|
|
\item Waits 30 seconds for stabilization
|
|
\item Executes each benchmark three times sequentially,
|
|
once per machine pair: $A\to B$, then
|
|
$B\to C$, lastly $C\to A$
|
|
\item Clears TC rules
|
|
\end{enumerate}
|
|
\item Collects results and metadata
|
|
\end{enumerate}
|
|
|
|
Figure~\ref{fig:orchestrator_flow} illustrates this procedure as a
|
|
flowchart.
|
|
|
|
\begin{figure}[H]
|
|
\centering
|
|
\begin{tikzpicture}[
|
|
box/.style={
|
|
draw, rounded corners, minimum width=4.8cm, minimum height=0.9cm,
|
|
font=\small, align=center, fill=white
|
|
},
|
|
decision/.style={
|
|
draw, diamond, aspect=2.5, minimum width=3cm,
|
|
font=\small, align=center, fill=white, inner sep=1pt
|
|
},
|
|
arr/.style={->, thick},
|
|
every node/.style={font=\small}
|
|
]
|
|
% Main flow
|
|
\node[box] (clean) at (0, 0) {Clean state directories};
|
|
\node[box] (deploy) at (0, -1.5) {Deploy VPN via Clan};
|
|
\node[box] (restart) at (0, -3) {Restart VPN services\\(up to 3 attempts)};
|
|
\node[box] (verify) at (0, -4.5) {Verify connectivity\\(120\,s timeout)};
|
|
|
|
% Inner loop
|
|
\node[decision] (profile) at (0, -6.3) {Next impairment\\profile?};
|
|
\node[box] (tc) at (0, -8.3) {Apply TC rules};
|
|
\node[box] (wait) at (0, -9.8) {Wait 30\,s};
|
|
\node[box] (bench) at (0, -11.3) {Run benchmarks\\$A{\to}B,\;
|
|
B{\to}C,\; C{\to}A$};
|
|
\node[box] (clear) at (0, -12.8) {Clear TC rules};
|
|
|
|
% After loop
|
|
\node[box] (collect) at (0, -14.8) {Collect results};
|
|
|
|
% Arrows -- main spine
|
|
\draw[arr] (clean) -- (deploy);
|
|
\draw[arr] (deploy) -- (restart);
|
|
\draw[arr] (restart) -- (verify);
|
|
\draw[arr] (verify) -- (profile);
|
|
\draw[arr] (profile) -- node[right] {yes} (tc);
|
|
\draw[arr] (tc) -- (wait);
|
|
\draw[arr] (wait) -- (bench);
|
|
\draw[arr] (bench) -- (clear);
|
|
|
|
% Loop back
|
|
\draw[arr] (clear) -- ++(3.8, 0) |- (profile);
|
|
|
|
% Exit loop
|
|
\draw[arr] (profile) -- ++(-3.2, 0) node[above, pos=0.3] {no}
|
|
|- (collect);
|
|
\end{tikzpicture}
|
|
\caption{Flowchart of the benchmark orchestrator procedure for a
|
|
single VPN}
|
|
\label{fig:orchestrator_flow}
|
|
\end{figure}
|
|
|
|
\subsection{Retry Logic}
|
|
|
|
Tests use a retry wrapper with up to 2 retries (3 total attempts),
|
|
5-second initial delay, and 700-second maximum total time. The number
|
|
of attempts is recorded in test metadata so that retried results can
|
|
be identified during analysis.
|
|
|
|
\subsection{Statistical Analysis}
|
|
|
|
Each metric is summarized as a statistics dictionary containing:
|
|
|
|
\begin{itemize}
|
|
\bitem{min / max:} Extreme values observed
|
|
\bitem{average:} Arithmetic mean across samples
|
|
\bitem{p25 / p50 / p75:} Quartiles via pythons
|
|
\texttt{statistics.quantiles()} method
|
|
\end{itemize}
|
|
|
|
Aggregation differs by benchmark type. Benchmarks that execute
|
|
multiple discrete runs, ping (3 runs of 100 packets each) and
|
|
nix-cache (2 timed runs via hyperfine), first compute statistics
|
|
within each run, then average the resulting statistics across runs.
|
|
Concretely, if ping produces three runs with mean RTTs of
|
|
5.1, 5.3, and 5.0\,ms, the reported average is the mean of
|
|
those three values (5.13\,ms). The reported minimum is the
|
|
single lowest RTT observed across all three runs.
|
|
|
|
Benchmarks that produce continuous per-second samples, qperf and
|
|
RIST streaming for example, pool all per-second measurements from a single
|
|
execution into one series before computing statistics. For qperf,
|
|
bandwidth is first summed across CPU cores for each second, and
|
|
statistics are then computed over the resulting time series.
|
|
|
|
The analysis reports empirical percentiles (p25, p50, p75) alongside
|
|
min/max bounds rather than parametric confidence intervals. This
|
|
choice is deliberate: benchmark latency and throughput distributions
|
|
are often skewed or multimodal, making assumptions of normality
|
|
unreliable. The interquartile range (p25--p75) conveys the spread of
|
|
typical observations, while min and max capture outlier behavior.
|
|
The nix-cache benchmark additionally reports standard deviation via
|
|
hyperfine's built-in statistical output.
|
|
|
|
\section{Source Code Analysis}
|
|
|
|
To complement the performance benchmarks with architectural
|
|
understanding, a structured source code analysis was conducted for
|
|
all ten VPN implementations. The analysis followed three phases.
|
|
|
|
\subsection{Repository Collection and LLM-Assisted Overview}
|
|
|
|
The latest main branch of each VPN's git repository was cloned,
|
|
together with key dependencies that implement core functionality
|
|
outside the main repository. For example, Yggdrasil delegates its
|
|
routing and cryptographic operations to the Ironwood library, which
|
|
was analyzed alongside the main codebase.
|
|
|
|
Ten LLM agents (Claude Code) were then spawned in parallel, one per
|
|
VPN. Each agent was instructed to read the full source tree and
|
|
produce an \texttt{overview.md} file documenting the following
|
|
aspects:
|
|
|
|
\begin{itemize}
|
|
\item Wire protocol and message framing
|
|
\item Encryption scheme and key exchange
|
|
\item Packet handling and performance
|
|
\item NAT traversal mechanism
|
|
\item Local routing and peer discovery
|
|
\item Security features and access control
|
|
\item Resilience / Central Point of Failure
|
|
\end{itemize}
|
|
|
|
Every claim in the generated overview was required to reference the
|
|
specific file and line range in the repository that supports it,
|
|
enabling direct verification.
|
|
|
|
\subsection{Manual Verification}
|
|
|
|
The LLM-generated overviews served as a navigational aid rather than
|
|
a trusted source. The most important code paths identified in each
|
|
overview were manually read and verified against the actual source
|
|
code, correcting inaccuracies and deepening the analysis where the
|
|
automated summaries remained superficial.
|
|
|
|
\subsection{Feature Matrix and Maintainer Review}
|
|
|
|
The findings from both the automated and manual analysis were
|
|
consolidated into a comprehensive feature matrix cataloguing 131
|
|
features across all ten VPN implementations. The matrix covers
|
|
protocol characteristics, cryptographic primitives, NAT traversal
|
|
strategies, routing behavior, and security properties.
|
|
|
|
The completed feature matrix was published and sent to the respective
|
|
VPN maintainers for review. Maintainer feedback was incorporated as
|
|
corrections and clarifications, improving the accuracy of the final
|
|
classification.
|
|
|
|
\section{Reproducibility}
|
|
|
|
Reproducibility is ensured at every layer of the experimental stack.
|
|
|
|
\subsection{Dependency Pinning}
|
|
|
|
Every external dependency is pinned via \texttt{flake.lock}, which records
|
|
cryptographic hashes (\texttt{narHash}) and commit SHAs for each input.
|
|
Key pinned inputs include:
|
|
|
|
\begin{itemize}
|
|
\bitem{nixpkgs:} Follows \texttt{clan-core/nixpkgs}, ensuring a
|
|
single version across the dependency graph
|
|
\bitem{clan-core:} The Clan framework, pinned to a specific commit
|
|
\bitem{VPN sources:} Hyprspace, EasyTier, Nebula locked to
|
|
exact commits
|
|
\bitem{Build infrastructure:} flake-parts, treefmt-nix, disko,
|
|
nixos-facter-modules
|
|
\end{itemize}
|
|
|
|
Custom packages not in nixpkgs (qperf, VpnCloud, iperf with auth patches,
|
|
EasyTier, Hyprspace) are built from source within the flake.
|
|
|
|
\subsection{Declarative System Configuration}
|
|
|
|
Each benchmark machine runs NixOS, where the entire operating system is
|
|
defined declaratively. There is no imperative package installation or
|
|
configuration drift. Given the same NixOS configuration, two machines
|
|
will have identical software, services, and kernel parameters.
|
|
|
|
Machine deployment is atomic: the system either switches to the new
|
|
configuration entirely or rolls back.
|
|
|
|
\subsection{Inventory-Driven Topology}
|
|
|
|
Clan's inventory system maps machines to service roles declaratively.
|
|
For each VPN, the orchestrator writes an inventory entry assigning
|
|
machines to roles (e.g., Nebula lighthouse vs.\ peer). The Clan module
|
|
system translates this into NixOS configuration; systemd services,
|
|
firewall rules, peer lists, and key references. The same inventory
|
|
entry always produces the same NixOS configuration.
|
|
|
|
\subsection{State Isolation}
|
|
|
|
Before installing a new VPN, the orchestrator deletes all state
|
|
directories from previous runs, including VPN-specific directories
|
|
(\texttt{/var/lib/zerotier-one}, \texttt{/var/lib/nebula}, etc.) and
|
|
benchmark directories. This prevents cross-contamination between tests.
|
|
|
|
\subsection{Data Provenance}
|
|
|
|
Every test result includes metadata recording:
|
|
|
|
\begin{itemize}
|
|
\item Wall-clock duration
|
|
\item Number of attempts (1 = first try succeeded)
|
|
\item VPN restart attempts and duration
|
|
\item Connectivity wait duration
|
|
\item Source and target machine names
|
|
\item Service logs (on failure)
|
|
\end{itemize}
|
|
|
|
Results are organized hierarchically by VPN, TC profile, and machine
|
|
pair. Each profile directory contains a \texttt{tc\_settings.json}
|
|
snapshot of the exact impairment parameters applied.
|