415 lines
16 KiB
TeX
Executable File
415 lines
16 KiB
TeX
Executable File
% Chapter Template
|
|
|
|
\chapter{Methodology} % Main chapter title
|
|
|
|
\label{Methodology}
|
|
|
|
This chapter describes the methodology used to benchmark peer-to-peer
|
|
mesh VPN implementations. The experimental design prioritizes
|
|
reproducibility at every layer; from dependency management to network
|
|
conditions; enabling independent verification of results and
|
|
facilitating future comparative studies.
|
|
|
|
\section{Experimental Setup}
|
|
|
|
\subsection{Hardware Configuration}
|
|
|
|
All experiments were conducted on three bare-metal servers with
|
|
identical specifications:
|
|
|
|
\begin{itemize}
|
|
\item \textbf{CPU:} Intel Model 94, 4 cores / 8 threads
|
|
\item \textbf{Memory:} 64 GB RAM
|
|
\item \textbf{Network:} 1 Gbps Ethernet (e1000e driver; one machine
|
|
uses r8169)
|
|
\item \textbf{Cryptographic acceleration:} AES-NI, AVX, AVX2, PCLMULQDQ,
|
|
RDRAND, SSE4.2
|
|
\end{itemize}
|
|
|
|
The presence of hardware cryptographic acceleration is relevant because
|
|
many VPN implementations leverage AES-NI for encryption, and the results
|
|
may differ on systems without these features.
|
|
|
|
\subsection{Network Topology}
|
|
|
|
The three machines are connected via a direct 1 Gbps LAN on the same
|
|
network segment. This baseline topology provides a controlled environment
|
|
with minimal latency and no packet loss, allowing the overhead introduced
|
|
by each VPN implementation to be measured in isolation.
|
|
|
|
To simulate real-world network conditions, Linux traffic control
|
|
(\texttt{tc netem}) is used to inject latency, jitter, packet loss,
|
|
and reordering. These impairments are applied symmetrically on all
|
|
machines, meaning effective round-trip impairment is approximately
|
|
double the per-machine values.
|
|
|
|
\section{VPNs Under Test}
|
|
|
|
Ten VPN implementations were selected for evaluation, spanning a range
|
|
of architectures from centralized coordination to fully decentralized
|
|
mesh topologies. Table~\ref{tab:vpn_selection} summarizes the selection.
|
|
|
|
\begin{table}[H]
|
|
\centering
|
|
\caption{VPN implementations included in the benchmark}
|
|
\label{tab:vpn_selection}
|
|
\begin{tabular}{lll}
|
|
\hline
|
|
\textbf{VPN} & \textbf{Architecture} & \textbf{Notes} \\
|
|
\hline
|
|
Tailscale (Headscale) & Coordinated mesh & Open-source
|
|
coordination server \\
|
|
ZeroTier & Coordinated mesh & Global virtual Ethernet \\
|
|
Nebula & Coordinated mesh & Slack's overlay network \\
|
|
Tinc & Fully decentralized & Established since 1998 \\
|
|
Yggdrasil & Fully decentralized & Spanning-tree routing \\
|
|
Mycelium & Fully decentralized & End-to-end encrypted IPv6 overlay \\
|
|
Hyprspace & Fully decentralized & libp2p-based, IPFS-compatible \\
|
|
EasyTier & Fully decentralized & Rust-based, multi-protocol \\
|
|
VpnCloud & Fully decentralized & Lightweight, kernel bypass option \\
|
|
WireGuard & Point-to-point & Reference baseline (not a mesh VPN) \\
|
|
\hline
|
|
Internal (no VPN) & N/A & Baseline for raw network performance \\
|
|
\hline
|
|
\end{tabular}
|
|
\end{table}
|
|
|
|
WireGuard is included as a reference point despite not being a mesh VPN.
|
|
Its minimal overhead and widespread adoption make it a useful comparison
|
|
for understanding the cost of mesh coordination and NAT traversal logic.
|
|
|
|
\subsection{Selection Criteria}
|
|
|
|
VPNs were selected based on:
|
|
\begin{itemize}
|
|
\item \textbf{NAT traversal capability:} All selected VPNs can establish
|
|
connections between peers behind NAT without manual port forwarding.
|
|
\item \textbf{Decentralization:} Preference for solutions without mandatory
|
|
central servers, though coordinated-mesh VPNs were included for comparison.
|
|
\item \textbf{Active development:} Only VPNs with recent commits and
|
|
maintained releases were considered.
|
|
\item \textbf{Linux support:} All VPNs must run on Linux.
|
|
\end{itemize}
|
|
|
|
\subsection{Configuration Methodology}
|
|
|
|
Each VPN is built from source within the Nix flake, ensuring that all
|
|
dependencies are pinned to exact versions. VPNs not packaged in nixpkgs
|
|
(Hyprspace, EasyTier, VpnCloud, qperf) have dedicated build expressions
|
|
under \texttt{pkgs/} in the flake.
|
|
|
|
Cryptographic material (WireGuard keys, Nebula certificates, ZeroTier
|
|
identities) is generated deterministically via Clan's vars generator
|
|
system. For example, WireGuard keys are generated as:
|
|
|
|
\begin{verbatim}
|
|
wg genkey > "$out/private-key"
|
|
wg pubkey < "$out/private-key" > "$out/public-key"
|
|
\end{verbatim}
|
|
|
|
Generated keys are stored in version control under
|
|
\texttt{vars/per-machine/\{name\}/} and read at NixOS evaluation time,
|
|
making key material part of the reproducible configuration.
|
|
|
|
\section{Benchmark Suite}
|
|
|
|
The benchmark suite includes both synthetic throughput tests and
|
|
real-world workloads. This combination addresses a limitation of prior
|
|
work that relied exclusively on iperf3.
|
|
|
|
\subsection{Ping}
|
|
|
|
Measures round-trip latency and packet delivery reliability.
|
|
|
|
\begin{itemize}
|
|
\item \textbf{Method:} 100 ICMP echo requests at 200 ms intervals,
|
|
1-second per-packet timeout, repeated for 3 runs.
|
|
\item \textbf{Metrics:} RTT (min, avg, max, mdev), packet loss percentage,
|
|
per-packet RTTs.
|
|
\end{itemize}
|
|
|
|
\subsection{iPerf3}
|
|
|
|
Measures bulk data transfer throughput.
|
|
|
|
\textbf{TCP variant:} 30-second bidirectional test with RSA authentication
|
|
and zero-copy mode (\texttt{-Z}) to minimize CPU overhead.
|
|
|
|
\textbf{UDP variant:} Same configuration with unlimited target bandwidth
|
|
(\texttt{-b 0}) and 64-bit counters.
|
|
|
|
\textbf{Parallel TCP variant:} Tests concurrent mesh traffic by running
|
|
TCP streams on all machines simultaneously in a circular pattern
|
|
(A$\rightarrow$B, B$\rightarrow$C, C$\rightarrow$A) for 60 seconds.
|
|
This simulates contention across the mesh.
|
|
|
|
\begin{itemize}
|
|
\item \textbf{Metrics:} Throughput (bits/s), retransmits, congestion window,
|
|
jitter (UDP), packet loss (UDP).
|
|
\end{itemize}
|
|
|
|
\subsection{qPerf}
|
|
|
|
Measures connection-level performance rather than bulk throughput.
|
|
|
|
\begin{itemize}
|
|
\item \textbf{Method:} One qperf instance per CPU core in parallel, each
|
|
running for 30 seconds. Bandwidth from all cores is summed per second.
|
|
\item \textbf{Metrics:} Total bandwidth (Mbps), CPU usage, time to first
|
|
byte (TTFB), connection establishment time.
|
|
\end{itemize}
|
|
|
|
\subsection{RIST Video Streaming}
|
|
|
|
Measures real-time multimedia streaming performance.
|
|
|
|
\begin{itemize}
|
|
\item \textbf{Method:} The sender generates a 4K (3840$\times$2160) test
|
|
pattern at 30 fps using ffmpeg with H.264 encoding (ultrafast preset,
|
|
zerolatency tuning) at 25 Mbps target bitrate. The stream is transmitted
|
|
over the RIST protocol to a receiver on the target machine for 30 seconds.
|
|
\item \textbf{Encoding metrics:} Actual bitrate, frame rate, dropped frames.
|
|
\item \textbf{Network metrics:} Packets dropped, packets recovered via
|
|
RIST retransmission, RTT, quality score (0--100), received bitrate.
|
|
\end{itemize}
|
|
|
|
RIST (Reliable Internet Stream Transport) is a protocol designed for
|
|
low-latency video contribution over unreliable networks, making it a
|
|
realistic test of VPN behavior under multimedia workloads.
|
|
|
|
\subsection{Nix Cache Download}
|
|
|
|
Measures sustained download performance using a real-world workload.
|
|
|
|
\begin{itemize}
|
|
\item \textbf{Method:} A Harmonia Nix binary cache server on the target
|
|
machine serves the Firefox package. The client downloads it via
|
|
\texttt{nix copy} through the VPN. Benchmarked with hyperfine:
|
|
1 warmup run followed by 2 timed runs. The local cache and Nix's
|
|
SQLite metadata are cleared between runs.
|
|
\item \textbf{Metrics:} Mean duration (seconds), standard deviation,
|
|
min/max duration.
|
|
\end{itemize}
|
|
|
|
This benchmark tests realistic HTTP traffic patterns and sustained
|
|
sequential download performance, complementing the synthetic throughput
|
|
tests.
|
|
|
|
\section{Network Impairment Profiles}
|
|
|
|
Four impairment profiles simulate a range of network conditions, from
|
|
ideal to severely degraded. Impairments are applied via Linux traffic
|
|
control (\texttt{tc netem}) on every machine's primary interface.
|
|
Table~\ref{tab:impairment_profiles} shows the per-machine values;
|
|
effective round-trip impairment is approximately doubled.
|
|
|
|
\begin{table}[H]
|
|
\centering
|
|
\caption{Network impairment profiles (per-machine egress values)}
|
|
\label{tab:impairment_profiles}
|
|
\begin{tabular}{lccccc}
|
|
\hline
|
|
\textbf{Profile} & \textbf{Latency} & \textbf{Jitter} &
|
|
\textbf{Loss} & \textbf{Reorder} & \textbf{Correlation} \\
|
|
\hline
|
|
Baseline & ; & ; & ; & ; & ; \\
|
|
Low & 2 ms & 2 ms & 0.25\% & 0.5\% & 25\% \\
|
|
Medium & 4 ms & 7 ms & 1.0\% & 2.5\% & 50\% \\
|
|
High & 12 ms & 30 ms & 5.0\% & 10\% & 50\% \\
|
|
\hline
|
|
\end{tabular}
|
|
\end{table}
|
|
|
|
The ``Low'' profile approximates a well-provisioned continental
|
|
connection, ``Medium'' represents intercontinental links or congested
|
|
networks, and ``High'' simulates severely degraded conditions such as
|
|
satellite links or highly congested mobile networks.
|
|
|
|
A 30-second stabilization period follows TC application before
|
|
measurements begin, allowing queuing disciplines to settle.
|
|
|
|
\section{Experimental Procedure}
|
|
|
|
\subsection{Automation}
|
|
|
|
The benchmark suite is fully automated via a Python orchestrator
|
|
(\texttt{vpn\_bench/}). For each VPN under test, the orchestrator:
|
|
|
|
\begin{enumerate}
|
|
\item Cleans all state directories from previous VPN runs
|
|
\item Deploys the VPN configuration to all machines via Clan
|
|
\item Restarts the VPN service on every machine (with retry:
|
|
up to 3 attempts, 2-second backoff)
|
|
\item Verifies VPN connectivity via a connection-check service
|
|
(120-second timeout)
|
|
\item For each impairment profile:
|
|
\begin{enumerate}
|
|
\item Applies TC rules via context manager (guarantees cleanup)
|
|
\item Waits 30 seconds for stabilization
|
|
\item Executes all benchmarks
|
|
\item Clears TC rules
|
|
\end{enumerate}
|
|
\item Collects results and metadata
|
|
\end{enumerate}
|
|
|
|
\subsection{Retry Logic}
|
|
|
|
Tests use a retry wrapper with up to 2 retries (3 total attempts),
|
|
5-second initial delay, and 700-second maximum total time. The number
|
|
of attempts is recorded in test metadata so that retried results can
|
|
be identified during analysis.
|
|
|
|
\subsection{Statistical Analysis}
|
|
|
|
Each metric is summarized as a statistics dictionary containing:
|
|
|
|
\begin{itemize}
|
|
\item \textbf{min / max:} Extreme values observed
|
|
\item \textbf{average:} Arithmetic mean across samples
|
|
\item \textbf{p25 / p50 / p75:} Quartiles via \texttt{statistics.quantiles()}
|
|
\end{itemize}
|
|
|
|
Multi-run tests (ping, nix-cache) aggregate across runs. Per-second
|
|
tests (qperf, RIST) aggregate across all per-second samples.
|
|
|
|
The approach uses empirical percentiles rather than parametric
|
|
confidence intervals, which is appropriate for benchmark data that
|
|
may not follow a normal distribution. The nix-cache test (via hyperfine)
|
|
additionally reports standard deviation.
|
|
|
|
\section{Reproducibility}
|
|
|
|
Reproducibility is ensured at every layer of the experimental stack.
|
|
|
|
\subsection{Dependency Pinning}
|
|
|
|
Every external dependency is pinned via \texttt{flake.lock}, which records
|
|
cryptographic hashes (\texttt{narHash}) and commit SHAs for each input.
|
|
Key pinned inputs include:
|
|
|
|
\begin{itemize}
|
|
\item \textbf{nixpkgs:} Follows \texttt{clan-core/nixpkgs}, ensuring a
|
|
single version across the dependency graph
|
|
\item \textbf{clan-core:} The Clan framework, pinned to a specific commit
|
|
\item \textbf{VPN sources:} Hyprspace, EasyTier, Nebula locked to
|
|
exact commits
|
|
\item \textbf{Build infrastructure:} flake-parts, treefmt-nix, disko,
|
|
nixos-facter-modules
|
|
\end{itemize}
|
|
|
|
Custom packages not in nixpkgs (qperf, VpnCloud, iperf with auth patches,
|
|
phantun, EasyTier, Hyprspace) are built from source within the flake.
|
|
|
|
\subsection{Declarative System Configuration}
|
|
|
|
Each benchmark machine runs NixOS, where the entire operating system is
|
|
defined declaratively. There is no imperative package installation or
|
|
configuration drift. Given the same NixOS configuration, two machines
|
|
will have identical software, services, and kernel parameters.
|
|
|
|
Machine deployment is atomic: the system either switches to the new
|
|
configuration entirely or rolls back.
|
|
|
|
\subsection{Inventory-Driven Topology}
|
|
|
|
Clan's inventory system maps machines to service roles declaratively.
|
|
For each VPN, the orchestrator writes an inventory entry assigning
|
|
machines to roles (e.g., Nebula lighthouse vs.\ peer). The Clan module
|
|
system translates this into NixOS configuration; systemd services,
|
|
firewall rules, peer lists, and key references. The same inventory
|
|
entry always produces the same NixOS configuration.
|
|
|
|
\subsection{State Isolation}
|
|
|
|
Before installing a new VPN, the orchestrator deletes all state
|
|
directories from previous runs, including VPN-specific directories
|
|
(\texttt{/var/lib/zerotier-one}, \texttt{/var/lib/nebula}, etc.) and
|
|
benchmark directories. This prevents cross-contamination between tests.
|
|
|
|
\subsection{Data Provenance}
|
|
|
|
Every test result includes metadata recording:
|
|
|
|
\begin{itemize}
|
|
\item Wall-clock duration
|
|
\item Number of attempts (1 = first try succeeded)
|
|
\item VPN restart attempts and duration
|
|
\item Connectivity wait duration
|
|
\item Source and target machine names
|
|
\item Service logs (on failure)
|
|
\end{itemize}
|
|
|
|
Results are organized hierarchically by VPN, TC profile, and machine
|
|
pair. Each profile directory contains a \texttt{tc\_settings.json}
|
|
snapshot of the exact impairment parameters applied.
|
|
|
|
\section{Related Work}
|
|
|
|
\subsection{Nix: A Safe and Policy-Free System for Software Deployment}
|
|
|
|
Nix addresses significant issues in software deployment by utilizing
|
|
cryptographic hashes to ensure unique paths for component instances
|
|
\cite{dolstra_nix_2004}. Features such as concurrent installation of
|
|
multiple versions, atomic upgrades, and safe garbage collection make
|
|
Nix a flexible deployment system. This work uses Nix to ensure that
|
|
all VPN builds and system configurations are deterministic.
|
|
|
|
\subsection{NixOS: A Purely Functional Linux Distribution}
|
|
|
|
NixOS extends Nix principles to Linux system configuration
|
|
\cite{dolstra_nixos_2008}. System configurations are reproducible and
|
|
isolated from stateful interactions typical in imperative package
|
|
management. This property is essential for ensuring identical test
|
|
environments across benchmark runs.
|
|
|
|
\subsection{A Comparative Study on Virtual Private Networks}
|
|
|
|
Lackorzynski et al.\ \cite{lackorzynski_comparative_2019} evaluate
|
|
VPN protocols in the context of industrial communication systems (Industry 4.0),
|
|
benchmarking OpenVPN, IPSec, Tinc, Freelan, MACsec, and WireGuard.
|
|
Their analysis focuses on point-to-point protocol performance; throughput,
|
|
latency, and CPU overhead; rather than overlay network behavior.
|
|
In contrast, this thesis evaluates VPNs that provide a full data plane
|
|
with peer-to-peer connectivity, NAT traversal, and dynamic peer discovery.
|
|
|
|
\subsection{Full-Mesh VPN Performance Evaluation}
|
|
|
|
Kjorveziroski et al.\ \cite{kjorveziroski_full-mesh_2024} provide a
|
|
comprehensive evaluation of full-mesh VPN solutions for distributed
|
|
systems. Their benchmarks analyze throughput, reliability under packet
|
|
loss, and relay behavior for VPNs including ZeroTier.
|
|
|
|
This thesis extends their work in several ways:
|
|
\begin{itemize}
|
|
\item Broader VPN selection with emphasis on fully decentralized
|
|
architectures
|
|
\item Real-world workloads (video streaming, package downloads)
|
|
beyond synthetic iperf3 tests
|
|
\item Multiple impairment profiles to characterize behavior under
|
|
varying network conditions
|
|
\item Fully reproducible experimental framework via Nix/NixOS/Clan
|
|
\end{itemize}
|
|
|
|
\subsection{UDP NAT and Firewall Puncturing in the Wild}
|
|
|
|
Halkes and Pouwelse~\cite{halkes_udp_2011} measure UDP hole punching
|
|
efficacy on a live P2P network using the Tribler BitTorrent client.
|
|
Their study finds that 79\% of peers are unreachable due to NAT or
|
|
firewall restrictions, yet 64\% reside behind configurations amenable
|
|
to hole punching. Among compatible peers, over 80\% of puncturing
|
|
attempts succeed, establishing hole punching as a practical NAT
|
|
traversal technique. Their timeout measurements further indicate that
|
|
keep-alive messages must be sent at least every 55 seconds to maintain
|
|
open NAT mappings.
|
|
|
|
These findings directly inform the evaluation criteria for this thesis.
|
|
All mesh VPNs tested rely on UDP hole punching for NAT traversal;
|
|
the 80\% success rate sets a baseline expectation, while the 55-second
|
|
timeout informs analysis of each implementation's keep-alive behavior
|
|
during source code review.
|
|
|
|
\subsection{An Overview of Packet Reordering in TCP}
|
|
TODO \cite{leung_overview_2007}
|
|
|
|
\subsection{Performance Evaluation of TCP over QUIC Tunnels}
|
|
TODO \cite{guo_implementation_2025} |