% Chapter Template \chapter{Methodology} % Main chapter title \label{Methodology} This chapter describes the methodology used to benchmark peer-to-peer mesh VPN implementations. The experimental design prioritizes reproducibility at every layer---from dependency management to network conditions---enabling independent verification of results and facilitating future comparative studies. \section{Experimental Setup} \subsection{Hardware Configuration} All experiments were conducted on three bare-metal servers with identical specifications: \begin{itemize} \item \textbf{CPU:} Intel Model 94, 4 cores / 8 threads \item \textbf{Memory:} 64 GB RAM \item \textbf{Network:} 1 Gbps Ethernet (e1000e driver; one machine uses r8169) \item \textbf{Cryptographic acceleration:} AES-NI, AVX, AVX2, PCLMULQDQ, RDRAND, SSE4.2 \end{itemize} The presence of hardware cryptographic acceleration is relevant because many VPN implementations leverage AES-NI for encryption, and the results may differ on systems without these features. \subsection{Network Topology} The three machines are connected via a direct 1 Gbps LAN on the same network segment. This baseline topology provides a controlled environment with minimal latency and no packet loss, allowing the overhead introduced by each VPN implementation to be measured in isolation. To simulate real-world network conditions, Linux traffic control (\texttt{tc netem}) is used to inject latency, jitter, packet loss, and reordering. These impairments are applied symmetrically on all machines, meaning effective round-trip impairment is approximately double the per-machine values. \section{VPNs Under Test} Ten VPN implementations were selected for evaluation, spanning a range of architectures from centralized coordination to fully decentralized mesh topologies. Table~\ref{tab:vpn_selection} summarizes the selection. \begin{table}[H] \centering \caption{VPN implementations included in the benchmark} \label{tab:vpn_selection} \begin{tabular}{lll} \hline \textbf{VPN} & \textbf{Architecture} & \textbf{Notes} \\ \hline Tailscale (Headscale) & Coordinated mesh & Open-source coordination server \\ ZeroTier & Coordinated mesh & Global virtual Ethernet \\ Nebula & Coordinated mesh & Slack's overlay network \\ Tinc & Fully decentralized & Established since 1998 \\ Yggdrasil & Fully decentralized & Spanning-tree routing \\ Mycelium & Fully decentralized & End-to-end encrypted IPv6 overlay \\ Hyprspace & Fully decentralized & libp2p-based, IPFS-compatible \\ EasyTier & Fully decentralized & Rust-based, multi-protocol \\ VpnCloud & Fully decentralized & Lightweight, kernel bypass option \\ WireGuard & Point-to-point & Reference baseline (not a mesh VPN) \\ \hline Internal (no VPN) & N/A & Baseline for raw network performance \\ \hline \end{tabular} \end{table} WireGuard is included as a reference point despite not being a mesh VPN. Its minimal overhead and widespread adoption make it a useful comparison for understanding the cost of mesh coordination and NAT traversal logic. \subsection{Selection Criteria} VPNs were selected based on: \begin{itemize} \item \textbf{NAT traversal capability:} All selected VPNs can establish connections between peers behind NAT without manual port forwarding. \item \textbf{Decentralization:} Preference for solutions without mandatory central servers, though coordinated-mesh VPNs were included for comparison. \item \textbf{Active development:} Only VPNs with recent commits and maintained releases were considered. \item \textbf{Linux support:} All VPNs must run on Linux. \end{itemize} \subsection{Configuration Methodology} Each VPN is built from source within the Nix flake, ensuring that all dependencies are pinned to exact versions. VPNs not packaged in nixpkgs (Hyprspace, EasyTier, VpnCloud, qperf) have dedicated build expressions under \texttt{pkgs/} in the flake. Cryptographic material (WireGuard keys, Nebula certificates, ZeroTier identities) is generated deterministically via Clan's vars generator system. For example, WireGuard keys are generated as: \begin{verbatim} wg genkey > "$out/private-key" wg pubkey < "$out/private-key" > "$out/public-key" \end{verbatim} Generated keys are stored in version control under \texttt{vars/per-machine/\{name\}/} and read at NixOS evaluation time, making key material part of the reproducible configuration. \section{Benchmark Suite} The benchmark suite includes both synthetic throughput tests and real-world workloads. This combination addresses a limitation of prior work that relied exclusively on iperf3. \subsection{Ping} Measures round-trip latency and packet delivery reliability. \begin{itemize} \item \textbf{Method:} 100 ICMP echo requests at 200 ms intervals, 1-second per-packet timeout, repeated for 3 runs. \item \textbf{Metrics:} RTT (min, avg, max, mdev), packet loss percentage, per-packet RTTs. \end{itemize} \subsection{iPerf3} Measures bulk data transfer throughput. \textbf{TCP variant:} 30-second bidirectional test with RSA authentication and zero-copy mode (\texttt{-Z}) to minimize CPU overhead. \textbf{UDP variant:} Same configuration with unlimited target bandwidth (\texttt{-b 0}) and 64-bit counters. \textbf{Parallel TCP variant:} Tests concurrent mesh traffic by running TCP streams on all machines simultaneously in a circular pattern (A$\rightarrow$B, B$\rightarrow$C, C$\rightarrow$A) for 60 seconds. This simulates contention across the mesh. \begin{itemize} \item \textbf{Metrics:} Throughput (bits/s), retransmits, congestion window, jitter (UDP), packet loss (UDP). \end{itemize} \subsection{qPerf} Measures connection-level performance rather than bulk throughput. \begin{itemize} \item \textbf{Method:} One qperf instance per CPU core in parallel, each running for 30 seconds. Bandwidth from all cores is summed per second. \item \textbf{Metrics:} Total bandwidth (Mbps), CPU usage, time to first byte (TTFB), connection establishment time. \end{itemize} \subsection{RIST Video Streaming} Measures real-time multimedia streaming performance. \begin{itemize} \item \textbf{Method:} The sender generates a 4K (3840$\times$2160) test pattern at 30 fps using ffmpeg with H.264 encoding (ultrafast preset, zerolatency tuning) at 25 Mbps target bitrate. The stream is transmitted over the RIST protocol to a receiver on the target machine for 30 seconds. \item \textbf{Encoding metrics:} Actual bitrate, frame rate, dropped frames. \item \textbf{Network metrics:} Packets dropped, packets recovered via RIST retransmission, RTT, quality score (0--100), received bitrate. \end{itemize} RIST (Reliable Internet Stream Transport) is a protocol designed for low-latency video contribution over unreliable networks, making it a realistic test of VPN behavior under multimedia workloads. \subsection{Nix Cache Download} Measures sustained download performance using a real-world workload. \begin{itemize} \item \textbf{Method:} A Harmonia Nix binary cache server on the target machine serves the Firefox package. The client downloads it via \texttt{nix copy} through the VPN. Benchmarked with hyperfine: 1 warmup run followed by 2 timed runs. The local cache and Nix's SQLite metadata are cleared between runs. \item \textbf{Metrics:} Mean duration (seconds), standard deviation, min/max duration. \end{itemize} This benchmark tests realistic HTTP traffic patterns and sustained sequential download performance, complementing the synthetic throughput tests. \section{Network Impairment Profiles} Four impairment profiles simulate a range of network conditions, from ideal to severely degraded. Impairments are applied via Linux traffic control (\texttt{tc netem}) on every machine's primary interface. Table~\ref{tab:impairment_profiles} shows the per-machine values; effective round-trip impairment is approximately doubled. \begin{table}[H] \centering \caption{Network impairment profiles (per-machine egress values)} \label{tab:impairment_profiles} \begin{tabular}{lccccc} \hline \textbf{Profile} & \textbf{Latency} & \textbf{Jitter} & \textbf{Loss} & \textbf{Reorder} & \textbf{Correlation} \\ \hline Baseline & --- & --- & --- & --- & --- \\ Low & 2 ms & 2 ms & 0.25\% & 0.5\% & 25\% \\ Medium & 4 ms & 7 ms & 1.0\% & 2.5\% & 50\% \\ High & 12 ms & 30 ms & 5.0\% & 10\% & 50\% \\ \hline \end{tabular} \end{table} The ``Low'' profile approximates a well-provisioned continental connection, ``Medium'' represents intercontinental links or congested networks, and ``High'' simulates severely degraded conditions such as satellite links or highly congested mobile networks. A 30-second stabilization period follows TC application before measurements begin, allowing queuing disciplines to settle. \section{Experimental Procedure} \subsection{Automation} The benchmark suite is fully automated via a Python orchestrator (\texttt{vpn\_bench/}). For each VPN under test, the orchestrator: \begin{enumerate} \item Cleans all state directories from previous VPN runs \item Deploys the VPN configuration to all machines via Clan \item Restarts the VPN service on every machine (with retry: up to 3 attempts, 2-second backoff) \item Verifies VPN connectivity via a connection-check service (120-second timeout) \item For each impairment profile: \begin{enumerate} \item Applies TC rules via context manager (guarantees cleanup) \item Waits 30 seconds for stabilization \item Executes all benchmarks \item Clears TC rules \end{enumerate} \item Collects results and metadata \end{enumerate} \subsection{Retry Logic} Tests use a retry wrapper with up to 2 retries (3 total attempts), 5-second initial delay, and 700-second maximum total time. The number of attempts is recorded in test metadata so that retried results can be identified during analysis. \subsection{Statistical Analysis} Each metric is summarized as a statistics dictionary containing: \begin{itemize} \item \textbf{min / max:} Extreme values observed \item \textbf{average:} Arithmetic mean across samples \item \textbf{p25 / p50 / p75:} Quartiles via \texttt{statistics.quantiles()} \end{itemize} Multi-run tests (ping, nix-cache) aggregate across runs. Per-second tests (qperf, RIST) aggregate across all per-second samples. The approach uses empirical percentiles rather than parametric confidence intervals, which is appropriate for benchmark data that may not follow a normal distribution. The nix-cache test (via hyperfine) additionally reports standard deviation. \section{Reproducibility} Reproducibility is ensured at every layer of the experimental stack. \subsection{Dependency Pinning} Every external dependency is pinned via \texttt{flake.lock}, which records cryptographic hashes (\texttt{narHash}) and commit SHAs for each input. Key pinned inputs include: \begin{itemize} \item \textbf{nixpkgs:} Follows \texttt{clan-core/nixpkgs}, ensuring a single version across the dependency graph \item \textbf{clan-core:} The Clan framework, pinned to a specific commit \item \textbf{VPN sources:} Hyprspace, EasyTier, Nebula locked to exact commits \item \textbf{Build infrastructure:} flake-parts, treefmt-nix, disko, nixos-facter-modules \end{itemize} Custom packages not in nixpkgs (qperf, VpnCloud, iperf with auth patches, phantun, EasyTier, Hyprspace) are built from source within the flake. \subsection{Declarative System Configuration} Each benchmark machine runs NixOS, where the entire operating system is defined declaratively. There is no imperative package installation or configuration drift. Given the same NixOS configuration, two machines will have identical software, services, and kernel parameters. Machine deployment is atomic: the system either switches to the new configuration entirely or rolls back. \subsection{Inventory-Driven Topology} Clan's inventory system maps machines to service roles declaratively. For each VPN, the orchestrator writes an inventory entry assigning machines to roles (e.g., Nebula lighthouse vs.\ peer). The Clan module system translates this into NixOS configuration---systemd services, firewall rules, peer lists, and key references. The same inventory entry always produces the same NixOS configuration. \subsection{State Isolation} Before installing a new VPN, the orchestrator deletes all state directories from previous runs, including VPN-specific directories (\texttt{/var/lib/zerotier-one}, \texttt{/var/lib/nebula}, etc.) and benchmark directories. This prevents cross-contamination between tests. \subsection{Data Provenance} Every test result includes metadata recording: \begin{itemize} \item Wall-clock duration \item Number of attempts (1 = first try succeeded) \item VPN restart attempts and duration \item Connectivity wait duration \item Source and target machine names \item Service logs (on failure) \end{itemize} Results are organized hierarchically by VPN, TC profile, and machine pair. Each profile directory contains a \texttt{tc\_settings.json} snapshot of the exact impairment parameters applied. \section{Related Work} \subsection{Nix: A Safe and Policy-Free System for Software Deployment} Nix addresses significant issues in software deployment by utilizing cryptographic hashes to ensure unique paths for component instances \cite{dolstra_nix_2004}. Features such as concurrent installation of multiple versions, atomic upgrades, and safe garbage collection make Nix a flexible deployment system. This work uses Nix to ensure that all VPN builds and system configurations are deterministic. \subsection{NixOS: A Purely Functional Linux Distribution} NixOS extends Nix principles to Linux system configuration \cite{dolstra_nixos_2008}. System configurations are reproducible and isolated from stateful interactions typical in imperative package management. This property is essential for ensuring identical test environments across benchmark runs. \subsection{A Comparative Study on Virtual Private Networks} Lackorzynski et al.\ \cite{lackorzynski_comparative_2019} evaluate VPN protocols in the context of industrial communication systems (Industry 4.0), benchmarking OpenVPN, IPSec, Tinc, Freelan, MACsec, and WireGuard. Their analysis focuses on point-to-point protocol performance---throughput, latency, and CPU overhead---rather than overlay network behavior. In contrast, this thesis evaluates VPNs that provide a full data plane with peer-to-peer connectivity, NAT traversal, and dynamic peer discovery. \subsection{Full-Mesh VPN Performance Evaluation} Kjorveziroski et al.\ \cite{kjorveziroski_full-mesh_2024} provide a comprehensive evaluation of full-mesh VPN solutions for distributed systems. Their benchmarks analyze throughput, reliability under packet loss, and relay behavior for VPNs including ZeroTier. This thesis extends their work in several ways: \begin{itemize} \item Broader VPN selection with emphasis on fully decentralized architectures \item Real-world workloads (video streaming, package downloads) beyond synthetic iperf3 tests \item Multiple impairment profiles to characterize behavior under varying network conditions \item Fully reproducible experimental framework via Nix/NixOS/Clan \end{itemize}