Methodology section

This commit is contained in:
2026-02-11 23:12:58 +01:00
parent ae3d6b2749
commit 906f9c963a

View File

@@ -2,243 +2,386 @@
\chapter{Methodology} % Main chapter title
\label{Methodology} % Change X to a consecutive number; for
% referencing this chapter elsewhere, use \ref{ChapterX}
\label{Methodology}
%----------------------------------------------------------------------------------------
% SECTION 1
%----------------------------------------------------------------------------------------
This chapter describes the methodology used to benchmark peer-to-peer
overlay VPN implementations. The experimental design prioritizes
reproducibility at every layer---from dependency management to network
conditions---enabling independent verification of results and
facilitating future comparative studies.
This chapter describes the methodology used to evaluate and analyze
the Clan framework. A summary of the logical flow of this research is
depicted in Figure \ref{fig:clan_thesis_argumentation_tree}.
\section{Experimental Setup}
\begin{figure}[H]
\centering
\includesvg[width=1\textwidth,
keepaspectratio]{Figures/clan_thesis_argumentation_tree.drawio.svg}
\caption{Argumentation Tree for the Clan Thesis}
\label{fig:clan_thesis_argumentation_tree}
\end{figure}
\subsection{Hardware Configuration}
The structure of this study adopts a multi-faceted approach,
addressing several interrelated challenges in enhancing the
reliability and manageability of \ac{P2P} networks.
The primary objective is to assess how the Clan framework effectively
addresses these challenges.
All experiments were conducted on three bare-metal servers with
identical specifications:
The research methodology consists of two main components:
\begin{enumerate}
\item \textbf{Development of a Theoretical Model} \\
A theoretical model of the Clan framework will be constructed.
This includes a formal specification of the system's foundational
axioms, outlining the principles and properties that guide its
design. From these axioms, key theorems will be derived, along
with their boundary conditions. The aim is to understand the
mechanisms underpinning the framework and establish a basis for
its evaluation.
\item \textbf{Empirical Validation of the Theoretical Model} \\
Practical experiments will be conducted to validate the
predictions of the theoretical model. These experiments will
evaluate how well the model aligns with observed performance in
real-world settings. This step is crucial to identifying the
models strengths and limitations.
\end{enumerate}
The methodology will particularly examine three core components of
the Clan framework:
\begin{itemize}
\item \textbf{Clan Deployment System} \\
The deployment system is the core of the Clan framework, enabling
the configuration and management of distributed software
components. It simplifies complex configurations through Python
code, which abstracts the intricacies of the Nix language.
Central to this system is the "inventory," a mergeable data
structure designed for ensuring consistent service configurations
across nodes without conflicts. This component will be analyzed
for its design, functionality, efficiency, scalability, and fault
resilience.
\item \textbf{Overlay Networks / Mesh VPNs} \\
Overlay networks, also known as "Mesh VPNs," are critical for
secure communication in Clans \ac{P2P} deployment. The study
will evaluate their performance in terms of security,
scalability, and resilience to network disruptions. Specifically,
the assessment will include how well these networks handle
traffic in environments where no device has a public IP address,
as well as the impact of node failures on overall
connectivity. The analysis will focus on:
\begin{itemize}
\item \textbf{ZeroTier}: A globally distributed "Ethernet Switch".
\item \textbf{Mycelium}: An end-to-end encrypted IPv6 overlay network.
\item \textbf{Hyprspace}: A lightweight VPN leveraging IPFS and libp2p.
\end{itemize}
Other Mesh VPN solutions may be considered as comparison:
\begin{itemize}
\item \textbf{Tailscale}: A secure network for teams.
\item \textbf{Nebula Lightouse}: A scalable overlay networking
tool with a focus on performance
\end{itemize}
\item \textbf{Data Mesher} \\
The Data Mesher is responsible for data synchronization across
nodes, ensuring eventual consistency in Clans decentralized network. This
component will be evaluated for synchronization speed, fault
tolerance, and conflict resolution mechanisms. Additionally, it
will be analyzed for its resilience in scenarios involving
malicious nodes, measuring how effectively it prevents and
mitigates manipulation or integrity violations during data
replication and distribution.
\item \textbf{CPU:} Intel Model 94, 4 cores / 8 threads
\item \textbf{Memory:} 64 GB RAM
\item \textbf{Network:} 1 Gbps Ethernet (e1000e driver; one machine uses r8169)
\item \textbf{Cryptographic acceleration:} AES-NI, AVX, AVX2, PCLMULQDQ,
RDRAND, SSE4.2
\end{itemize}
\section{Related Work}
The presence of hardware cryptographic acceleration is relevant because
many VPN implementations leverage AES-NI for encryption, and the results
may differ on systems without these features.
The Clan framework operates within the realm of software deployment
and peer-to-peer networking,
necessitating a deep understanding of existing methodologies in these
areas to tackle contemporary challenges.
This section will discuss related works encompassing system
deployment, peer data management,
and low maintenance structured peer-to-peer overlays, which inform
the development and positioning of the Clan framework.
\subsection{Network Topology}
The three machines are connected via a direct 1 Gbps LAN on the same
network segment. This baseline topology provides a controlled environment
with minimal latency and no packet loss, allowing the overhead introduced
by each VPN implementation to be measured in isolation.
To simulate real-world network conditions, Linux traffic control
(\texttt{tc netem}) is used to inject latency, jitter, packet loss,
and reordering. These impairments are applied symmetrically on all
machines, meaning effective round-trip impairment is approximately
double the per-machine values.
\section{VPNs Under Test}
Ten VPN implementations were selected for evaluation, spanning a range
of architectures from centralized coordination to fully decentralized
mesh topologies. Table~\ref{tab:vpn_selection} summarizes the selection.
\begin{table}[H]
\centering
\caption{VPN implementations included in the benchmark}
\label{tab:vpn_selection}
\begin{tabular}{lll}
\hline
\textbf{VPN} & \textbf{Architecture} & \textbf{Notes} \\
\hline
Tailscale (Headscale) & Coordinated mesh & Open-source coordination server \\
ZeroTier & Coordinated mesh & Global virtual Ethernet \\
Nebula & Lighthouse-based mesh & Slack's overlay network \\
Tinc & Decentralized mesh & Established since 1998 \\
Yggdrasil & Fully decentralized & Spanning-tree routing \\
Mycelium & Fully decentralized & End-to-end encrypted IPv6 overlay \\
Hyprspace & Fully decentralized & libp2p-based, IPFS-compatible \\
EasyTier & Decentralized mesh & Rust-based, multi-protocol \\
VpnCloud & Decentralized mesh & Lightweight, kernel bypass option \\
WireGuard & Point-to-point & Reference baseline (not a mesh VPN) \\
\hline
Internal (no VPN) & N/A & Baseline for raw network performance \\
\hline
\end{tabular}
\end{table}
WireGuard is included as a reference point despite not being a mesh VPN.
Its minimal overhead and widespread adoption make it a useful comparison
for understanding the cost of mesh coordination and NAT traversal logic.
\subsection{Selection Criteria}
VPNs were selected based on:
\begin{itemize}
\item \textbf{NAT traversal capability:} All selected VPNs can establish
connections between peers behind NAT without manual port forwarding.
\item \textbf{Decentralization:} Preference for solutions without mandatory
central servers, though coordinated-mesh VPNs were included for comparison.
\item \textbf{Active development:} Only VPNs with recent commits and
maintained releases were considered.
\item \textbf{Linux support:} All VPNs must run on Linux.
\end{itemize}
\subsection{Configuration Methodology}
Each VPN is built from source within the Nix flake, ensuring that all
dependencies are pinned to exact versions. VPNs not packaged in nixpkgs
(Hyprspace, EasyTier, VpnCloud, qperf) have dedicated build expressions
under \texttt{pkgs/} in the flake.
Cryptographic material (WireGuard keys, Nebula certificates, ZeroTier
identities) is generated deterministically via Clan's vars generator
system. For example, WireGuard keys are generated as:
\begin{verbatim}
wg genkey > "$out/private-key"
wg pubkey < "$out/private-key" > "$out/public-key"
\end{verbatim}
Generated keys are stored in version control under
\texttt{vars/per-machine/\{name\}/} and read at NixOS evaluation time,
making key material part of the reproducible configuration.
\section{Benchmark Suite}
The benchmark suite includes both synthetic throughput tests and
real-world workloads. This combination addresses a limitation of prior
work that relied exclusively on iperf3.
\subsection{Ping}
Measures round-trip latency and packet delivery reliability.
\begin{itemize}
\item \textbf{Method:} 100 ICMP echo requests at 200 ms intervals,
1-second per-packet timeout, repeated for 3 runs.
\item \textbf{Metrics:} RTT (min, avg, max, mdev), packet loss percentage,
per-packet RTTs.
\end{itemize}
\subsection{iPerf3}
Measures bulk data transfer throughput.
\textbf{TCP variant:} 30-second bidirectional test with RSA authentication
and zero-copy mode (\texttt{-Z}) to minimize CPU overhead.
\textbf{UDP variant:} Same configuration with unlimited target bandwidth
(\texttt{-b 0}) and 64-bit counters.
\textbf{Parallel TCP variant:} Tests concurrent mesh traffic by running
TCP streams on all machines simultaneously in a circular pattern
(A$\rightarrow$B, B$\rightarrow$C, C$\rightarrow$A) for 60 seconds.
This simulates contention across the mesh.
\begin{itemize}
\item \textbf{Metrics:} Throughput (bits/s), retransmits, congestion window,
jitter (UDP), packet loss (UDP).
\end{itemize}
\subsection{qPerf}
Measures connection-level performance rather than bulk throughput.
\begin{itemize}
\item \textbf{Method:} One qperf instance per CPU core in parallel, each
running for 30 seconds. Bandwidth from all cores is summed per second.
\item \textbf{Metrics:} Total bandwidth (Mbps), CPU usage, time to first
byte (TTFB), connection establishment time.
\end{itemize}
\subsection{RIST Video Streaming}
Measures real-time multimedia streaming performance.
\begin{itemize}
\item \textbf{Method:} The sender generates a 4K (3840$\times$2160) test
pattern at 30 fps using ffmpeg with H.264 encoding (ultrafast preset,
zerolatency tuning) at 25 Mbps target bitrate. The stream is transmitted
over the RIST protocol to a receiver on the target machine for 30 seconds.
\item \textbf{Encoding metrics:} Actual bitrate, frame rate, dropped frames.
\item \textbf{Network metrics:} Packets dropped, packets recovered via
RIST retransmission, RTT, quality score (0--100), received bitrate.
\end{itemize}
RIST (Reliable Internet Stream Transport) is a protocol designed for
low-latency video contribution over unreliable networks, making it a
realistic test of VPN behavior under multimedia workloads.
\subsection{Nix Cache Download}
Measures sustained download performance using a real-world workload.
\begin{itemize}
\item \textbf{Method:} A Harmonia Nix binary cache server on the target
machine serves the Firefox package. The client downloads it via
\texttt{nix copy} through the VPN. Benchmarked with hyperfine:
1 warmup run followed by 2 timed runs. The local cache and Nix's
SQLite metadata are cleared between runs.
\item \textbf{Metrics:} Mean duration (seconds), standard deviation,
min/max duration.
\end{itemize}
This benchmark tests realistic HTTP traffic patterns and sustained
sequential download performance, complementing the synthetic throughput
tests.
\section{Network Impairment Profiles}
Four impairment profiles simulate a range of network conditions, from
ideal to severely degraded. Impairments are applied via Linux traffic
control (\texttt{tc netem}) on every machine's primary interface.
Table~\ref{tab:impairment_profiles} shows the per-machine values;
effective round-trip impairment is approximately doubled.
\begin{table}[H]
\centering
\caption{Network impairment profiles (per-machine egress values)}
\label{tab:impairment_profiles}
\begin{tabular}{lccccc}
\hline
\textbf{Profile} & \textbf{Latency} & \textbf{Jitter} &
\textbf{Loss} & \textbf{Reorder} & \textbf{Correlation} \\
\hline
Baseline & --- & --- & --- & --- & --- \\
Low & 2 ms & 2 ms & 0.25\% & 0.5\% & 25\% \\
Medium & 4 ms & 7 ms & 1.0\% & 2.5\% & 50\% \\
High & 12 ms & 30 ms & 5.0\% & 10\% & 50\% \\
\hline
\end{tabular}
\end{table}
The ``Low'' profile approximates a well-provisioned continental
connection, ``Medium'' represents intercontinental links or congested
networks, and ``High'' simulates severely degraded conditions such as
satellite links or highly congested mobile networks.
A 30-second stabilization period follows TC application before
measurements begin, allowing queuing disciplines to settle.
\section{Experimental Procedure}
\subsection{Automation}
The benchmark suite is fully automated via a Python orchestrator
(\texttt{vpn\_bench/}). For each VPN under test, the orchestrator:
\begin{enumerate}
\item Cleans all state directories from previous VPN runs
\item Deploys the VPN configuration to all machines via Clan
\item Restarts the VPN service on every machine (with retry:
up to 3 attempts, 2-second backoff)
\item Verifies VPN connectivity via a connection-check service
(120-second timeout)
\item For each impairment profile:
\begin{enumerate}
\item Applies TC rules via context manager (guarantees cleanup)
\item Waits 30 seconds for stabilization
\item Executes all benchmarks
\item Clears TC rules
\end{enumerate}
\item Collects results and metadata
\end{enumerate}
\subsection{Retry Logic}
Tests use a retry wrapper with up to 2 retries (3 total attempts),
5-second initial delay, and 700-second maximum total time. The number
of attempts is recorded in test metadata so that retried results can
be identified during analysis.
\subsection{Statistical Analysis}
Each metric is summarized as a statistics dictionary containing:
\begin{itemize}
\item \textbf{min / max:} Extreme values observed
\item \textbf{average:} Arithmetic mean across samples
\item \textbf{p25 / p50 / p75:} Quartiles via \texttt{statistics.quantiles()}
\end{itemize}
Multi-run tests (ping, nix-cache) aggregate across runs. Per-second
tests (qperf, RIST) aggregate across all per-second samples.
The approach uses empirical percentiles rather than parametric
confidence intervals, which is appropriate for benchmark data that
may not follow a normal distribution. The nix-cache test (via hyperfine)
additionally reports standard deviation.
\section{Reproducibility}
Reproducibility is ensured at every layer of the experimental stack.
\subsection{Dependency Pinning}
Every external dependency is pinned via \texttt{flake.lock}, which records
cryptographic hashes (\texttt{narHash}) and commit SHAs for each input.
Key pinned inputs include:
\begin{itemize}
\item \textbf{nixpkgs:} Follows \texttt{clan-core/nixpkgs}, ensuring a
single version across the dependency graph
\item \textbf{clan-core:} The Clan framework, pinned to a specific commit
\item \textbf{VPN sources:} Hyprspace, EasyTier, Nebula locked to exact commits
\item \textbf{Build infrastructure:} flake-parts, treefmt-nix, disko,
nixos-facter-modules
\end{itemize}
Custom packages not in nixpkgs (qperf, VpnCloud, iperf with auth patches,
phantun, EasyTier, Hyprspace) are built from source within the flake.
\subsection{Declarative System Configuration}
Each benchmark machine runs NixOS, where the entire operating system is
defined declaratively. There is no imperative package installation or
configuration drift. Given the same NixOS configuration, two machines
will have identical software, services, and kernel parameters.
Machine deployment is atomic: the system either switches to the new
configuration entirely or rolls back.
\subsection{Inventory-Driven Topology}
Clan's inventory system maps machines to service roles declaratively.
For each VPN, the orchestrator writes an inventory entry assigning
machines to roles (e.g., Nebula lighthouse vs.\ peer). The Clan module
system translates this into NixOS configuration---systemd services,
firewall rules, peer lists, and key references. The same inventory
entry always produces the same NixOS configuration.
\subsection{State Isolation}
Before installing a new VPN, the orchestrator deletes all state
directories from previous runs, including VPN-specific directories
(\texttt{/var/lib/zerotier-one}, \texttt{/var/lib/nebula}, etc.) and
benchmark directories. This prevents cross-contamination between tests.
\subsection{Data Provenance}
Every test result includes metadata recording:
\begin{itemize}
\item Wall-clock duration
\item Number of attempts (1 = first try succeeded)
\item VPN restart attempts and duration
\item Connectivity wait duration
\item Source and target machine names
\item Service logs (on failure)
\end{itemize}
Results are organized hierarchically by VPN, TC profile, and machine
pair. Each profile directory contains a \texttt{tc\_settings.json}
snapshot of the exact impairment parameters applied.
\section{Related Work}
\subsection{Nix: A Safe and Policy-Free System for Software Deployment}
Nix addresses significant issues in software deployment by utilizing
a technique that employs cryptographic
hashes to ensure unique paths for component instances \cite{dolstra_nix_2004}.
The system is distinguished by its features, such as concurrent
installation of multiple versions and variants,
atomic upgrades, and safe garbage collection.
These capabilities lead to a flexible deployment system that
harmonizes source and binary deployments.
Nix conceptualizes deployment without imposing rigid policies,
thereby offering adaptable strategies for component management.
This contrasts with many prevailing systems that are constrained by
policy-specific designs,
making Nix an easily extensible, safe and versatile deployment solution
for configuration files and software.
As Clan makes extensive use of Nix for deployment, understanding the
foundations and principles of Nix is crucial for evaluating inner workings.
cryptographic hashes to ensure unique paths for component instances
\cite{dolstra_nix_2004}. Features such as concurrent installation of
multiple versions, atomic upgrades, and safe garbage collection make
Nix a flexible deployment system. This work uses Nix to ensure that
all VPN builds and system configurations are deterministic.
\subsection{NixOS: A Purely Functional Linux Distribution}
NixOS is an extension of the principles established by Nix,
presenting a Linux distribution that manages system configurations
using purely functional methods \cite{dolstra_nixos_2008}. This model
ensures that system
configurations are reproducible and isolated
from stateful interactions typical in imperative models of package management.
Because NixOS configurations are built by pure functions, they can overcome the
challenges of easily rolling back changes, deploying multiple package versions
side-by-side, and achieving deterministic configuration reproduction .
The solution is particularly compelling in environments necessitating rigorous
reproducibility and minimal configuration drift—a valuable feature
for distributed networks .
Clan also leverages NixOS for system configuration and deployment,
making it essential to understand how NixOS's functional model works.
\subsection{Disnix: A Toolset for Distributed Deployment}
Disnix extends the Nix philosophy to the challenge of distributed
deployment, offering a toolset that enables system administrators and
developers to perform automatic deployment of service-oriented
systems across a network of machines \cite{van_der_burg_disnix_2014}.
Disnix leverages the features of Nix to manage complex intra-dependencies.
Meaning dependencies that exist on a network level instead on a binary level.
The overlap with the Clan framework is evident in the focus on deployment, how
they differ will be explored in the evaluation of Clan's deployment system.
\subsection{State of the Art in Software Defined Networking}
The work by Bakhshi \cite{bakhshi_state_2017} surveys the
foundational principles and recent developments in Software Defined
Networking (SDN). It describes SDN as a paradigm that separates the
control plane from the data plane, enabling centralized, programmable
control over network behavior. The paper focuses on the architectural
components of SDN, including the three-layer abstraction model—the
application layer, control layer, and data layer—and highlights the
role of SDN controllers such as OpenDaylight, Floodlight, and Ryu.
A key contribution of the paper is its identification of challenges
and open research questions in SDN. These include issues related to
scalability, fault tolerance, and the security risks introduced by
centralized control.
This work is relevant to evaluating Clans role as a
Software Defined Network deployment tool and as a
comparison point against the state of the art.
\subsection{Low Maintenance Peer-to-Peer Overlays}
Structured Peer-to-Peer (P2P) overlay networks offer scalability and
efficiency but often require significant maintenance to handle
challenges such as peer churn and mismatched logical and physical
topologies. Shukla et al. propose a novel approach to designing
Distributed Hash Table (DHT)-based P2P overlays by integrating
Software Defined Networks (SDNs) to dynamically adjust
application-specific network policies and rules
\cite{shukla_towards_2021}. This method reduces maintenance overhead
by aligning overlay topology with the underlying physical network,
thus improving performance and reducing communication costs.
The relevance of this work to Clan lies in its addressing of
operational complexity in managing P2P networks.
NixOS extends Nix principles to Linux system configuration
\cite{dolstra_nixos_2008}. System configurations are reproducible and
isolated from stateful interactions typical in imperative package
management. This property is essential for ensuring identical test
environments across benchmark runs.
\subsection{Full-Mesh VPN Performance Evaluation}
The work by Kjorveziroski et al. \cite{kjorveziroski_full-mesh_2024}
provides a comprehensive evaluation of full-mesh VPN solutions,
specifically focusing on their use as underlay networks for
distributed systems, such as Kubernetes clusters. Their benchmarks
analyze the performance of VPNs with built-in NAT traversal
capabilities, including ZeroTier, emphasizing throughput, reliability
under packet loss, and behavior when relay mechanisms are used. For
the Clan framework, these insights are particularly relevant in
assessing the performance and scalability of its Overlay Networks
component. By benchmarking ZeroTier alongside its peers, the paper
offers an established reference point for evaluating how Mesh VPN
solutions like ZeroTier perform under conditions similar to the
intricacies of peer-to-peer systems managed by Clan.
Kjorveziroski et al.\ \cite{kjorveziroski_full-mesh_2024} provide a
comprehensive evaluation of full-mesh VPN solutions for distributed
systems. Their benchmarks analyze throughput, reliability under packet
loss, and relay behavior for VPNs including ZeroTier.
\subsection{AMC: Towards Trustworthy and Explorable CRDT Applications}
This thesis extends their work in several ways:
\begin{itemize}
\item Broader VPN selection with emphasis on fully decentralized
architectures
\item Real-world workloads (video streaming, package downloads)
beyond synthetic iperf3 tests
\item Multiple impairment profiles to characterize behavior under
varying network conditions
\item Fully reproducible experimental framework via Nix/NixOS/Clan
\end{itemize}
Jeffery and Mortier \cite{jeffery_amc_2023} present the Automerge
Model Checker (AMC), a tool aimed at verifying and dynamically
exploring the correctness of applications built on Conflict-Free
Replicated Data Types (CRDTs). Their work addresses critical
challenges associated with implementing and optimizing
operation-based (op-based) CRDTs, particularly emphasizing how these
optimizations can inadvertently introduce subtle bugs in distributed
systems despite rigorous testing methods like fuzz testing. As part
of their contributions, they implemented the "Automerge" library in
Rust, an op-based CRDT framework that exposes a JSON-like API and
supports local-first and asynchronous collaborative operations.
\subsection{Low Maintenance Peer-to-Peer Overlays}
This paper is particularly relevant to the development and evaluation
of the Data Mesher component of the Clan framework, which utilizes
state-based (or value-based) CRDTs for synchronizing distributed data
across peer-to-peer nodes. While Automerge addresses issues pertinent
to op-based CRDTs, the discussion on verification techniques, edge
case handling, and model-checking methodologies provides
cross-cutting insights to the complexities of ops based CRDTs and is
a good argument for using simpler state based CRDTs.
\subsection{Keep CALM and CRDT On}
The work by Laddad et al. \cite{laddad_keep_2022} complements and
expands upon concepts presented in the AMC paper. By revisiting the
foundations of CRDTs, the authors address limitations related to
reliance on eventual consistency and propose techniques to
distinguish between safe and unsafe queries using monotonicity
results derived from the CALM Theorem. This inquiry is highly
relevant for the Data Mesher component of Clan, as it delves into
operational and observable consistency guarantees that can optimize
both efficiency and safety in distributed query execution.
Specifically, the insights on query models and coordination-free
approaches advance the understanding of how CRDT-based systems, like
the Data Mesher, manage distributed state effectively without
compromising safety guarantees.
Shukla et al.\ propose integrating Software Defined Networks with
DHT-based P2P overlays to reduce maintenance overhead
\cite{shukla_towards_2021}. Their work on aligning overlay topology
with physical networks is relevant to understanding the performance
characteristics of mesh VPNs that must discover and maintain peer
connectivity dynamically.