Compare commits

...

3 Commits

Author SHA1 Message Date
8a6d676e93 Improved writing 2026-03-18 23:57:01 +01:00
f3cf653ab5 Improved Writing made it less verbouse 2026-03-18 21:50:30 +01:00
64aeeb5772 Improved Writing made it less verbouse 2026-03-18 21:50:15 +01:00
5 changed files with 206 additions and 202 deletions

View File

@@ -2,38 +2,33 @@
\label{Introduction} \label{Introduction}
Peer-to-peer overlay VPNs promise to restore genuine decentralization Peer-to-peer overlay VPNs allow nodes to connect directly regardless
by enabling direct connectivity between nodes regardless of NAT or of NAT or firewall restrictions. Yet practitioners choosing among the
firewall restrictions. Yet practitioners choosing among the growing growing number of mesh VPN implementations must rely largely on
number of mesh VPN implementations must rely largely on anecdotal anecdotal evidence: systematic, reproducible comparisons under
evidence: systematic, reproducible comparisons under realistic realistic conditions are scarce.
conditions are scarce.
This thesis addresses that gap. We benchmark ten peer-to-peer VPN This thesis addresses that gap. We benchmark ten peer-to-peer VPN
implementations across seven workloads and four network impairment implementations across seven workloads and four network impairment
profiles, yielding over 300 unique measurements. We complement these profiles. We complement these performance benchmarks with a source
performance benchmarks with a source code analysis of each code analysis of each implementation, verified by the respective
implementation, verified through direct engagement with the respective maintainers. The entire
maintainers. The entire experimental framework is built on Nix, NixOS, experimental framework is built on Nix, NixOS, and the Clan deployment
and the Clan deployment system, making every result independently system, so every result is independently reproducible.
reproducible.
\section{Motivation} \section{Motivation}
Peer-to-peer architectures promise censorship-resistant, fault-tolerant Peer-to-peer architectures can provide censorship-resistant,
infrastructure by eliminating single points of failure fault-tolerant infrastructure because they have no single point of
\cite{shukla_towards_2021}. failure \cite{shukla_towards_2021}. Blockchain platforms like Ethereum
These architectures underpin a growing range of systems, from IoT depend on this property, as do IoT edge networks and content delivery
edge computing and content delivery networks to blockchain platforms systems. But these benefits only hold when nodes are spread across
like Ethereum. diverse hosting entities.
Yet realizing these benefits requires distributing nodes across
genuinely diverse hosting entities.
In practice, this diversity remains illusory. In practice, this diversity remains illusory.
Amazon, Hetzner, and OVH collectively host 70\% of all Ethereum nodes Amazon, Hetzner, and OVH collectively host 70\% of all Ethereum nodes
(see Figure~\ref{fig:ethernodes_hosting}), (see Figure~\ref{fig:ethernodes_hosting}), so nominally decentralized
concentrating nominally decentralized infrastructure infrastructure actually sits in a handful of cloud providers.
within a handful of cloud providers.
More concerning, these providers operate under overlapping regulatory More concerning, these providers operate under overlapping regulatory
jurisdictions, jurisdictions,
predominantly the United States and the European Union. predominantly the United States and the European Union.
@@ -49,108 +44,96 @@ data disclosure, or traffic manipulation across a majority of the network.
\label{fig:ethernodes_hosting} \label{fig:ethernodes_hosting}
\end{figure} \end{figure}
Why does this centralization persist despite the explicit goals of This centralization persists because self-hosting is hard. Cloud
decentralization? providers offer static IP addresses and publicly routable endpoints,
The answer lies in the practical barriers to self-hosting. which avoids the networking problems that residential and small-office
Cloud providers offer static IP addresses and publicly routable endpoints, deployments face.
eliminating the networking complexity that plagues residential and
small-office deployments.
Most internet-connected devices sit behind Network Address Translation (NAT), Most internet-connected devices sit behind Network Address Translation (NAT),
which prevents incoming connections without explicit port forwarding which prevents incoming connections without explicit port forwarding
or relay infrastructure. or relay infrastructure.
Combined with dynamic IP assignments from ISPs, maintaining stable Combined with dynamic IP assignments from ISPs, stable peer
peer connectivity connectivity from self-hosted infrastructure has traditionally
from self-hosted infrastructure traditionally required significant required significant technical expertise.
technical expertise.
Overlay VPNs offer a solution to this fundamental barrier. Overlay VPNs solve this problem. They establish encrypted tunnels
By establishing encrypted tunnels that traverse NAT boundaries, that traverse NAT boundaries, so peers can connect directly without
mesh VPNs enable direct peer-to-peer connectivity without requiring static IP addresses or manual firewall configuration. Each node
static IP addresses or manual firewall configuration. receives a stable virtual address within the overlay network,
Each node receives a stable virtual address within the overlay network, regardless of its physical network topology. A device behind
regardless of its underlying network topology. consumer-grade NAT can therefore participate as a first-class peer
In practice, this means a device behind consumer-grade NAT can in a distributed system.
participate as a first-class peer in a distributed system,
removing the primary technical advantage that cloud providers hold.
The Clan deployment framework builds on this foundation. The Clan deployment framework uses Nix and NixOS to eliminate
Clan uses Nix and NixOS to eliminate configuration drift and configuration drift and dependency conflicts, which makes it
dependency conflicts, reducing operational overhead enough for a practical for a single administrator to self-host distributed
single administrator to reliably self-host complex distributed
services. services.
Overlay VPNs are central to Clan's architecture, Overlay VPNs are central to Clan's architecture: they supply the
providing the secure peer connectivity that enables nodes peer connectivity that lets nodes form a network regardless of
to form cohesive networks regardless of their physical location or physical location or NAT situation.
NAT situation. As illustrated in Figure~\ref{fig:vision-stages}, Clan plans to offer
As illustrated in Figure~\ref{fig:vision-stages}, Clan envisions a web interface that lets users design and deploy private P2P networks
a web interface that enables users to design and deploy private P2P networks with minimal configuration, assisted by an integrated LLM.
with minimal configuration, assisted by an integrated LLM
for contextual guidance and troubleshooting.
During the development of Clan, a recurring challenge became apparent: During Clan's development, a recurring problem surfaced:
practitioners held divergent preferences for mesh VPN solutions, practitioners disagreed on which mesh VPN to use, each pointing to
each citing different edge cases where their chosen VPN different edge cases where their preferred VPN failed or lacked a
proved unreliable or lacked essential features. needed feature. These discussions relied on anecdotal evidence rather
These discussions were grounded in anecdotal evidence rather than than systematic evaluation, which motivated the present work.
systematic evaluation, motivating the present work.
\subsection{Related Work} \subsection{Related Work}
Existing research offers only partial coverage of this space. Existing research offers only partial coverage of this space.
Lackorzynski et al.\ \cite{lackorzynski_comparative_2019} benchmark Lackorzynski et al.\ \cite{lackorzynski_comparative_2019} benchmark
OpenVPN, IPSec, Tinc, Freelan, MACsec, and WireGuard in the context OpenVPN, IPSec, Tinc, Freelan, MACsec, and WireGuard in the context
of industrial communication systems, measuring point-to-point of industrial communication systems. They measure point-to-point
throughput, latency, and CPU overhead. Their work does not address throughput, latency, and CPU overhead but do not address overlay
overlay network behavior such as NAT traversal or dynamic peer discovery. network behavior such as NAT traversal or dynamic peer discovery.
The most closely related study by Kjorveziroski et al.\ The most closely related study by Kjorveziroski et al.\
\cite{kjorveziroski_full-mesh_2024} evaluates full-mesh VPN solutions \cite{kjorveziroski_full-mesh_2024} evaluates full-mesh VPN solutions
for distributed systems, analyzing throughput, reliability under packet for distributed systems, looking at throughput, reliability under
loss, and relay behavior for VPNs including ZeroTier. However, it packet loss, and relay behavior for VPNs including ZeroTier. However,
focuses primarily on solutions with a central point of failure and it focuses primarily on solutions with a central point of failure and
limits its workloads to synthetic iperf3 tests. This thesis extends limits its workloads to synthetic iperf3 tests.
that foundation by evaluating a broader set of VPN implementations
with emphasis on fully decentralized architectures, exercising them
under real-world workloads such as video streaming and package
downloads, applying multiple network impairment profiles, and
providing a fully reproducible experimental framework built on
Nix, NixOS, and Clan.
Beyond filling this research gap, a further goal was to create a fully This thesis extends that work in several directions. It evaluates a
automated benchmarking framework capable of generating a public broader set of VPN implementations with emphasis on fully
leaderboard, similar in spirit to the js-framework-benchmark decentralized architectures and tests them under application-level
(see Figure~\ref{fig:js-framework-benchmark}). By providing an workloads such as video streaming and package downloads. It also
accessible web interface with regularly updated applies multiple network impairment profiles and provides a
results, the framework gives VPN developers a concrete, public reproducible experimental framework built on Nix, NixOS, and Clan.
baseline to measure against.
A secondary goal was to create an automated benchmarking framework
that generates a public leaderboard, similar in spirit to the
js-framework-benchmark (see Figure~\ref{fig:js-framework-benchmark}).
A web interface with regularly updated results gives VPN developers a
concrete baseline to measure against.
\section{Research Contribution} \section{Research Contribution}
This thesis makes the following contributions: This thesis makes the following contributions:
\begin{enumerate} \begin{enumerate}
\item A comprehensive benchmark of ten peer-to-peer VPN \item A benchmark of ten peer-to-peer VPN implementations across
implementations across seven workloads (including real-world seven workloads and four network impairment profiles. The workloads
video streaming and package downloads) and four network include video streaming and package downloads alongside synthetic
impairment profiles, producing over 300 unique measurements. throughput tests.
\item A source code analysis of all ten VPN implementations, \item A source code analysis of all ten VPN implementations. Manual
combining manual code review with LLM-assisted analysis, code review was combined with LLM-assisted analysis and the results
followed by verification through direct engagement with the were verified by the respective maintainers on GitHub.
respective maintainers on GitHub. \item A reproducible experimental framework built on Nix, NixOS,
\item A fully reproducible experimental framework built on and the Clan deployment system. Dependencies are pinned and system
Nix, NixOS, and the Clan deployment system, with pinned configuration is declarative, down to deterministic cryptographic
dependencies, declarative system configuration, and material generation. Every result can be independently replicated.
deterministic cryptographic material generation, enabling \item A performance analysis showing that Tailscale outperforms the
independent replication of all results. Linux kernel's default networking stack under degraded conditions,
\item A performance analysis demonstrating that Tailscale and that kernel parameter tuning (Reno congestion control in place
outperforms the Linux kernel's default networking stack under of CUBIC, with RACK disabled) yields measurable throughput
degraded conditions, and that kernel parameter tuning (Reno improvements.
congestion control in place of CUBIC, with RACK
disabled) yields measurable throughput improvements.
\item The discovery of several security vulnerabilities across \item The discovery of several security vulnerabilities across
the evaluated VPN implementations. the evaluated VPN implementations.
\item An automated benchmarking framework designed for public \item An automated benchmarking framework that produces a public
leaderboard generation, intended to encourage ongoing leaderboard, giving VPN developers a target to optimize
optimization by VPN developers. against.
\end{enumerate} \end{enumerate}
\begin{figure}[H] \begin{figure}[H]

View File

@@ -7,11 +7,9 @@
This chapter describes the methodology used to benchmark and analyze This chapter describes the methodology used to benchmark and analyze
peer-to-peer mesh VPN implementations. The evaluation combines peer-to-peer mesh VPN implementations. The evaluation combines
performance benchmarking under controlled network conditions with a performance benchmarking under controlled network conditions with a
structured source code analysis of each implementation. The structured source code analysis of each implementation. All
benchmarking framework prioritizes reproducibility at every layer, dependencies, system configurations, and test procedures are pinned
from pinned dependencies and declarative system configuration to or declared so that the experiments can be independently reproduced.
automated test orchestration, enabling independent verification of
results and facilitating future comparative studies.
\section{Experimental Setup} \section{Experimental Setup}
@@ -29,19 +27,19 @@ identical specifications:
RDRAND, SSE4.2 RDRAND, SSE4.2
\end{itemize} \end{itemize}
The presence of hardware cryptographic acceleration is relevant because Results may differ on systems without hardware cryptographic
many VPN implementations use AES-NI for encryption, and the results acceleration, since most of the tested VPNs offload encryption to
may differ on systems without these features. AES-NI.
\subsection{Network Topology} \subsection{Network Topology}
The three machines are connected via a direct 1 Gbps LAN on the same The three machines are connected via a direct 1 Gbps LAN on the same
network segment. Each machine has a publicly reachable IPv4 address, network segment. Each machine has a publicly reachable IPv4 address,
which is used to deploy configuration changes via Clan. This baseline which is used to deploy configuration changes via Clan. On this
topology provides a controlled environment with minimal latency and no baseline topology, latency is sub-millisecond and there is no packet
packet loss, allowing the overhead introduced by each VPN implementation loss, so measured overhead can be attributed to the VPN itself.
to be measured in isolation. Figure~\ref{fig:mesh_topology} illustrates Figure~\ref{fig:mesh_topology} illustrates the full-mesh connectivity
the full-mesh connectivity between the three machines. between the three machines.
\begin{figure}[H] \begin{figure}[H]
\centering \centering
@@ -74,8 +72,8 @@ double the per-machine values.
\subsection{Configuration Methodology} \subsection{Configuration Methodology}
Each VPN is built from source within the Nix flake, ensuring that all Each VPN is built from source within the Nix flake, with all
dependencies are pinned to exact versions. VPNs not packaged in nixpkgs dependencies pinned to exact versions. VPNs not packaged in nixpkgs
(Hyprspace, EasyTier, VpnCloud) have dedicated build expressions (Hyprspace, EasyTier, VpnCloud) have dedicated build expressions
under \texttt{pkgs/} in the flake. under \texttt{pkgs/} in the flake.
@@ -85,13 +83,14 @@ system.
Generated keys are stored in version control under Generated keys are stored in version control under
\texttt{vars/per-machine/\{name\}/} and read at NixOS evaluation time, \texttt{vars/per-machine/\{name\}/} and read at NixOS evaluation time,
making key material part of the reproducible configuration. so key material is part of the reproducible configuration.
\section{Benchmark Suite} \section{Benchmark Suite}
The benchmark suite includes both synthetic throughput tests and The benchmark suite includes synthetic throughput tests and
real-world workloads. This combination addresses a limitation of prior application-level workloads. Prior comparative work relied exclusively
work that relied exclusively on iperf3. on iperf3; the additional benchmarks here capture behavior that
iperf3 alone misses.
Table~\ref{tab:benchmark_suite} summarises each benchmark. Table~\ref{tab:benchmark_suite} summarises each benchmark.
\begin{table}[H] \begin{table}[H]
@@ -114,8 +113,8 @@ Table~\ref{tab:benchmark_suite} summarises each benchmark.
\end{tabular} \end{tabular}
\end{table} \end{table}
The first four benchmarks use well-known network testing tools; The first four benchmarks use standard network testing tools;
the remaining three target workloads closer to real-world usage. the remaining three test application-level workloads.
The subsections below describe configuration details that the table The subsections below describe configuration details that the table
does not capture. does not capture.
@@ -133,48 +132,49 @@ counters.
\subsection{Parallel iPerf3} \subsection{Parallel iPerf3}
Runs TCP streams on all three machines simultaneously in a circular Runs one bidirectional TCP stream on all three machine pairs
pattern (A$\rightarrow$B, B$\rightarrow$C, C$\rightarrow$A) for simultaneously in a circular pattern (A$\rightarrow$B,
60 seconds with zero-copy (\texttt{-Z}). This creates contention B$\rightarrow$C, C$\rightarrow$A) for 60 seconds with zero-copy
across the overlay network, stressing shared resources that (\texttt{-Z}). The three concurrent bidirectional links produce six
single-stream tests leave idle. unidirectional flows in total. This contention stresses shared
resources that single-stream tests leave idle.
\subsection{QPerf} \subsection{QPerf}
Spawns one qperf process per CPU core, each running for 30 seconds. Spawns one qperf process per CPU core, each running for 30 seconds.
Per-core bandwidth is summed per second. Unlike the iPerf3 tests, Per-core bandwidth is summed per second. In addition to throughput,
QPerf targets QUIC connection-level performance, capturing time to QPerf reports time to first byte and connection establishment time,
first byte and connection establishment time alongside throughput. which iPerf3 does not measure.
\subsection{RIST Video Streaming} \subsection{RIST Video Streaming}
Generates a 4K ($3840\times2160$) H.264 test pattern at 30\,fps Generates a 4K ($3840\times2160$) H.264 test pattern at 30\,fps
(ultrafast preset, zerolatency tuning, 25\,Mbps target bitrate) with (ultrafast preset, zerolatency tuning, 25\,Mbps bitrate cap) with
ffmpeg and transmits it over the RIST protocol for 30 seconds. RIST ffmpeg and transmits it over the RIST protocol for 30 seconds. Because
(Reliable Internet Stream Transport) is designed for low-latency the synthetic test pattern is highly compressible, the actual encoding
video contribution over unreliable networks, making it a realistic bitrate is approximately 3.3\,Mbps, well below the configured cap. RIST
test of VPN behavior under multimedia workloads. In addition to (Reliable Internet Stream Transport) is a protocol for low-latency
standard network metrics, the benchmark records encoding-side video contribution over unreliable networks. The benchmark records
statistics (actual bitrate, frame rate, dropped frames) and encoding-side statistics (actual bitrate, frame rate, dropped frames)
RIST-specific counters (packets recovered via retransmission, quality and RIST-specific counters (packets recovered via retransmission,
score). quality score).
\subsection{Nix Cache Download} \subsection{Nix Cache Download}
A Harmonia Nix binary cache server on the target machine serves the A Harmonia Nix binary cache server on the target machine serves the
Firefox package. The client downloads it via \texttt{nix copy} Firefox package. The client downloads it via \texttt{nix copy}
through the VPN, exercising many small HTTP requests rather than a through the VPN. Unlike the iPerf3 tests, this workload issues many
single bulk transfer. Benchmarked with hyperfine (1 warmup run, short-lived HTTP requests instead of a single bulk transfer.
2 timed runs); the local Nix store and SQLite metadata are cleared Benchmarked with hyperfine (1 warmup run, 2 timed runs); the local
between runs. Nix store and SQLite metadata are cleared between runs.
\section{Network Impairment Profiles} \section{Network Impairment Profiles}
To evaluate VPN performance under different network conditions, four Four impairment profiles simulate progressively worse network
impairment profiles are defined, ranging from an unmodified baseline conditions, from an unmodified baseline to a severely degraded link.
to a severely degraded link. All impairments are injected with Linux All impairments are injected with Linux traffic control
traffic control (\texttt{tc netem}) on the egress side of every (\texttt{tc netem}) on the egress side of every machine's primary
machine's primary interface. interface.
Table~\ref{tab:impairment_profiles} lists the per-machine values. Table~\ref{tab:impairment_profiles} lists the per-machine values.
Because impairments are applied on both ends of a connection, the Because impairments are applied on both ends of a connection, the
effective round-trip impact is roughly double the listed values. effective round-trip impact is roughly double the listed values.
@@ -222,14 +222,14 @@ aspect of the simulated degradation:
\end{itemize} \end{itemize}
A 30-second stabilization period follows TC application before A 30-second stabilization period follows TC application before
measurements begin, allowing queuing disciplines to settle. measurements begin so that queuing disciplines can settle.
\section{Experimental Procedure} \section{Experimental Procedure}
\subsection{Automation} \subsection{Automation}
The benchmark suite is fully automated via a Python orchestrator A Python orchestrator (\texttt{vpn\_bench/}) automates the full
(\texttt{vpn\_bench/}). For each VPN under test, the orchestrator: benchmark suite. For each VPN under test, it:
\begin{enumerate} \begin{enumerate}
\item Cleans all state directories from previous VPN runs \item Cleans all state directories from previous VPN runs
@@ -327,11 +327,12 @@ Each metric is summarized as a statistics dictionary containing:
Aggregation differs by benchmark type. Benchmarks that execute Aggregation differs by benchmark type. Benchmarks that execute
multiple discrete runs, ping (3 runs of 100 packets each) and multiple discrete runs, ping (3 runs of 100 packets each) and
nix-cache (2 timed runs via hyperfine), first compute statistics nix-cache (2 timed runs via hyperfine), first compute statistics
within each run, then average the resulting statistics across runs. within each run, then aggregate across runs: averages and percentiles
Concretely, if ping produces three runs with mean RTTs of are averaged, while the reported minimum and maximum are the global
5.1, 5.3, and 5.0\,ms, the reported average is the mean of extremes across all runs. Concretely, if ping produces three runs
those three values (5.13\,ms). The reported minimum is the with mean RTTs of 5.1, 5.3, and 5.0\,ms, the reported average is
single lowest RTT observed across all three runs. the mean of those three values (5.13\,ms). The reported minimum is
the single lowest RTT observed across all three runs.
Benchmarks that produce continuous per-second samples, qperf and Benchmarks that produce continuous per-second samples, qperf and
RIST streaming for example, pool all per-second measurements from a single RIST streaming for example, pool all per-second measurements from a single
@@ -340,9 +341,9 @@ bandwidth is first summed across CPU cores for each second, and
statistics are then computed over the resulting time series. statistics are then computed over the resulting time series.
The analysis reports empirical percentiles (p25, p50, p75) alongside The analysis reports empirical percentiles (p25, p50, p75) alongside
min/max bounds rather than parametric confidence intervals. This min/max bounds rather than parametric confidence intervals.
choice is deliberate: benchmark latency and throughput distributions Benchmark latency and throughput distributions are often skewed or
are often skewed or multimodal, making assumptions of normality multimodal, so parametric assumptions of normality would be
unreliable. The interquartile range (p25--p75) conveys the spread of unreliable. The interquartile range (p25--p75) conveys the spread of
typical observations, while min and max capture outlier behavior. typical observations, while min and max capture outlier behavior.
The nix-cache benchmark additionally reports standard deviation via The nix-cache benchmark additionally reports standard deviation via
@@ -350,9 +351,8 @@ hyperfine's built-in statistical output.
\section{Source Code Analysis} \section{Source Code Analysis}
To complement the performance benchmarks with architectural We also conducted a structured source code analysis of all ten VPN
understanding, we conducted a structured source code analysis of implementations. The analysis followed three phases.
all ten VPN implementations. The analysis followed three phases.
\subsection{Repository Collection and LLM-Assisted Overview} \subsection{Repository Collection and LLM-Assisted Overview}
@@ -378,23 +378,23 @@ aspects:
\end{itemize} \end{itemize}
Each agent was required to reference the specific file and line Each agent was required to reference the specific file and line
range supporting every claim, enabling direct verification. range supporting every claim so that outputs could be verified
against the source.
\subsection{Manual Verification} \subsection{Manual Verification}
The LLM-generated overviews served as a navigational aid rather than The LLM-generated overviews served as a navigational aid rather than
a trusted source. The most important code paths identified in each a trusted source. The most important code paths identified in each
overview were manually read and verified against the actual source overview were manually read and verified against the actual source
code, correcting inaccuracies and deepening the analysis where the code. Where the automated summaries were inaccurate or superficial,
automated summaries remained superficial. they were corrected and expanded.
\subsection{Feature Matrix and Maintainer Review} \subsection{Feature Matrix and Maintainer Review}
The findings from both the automated and manual analysis were The findings from both phases were consolidated into a feature matrix
consolidated into a feature matrix cataloguing 131 features across of 131 features across all ten VPN implementations, covering protocol
all ten VPN implementations. The matrix covers characteristics, cryptographic primitives, NAT traversal strategies,
protocol characteristics, cryptographic primitives, NAT traversal routing behavior, and security properties.
strategies, routing behavior, and security properties.
The completed feature matrix was published and sent to the respective The completed feature matrix was published and sent to the respective
VPN maintainers for review. We incorporated their feedback as VPN maintainers for review. We incorporated their feedback as
@@ -402,7 +402,7 @@ corrections and clarifications to the final classification.
\section{Reproducibility} \section{Reproducibility}
The experimental stack pins or declares every variable that could The experimental stack pins or declares the variables that could
affect results. affect results.
\subsection{Dependency Pinning} \subsection{Dependency Pinning}
@@ -412,8 +412,8 @@ cryptographic hashes (\texttt{narHash}) and commit SHAs for each input.
Key pinned inputs include: Key pinned inputs include:
\begin{itemize} \begin{itemize}
\bitem{nixpkgs:} Follows \texttt{clan-core/nixpkgs}, ensuring a \bitem{nixpkgs:} Follows \texttt{clan-core/nixpkgs}, so a single
single version across the dependency graph version is used across the dependency graph
\bitem{clan-core:} The Clan framework, pinned to a specific commit \bitem{clan-core:} The Clan framework, pinned to a specific commit
\bitem{VPN sources:} Hyprspace, EasyTier, Nebula locked to \bitem{VPN sources:} Hyprspace, EasyTier, Nebula locked to
exact commits exact commits
@@ -527,9 +527,8 @@ VPNs were selected based on:
\bitem{Linux support:} All VPNs must run on Linux. \bitem{Linux support:} All VPNs must run on Linux.
\end{itemize} \end{itemize}
Ten VPN implementations were selected for evaluation, spanning a range Table~\ref{tab:vpn_selection} lists the ten VPN implementations
of architectures from centralized coordination to fully decentralized selected for evaluation.
mesh topologies. Table~\ref{tab:vpn_selection} summarizes the selection.
\begin{table}[H] \begin{table}[H]
\centering \centering
@@ -556,7 +555,7 @@ mesh topologies. Table~\ref{tab:vpn_selection} summarizes the selection.
\end{tabular} \end{tabular}
\end{table} \end{table}
WireGuard is included as a reference point despite not being a mesh VPN. WireGuard is not a mesh VPN but is included as a reference point.
Its minimal overhead and widespread adoption make it a useful comparison Comparing its overhead to the mesh VPNs isolates the cost of mesh
for understanding the cost of mesh coordination and NAT traversal logic. coordination and NAT traversal.

View File

@@ -10,8 +10,9 @@ follows the impairment profiles from ideal to degraded:
Section~\ref{sec:baseline} establishes overhead under ideal Section~\ref{sec:baseline} establishes overhead under ideal
conditions, then subsequent sections examine how each VPN responds to conditions, then subsequent sections examine how each VPN responds to
increasing network impairment. The chapter concludes with findings increasing network impairment. The chapter concludes with findings
from the source code analysis. A recurring theme throughout is that from the source code analysis. A recurring theme is that no single
no single metric captures VPN performance; the rankings shift metric captures VPN
performance; the rankings shift
depending on whether one measures throughput, latency, retransmit depending on whether one measures throughput, latency, retransmit
behavior, or real-world application performance. behavior, or real-world application performance.
@@ -184,7 +185,7 @@ opposite extreme: brute-force retransmission can still yield high
throughput (814\,Mbps with 1\,163 retransmits), at the cost of wasted throughput (814\,Mbps with 1\,163 retransmits), at the cost of wasted
bandwidth and unstable flow behavior. bandwidth and unstable flow behavior.
VpnCloud warrants specific attention: its sender reports 538.8\,Mbps VpnCloud stands out: its sender reports 538.8\,Mbps
but the receiver measures only 413.4\,Mbps, leaving a 23\,\% gap (the largest but the receiver measures only 413.4\,Mbps, leaving a 23\,\% gap (the largest
in the dataset). This suggests significant in-tunnel packet loss or in the dataset). This suggests significant in-tunnel packet loss or
buffering at the VpnCloud layer that the retransmit count (857) buffering at the VpnCloud layer that the retransmit count (857)
@@ -256,10 +257,10 @@ times, which cluster into three distinct ranges.
\end{table} \end{table}
Six VPNs stay below 1.3\,ms, comfortably close to the bare-metal Six VPNs stay below 1.3\,ms, comfortably close to the bare-metal
0.60\,ms. VpnCloud is a notable result: it posts the lowest latency 0.60\,ms. VpnCloud posts the lowest latency of any VPN (1.13\,ms), below
of any VPN (1.13\,ms), edging out WireGuard (1.20\,ms), yet its WireGuard (1.20\,ms), yet its throughput tops out at only 539\,Mbps.
throughput tops out at only 539\,Mbps. Low per-packet latency does Low per-packet latency does not guarantee high bulk throughput. A
not guarantee high bulk throughput. A second group (Headscale, second group (Headscale,
Hyprspace, Yggdrasil) lands in the 1.5--2.2\,ms range, representing Hyprspace, Yggdrasil) lands in the 1.5--2.2\,ms range, representing
moderate overhead. Then there is Mycelium at 34.9\,ms, so far moderate overhead. Then there is Mycelium at 34.9\,ms, so far
removed from the rest that Section~\ref{sec:mycelium_routing} gives removed from the rest that Section~\ref{sec:mycelium_routing} gives
@@ -289,8 +290,8 @@ the CPU, not the network, is the bottleneck.
Figure~\ref{fig:latency_throughput} makes this disconnect easy to Figure~\ref{fig:latency_throughput} makes this disconnect easy to
spot. spot.
Looking at CPU efficiency more broadly, the qperf measurements The qperf measurements also reveal a wide spread in CPU usage.
reveal a wide spread. Hyprspace (55.1\,\%) and Yggdrasil Hyprspace (55.1\,\%) and Yggdrasil
(52.8\,\%) consume 5--6$\times$ as much CPU as Internal's (52.8\,\%) consume 5--6$\times$ as much CPU as Internal's
9.7\,\%. WireGuard sits at 30.8\,\%, surprisingly high for a 9.7\,\%. WireGuard sits at 30.8\,\%, surprisingly high for a
kernel-level implementation, though much of that goes to kernel-level implementation, though much of that goes to
@@ -318,20 +319,21 @@ The single-stream benchmark tests one link direction at a time. The
parallel benchmark changes this setup: all three link directions parallel benchmark changes this setup: all three link directions
(lom$\rightarrow$yuki, yuki$\rightarrow$luna, (lom$\rightarrow$yuki, yuki$\rightarrow$luna,
luna$\rightarrow$lom) run simultaneously in a circular pattern for luna$\rightarrow$lom) run simultaneously in a circular pattern for
60~seconds, each carrying ten TCP streams. Because three independent 60~seconds, each carrying one bidirectional TCP stream (six
unidirectional flows in total). Because three independent
link pairs now compete for shared tunnel resources at once, the link pairs now compete for shared tunnel resources at once, the
aggregate throughput is naturally higher than any single direction aggregate throughput is naturally higher than any single direction
alone, which is why even Internal reaches 1.50$\times$ its alone, which is why even Internal reaches 1.50$\times$ its
single-stream figure. The scaling factor (parallel throughput single-stream figure. The scaling factor (parallel throughput
divided by single-stream throughput) therefore captures two effects: divided by single-stream throughput) captures two effects:
the benefit of utilizing multiple link pairs in parallel, and how the benefit of using multiple link pairs in parallel, and how
well the VPN handles the resulting contention. well the VPN handles the resulting contention.
Table~\ref{tab:parallel_scaling} lists the results. Table~\ref{tab:parallel_scaling} lists the results.
\begin{table}[H] \begin{table}[H]
\centering \centering
\caption{Parallel TCP scaling at baseline. Scaling factor is the \caption{Parallel TCP scaling at baseline. Scaling factor is the
ratio of ten-stream to single-stream throughput. Internal's ratio of parallel to single-stream throughput. Internal's
1.50$\times$ represents the expected scaling on this hardware.} 1.50$\times$ represents the expected scaling on this hardware.}
\label{tab:parallel_scaling} \label{tab:parallel_scaling}
\begin{tabular}{lrrr} \begin{tabular}{lrrr}
@@ -357,7 +359,7 @@ Table~\ref{tab:parallel_scaling} lists the results.
The VPNs that gain the most are those most constrained in The VPNs that gain the most are those most constrained in
single-stream mode. Mycelium's 34.9\,ms RTT means a lone TCP stream single-stream mode. Mycelium's 34.9\,ms RTT means a lone TCP stream
can never fill the pipe: the bandwidth-delay product demands a window can never fill the pipe: the bandwidth-delay product demands a window
larger than any single flow maintains, so ten streams collectively larger than any single flow maintains, so multiple concurrent flows
compensate for that constraint and push throughput to 2.20$\times$ compensate for that constraint and push throughput to 2.20$\times$
the single-stream figure. Hyprspace scales almost as well the single-stream figure. Hyprspace scales almost as well
(2.18$\times$) but for a (2.18$\times$) but for a
@@ -379,8 +381,8 @@ streams: throughput drops from 706\,Mbps to 648\,Mbps
streams are clearly fighting each other for resources inside the streams are clearly fighting each other for resources inside the
tunnel. tunnel.
More streams also amplify existing retransmit problems across the More streams also amplify existing retransmit problems. Hyprspace
board. Hyprspace climbs from 4\,965 to 17\,426~retransmits; climbs from 4\,965 to 17\,426~retransmits;
VpnCloud from 857 to 6\,023. VPNs that were clean in single-stream VpnCloud from 857 to 6\,023. VPNs that were clean in single-stream
mode stay clean under load, while the stressed ones only get worse. mode stay clean under load, while the stressed ones only get worse.
@@ -702,8 +704,8 @@ propagate.
\label{sec:pathological} \label{sec:pathological}
Three VPNs exhibit behaviors that the aggregate numbers alone cannot Three VPNs exhibit behaviors that the aggregate numbers alone cannot
explain. The following subsections synthesize observations from the explain. The following subsections piece together observations from
preceding benchmarks into per-VPN diagnoses. earlier benchmarks into per-VPN diagnoses.
\paragraph{Hyprspace: Buffer Bloat.} \paragraph{Hyprspace: Buffer Bloat.}
\label{sec:hyprspace_bloat} \label{sec:hyprspace_bloat}

View File

@@ -8,21 +8,21 @@
\addchaptertocentry{Zusammenfassung} \addchaptertocentry{Zusammenfassung}
Diese Arbeit evaluiert zehn Peer-to-Peer-Mesh-VPN-Implementierungen Diese Arbeit evaluiert zehn Peer-to-Peer-Mesh-VPN-Implementierungen
unter kontrollierten Netzwerkbedingungen mithilfe eines under kontrollierten Netzwerkbedingungen mithilfe eines
reproduzierbaren, Nix-basierten Benchmark-Frameworks, das auf einem reproduzierbaren, Nix-basierten Benchmark-Frameworks, das auf einem
Deployment-System namens Clan aufbaut. Die Implementierungen reichen Deployment-System namens Clan aufbaut. Die Implementierungen reichen
von Kernel-Protokollen (WireGuard, als Referenz-Baseline) bis zu von Kernel-Protokollen (WireGuard, also Reference-Baseline) bis zu
Userspace-Overlays (Tinc, Yggdrasil, Nebula, Hyprspace und Userspace-Overlays (Tinc, Yggdrasil, Nebula, Hyprspace und
weitere). Jede wird unter vier Beeinträchtigungsprofilen mit weitere). Jede wird under vier Beeinträchtigungsprofilen mit
variierendem Paketverlust, Paketumsortierung, Latenz und Jitter variierendem Paketverlust, Paketumsortierung, Latenz und Jitter
getestet, was über 300 Messungen in sieben Benchmarks ergibt, von getestet, was über 300 Messungen in sieben Benchmarks ergibt, von
reinem TCP- und UDP-Durchsatz bis zu Video-Streaming und reinem TCP- und UDP-Durchsatz bis zu Video-Streaming und
Anwendungs-Downloads. Anwendungs-Downloads.
Ein zentrales Ergebnis ist, dass keine einzelne Metrik die In zentrales Ergebnis ist, dass keine einzelne Metrik die
VPN-Leistung vollständig erfasst: Die Rangfolge verschiebt sich je VPN-Leistung vollständig erfasst: Die Rangfolge verschiebt sich je
nachdem, ob Durchsatz, Latenz, Retransmit-Verhalten oder nachdem, ob Durchsatz, Latenz, Retransmit-Verhalten oder
Transferzeit auf Anwendungsebene gemessen wird. Unter Transferzeit auf Anwendungsebene gemessen wird. Under
Netzwerkbeeinträchtigung übertrifft Tailscale (über Headscale) den Netzwerkbeeinträchtigung übertrifft Tailscale (über Headscale) den
Standard-Netzwerkstack des Linux-Kernels, eine Anomalie, die wir Standard-Netzwerkstack des Linux-Kernels, eine Anomalie, die wir
auf die optimierten Congestion-Control- und Pufferparameter seines auf die optimierten Congestion-Control- und Pufferparameter seines

20
_typos.toml Normal file
View File

@@ -0,0 +1,20 @@
[files]
extend-exclude = [
"**/secret",
"**/value",
"**.rev",
"**/facter-report.nix",
"Chapters/Zusammenfassung.tex",
"**/key.json",
"pkgs/clan-cli/clan_lib/machines/test_suggestions.py",
]
[default.extend-words]
facter = "facter"
metalness = "metalness" # would be corrected to metallicity, not sure which one's preferred
hda = "hda" # snd_hda_intel
dynamicdns = "dynamicdns"
substituters = "substituters"
[default.extend-identifiers]
pn = "pn"