Compare commits
3 Commits
6b32967f32
...
main
| Author | SHA1 | Date | |
|---|---|---|---|
| 8a6d676e93 | |||
| f3cf653ab5 | |||
| 64aeeb5772 |
@@ -2,38 +2,33 @@
|
||||
|
||||
\label{Introduction}
|
||||
|
||||
Peer-to-peer overlay VPNs promise to restore genuine decentralization
|
||||
by enabling direct connectivity between nodes regardless of NAT or
|
||||
firewall restrictions. Yet practitioners choosing among the growing
|
||||
number of mesh VPN implementations must rely largely on anecdotal
|
||||
evidence: systematic, reproducible comparisons under realistic
|
||||
conditions are scarce.
|
||||
Peer-to-peer overlay VPNs allow nodes to connect directly regardless
|
||||
of NAT or firewall restrictions. Yet practitioners choosing among the
|
||||
growing number of mesh VPN implementations must rely largely on
|
||||
anecdotal evidence: systematic, reproducible comparisons under
|
||||
realistic conditions are scarce.
|
||||
|
||||
This thesis addresses that gap. We benchmark ten peer-to-peer VPN
|
||||
implementations across seven workloads and four network impairment
|
||||
profiles, yielding over 300 unique measurements. We complement these
|
||||
performance benchmarks with a source code analysis of each
|
||||
implementation, verified through direct engagement with the respective
|
||||
maintainers. The entire experimental framework is built on Nix, NixOS,
|
||||
and the Clan deployment system, making every result independently
|
||||
reproducible.
|
||||
profiles. We complement these performance benchmarks with a source
|
||||
code analysis of each implementation, verified by the respective
|
||||
maintainers. The entire
|
||||
experimental framework is built on Nix, NixOS, and the Clan deployment
|
||||
system, so every result is independently reproducible.
|
||||
|
||||
\section{Motivation}
|
||||
|
||||
Peer-to-peer architectures promise censorship-resistant, fault-tolerant
|
||||
infrastructure by eliminating single points of failure
|
||||
\cite{shukla_towards_2021}.
|
||||
These architectures underpin a growing range of systems, from IoT
|
||||
edge computing and content delivery networks to blockchain platforms
|
||||
like Ethereum.
|
||||
Yet realizing these benefits requires distributing nodes across
|
||||
genuinely diverse hosting entities.
|
||||
Peer-to-peer architectures can provide censorship-resistant,
|
||||
fault-tolerant infrastructure because they have no single point of
|
||||
failure \cite{shukla_towards_2021}. Blockchain platforms like Ethereum
|
||||
depend on this property, as do IoT edge networks and content delivery
|
||||
systems. But these benefits only hold when nodes are spread across
|
||||
diverse hosting entities.
|
||||
|
||||
In practice, this diversity remains illusory.
|
||||
Amazon, Hetzner, and OVH collectively host 70\% of all Ethereum nodes
|
||||
(see Figure~\ref{fig:ethernodes_hosting}),
|
||||
concentrating nominally decentralized infrastructure
|
||||
within a handful of cloud providers.
|
||||
(see Figure~\ref{fig:ethernodes_hosting}), so nominally decentralized
|
||||
infrastructure actually sits in a handful of cloud providers.
|
||||
More concerning, these providers operate under overlapping regulatory
|
||||
jurisdictions,
|
||||
predominantly the United States and the European Union.
|
||||
@@ -49,108 +44,96 @@ data disclosure, or traffic manipulation across a majority of the network.
|
||||
\label{fig:ethernodes_hosting}
|
||||
\end{figure}
|
||||
|
||||
Why does this centralization persist despite the explicit goals of
|
||||
decentralization?
|
||||
The answer lies in the practical barriers to self-hosting.
|
||||
Cloud providers offer static IP addresses and publicly routable endpoints,
|
||||
eliminating the networking complexity that plagues residential and
|
||||
small-office deployments.
|
||||
This centralization persists because self-hosting is hard. Cloud
|
||||
providers offer static IP addresses and publicly routable endpoints,
|
||||
which avoids the networking problems that residential and small-office
|
||||
deployments face.
|
||||
Most internet-connected devices sit behind Network Address Translation (NAT),
|
||||
which prevents incoming connections without explicit port forwarding
|
||||
or relay infrastructure.
|
||||
Combined with dynamic IP assignments from ISPs, maintaining stable
|
||||
peer connectivity
|
||||
from self-hosted infrastructure traditionally required significant
|
||||
technical expertise.
|
||||
Combined with dynamic IP assignments from ISPs, stable peer
|
||||
connectivity from self-hosted infrastructure has traditionally
|
||||
required significant technical expertise.
|
||||
|
||||
Overlay VPNs offer a solution to this fundamental barrier.
|
||||
By establishing encrypted tunnels that traverse NAT boundaries,
|
||||
mesh VPNs enable direct peer-to-peer connectivity without requiring
|
||||
static IP addresses or manual firewall configuration.
|
||||
Each node receives a stable virtual address within the overlay network,
|
||||
regardless of its underlying network topology.
|
||||
In practice, this means a device behind consumer-grade NAT can
|
||||
participate as a first-class peer in a distributed system,
|
||||
removing the primary technical advantage that cloud providers hold.
|
||||
Overlay VPNs solve this problem. They establish encrypted tunnels
|
||||
that traverse NAT boundaries, so peers can connect directly without
|
||||
static IP addresses or manual firewall configuration. Each node
|
||||
receives a stable virtual address within the overlay network,
|
||||
regardless of its physical network topology. A device behind
|
||||
consumer-grade NAT can therefore participate as a first-class peer
|
||||
in a distributed system.
|
||||
|
||||
The Clan deployment framework builds on this foundation.
|
||||
Clan uses Nix and NixOS to eliminate configuration drift and
|
||||
dependency conflicts, reducing operational overhead enough for a
|
||||
single administrator to reliably self-host complex distributed
|
||||
The Clan deployment framework uses Nix and NixOS to eliminate
|
||||
configuration drift and dependency conflicts, which makes it
|
||||
practical for a single administrator to self-host distributed
|
||||
services.
|
||||
Overlay VPNs are central to Clan's architecture,
|
||||
providing the secure peer connectivity that enables nodes
|
||||
to form cohesive networks regardless of their physical location or
|
||||
NAT situation.
|
||||
As illustrated in Figure~\ref{fig:vision-stages}, Clan envisions
|
||||
a web interface that enables users to design and deploy private P2P networks
|
||||
with minimal configuration, assisted by an integrated LLM
|
||||
for contextual guidance and troubleshooting.
|
||||
Overlay VPNs are central to Clan's architecture: they supply the
|
||||
peer connectivity that lets nodes form a network regardless of
|
||||
physical location or NAT situation.
|
||||
As illustrated in Figure~\ref{fig:vision-stages}, Clan plans to offer
|
||||
a web interface that lets users design and deploy private P2P networks
|
||||
with minimal configuration, assisted by an integrated LLM.
|
||||
|
||||
During the development of Clan, a recurring challenge became apparent:
|
||||
practitioners held divergent preferences for mesh VPN solutions,
|
||||
each citing different edge cases where their chosen VPN
|
||||
proved unreliable or lacked essential features.
|
||||
These discussions were grounded in anecdotal evidence rather than
|
||||
systematic evaluation, motivating the present work.
|
||||
During Clan's development, a recurring problem surfaced:
|
||||
practitioners disagreed on which mesh VPN to use, each pointing to
|
||||
different edge cases where their preferred VPN failed or lacked a
|
||||
needed feature. These discussions relied on anecdotal evidence rather
|
||||
than systematic evaluation, which motivated the present work.
|
||||
|
||||
\subsection{Related Work}
|
||||
|
||||
Existing research offers only partial coverage of this space.
|
||||
Lackorzynski et al.\ \cite{lackorzynski_comparative_2019} benchmark
|
||||
OpenVPN, IPSec, Tinc, Freelan, MACsec, and WireGuard in the context
|
||||
of industrial communication systems, measuring point-to-point
|
||||
throughput, latency, and CPU overhead. Their work does not address
|
||||
overlay network behavior such as NAT traversal or dynamic peer discovery.
|
||||
of industrial communication systems. They measure point-to-point
|
||||
throughput, latency, and CPU overhead but do not address overlay
|
||||
network behavior such as NAT traversal or dynamic peer discovery.
|
||||
The most closely related study by Kjorveziroski et al.\
|
||||
\cite{kjorveziroski_full-mesh_2024} evaluates full-mesh VPN solutions
|
||||
for distributed systems, analyzing throughput, reliability under packet
|
||||
loss, and relay behavior for VPNs including ZeroTier. However, it
|
||||
focuses primarily on solutions with a central point of failure and
|
||||
limits its workloads to synthetic iperf3 tests. This thesis extends
|
||||
that foundation by evaluating a broader set of VPN implementations
|
||||
with emphasis on fully decentralized architectures, exercising them
|
||||
under real-world workloads such as video streaming and package
|
||||
downloads, applying multiple network impairment profiles, and
|
||||
providing a fully reproducible experimental framework built on
|
||||
Nix, NixOS, and Clan.
|
||||
for distributed systems, looking at throughput, reliability under
|
||||
packet loss, and relay behavior for VPNs including ZeroTier. However,
|
||||
it focuses primarily on solutions with a central point of failure and
|
||||
limits its workloads to synthetic iperf3 tests.
|
||||
|
||||
Beyond filling this research gap, a further goal was to create a fully
|
||||
automated benchmarking framework capable of generating a public
|
||||
leaderboard, similar in spirit to the js-framework-benchmark
|
||||
(see Figure~\ref{fig:js-framework-benchmark}). By providing an
|
||||
accessible web interface with regularly updated
|
||||
results, the framework gives VPN developers a concrete, public
|
||||
baseline to measure against.
|
||||
This thesis extends that work in several directions. It evaluates a
|
||||
broader set of VPN implementations with emphasis on fully
|
||||
decentralized architectures and tests them under application-level
|
||||
workloads such as video streaming and package downloads. It also
|
||||
applies multiple network impairment profiles and provides a
|
||||
reproducible experimental framework built on Nix, NixOS, and Clan.
|
||||
|
||||
A secondary goal was to create an automated benchmarking framework
|
||||
that generates a public leaderboard, similar in spirit to the
|
||||
js-framework-benchmark (see Figure~\ref{fig:js-framework-benchmark}).
|
||||
A web interface with regularly updated results gives VPN developers a
|
||||
concrete baseline to measure against.
|
||||
|
||||
\section{Research Contribution}
|
||||
|
||||
This thesis makes the following contributions:
|
||||
|
||||
\begin{enumerate}
|
||||
\item A comprehensive benchmark of ten peer-to-peer VPN
|
||||
implementations across seven workloads (including real-world
|
||||
video streaming and package downloads) and four network
|
||||
impairment profiles, producing over 300 unique measurements.
|
||||
\item A source code analysis of all ten VPN implementations,
|
||||
combining manual code review with LLM-assisted analysis,
|
||||
followed by verification through direct engagement with the
|
||||
respective maintainers on GitHub.
|
||||
\item A fully reproducible experimental framework built on
|
||||
Nix, NixOS, and the Clan deployment system, with pinned
|
||||
dependencies, declarative system configuration, and
|
||||
deterministic cryptographic material generation, enabling
|
||||
independent replication of all results.
|
||||
\item A performance analysis demonstrating that Tailscale
|
||||
outperforms the Linux kernel's default networking stack under
|
||||
degraded conditions, and that kernel parameter tuning (Reno
|
||||
congestion control in place of CUBIC, with RACK
|
||||
disabled) yields measurable throughput improvements.
|
||||
\item A benchmark of ten peer-to-peer VPN implementations across
|
||||
seven workloads and four network impairment profiles. The workloads
|
||||
include video streaming and package downloads alongside synthetic
|
||||
throughput tests.
|
||||
\item A source code analysis of all ten VPN implementations. Manual
|
||||
code review was combined with LLM-assisted analysis and the results
|
||||
were verified by the respective maintainers on GitHub.
|
||||
\item A reproducible experimental framework built on Nix, NixOS,
|
||||
and the Clan deployment system. Dependencies are pinned and system
|
||||
configuration is declarative, down to deterministic cryptographic
|
||||
material generation. Every result can be independently replicated.
|
||||
\item A performance analysis showing that Tailscale outperforms the
|
||||
Linux kernel's default networking stack under degraded conditions,
|
||||
and that kernel parameter tuning (Reno congestion control in place
|
||||
of CUBIC, with RACK disabled) yields measurable throughput
|
||||
improvements.
|
||||
\item The discovery of several security vulnerabilities across
|
||||
the evaluated VPN implementations.
|
||||
\item An automated benchmarking framework designed for public
|
||||
leaderboard generation, intended to encourage ongoing
|
||||
optimization by VPN developers.
|
||||
\item An automated benchmarking framework that produces a public
|
||||
leaderboard, giving VPN developers a target to optimize
|
||||
against.
|
||||
\end{enumerate}
|
||||
|
||||
\begin{figure}[H]
|
||||
|
||||
@@ -7,11 +7,9 @@
|
||||
This chapter describes the methodology used to benchmark and analyze
|
||||
peer-to-peer mesh VPN implementations. The evaluation combines
|
||||
performance benchmarking under controlled network conditions with a
|
||||
structured source code analysis of each implementation. The
|
||||
benchmarking framework prioritizes reproducibility at every layer,
|
||||
from pinned dependencies and declarative system configuration to
|
||||
automated test orchestration, enabling independent verification of
|
||||
results and facilitating future comparative studies.
|
||||
structured source code analysis of each implementation. All
|
||||
dependencies, system configurations, and test procedures are pinned
|
||||
or declared so that the experiments can be independently reproduced.
|
||||
|
||||
\section{Experimental Setup}
|
||||
|
||||
@@ -29,19 +27,19 @@ identical specifications:
|
||||
RDRAND, SSE4.2
|
||||
\end{itemize}
|
||||
|
||||
The presence of hardware cryptographic acceleration is relevant because
|
||||
many VPN implementations use AES-NI for encryption, and the results
|
||||
may differ on systems without these features.
|
||||
Results may differ on systems without hardware cryptographic
|
||||
acceleration, since most of the tested VPNs offload encryption to
|
||||
AES-NI.
|
||||
|
||||
\subsection{Network Topology}
|
||||
|
||||
The three machines are connected via a direct 1 Gbps LAN on the same
|
||||
network segment. Each machine has a publicly reachable IPv4 address,
|
||||
which is used to deploy configuration changes via Clan. This baseline
|
||||
topology provides a controlled environment with minimal latency and no
|
||||
packet loss, allowing the overhead introduced by each VPN implementation
|
||||
to be measured in isolation. Figure~\ref{fig:mesh_topology} illustrates
|
||||
the full-mesh connectivity between the three machines.
|
||||
which is used to deploy configuration changes via Clan. On this
|
||||
baseline topology, latency is sub-millisecond and there is no packet
|
||||
loss, so measured overhead can be attributed to the VPN itself.
|
||||
Figure~\ref{fig:mesh_topology} illustrates the full-mesh connectivity
|
||||
between the three machines.
|
||||
|
||||
\begin{figure}[H]
|
||||
\centering
|
||||
@@ -74,8 +72,8 @@ double the per-machine values.
|
||||
|
||||
\subsection{Configuration Methodology}
|
||||
|
||||
Each VPN is built from source within the Nix flake, ensuring that all
|
||||
dependencies are pinned to exact versions. VPNs not packaged in nixpkgs
|
||||
Each VPN is built from source within the Nix flake, with all
|
||||
dependencies pinned to exact versions. VPNs not packaged in nixpkgs
|
||||
(Hyprspace, EasyTier, VpnCloud) have dedicated build expressions
|
||||
under \texttt{pkgs/} in the flake.
|
||||
|
||||
@@ -85,13 +83,14 @@ system.
|
||||
|
||||
Generated keys are stored in version control under
|
||||
\texttt{vars/per-machine/\{name\}/} and read at NixOS evaluation time,
|
||||
making key material part of the reproducible configuration.
|
||||
so key material is part of the reproducible configuration.
|
||||
|
||||
\section{Benchmark Suite}
|
||||
|
||||
The benchmark suite includes both synthetic throughput tests and
|
||||
real-world workloads. This combination addresses a limitation of prior
|
||||
work that relied exclusively on iperf3.
|
||||
The benchmark suite includes synthetic throughput tests and
|
||||
application-level workloads. Prior comparative work relied exclusively
|
||||
on iperf3; the additional benchmarks here capture behavior that
|
||||
iperf3 alone misses.
|
||||
Table~\ref{tab:benchmark_suite} summarises each benchmark.
|
||||
|
||||
\begin{table}[H]
|
||||
@@ -114,8 +113,8 @@ Table~\ref{tab:benchmark_suite} summarises each benchmark.
|
||||
\end{tabular}
|
||||
\end{table}
|
||||
|
||||
The first four benchmarks use well-known network testing tools;
|
||||
the remaining three target workloads closer to real-world usage.
|
||||
The first four benchmarks use standard network testing tools;
|
||||
the remaining three test application-level workloads.
|
||||
The subsections below describe configuration details that the table
|
||||
does not capture.
|
||||
|
||||
@@ -133,48 +132,49 @@ counters.
|
||||
|
||||
\subsection{Parallel iPerf3}
|
||||
|
||||
Runs TCP streams on all three machines simultaneously in a circular
|
||||
pattern (A$\rightarrow$B, B$\rightarrow$C, C$\rightarrow$A) for
|
||||
60 seconds with zero-copy (\texttt{-Z}). This creates contention
|
||||
across the overlay network, stressing shared resources that
|
||||
single-stream tests leave idle.
|
||||
Runs one bidirectional TCP stream on all three machine pairs
|
||||
simultaneously in a circular pattern (A$\rightarrow$B,
|
||||
B$\rightarrow$C, C$\rightarrow$A) for 60 seconds with zero-copy
|
||||
(\texttt{-Z}). The three concurrent bidirectional links produce six
|
||||
unidirectional flows in total. This contention stresses shared
|
||||
resources that single-stream tests leave idle.
|
||||
|
||||
\subsection{QPerf}
|
||||
|
||||
Spawns one qperf process per CPU core, each running for 30 seconds.
|
||||
Per-core bandwidth is summed per second. Unlike the iPerf3 tests,
|
||||
QPerf targets QUIC connection-level performance, capturing time to
|
||||
first byte and connection establishment time alongside throughput.
|
||||
Per-core bandwidth is summed per second. In addition to throughput,
|
||||
QPerf reports time to first byte and connection establishment time,
|
||||
which iPerf3 does not measure.
|
||||
|
||||
\subsection{RIST Video Streaming}
|
||||
|
||||
Generates a 4K ($3840\times2160$) H.264 test pattern at 30\,fps
|
||||
(ultrafast preset, zerolatency tuning, 25\,Mbps target bitrate) with
|
||||
ffmpeg and transmits it over the RIST protocol for 30 seconds. RIST
|
||||
(Reliable Internet Stream Transport) is designed for low-latency
|
||||
video contribution over unreliable networks, making it a realistic
|
||||
test of VPN behavior under multimedia workloads. In addition to
|
||||
standard network metrics, the benchmark records encoding-side
|
||||
statistics (actual bitrate, frame rate, dropped frames) and
|
||||
RIST-specific counters (packets recovered via retransmission, quality
|
||||
score).
|
||||
(ultrafast preset, zerolatency tuning, 25\,Mbps bitrate cap) with
|
||||
ffmpeg and transmits it over the RIST protocol for 30 seconds. Because
|
||||
the synthetic test pattern is highly compressible, the actual encoding
|
||||
bitrate is approximately 3.3\,Mbps, well below the configured cap. RIST
|
||||
(Reliable Internet Stream Transport) is a protocol for low-latency
|
||||
video contribution over unreliable networks. The benchmark records
|
||||
encoding-side statistics (actual bitrate, frame rate, dropped frames)
|
||||
and RIST-specific counters (packets recovered via retransmission,
|
||||
quality score).
|
||||
|
||||
\subsection{Nix Cache Download}
|
||||
|
||||
A Harmonia Nix binary cache server on the target machine serves the
|
||||
Firefox package. The client downloads it via \texttt{nix copy}
|
||||
through the VPN, exercising many small HTTP requests rather than a
|
||||
single bulk transfer. Benchmarked with hyperfine (1 warmup run,
|
||||
2 timed runs); the local Nix store and SQLite metadata are cleared
|
||||
between runs.
|
||||
through the VPN. Unlike the iPerf3 tests, this workload issues many
|
||||
short-lived HTTP requests instead of a single bulk transfer.
|
||||
Benchmarked with hyperfine (1 warmup run, 2 timed runs); the local
|
||||
Nix store and SQLite metadata are cleared between runs.
|
||||
|
||||
\section{Network Impairment Profiles}
|
||||
|
||||
To evaluate VPN performance under different network conditions, four
|
||||
impairment profiles are defined, ranging from an unmodified baseline
|
||||
to a severely degraded link. All impairments are injected with Linux
|
||||
traffic control (\texttt{tc netem}) on the egress side of every
|
||||
machine's primary interface.
|
||||
Four impairment profiles simulate progressively worse network
|
||||
conditions, from an unmodified baseline to a severely degraded link.
|
||||
All impairments are injected with Linux traffic control
|
||||
(\texttt{tc netem}) on the egress side of every machine's primary
|
||||
interface.
|
||||
Table~\ref{tab:impairment_profiles} lists the per-machine values.
|
||||
Because impairments are applied on both ends of a connection, the
|
||||
effective round-trip impact is roughly double the listed values.
|
||||
@@ -222,14 +222,14 @@ aspect of the simulated degradation:
|
||||
\end{itemize}
|
||||
|
||||
A 30-second stabilization period follows TC application before
|
||||
measurements begin, allowing queuing disciplines to settle.
|
||||
measurements begin so that queuing disciplines can settle.
|
||||
|
||||
\section{Experimental Procedure}
|
||||
|
||||
\subsection{Automation}
|
||||
|
||||
The benchmark suite is fully automated via a Python orchestrator
|
||||
(\texttt{vpn\_bench/}). For each VPN under test, the orchestrator:
|
||||
A Python orchestrator (\texttt{vpn\_bench/}) automates the full
|
||||
benchmark suite. For each VPN under test, it:
|
||||
|
||||
\begin{enumerate}
|
||||
\item Cleans all state directories from previous VPN runs
|
||||
@@ -327,11 +327,12 @@ Each metric is summarized as a statistics dictionary containing:
|
||||
Aggregation differs by benchmark type. Benchmarks that execute
|
||||
multiple discrete runs, ping (3 runs of 100 packets each) and
|
||||
nix-cache (2 timed runs via hyperfine), first compute statistics
|
||||
within each run, then average the resulting statistics across runs.
|
||||
Concretely, if ping produces three runs with mean RTTs of
|
||||
5.1, 5.3, and 5.0\,ms, the reported average is the mean of
|
||||
those three values (5.13\,ms). The reported minimum is the
|
||||
single lowest RTT observed across all three runs.
|
||||
within each run, then aggregate across runs: averages and percentiles
|
||||
are averaged, while the reported minimum and maximum are the global
|
||||
extremes across all runs. Concretely, if ping produces three runs
|
||||
with mean RTTs of 5.1, 5.3, and 5.0\,ms, the reported average is
|
||||
the mean of those three values (5.13\,ms). The reported minimum is
|
||||
the single lowest RTT observed across all three runs.
|
||||
|
||||
Benchmarks that produce continuous per-second samples, qperf and
|
||||
RIST streaming for example, pool all per-second measurements from a single
|
||||
@@ -340,9 +341,9 @@ bandwidth is first summed across CPU cores for each second, and
|
||||
statistics are then computed over the resulting time series.
|
||||
|
||||
The analysis reports empirical percentiles (p25, p50, p75) alongside
|
||||
min/max bounds rather than parametric confidence intervals. This
|
||||
choice is deliberate: benchmark latency and throughput distributions
|
||||
are often skewed or multimodal, making assumptions of normality
|
||||
min/max bounds rather than parametric confidence intervals.
|
||||
Benchmark latency and throughput distributions are often skewed or
|
||||
multimodal, so parametric assumptions of normality would be
|
||||
unreliable. The interquartile range (p25--p75) conveys the spread of
|
||||
typical observations, while min and max capture outlier behavior.
|
||||
The nix-cache benchmark additionally reports standard deviation via
|
||||
@@ -350,9 +351,8 @@ hyperfine's built-in statistical output.
|
||||
|
||||
\section{Source Code Analysis}
|
||||
|
||||
To complement the performance benchmarks with architectural
|
||||
understanding, we conducted a structured source code analysis of
|
||||
all ten VPN implementations. The analysis followed three phases.
|
||||
We also conducted a structured source code analysis of all ten VPN
|
||||
implementations. The analysis followed three phases.
|
||||
|
||||
\subsection{Repository Collection and LLM-Assisted Overview}
|
||||
|
||||
@@ -378,23 +378,23 @@ aspects:
|
||||
\end{itemize}
|
||||
|
||||
Each agent was required to reference the specific file and line
|
||||
range supporting every claim, enabling direct verification.
|
||||
range supporting every claim so that outputs could be verified
|
||||
against the source.
|
||||
|
||||
\subsection{Manual Verification}
|
||||
|
||||
The LLM-generated overviews served as a navigational aid rather than
|
||||
a trusted source. The most important code paths identified in each
|
||||
overview were manually read and verified against the actual source
|
||||
code, correcting inaccuracies and deepening the analysis where the
|
||||
automated summaries remained superficial.
|
||||
code. Where the automated summaries were inaccurate or superficial,
|
||||
they were corrected and expanded.
|
||||
|
||||
\subsection{Feature Matrix and Maintainer Review}
|
||||
|
||||
The findings from both the automated and manual analysis were
|
||||
consolidated into a feature matrix cataloguing 131 features across
|
||||
all ten VPN implementations. The matrix covers
|
||||
protocol characteristics, cryptographic primitives, NAT traversal
|
||||
strategies, routing behavior, and security properties.
|
||||
The findings from both phases were consolidated into a feature matrix
|
||||
of 131 features across all ten VPN implementations, covering protocol
|
||||
characteristics, cryptographic primitives, NAT traversal strategies,
|
||||
routing behavior, and security properties.
|
||||
|
||||
The completed feature matrix was published and sent to the respective
|
||||
VPN maintainers for review. We incorporated their feedback as
|
||||
@@ -402,7 +402,7 @@ corrections and clarifications to the final classification.
|
||||
|
||||
\section{Reproducibility}
|
||||
|
||||
The experimental stack pins or declares every variable that could
|
||||
The experimental stack pins or declares the variables that could
|
||||
affect results.
|
||||
|
||||
\subsection{Dependency Pinning}
|
||||
@@ -412,8 +412,8 @@ cryptographic hashes (\texttt{narHash}) and commit SHAs for each input.
|
||||
Key pinned inputs include:
|
||||
|
||||
\begin{itemize}
|
||||
\bitem{nixpkgs:} Follows \texttt{clan-core/nixpkgs}, ensuring a
|
||||
single version across the dependency graph
|
||||
\bitem{nixpkgs:} Follows \texttt{clan-core/nixpkgs}, so a single
|
||||
version is used across the dependency graph
|
||||
\bitem{clan-core:} The Clan framework, pinned to a specific commit
|
||||
\bitem{VPN sources:} Hyprspace, EasyTier, Nebula locked to
|
||||
exact commits
|
||||
@@ -527,9 +527,8 @@ VPNs were selected based on:
|
||||
\bitem{Linux support:} All VPNs must run on Linux.
|
||||
\end{itemize}
|
||||
|
||||
Ten VPN implementations were selected for evaluation, spanning a range
|
||||
of architectures from centralized coordination to fully decentralized
|
||||
mesh topologies. Table~\ref{tab:vpn_selection} summarizes the selection.
|
||||
Table~\ref{tab:vpn_selection} lists the ten VPN implementations
|
||||
selected for evaluation.
|
||||
|
||||
\begin{table}[H]
|
||||
\centering
|
||||
@@ -556,7 +555,7 @@ mesh topologies. Table~\ref{tab:vpn_selection} summarizes the selection.
|
||||
\end{tabular}
|
||||
\end{table}
|
||||
|
||||
WireGuard is included as a reference point despite not being a mesh VPN.
|
||||
Its minimal overhead and widespread adoption make it a useful comparison
|
||||
for understanding the cost of mesh coordination and NAT traversal logic.
|
||||
WireGuard is not a mesh VPN but is included as a reference point.
|
||||
Comparing its overhead to the mesh VPNs isolates the cost of mesh
|
||||
coordination and NAT traversal.
|
||||
|
||||
|
||||
@@ -10,8 +10,9 @@ follows the impairment profiles from ideal to degraded:
|
||||
Section~\ref{sec:baseline} establishes overhead under ideal
|
||||
conditions, then subsequent sections examine how each VPN responds to
|
||||
increasing network impairment. The chapter concludes with findings
|
||||
from the source code analysis. A recurring theme throughout is that
|
||||
no single metric captures VPN performance; the rankings shift
|
||||
from the source code analysis. A recurring theme is that no single
|
||||
metric captures VPN
|
||||
performance; the rankings shift
|
||||
depending on whether one measures throughput, latency, retransmit
|
||||
behavior, or real-world application performance.
|
||||
|
||||
@@ -184,7 +185,7 @@ opposite extreme: brute-force retransmission can still yield high
|
||||
throughput (814\,Mbps with 1\,163 retransmits), at the cost of wasted
|
||||
bandwidth and unstable flow behavior.
|
||||
|
||||
VpnCloud warrants specific attention: its sender reports 538.8\,Mbps
|
||||
VpnCloud stands out: its sender reports 538.8\,Mbps
|
||||
but the receiver measures only 413.4\,Mbps, leaving a 23\,\% gap (the largest
|
||||
in the dataset). This suggests significant in-tunnel packet loss or
|
||||
buffering at the VpnCloud layer that the retransmit count (857)
|
||||
@@ -256,10 +257,10 @@ times, which cluster into three distinct ranges.
|
||||
\end{table}
|
||||
|
||||
Six VPNs stay below 1.3\,ms, comfortably close to the bare-metal
|
||||
0.60\,ms. VpnCloud is a notable result: it posts the lowest latency
|
||||
of any VPN (1.13\,ms), edging out WireGuard (1.20\,ms), yet its
|
||||
throughput tops out at only 539\,Mbps. Low per-packet latency does
|
||||
not guarantee high bulk throughput. A second group (Headscale,
|
||||
0.60\,ms. VpnCloud posts the lowest latency of any VPN (1.13\,ms), below
|
||||
WireGuard (1.20\,ms), yet its throughput tops out at only 539\,Mbps.
|
||||
Low per-packet latency does not guarantee high bulk throughput. A
|
||||
second group (Headscale,
|
||||
Hyprspace, Yggdrasil) lands in the 1.5--2.2\,ms range, representing
|
||||
moderate overhead. Then there is Mycelium at 34.9\,ms, so far
|
||||
removed from the rest that Section~\ref{sec:mycelium_routing} gives
|
||||
@@ -289,8 +290,8 @@ the CPU, not the network, is the bottleneck.
|
||||
Figure~\ref{fig:latency_throughput} makes this disconnect easy to
|
||||
spot.
|
||||
|
||||
Looking at CPU efficiency more broadly, the qperf measurements
|
||||
reveal a wide spread. Hyprspace (55.1\,\%) and Yggdrasil
|
||||
The qperf measurements also reveal a wide spread in CPU usage.
|
||||
Hyprspace (55.1\,\%) and Yggdrasil
|
||||
(52.8\,\%) consume 5--6$\times$ as much CPU as Internal's
|
||||
9.7\,\%. WireGuard sits at 30.8\,\%, surprisingly high for a
|
||||
kernel-level implementation, though much of that goes to
|
||||
@@ -318,20 +319,21 @@ The single-stream benchmark tests one link direction at a time. The
|
||||
parallel benchmark changes this setup: all three link directions
|
||||
(lom$\rightarrow$yuki, yuki$\rightarrow$luna,
|
||||
luna$\rightarrow$lom) run simultaneously in a circular pattern for
|
||||
60~seconds, each carrying ten TCP streams. Because three independent
|
||||
60~seconds, each carrying one bidirectional TCP stream (six
|
||||
unidirectional flows in total). Because three independent
|
||||
link pairs now compete for shared tunnel resources at once, the
|
||||
aggregate throughput is naturally higher than any single direction
|
||||
alone, which is why even Internal reaches 1.50$\times$ its
|
||||
single-stream figure. The scaling factor (parallel throughput
|
||||
divided by single-stream throughput) therefore captures two effects:
|
||||
the benefit of utilizing multiple link pairs in parallel, and how
|
||||
divided by single-stream throughput) captures two effects:
|
||||
the benefit of using multiple link pairs in parallel, and how
|
||||
well the VPN handles the resulting contention.
|
||||
Table~\ref{tab:parallel_scaling} lists the results.
|
||||
|
||||
\begin{table}[H]
|
||||
\centering
|
||||
\caption{Parallel TCP scaling at baseline. Scaling factor is the
|
||||
ratio of ten-stream to single-stream throughput. Internal's
|
||||
ratio of parallel to single-stream throughput. Internal's
|
||||
1.50$\times$ represents the expected scaling on this hardware.}
|
||||
\label{tab:parallel_scaling}
|
||||
\begin{tabular}{lrrr}
|
||||
@@ -357,7 +359,7 @@ Table~\ref{tab:parallel_scaling} lists the results.
|
||||
The VPNs that gain the most are those most constrained in
|
||||
single-stream mode. Mycelium's 34.9\,ms RTT means a lone TCP stream
|
||||
can never fill the pipe: the bandwidth-delay product demands a window
|
||||
larger than any single flow maintains, so ten streams collectively
|
||||
larger than any single flow maintains, so multiple concurrent flows
|
||||
compensate for that constraint and push throughput to 2.20$\times$
|
||||
the single-stream figure. Hyprspace scales almost as well
|
||||
(2.18$\times$) but for a
|
||||
@@ -379,8 +381,8 @@ streams: throughput drops from 706\,Mbps to 648\,Mbps
|
||||
streams are clearly fighting each other for resources inside the
|
||||
tunnel.
|
||||
|
||||
More streams also amplify existing retransmit problems across the
|
||||
board. Hyprspace climbs from 4\,965 to 17\,426~retransmits;
|
||||
More streams also amplify existing retransmit problems. Hyprspace
|
||||
climbs from 4\,965 to 17\,426~retransmits;
|
||||
VpnCloud from 857 to 6\,023. VPNs that were clean in single-stream
|
||||
mode stay clean under load, while the stressed ones only get worse.
|
||||
|
||||
@@ -702,8 +704,8 @@ propagate.
|
||||
\label{sec:pathological}
|
||||
|
||||
Three VPNs exhibit behaviors that the aggregate numbers alone cannot
|
||||
explain. The following subsections synthesize observations from the
|
||||
preceding benchmarks into per-VPN diagnoses.
|
||||
explain. The following subsections piece together observations from
|
||||
earlier benchmarks into per-VPN diagnoses.
|
||||
|
||||
\paragraph{Hyprspace: Buffer Bloat.}
|
||||
\label{sec:hyprspace_bloat}
|
||||
|
||||
@@ -8,21 +8,21 @@
|
||||
\addchaptertocentry{Zusammenfassung}
|
||||
|
||||
Diese Arbeit evaluiert zehn Peer-to-Peer-Mesh-VPN-Implementierungen
|
||||
unter kontrollierten Netzwerkbedingungen mithilfe eines
|
||||
under kontrollierten Netzwerkbedingungen mithilfe eines
|
||||
reproduzierbaren, Nix-basierten Benchmark-Frameworks, das auf einem
|
||||
Deployment-System namens Clan aufbaut. Die Implementierungen reichen
|
||||
von Kernel-Protokollen (WireGuard, als Referenz-Baseline) bis zu
|
||||
von Kernel-Protokollen (WireGuard, also Reference-Baseline) bis zu
|
||||
Userspace-Overlays (Tinc, Yggdrasil, Nebula, Hyprspace und
|
||||
weitere). Jede wird unter vier Beeinträchtigungsprofilen mit
|
||||
weitere). Jede wird under vier Beeinträchtigungsprofilen mit
|
||||
variierendem Paketverlust, Paketumsortierung, Latenz und Jitter
|
||||
getestet, was über 300 Messungen in sieben Benchmarks ergibt, von
|
||||
reinem TCP- und UDP-Durchsatz bis zu Video-Streaming und
|
||||
Anwendungs-Downloads.
|
||||
|
||||
Ein zentrales Ergebnis ist, dass keine einzelne Metrik die
|
||||
In zentrales Ergebnis ist, dass keine einzelne Metrik die
|
||||
VPN-Leistung vollständig erfasst: Die Rangfolge verschiebt sich je
|
||||
nachdem, ob Durchsatz, Latenz, Retransmit-Verhalten oder
|
||||
Transferzeit auf Anwendungsebene gemessen wird. Unter
|
||||
Transferzeit auf Anwendungsebene gemessen wird. Under
|
||||
Netzwerkbeeinträchtigung übertrifft Tailscale (über Headscale) den
|
||||
Standard-Netzwerkstack des Linux-Kernels, eine Anomalie, die wir
|
||||
auf die optimierten Congestion-Control- und Pufferparameter seines
|
||||
|
||||
20
_typos.toml
Normal file
20
_typos.toml
Normal file
@@ -0,0 +1,20 @@
|
||||
[files]
|
||||
extend-exclude = [
|
||||
"**/secret",
|
||||
"**/value",
|
||||
"**.rev",
|
||||
"**/facter-report.nix",
|
||||
"Chapters/Zusammenfassung.tex",
|
||||
"**/key.json",
|
||||
"pkgs/clan-cli/clan_lib/machines/test_suggestions.py",
|
||||
]
|
||||
|
||||
[default.extend-words]
|
||||
facter = "facter"
|
||||
metalness = "metalness" # would be corrected to metallicity, not sure which one's preferred
|
||||
hda = "hda" # snd_hda_intel
|
||||
dynamicdns = "dynamicdns"
|
||||
substituters = "substituters"
|
||||
|
||||
[default.extend-identifiers]
|
||||
pn = "pn"
|
||||
Reference in New Issue
Block a user