Improved Writing made it less verbouse
This commit is contained in:
@@ -2,38 +2,33 @@
|
|||||||
|
|
||||||
\label{Introduction}
|
\label{Introduction}
|
||||||
|
|
||||||
Peer-to-peer overlay VPNs promise to restore genuine decentralization
|
Peer-to-peer overlay VPNs allow nodes to connect directly regardless
|
||||||
by enabling direct connectivity between nodes regardless of NAT or
|
of NAT or firewall restrictions. Yet practitioners choosing among the
|
||||||
firewall restrictions. Yet practitioners choosing among the growing
|
growing number of mesh VPN implementations must rely largely on
|
||||||
number of mesh VPN implementations must rely largely on anecdotal
|
anecdotal evidence: systematic, reproducible comparisons under
|
||||||
evidence: systematic, reproducible comparisons under realistic
|
realistic conditions are scarce.
|
||||||
conditions are scarce.
|
|
||||||
|
|
||||||
This thesis addresses that gap. We benchmark ten peer-to-peer VPN
|
This thesis addresses that gap. We benchmark ten peer-to-peer VPN
|
||||||
implementations across seven workloads and four network impairment
|
implementations across seven workloads and four network impairment
|
||||||
profiles, yielding over 300 unique measurements. We complement these
|
profiles, producing over 300 unique measurements. We complement these
|
||||||
performance benchmarks with a source code analysis of each
|
performance benchmarks with a source code analysis of each
|
||||||
implementation, verified through direct engagement with the respective
|
implementation, verified by the respective maintainers. The entire
|
||||||
maintainers. The entire experimental framework is built on Nix, NixOS,
|
experimental framework is built on Nix, NixOS, and the Clan deployment
|
||||||
and the Clan deployment system, making every result independently
|
system, so every result is independently reproducible.
|
||||||
reproducible.
|
|
||||||
|
|
||||||
\section{Motivation}
|
\section{Motivation}
|
||||||
|
|
||||||
Peer-to-peer architectures promise censorship-resistant, fault-tolerant
|
Peer-to-peer architectures can provide censorship-resistant,
|
||||||
infrastructure by eliminating single points of failure
|
fault-tolerant infrastructure because they have no single point of
|
||||||
\cite{shukla_towards_2021}.
|
failure \cite{shukla_towards_2021}. IoT edge computing, content
|
||||||
These architectures underpin a growing range of systems, from IoT
|
delivery networks, and blockchain platforms like Ethereum all rely on
|
||||||
edge computing and content delivery networks to blockchain platforms
|
some form of peer-to-peer topology. But these benefits only hold when
|
||||||
like Ethereum.
|
nodes are spread across diverse hosting entities.
|
||||||
Yet realizing these benefits requires distributing nodes across
|
|
||||||
genuinely diverse hosting entities.
|
|
||||||
|
|
||||||
In practice, this diversity remains illusory.
|
In practice, this diversity remains illusory.
|
||||||
Amazon, Hetzner, and OVH collectively host 70\% of all Ethereum nodes
|
Amazon, Hetzner, and OVH collectively host 70\% of all Ethereum nodes
|
||||||
(see Figure~\ref{fig:ethernodes_hosting}),
|
(see Figure~\ref{fig:ethernodes_hosting}), so nominally decentralized
|
||||||
concentrating nominally decentralized infrastructure
|
infrastructure actually sits in a handful of cloud providers.
|
||||||
within a handful of cloud providers.
|
|
||||||
More concerning, these providers operate under overlapping regulatory
|
More concerning, these providers operate under overlapping regulatory
|
||||||
jurisdictions,
|
jurisdictions,
|
||||||
predominantly the United States and the European Union.
|
predominantly the United States and the European Union.
|
||||||
@@ -49,50 +44,40 @@ data disclosure, or traffic manipulation across a majority of the network.
|
|||||||
\label{fig:ethernodes_hosting}
|
\label{fig:ethernodes_hosting}
|
||||||
\end{figure}
|
\end{figure}
|
||||||
|
|
||||||
Why does this centralization persist despite the explicit goals of
|
This centralization persists because self-hosting is hard. Cloud
|
||||||
decentralization?
|
providers offer static IP addresses and publicly routable endpoints,
|
||||||
The answer lies in the practical barriers to self-hosting.
|
which avoids the networking problems that residential and small-office
|
||||||
Cloud providers offer static IP addresses and publicly routable endpoints,
|
deployments face.
|
||||||
eliminating the networking complexity that plagues residential and
|
|
||||||
small-office deployments.
|
|
||||||
Most internet-connected devices sit behind Network Address Translation (NAT),
|
Most internet-connected devices sit behind Network Address Translation (NAT),
|
||||||
which prevents incoming connections without explicit port forwarding
|
which prevents incoming connections without explicit port forwarding
|
||||||
or relay infrastructure.
|
or relay infrastructure.
|
||||||
Combined with dynamic IP assignments from ISPs, maintaining stable
|
Combined with dynamic IP assignments from ISPs, stable peer
|
||||||
peer connectivity
|
connectivity from self-hosted infrastructure has traditionally
|
||||||
from self-hosted infrastructure traditionally required significant
|
required significant technical expertise.
|
||||||
technical expertise.
|
|
||||||
|
|
||||||
Overlay VPNs offer a solution to this fundamental barrier.
|
Overlay VPNs solve this problem. They establish encrypted tunnels
|
||||||
By establishing encrypted tunnels that traverse NAT boundaries,
|
that traverse NAT boundaries, so peers can connect directly without
|
||||||
mesh VPNs enable direct peer-to-peer connectivity without requiring
|
static IP addresses or manual firewall configuration. Each node
|
||||||
static IP addresses or manual firewall configuration.
|
receives a stable virtual address within the overlay network,
|
||||||
Each node receives a stable virtual address within the overlay network,
|
regardless of its physical network topology. A device behind
|
||||||
regardless of its underlying network topology.
|
consumer-grade NAT can therefore participate as a first-class peer
|
||||||
In practice, this means a device behind consumer-grade NAT can
|
in a distributed system.
|
||||||
participate as a first-class peer in a distributed system,
|
|
||||||
removing the primary technical advantage that cloud providers hold.
|
|
||||||
|
|
||||||
The Clan deployment framework builds on this foundation.
|
The Clan deployment framework uses Nix and NixOS to eliminate
|
||||||
Clan uses Nix and NixOS to eliminate configuration drift and
|
configuration drift and dependency conflicts. The result is that a
|
||||||
dependency conflicts, reducing operational overhead enough for a
|
single administrator can reliably self-host distributed services.
|
||||||
single administrator to reliably self-host complex distributed
|
Overlay VPNs are central to Clan's architecture: they supply the
|
||||||
services.
|
peer connectivity that lets nodes form a network regardless of
|
||||||
Overlay VPNs are central to Clan's architecture,
|
physical location or NAT situation.
|
||||||
providing the secure peer connectivity that enables nodes
|
As illustrated in Figure~\ref{fig:vision-stages}, Clan plans to offer
|
||||||
to form cohesive networks regardless of their physical location or
|
a web interface that lets users design and deploy private P2P networks
|
||||||
NAT situation.
|
with minimal configuration, assisted by an integrated LLM.
|
||||||
As illustrated in Figure~\ref{fig:vision-stages}, Clan envisions
|
|
||||||
a web interface that enables users to design and deploy private P2P networks
|
|
||||||
with minimal configuration, assisted by an integrated LLM
|
|
||||||
for contextual guidance and troubleshooting.
|
|
||||||
|
|
||||||
During the development of Clan, a recurring challenge became apparent:
|
During Clan's development, a recurring problem surfaced:
|
||||||
practitioners held divergent preferences for mesh VPN solutions,
|
practitioners disagreed on which mesh VPN to use, each pointing to
|
||||||
each citing different edge cases where their chosen VPN
|
different edge cases where their preferred VPN failed or lacked a
|
||||||
proved unreliable or lacked essential features.
|
needed feature. These discussions relied on anecdotal evidence rather
|
||||||
These discussions were grounded in anecdotal evidence rather than
|
than systematic evaluation, which motivated the present work.
|
||||||
systematic evaluation, motivating the present work.
|
|
||||||
|
|
||||||
\subsection{Related Work}
|
\subsection{Related Work}
|
||||||
|
|
||||||
@@ -108,49 +93,45 @@ for distributed systems, analyzing throughput, reliability under packet
|
|||||||
loss, and relay behavior for VPNs including ZeroTier. However, it
|
loss, and relay behavior for VPNs including ZeroTier. However, it
|
||||||
focuses primarily on solutions with a central point of failure and
|
focuses primarily on solutions with a central point of failure and
|
||||||
limits its workloads to synthetic iperf3 tests. This thesis extends
|
limits its workloads to synthetic iperf3 tests. This thesis extends
|
||||||
that foundation by evaluating a broader set of VPN implementations
|
that work: it evaluates a broader set of VPN implementations with
|
||||||
with emphasis on fully decentralized architectures, exercising them
|
emphasis on fully decentralized architectures, tests them under
|
||||||
under real-world workloads such as video streaming and package
|
application-level workloads (video streaming, package downloads),
|
||||||
downloads, applying multiple network impairment profiles, and
|
applies multiple network impairment profiles, and provides a
|
||||||
providing a fully reproducible experimental framework built on
|
reproducible experimental framework built on Nix, NixOS, and Clan.
|
||||||
Nix, NixOS, and Clan.
|
|
||||||
|
|
||||||
Beyond filling this research gap, a further goal was to create a fully
|
A secondary goal was to create an automated benchmarking framework
|
||||||
automated benchmarking framework capable of generating a public
|
that generates a public leaderboard, similar in spirit to the
|
||||||
leaderboard, similar in spirit to the js-framework-benchmark
|
js-framework-benchmark (see Figure~\ref{fig:js-framework-benchmark}).
|
||||||
(see Figure~\ref{fig:js-framework-benchmark}). By providing an
|
A web interface with regularly updated results gives VPN developers a
|
||||||
accessible web interface with regularly updated
|
concrete baseline to measure against.
|
||||||
results, the framework gives VPN developers a concrete, public
|
|
||||||
baseline to measure against.
|
|
||||||
|
|
||||||
\section{Research Contribution}
|
\section{Research Contribution}
|
||||||
|
|
||||||
This thesis makes the following contributions:
|
This thesis makes the following contributions:
|
||||||
|
|
||||||
\begin{enumerate}
|
\begin{enumerate}
|
||||||
\item A comprehensive benchmark of ten peer-to-peer VPN
|
\item A benchmark of ten peer-to-peer VPN implementations across
|
||||||
implementations across seven workloads (including real-world
|
seven workloads (including video streaming and package downloads)
|
||||||
video streaming and package downloads) and four network
|
and four network impairment profiles, with over 300 unique
|
||||||
impairment profiles, producing over 300 unique measurements.
|
measurements.
|
||||||
\item A source code analysis of all ten VPN implementations,
|
\item A source code analysis of all ten VPN implementations. Manual
|
||||||
combining manual code review with LLM-assisted analysis,
|
code review was combined with LLM-assisted analysis and the results
|
||||||
followed by verification through direct engagement with the
|
were verified by the respective maintainers on GitHub.
|
||||||
respective maintainers on GitHub.
|
\item A reproducible experimental framework built on Nix, NixOS,
|
||||||
\item A fully reproducible experimental framework built on
|
and the Clan deployment system. All dependencies are pinned,
|
||||||
Nix, NixOS, and the Clan deployment system, with pinned
|
system configuration is declarative, and cryptographic material
|
||||||
dependencies, declarative system configuration, and
|
is generated deterministically, so every result can be
|
||||||
deterministic cryptographic material generation, enabling
|
independently replicated.
|
||||||
independent replication of all results.
|
\item A performance analysis showing that Tailscale outperforms the
|
||||||
\item A performance analysis demonstrating that Tailscale
|
Linux kernel's default networking stack under degraded conditions,
|
||||||
outperforms the Linux kernel's default networking stack under
|
and that kernel parameter tuning (Reno congestion control in place
|
||||||
degraded conditions, and that kernel parameter tuning (Reno
|
of CUBIC, with RACK disabled) yields measurable throughput
|
||||||
congestion control in place of CUBIC, with RACK
|
improvements.
|
||||||
disabled) yields measurable throughput improvements.
|
|
||||||
\item The discovery of several security vulnerabilities across
|
\item The discovery of several security vulnerabilities across
|
||||||
the evaluated VPN implementations.
|
the evaluated VPN implementations.
|
||||||
\item An automated benchmarking framework designed for public
|
\item An automated benchmarking framework that produces a public
|
||||||
leaderboard generation, intended to encourage ongoing
|
leaderboard, giving VPN developers a target to optimize
|
||||||
optimization by VPN developers.
|
against.
|
||||||
\end{enumerate}
|
\end{enumerate}
|
||||||
|
|
||||||
\begin{figure}[H]
|
\begin{figure}[H]
|
||||||
|
|||||||
@@ -7,11 +7,9 @@
|
|||||||
This chapter describes the methodology used to benchmark and analyze
|
This chapter describes the methodology used to benchmark and analyze
|
||||||
peer-to-peer mesh VPN implementations. The evaluation combines
|
peer-to-peer mesh VPN implementations. The evaluation combines
|
||||||
performance benchmarking under controlled network conditions with a
|
performance benchmarking under controlled network conditions with a
|
||||||
structured source code analysis of each implementation. The
|
structured source code analysis of each implementation. All
|
||||||
benchmarking framework prioritizes reproducibility at every layer,
|
dependencies, system configurations, and test procedures are pinned
|
||||||
from pinned dependencies and declarative system configuration to
|
or declared so that the experiments can be independently reproduced.
|
||||||
automated test orchestration, enabling independent verification of
|
|
||||||
results and facilitating future comparative studies.
|
|
||||||
|
|
||||||
\section{Experimental Setup}
|
\section{Experimental Setup}
|
||||||
|
|
||||||
@@ -29,19 +27,19 @@ identical specifications:
|
|||||||
RDRAND, SSE4.2
|
RDRAND, SSE4.2
|
||||||
\end{itemize}
|
\end{itemize}
|
||||||
|
|
||||||
The presence of hardware cryptographic acceleration is relevant because
|
Results may differ on systems without hardware cryptographic
|
||||||
many VPN implementations use AES-NI for encryption, and the results
|
acceleration, since most of the tested VPNs offload encryption to
|
||||||
may differ on systems without these features.
|
AES-NI.
|
||||||
|
|
||||||
\subsection{Network Topology}
|
\subsection{Network Topology}
|
||||||
|
|
||||||
The three machines are connected via a direct 1 Gbps LAN on the same
|
The three machines are connected via a direct 1 Gbps LAN on the same
|
||||||
network segment. Each machine has a publicly reachable IPv4 address,
|
network segment. Each machine has a publicly reachable IPv4 address,
|
||||||
which is used to deploy configuration changes via Clan. This baseline
|
which is used to deploy configuration changes via Clan. On this
|
||||||
topology provides a controlled environment with minimal latency and no
|
baseline topology, latency is sub-millisecond and there is no packet
|
||||||
packet loss, allowing the overhead introduced by each VPN implementation
|
loss, so measured overhead can be attributed to the VPN itself.
|
||||||
to be measured in isolation. Figure~\ref{fig:mesh_topology} illustrates
|
Figure~\ref{fig:mesh_topology} illustrates the full-mesh connectivity
|
||||||
the full-mesh connectivity between the three machines.
|
between the three machines.
|
||||||
|
|
||||||
\begin{figure}[H]
|
\begin{figure}[H]
|
||||||
\centering
|
\centering
|
||||||
@@ -74,8 +72,8 @@ double the per-machine values.
|
|||||||
|
|
||||||
\subsection{Configuration Methodology}
|
\subsection{Configuration Methodology}
|
||||||
|
|
||||||
Each VPN is built from source within the Nix flake, ensuring that all
|
Each VPN is built from source within the Nix flake, with all
|
||||||
dependencies are pinned to exact versions. VPNs not packaged in nixpkgs
|
dependencies pinned to exact versions. VPNs not packaged in nixpkgs
|
||||||
(Hyprspace, EasyTier, VpnCloud) have dedicated build expressions
|
(Hyprspace, EasyTier, VpnCloud) have dedicated build expressions
|
||||||
under \texttt{pkgs/} in the flake.
|
under \texttt{pkgs/} in the flake.
|
||||||
|
|
||||||
@@ -85,13 +83,14 @@ system.
|
|||||||
|
|
||||||
Generated keys are stored in version control under
|
Generated keys are stored in version control under
|
||||||
\texttt{vars/per-machine/\{name\}/} and read at NixOS evaluation time,
|
\texttt{vars/per-machine/\{name\}/} and read at NixOS evaluation time,
|
||||||
making key material part of the reproducible configuration.
|
so key material is part of the reproducible configuration.
|
||||||
|
|
||||||
\section{Benchmark Suite}
|
\section{Benchmark Suite}
|
||||||
|
|
||||||
The benchmark suite includes both synthetic throughput tests and
|
The benchmark suite includes synthetic throughput tests and
|
||||||
real-world workloads. This combination addresses a limitation of prior
|
application-level workloads. Prior comparative work relied exclusively
|
||||||
work that relied exclusively on iperf3.
|
on iperf3; the additional benchmarks here capture behavior that
|
||||||
|
iperf3 alone misses.
|
||||||
Table~\ref{tab:benchmark_suite} summarises each benchmark.
|
Table~\ref{tab:benchmark_suite} summarises each benchmark.
|
||||||
|
|
||||||
\begin{table}[H]
|
\begin{table}[H]
|
||||||
@@ -114,8 +113,8 @@ Table~\ref{tab:benchmark_suite} summarises each benchmark.
|
|||||||
\end{tabular}
|
\end{tabular}
|
||||||
\end{table}
|
\end{table}
|
||||||
|
|
||||||
The first four benchmarks use well-known network testing tools;
|
The first four benchmarks use standard network testing tools;
|
||||||
the remaining three target workloads closer to real-world usage.
|
the remaining three test application-level workloads.
|
||||||
The subsections below describe configuration details that the table
|
The subsections below describe configuration details that the table
|
||||||
does not capture.
|
does not capture.
|
||||||
|
|
||||||
@@ -133,48 +132,49 @@ counters.
|
|||||||
|
|
||||||
\subsection{Parallel iPerf3}
|
\subsection{Parallel iPerf3}
|
||||||
|
|
||||||
Runs TCP streams on all three machines simultaneously in a circular
|
Runs one bidirectional TCP stream on all three machine pairs
|
||||||
pattern (A$\rightarrow$B, B$\rightarrow$C, C$\rightarrow$A) for
|
simultaneously in a circular pattern (A$\rightarrow$B,
|
||||||
60 seconds with zero-copy (\texttt{-Z}). This creates contention
|
B$\rightarrow$C, C$\rightarrow$A) for 60 seconds with zero-copy
|
||||||
across the overlay network, stressing shared resources that
|
(\texttt{-Z}). The three concurrent bidirectional links produce six
|
||||||
single-stream tests leave idle.
|
unidirectional flows in total. This contention stresses shared
|
||||||
|
resources that single-stream tests leave idle.
|
||||||
|
|
||||||
\subsection{QPerf}
|
\subsection{QPerf}
|
||||||
|
|
||||||
Spawns one qperf process per CPU core, each running for 30 seconds.
|
Spawns one qperf process per CPU core, each running for 30 seconds.
|
||||||
Per-core bandwidth is summed per second. Unlike the iPerf3 tests,
|
Per-core bandwidth is summed per second. In addition to throughput,
|
||||||
QPerf targets QUIC connection-level performance, capturing time to
|
QPerf reports time to first byte and connection establishment time,
|
||||||
first byte and connection establishment time alongside throughput.
|
which iPerf3 does not measure.
|
||||||
|
|
||||||
\subsection{RIST Video Streaming}
|
\subsection{RIST Video Streaming}
|
||||||
|
|
||||||
Generates a 4K ($3840\times2160$) H.264 test pattern at 30\,fps
|
Generates a 4K ($3840\times2160$) H.264 test pattern at 30\,fps
|
||||||
(ultrafast preset, zerolatency tuning, 25\,Mbps target bitrate) with
|
(ultrafast preset, zerolatency tuning, 25\,Mbps bitrate cap) with
|
||||||
ffmpeg and transmits it over the RIST protocol for 30 seconds. RIST
|
ffmpeg and transmits it over the RIST protocol for 30 seconds. Because
|
||||||
(Reliable Internet Stream Transport) is designed for low-latency
|
the synthetic test pattern is highly compressible, the actual encoding
|
||||||
video contribution over unreliable networks, making it a realistic
|
bitrate is approximately 3.3\,Mbps, well below the configured cap. RIST
|
||||||
test of VPN behavior under multimedia workloads. In addition to
|
(Reliable Internet Stream Transport) is a protocol for low-latency
|
||||||
standard network metrics, the benchmark records encoding-side
|
video contribution over unreliable networks. The benchmark records
|
||||||
statistics (actual bitrate, frame rate, dropped frames) and
|
encoding-side statistics (actual bitrate, frame rate, dropped frames)
|
||||||
RIST-specific counters (packets recovered via retransmission, quality
|
and RIST-specific counters (packets recovered via retransmission,
|
||||||
score).
|
quality score).
|
||||||
|
|
||||||
\subsection{Nix Cache Download}
|
\subsection{Nix Cache Download}
|
||||||
|
|
||||||
A Harmonia Nix binary cache server on the target machine serves the
|
A Harmonia Nix binary cache server on the target machine serves the
|
||||||
Firefox package. The client downloads it via \texttt{nix copy}
|
Firefox package. The client downloads it via \texttt{nix copy}
|
||||||
through the VPN, exercising many small HTTP requests rather than a
|
through the VPN. Unlike the iPerf3 tests, this workload issues many
|
||||||
single bulk transfer. Benchmarked with hyperfine (1 warmup run,
|
short-lived HTTP requests instead of a single bulk transfer.
|
||||||
2 timed runs); the local Nix store and SQLite metadata are cleared
|
Benchmarked with hyperfine (1 warmup run, 2 timed runs); the local
|
||||||
between runs.
|
Nix store and SQLite metadata are cleared between runs.
|
||||||
|
|
||||||
\section{Network Impairment Profiles}
|
\section{Network Impairment Profiles}
|
||||||
|
|
||||||
To evaluate VPN performance under different network conditions, four
|
Four impairment profiles simulate progressively worse network
|
||||||
impairment profiles are defined, ranging from an unmodified baseline
|
conditions, from an unmodified baseline to a severely degraded link.
|
||||||
to a severely degraded link. All impairments are injected with Linux
|
All impairments are injected with Linux traffic control
|
||||||
traffic control (\texttt{tc netem}) on the egress side of every
|
(\texttt{tc netem}) on the egress side of every machine's primary
|
||||||
machine's primary interface.
|
interface.
|
||||||
Table~\ref{tab:impairment_profiles} lists the per-machine values.
|
Table~\ref{tab:impairment_profiles} lists the per-machine values.
|
||||||
Because impairments are applied on both ends of a connection, the
|
Because impairments are applied on both ends of a connection, the
|
||||||
effective round-trip impact is roughly double the listed values.
|
effective round-trip impact is roughly double the listed values.
|
||||||
@@ -222,14 +222,14 @@ aspect of the simulated degradation:
|
|||||||
\end{itemize}
|
\end{itemize}
|
||||||
|
|
||||||
A 30-second stabilization period follows TC application before
|
A 30-second stabilization period follows TC application before
|
||||||
measurements begin, allowing queuing disciplines to settle.
|
measurements begin so that queuing disciplines can settle.
|
||||||
|
|
||||||
\section{Experimental Procedure}
|
\section{Experimental Procedure}
|
||||||
|
|
||||||
\subsection{Automation}
|
\subsection{Automation}
|
||||||
|
|
||||||
The benchmark suite is fully automated via a Python orchestrator
|
A Python orchestrator (\texttt{vpn\_bench/}) automates the full
|
||||||
(\texttt{vpn\_bench/}). For each VPN under test, the orchestrator:
|
benchmark suite. For each VPN under test, it:
|
||||||
|
|
||||||
\begin{enumerate}
|
\begin{enumerate}
|
||||||
\item Cleans all state directories from previous VPN runs
|
\item Cleans all state directories from previous VPN runs
|
||||||
@@ -327,11 +327,12 @@ Each metric is summarized as a statistics dictionary containing:
|
|||||||
Aggregation differs by benchmark type. Benchmarks that execute
|
Aggregation differs by benchmark type. Benchmarks that execute
|
||||||
multiple discrete runs, ping (3 runs of 100 packets each) and
|
multiple discrete runs, ping (3 runs of 100 packets each) and
|
||||||
nix-cache (2 timed runs via hyperfine), first compute statistics
|
nix-cache (2 timed runs via hyperfine), first compute statistics
|
||||||
within each run, then average the resulting statistics across runs.
|
within each run, then aggregate across runs: averages and percentiles
|
||||||
Concretely, if ping produces three runs with mean RTTs of
|
are averaged, while the reported minimum and maximum are the global
|
||||||
5.1, 5.3, and 5.0\,ms, the reported average is the mean of
|
extremes across all runs. Concretely, if ping produces three runs
|
||||||
those three values (5.13\,ms). The reported minimum is the
|
with mean RTTs of 5.1, 5.3, and 5.0\,ms, the reported average is
|
||||||
single lowest RTT observed across all three runs.
|
the mean of those three values (5.13\,ms). The reported minimum is
|
||||||
|
the single lowest RTT observed across all three runs.
|
||||||
|
|
||||||
Benchmarks that produce continuous per-second samples, qperf and
|
Benchmarks that produce continuous per-second samples, qperf and
|
||||||
RIST streaming for example, pool all per-second measurements from a single
|
RIST streaming for example, pool all per-second measurements from a single
|
||||||
@@ -340,9 +341,9 @@ bandwidth is first summed across CPU cores for each second, and
|
|||||||
statistics are then computed over the resulting time series.
|
statistics are then computed over the resulting time series.
|
||||||
|
|
||||||
The analysis reports empirical percentiles (p25, p50, p75) alongside
|
The analysis reports empirical percentiles (p25, p50, p75) alongside
|
||||||
min/max bounds rather than parametric confidence intervals. This
|
min/max bounds rather than parametric confidence intervals.
|
||||||
choice is deliberate: benchmark latency and throughput distributions
|
Benchmark latency and throughput distributions are often skewed or
|
||||||
are often skewed or multimodal, making assumptions of normality
|
multimodal, so parametric assumptions of normality would be
|
||||||
unreliable. The interquartile range (p25--p75) conveys the spread of
|
unreliable. The interquartile range (p25--p75) conveys the spread of
|
||||||
typical observations, while min and max capture outlier behavior.
|
typical observations, while min and max capture outlier behavior.
|
||||||
The nix-cache benchmark additionally reports standard deviation via
|
The nix-cache benchmark additionally reports standard deviation via
|
||||||
@@ -350,9 +351,8 @@ hyperfine's built-in statistical output.
|
|||||||
|
|
||||||
\section{Source Code Analysis}
|
\section{Source Code Analysis}
|
||||||
|
|
||||||
To complement the performance benchmarks with architectural
|
We also conducted a structured source code analysis of all ten VPN
|
||||||
understanding, we conducted a structured source code analysis of
|
implementations. The analysis followed three phases.
|
||||||
all ten VPN implementations. The analysis followed three phases.
|
|
||||||
|
|
||||||
\subsection{Repository Collection and LLM-Assisted Overview}
|
\subsection{Repository Collection and LLM-Assisted Overview}
|
||||||
|
|
||||||
@@ -378,23 +378,23 @@ aspects:
|
|||||||
\end{itemize}
|
\end{itemize}
|
||||||
|
|
||||||
Each agent was required to reference the specific file and line
|
Each agent was required to reference the specific file and line
|
||||||
range supporting every claim, enabling direct verification.
|
range supporting every claim so that outputs could be verified
|
||||||
|
against the source.
|
||||||
|
|
||||||
\subsection{Manual Verification}
|
\subsection{Manual Verification}
|
||||||
|
|
||||||
The LLM-generated overviews served as a navigational aid rather than
|
The LLM-generated overviews served as a navigational aid rather than
|
||||||
a trusted source. The most important code paths identified in each
|
a trusted source. The most important code paths identified in each
|
||||||
overview were manually read and verified against the actual source
|
overview were manually read and verified against the actual source
|
||||||
code, correcting inaccuracies and deepening the analysis where the
|
code. Where the automated summaries were inaccurate or superficial,
|
||||||
automated summaries remained superficial.
|
they were corrected and expanded.
|
||||||
|
|
||||||
\subsection{Feature Matrix and Maintainer Review}
|
\subsection{Feature Matrix and Maintainer Review}
|
||||||
|
|
||||||
The findings from both the automated and manual analysis were
|
The findings from both phases were consolidated into a feature matrix
|
||||||
consolidated into a feature matrix cataloguing 131 features across
|
of 131 features across all ten VPN implementations, covering protocol
|
||||||
all ten VPN implementations. The matrix covers
|
characteristics, cryptographic primitives, NAT traversal strategies,
|
||||||
protocol characteristics, cryptographic primitives, NAT traversal
|
routing behavior, and security properties.
|
||||||
strategies, routing behavior, and security properties.
|
|
||||||
|
|
||||||
The completed feature matrix was published and sent to the respective
|
The completed feature matrix was published and sent to the respective
|
||||||
VPN maintainers for review. We incorporated their feedback as
|
VPN maintainers for review. We incorporated their feedback as
|
||||||
@@ -402,7 +402,7 @@ corrections and clarifications to the final classification.
|
|||||||
|
|
||||||
\section{Reproducibility}
|
\section{Reproducibility}
|
||||||
|
|
||||||
The experimental stack pins or declares every variable that could
|
The experimental stack pins or declares the variables that could
|
||||||
affect results.
|
affect results.
|
||||||
|
|
||||||
\subsection{Dependency Pinning}
|
\subsection{Dependency Pinning}
|
||||||
@@ -412,8 +412,8 @@ cryptographic hashes (\texttt{narHash}) and commit SHAs for each input.
|
|||||||
Key pinned inputs include:
|
Key pinned inputs include:
|
||||||
|
|
||||||
\begin{itemize}
|
\begin{itemize}
|
||||||
\bitem{nixpkgs:} Follows \texttt{clan-core/nixpkgs}, ensuring a
|
\bitem{nixpkgs:} Follows \texttt{clan-core/nixpkgs}, so a single
|
||||||
single version across the dependency graph
|
version is used across the dependency graph
|
||||||
\bitem{clan-core:} The Clan framework, pinned to a specific commit
|
\bitem{clan-core:} The Clan framework, pinned to a specific commit
|
||||||
\bitem{VPN sources:} Hyprspace, EasyTier, Nebula locked to
|
\bitem{VPN sources:} Hyprspace, EasyTier, Nebula locked to
|
||||||
exact commits
|
exact commits
|
||||||
@@ -527,9 +527,8 @@ VPNs were selected based on:
|
|||||||
\bitem{Linux support:} All VPNs must run on Linux.
|
\bitem{Linux support:} All VPNs must run on Linux.
|
||||||
\end{itemize}
|
\end{itemize}
|
||||||
|
|
||||||
Ten VPN implementations were selected for evaluation, spanning a range
|
Table~\ref{tab:vpn_selection} lists the ten VPN implementations
|
||||||
of architectures from centralized coordination to fully decentralized
|
selected for evaluation.
|
||||||
mesh topologies. Table~\ref{tab:vpn_selection} summarizes the selection.
|
|
||||||
|
|
||||||
\begin{table}[H]
|
\begin{table}[H]
|
||||||
\centering
|
\centering
|
||||||
@@ -556,7 +555,7 @@ mesh topologies. Table~\ref{tab:vpn_selection} summarizes the selection.
|
|||||||
\end{tabular}
|
\end{tabular}
|
||||||
\end{table}
|
\end{table}
|
||||||
|
|
||||||
WireGuard is included as a reference point despite not being a mesh VPN.
|
WireGuard is not a mesh VPN but is included as a reference point.
|
||||||
Its minimal overhead and widespread adoption make it a useful comparison
|
Comparing its overhead to the mesh VPNs isolates the cost of mesh
|
||||||
for understanding the cost of mesh coordination and NAT traversal logic.
|
coordination and NAT traversal.
|
||||||
|
|
||||||
|
|||||||
@@ -10,8 +10,9 @@ follows the impairment profiles from ideal to degraded:
|
|||||||
Section~\ref{sec:baseline} establishes overhead under ideal
|
Section~\ref{sec:baseline} establishes overhead under ideal
|
||||||
conditions, then subsequent sections examine how each VPN responds to
|
conditions, then subsequent sections examine how each VPN responds to
|
||||||
increasing network impairment. The chapter concludes with findings
|
increasing network impairment. The chapter concludes with findings
|
||||||
from the source code analysis. A recurring theme throughout is that
|
from the source code analysis. A recurring theme is that no single
|
||||||
no single metric captures VPN performance; the rankings shift
|
metric captures VPN
|
||||||
|
performance; the rankings shift
|
||||||
depending on whether one measures throughput, latency, retransmit
|
depending on whether one measures throughput, latency, retransmit
|
||||||
behavior, or real-world application performance.
|
behavior, or real-world application performance.
|
||||||
|
|
||||||
@@ -184,7 +185,7 @@ opposite extreme: brute-force retransmission can still yield high
|
|||||||
throughput (814\,Mbps with 1\,163 retransmits), at the cost of wasted
|
throughput (814\,Mbps with 1\,163 retransmits), at the cost of wasted
|
||||||
bandwidth and unstable flow behavior.
|
bandwidth and unstable flow behavior.
|
||||||
|
|
||||||
VpnCloud warrants specific attention: its sender reports 538.8\,Mbps
|
VpnCloud stands out: its sender reports 538.8\,Mbps
|
||||||
but the receiver measures only 413.4\,Mbps, leaving a 23\,\% gap (the largest
|
but the receiver measures only 413.4\,Mbps, leaving a 23\,\% gap (the largest
|
||||||
in the dataset). This suggests significant in-tunnel packet loss or
|
in the dataset). This suggests significant in-tunnel packet loss or
|
||||||
buffering at the VpnCloud layer that the retransmit count (857)
|
buffering at the VpnCloud layer that the retransmit count (857)
|
||||||
@@ -256,10 +257,10 @@ times, which cluster into three distinct ranges.
|
|||||||
\end{table}
|
\end{table}
|
||||||
|
|
||||||
Six VPNs stay below 1.3\,ms, comfortably close to the bare-metal
|
Six VPNs stay below 1.3\,ms, comfortably close to the bare-metal
|
||||||
0.60\,ms. VpnCloud is a notable result: it posts the lowest latency
|
0.60\,ms. VpnCloud posts the lowest latency of any VPN (1.13\,ms), below
|
||||||
of any VPN (1.13\,ms), edging out WireGuard (1.20\,ms), yet its
|
WireGuard (1.20\,ms), yet its throughput tops out at only 539\,Mbps.
|
||||||
throughput tops out at only 539\,Mbps. Low per-packet latency does
|
Low per-packet latency does not guarantee high bulk throughput. A
|
||||||
not guarantee high bulk throughput. A second group (Headscale,
|
second group (Headscale,
|
||||||
Hyprspace, Yggdrasil) lands in the 1.5--2.2\,ms range, representing
|
Hyprspace, Yggdrasil) lands in the 1.5--2.2\,ms range, representing
|
||||||
moderate overhead. Then there is Mycelium at 34.9\,ms, so far
|
moderate overhead. Then there is Mycelium at 34.9\,ms, so far
|
||||||
removed from the rest that Section~\ref{sec:mycelium_routing} gives
|
removed from the rest that Section~\ref{sec:mycelium_routing} gives
|
||||||
@@ -289,8 +290,8 @@ the CPU, not the network, is the bottleneck.
|
|||||||
Figure~\ref{fig:latency_throughput} makes this disconnect easy to
|
Figure~\ref{fig:latency_throughput} makes this disconnect easy to
|
||||||
spot.
|
spot.
|
||||||
|
|
||||||
Looking at CPU efficiency more broadly, the qperf measurements
|
The qperf measurements also reveal a wide spread in CPU usage.
|
||||||
reveal a wide spread. Hyprspace (55.1\,\%) and Yggdrasil
|
Hyprspace (55.1\,\%) and Yggdrasil
|
||||||
(52.8\,\%) consume 5--6$\times$ as much CPU as Internal's
|
(52.8\,\%) consume 5--6$\times$ as much CPU as Internal's
|
||||||
9.7\,\%. WireGuard sits at 30.8\,\%, surprisingly high for a
|
9.7\,\%. WireGuard sits at 30.8\,\%, surprisingly high for a
|
||||||
kernel-level implementation, though much of that goes to
|
kernel-level implementation, though much of that goes to
|
||||||
@@ -318,20 +319,21 @@ The single-stream benchmark tests one link direction at a time. The
|
|||||||
parallel benchmark changes this setup: all three link directions
|
parallel benchmark changes this setup: all three link directions
|
||||||
(lom$\rightarrow$yuki, yuki$\rightarrow$luna,
|
(lom$\rightarrow$yuki, yuki$\rightarrow$luna,
|
||||||
luna$\rightarrow$lom) run simultaneously in a circular pattern for
|
luna$\rightarrow$lom) run simultaneously in a circular pattern for
|
||||||
60~seconds, each carrying ten TCP streams. Because three independent
|
60~seconds, each carrying one bidirectional TCP stream (six
|
||||||
|
unidirectional flows in total). Because three independent
|
||||||
link pairs now compete for shared tunnel resources at once, the
|
link pairs now compete for shared tunnel resources at once, the
|
||||||
aggregate throughput is naturally higher than any single direction
|
aggregate throughput is naturally higher than any single direction
|
||||||
alone, which is why even Internal reaches 1.50$\times$ its
|
alone, which is why even Internal reaches 1.50$\times$ its
|
||||||
single-stream figure. The scaling factor (parallel throughput
|
single-stream figure. The scaling factor (parallel throughput
|
||||||
divided by single-stream throughput) therefore captures two effects:
|
divided by single-stream throughput) captures two effects:
|
||||||
the benefit of utilizing multiple link pairs in parallel, and how
|
the benefit of using multiple link pairs in parallel, and how
|
||||||
well the VPN handles the resulting contention.
|
well the VPN handles the resulting contention.
|
||||||
Table~\ref{tab:parallel_scaling} lists the results.
|
Table~\ref{tab:parallel_scaling} lists the results.
|
||||||
|
|
||||||
\begin{table}[H]
|
\begin{table}[H]
|
||||||
\centering
|
\centering
|
||||||
\caption{Parallel TCP scaling at baseline. Scaling factor is the
|
\caption{Parallel TCP scaling at baseline. Scaling factor is the
|
||||||
ratio of ten-stream to single-stream throughput. Internal's
|
ratio of parallel to single-stream throughput. Internal's
|
||||||
1.50$\times$ represents the expected scaling on this hardware.}
|
1.50$\times$ represents the expected scaling on this hardware.}
|
||||||
\label{tab:parallel_scaling}
|
\label{tab:parallel_scaling}
|
||||||
\begin{tabular}{lrrr}
|
\begin{tabular}{lrrr}
|
||||||
@@ -357,7 +359,7 @@ Table~\ref{tab:parallel_scaling} lists the results.
|
|||||||
The VPNs that gain the most are those most constrained in
|
The VPNs that gain the most are those most constrained in
|
||||||
single-stream mode. Mycelium's 34.9\,ms RTT means a lone TCP stream
|
single-stream mode. Mycelium's 34.9\,ms RTT means a lone TCP stream
|
||||||
can never fill the pipe: the bandwidth-delay product demands a window
|
can never fill the pipe: the bandwidth-delay product demands a window
|
||||||
larger than any single flow maintains, so ten streams collectively
|
larger than any single flow maintains, so multiple concurrent flows
|
||||||
compensate for that constraint and push throughput to 2.20$\times$
|
compensate for that constraint and push throughput to 2.20$\times$
|
||||||
the single-stream figure. Hyprspace scales almost as well
|
the single-stream figure. Hyprspace scales almost as well
|
||||||
(2.18$\times$) but for a
|
(2.18$\times$) but for a
|
||||||
@@ -379,8 +381,8 @@ streams: throughput drops from 706\,Mbps to 648\,Mbps
|
|||||||
streams are clearly fighting each other for resources inside the
|
streams are clearly fighting each other for resources inside the
|
||||||
tunnel.
|
tunnel.
|
||||||
|
|
||||||
More streams also amplify existing retransmit problems across the
|
More streams also amplify existing retransmit problems. Hyprspace
|
||||||
board. Hyprspace climbs from 4\,965 to 17\,426~retransmits;
|
climbs from 4\,965 to 17\,426~retransmits;
|
||||||
VpnCloud from 857 to 6\,023. VPNs that were clean in single-stream
|
VpnCloud from 857 to 6\,023. VPNs that were clean in single-stream
|
||||||
mode stay clean under load, while the stressed ones only get worse.
|
mode stay clean under load, while the stressed ones only get worse.
|
||||||
|
|
||||||
@@ -702,8 +704,8 @@ propagate.
|
|||||||
\label{sec:pathological}
|
\label{sec:pathological}
|
||||||
|
|
||||||
Three VPNs exhibit behaviors that the aggregate numbers alone cannot
|
Three VPNs exhibit behaviors that the aggregate numbers alone cannot
|
||||||
explain. The following subsections synthesize observations from the
|
explain. The following subsections piece together observations from
|
||||||
preceding benchmarks into per-VPN diagnoses.
|
earlier benchmarks into per-VPN diagnoses.
|
||||||
|
|
||||||
\paragraph{Hyprspace: Buffer Bloat.}
|
\paragraph{Hyprspace: Buffer Bloat.}
|
||||||
\label{sec:hyprspace_bloat}
|
\label{sec:hyprspace_bloat}
|
||||||
|
|||||||
@@ -8,21 +8,21 @@
|
|||||||
\addchaptertocentry{Zusammenfassung}
|
\addchaptertocentry{Zusammenfassung}
|
||||||
|
|
||||||
Diese Arbeit evaluiert zehn Peer-to-Peer-Mesh-VPN-Implementierungen
|
Diese Arbeit evaluiert zehn Peer-to-Peer-Mesh-VPN-Implementierungen
|
||||||
unter kontrollierten Netzwerkbedingungen mithilfe eines
|
under kontrollierten Netzwerkbedingungen mithilfe eines
|
||||||
reproduzierbaren, Nix-basierten Benchmark-Frameworks, das auf einem
|
reproduzierbaren, Nix-basierten Benchmark-Frameworks, das auf einem
|
||||||
Deployment-System namens Clan aufbaut. Die Implementierungen reichen
|
Deployment-System namens Clan aufbaut. Die Implementierungen reichen
|
||||||
von Kernel-Protokollen (WireGuard, als Referenz-Baseline) bis zu
|
von Kernel-Protokollen (WireGuard, also Reference-Baseline) bis zu
|
||||||
Userspace-Overlays (Tinc, Yggdrasil, Nebula, Hyprspace und
|
Userspace-Overlays (Tinc, Yggdrasil, Nebula, Hyprspace und
|
||||||
weitere). Jede wird unter vier Beeinträchtigungsprofilen mit
|
weitere). Jede wird under vier Beeinträchtigungsprofilen mit
|
||||||
variierendem Paketverlust, Paketumsortierung, Latenz und Jitter
|
variierendem Paketverlust, Paketumsortierung, Latenz und Jitter
|
||||||
getestet, was über 300 Messungen in sieben Benchmarks ergibt, von
|
getestet, was über 300 Messungen in sieben Benchmarks ergibt, von
|
||||||
reinem TCP- und UDP-Durchsatz bis zu Video-Streaming und
|
reinem TCP- und UDP-Durchsatz bis zu Video-Streaming und
|
||||||
Anwendungs-Downloads.
|
Anwendungs-Downloads.
|
||||||
|
|
||||||
Ein zentrales Ergebnis ist, dass keine einzelne Metrik die
|
In zentrales Ergebnis ist, dass keine einzelne Metrik die
|
||||||
VPN-Leistung vollständig erfasst: Die Rangfolge verschiebt sich je
|
VPN-Leistung vollständig erfasst: Die Rangfolge verschiebt sich je
|
||||||
nachdem, ob Durchsatz, Latenz, Retransmit-Verhalten oder
|
nachdem, ob Durchsatz, Latenz, Retransmit-Verhalten oder
|
||||||
Transferzeit auf Anwendungsebene gemessen wird. Unter
|
Transferzeit auf Anwendungsebene gemessen wird. Under
|
||||||
Netzwerkbeeinträchtigung übertrifft Tailscale (über Headscale) den
|
Netzwerkbeeinträchtigung übertrifft Tailscale (über Headscale) den
|
||||||
Standard-Netzwerkstack des Linux-Kernels, eine Anomalie, die wir
|
Standard-Netzwerkstack des Linux-Kernels, eine Anomalie, die wir
|
||||||
auf die optimierten Congestion-Control- und Pufferparameter seines
|
auf die optimierten Congestion-Control- und Pufferparameter seines
|
||||||
|
|||||||
Reference in New Issue
Block a user