Improved Writing made it less verbouse

This commit is contained in:
2026-03-18 21:50:15 +01:00
parent 6b32967f32
commit 64aeeb5772
4 changed files with 177 additions and 195 deletions

View File

@@ -2,38 +2,33 @@
\label{Introduction}
Peer-to-peer overlay VPNs promise to restore genuine decentralization
by enabling direct connectivity between nodes regardless of NAT or
firewall restrictions. Yet practitioners choosing among the growing
number of mesh VPN implementations must rely largely on anecdotal
evidence: systematic, reproducible comparisons under realistic
conditions are scarce.
Peer-to-peer overlay VPNs allow nodes to connect directly regardless
of NAT or firewall restrictions. Yet practitioners choosing among the
growing number of mesh VPN implementations must rely largely on
anecdotal evidence: systematic, reproducible comparisons under
realistic conditions are scarce.
This thesis addresses that gap. We benchmark ten peer-to-peer VPN
implementations across seven workloads and four network impairment
profiles, yielding over 300 unique measurements. We complement these
profiles, producing over 300 unique measurements. We complement these
performance benchmarks with a source code analysis of each
implementation, verified through direct engagement with the respective
maintainers. The entire experimental framework is built on Nix, NixOS,
and the Clan deployment system, making every result independently
reproducible.
implementation, verified by the respective maintainers. The entire
experimental framework is built on Nix, NixOS, and the Clan deployment
system, so every result is independently reproducible.
\section{Motivation}
Peer-to-peer architectures promise censorship-resistant, fault-tolerant
infrastructure by eliminating single points of failure
\cite{shukla_towards_2021}.
These architectures underpin a growing range of systems, from IoT
edge computing and content delivery networks to blockchain platforms
like Ethereum.
Yet realizing these benefits requires distributing nodes across
genuinely diverse hosting entities.
Peer-to-peer architectures can provide censorship-resistant,
fault-tolerant infrastructure because they have no single point of
failure \cite{shukla_towards_2021}. IoT edge computing, content
delivery networks, and blockchain platforms like Ethereum all rely on
some form of peer-to-peer topology. But these benefits only hold when
nodes are spread across diverse hosting entities.
In practice, this diversity remains illusory.
Amazon, Hetzner, and OVH collectively host 70\% of all Ethereum nodes
(see Figure~\ref{fig:ethernodes_hosting}),
concentrating nominally decentralized infrastructure
within a handful of cloud providers.
(see Figure~\ref{fig:ethernodes_hosting}), so nominally decentralized
infrastructure actually sits in a handful of cloud providers.
More concerning, these providers operate under overlapping regulatory
jurisdictions,
predominantly the United States and the European Union.
@@ -49,50 +44,40 @@ data disclosure, or traffic manipulation across a majority of the network.
\label{fig:ethernodes_hosting}
\end{figure}
Why does this centralization persist despite the explicit goals of
decentralization?
The answer lies in the practical barriers to self-hosting.
Cloud providers offer static IP addresses and publicly routable endpoints,
eliminating the networking complexity that plagues residential and
small-office deployments.
This centralization persists because self-hosting is hard. Cloud
providers offer static IP addresses and publicly routable endpoints,
which avoids the networking problems that residential and small-office
deployments face.
Most internet-connected devices sit behind Network Address Translation (NAT),
which prevents incoming connections without explicit port forwarding
or relay infrastructure.
Combined with dynamic IP assignments from ISPs, maintaining stable
peer connectivity
from self-hosted infrastructure traditionally required significant
technical expertise.
Combined with dynamic IP assignments from ISPs, stable peer
connectivity from self-hosted infrastructure has traditionally
required significant technical expertise.
Overlay VPNs offer a solution to this fundamental barrier.
By establishing encrypted tunnels that traverse NAT boundaries,
mesh VPNs enable direct peer-to-peer connectivity without requiring
static IP addresses or manual firewall configuration.
Each node receives a stable virtual address within the overlay network,
regardless of its underlying network topology.
In practice, this means a device behind consumer-grade NAT can
participate as a first-class peer in a distributed system,
removing the primary technical advantage that cloud providers hold.
Overlay VPNs solve this problem. They establish encrypted tunnels
that traverse NAT boundaries, so peers can connect directly without
static IP addresses or manual firewall configuration. Each node
receives a stable virtual address within the overlay network,
regardless of its physical network topology. A device behind
consumer-grade NAT can therefore participate as a first-class peer
in a distributed system.
The Clan deployment framework builds on this foundation.
Clan uses Nix and NixOS to eliminate configuration drift and
dependency conflicts, reducing operational overhead enough for a
single administrator to reliably self-host complex distributed
services.
Overlay VPNs are central to Clan's architecture,
providing the secure peer connectivity that enables nodes
to form cohesive networks regardless of their physical location or
NAT situation.
As illustrated in Figure~\ref{fig:vision-stages}, Clan envisions
a web interface that enables users to design and deploy private P2P networks
with minimal configuration, assisted by an integrated LLM
for contextual guidance and troubleshooting.
The Clan deployment framework uses Nix and NixOS to eliminate
configuration drift and dependency conflicts. The result is that a
single administrator can reliably self-host distributed services.
Overlay VPNs are central to Clan's architecture: they supply the
peer connectivity that lets nodes form a network regardless of
physical location or NAT situation.
As illustrated in Figure~\ref{fig:vision-stages}, Clan plans to offer
a web interface that lets users design and deploy private P2P networks
with minimal configuration, assisted by an integrated LLM.
During the development of Clan, a recurring challenge became apparent:
practitioners held divergent preferences for mesh VPN solutions,
each citing different edge cases where their chosen VPN
proved unreliable or lacked essential features.
These discussions were grounded in anecdotal evidence rather than
systematic evaluation, motivating the present work.
During Clan's development, a recurring problem surfaced:
practitioners disagreed on which mesh VPN to use, each pointing to
different edge cases where their preferred VPN failed or lacked a
needed feature. These discussions relied on anecdotal evidence rather
than systematic evaluation, which motivated the present work.
\subsection{Related Work}
@@ -108,49 +93,45 @@ for distributed systems, analyzing throughput, reliability under packet
loss, and relay behavior for VPNs including ZeroTier. However, it
focuses primarily on solutions with a central point of failure and
limits its workloads to synthetic iperf3 tests. This thesis extends
that foundation by evaluating a broader set of VPN implementations
with emphasis on fully decentralized architectures, exercising them
under real-world workloads such as video streaming and package
downloads, applying multiple network impairment profiles, and
providing a fully reproducible experimental framework built on
Nix, NixOS, and Clan.
that work: it evaluates a broader set of VPN implementations with
emphasis on fully decentralized architectures, tests them under
application-level workloads (video streaming, package downloads),
applies multiple network impairment profiles, and provides a
reproducible experimental framework built on Nix, NixOS, and Clan.
Beyond filling this research gap, a further goal was to create a fully
automated benchmarking framework capable of generating a public
leaderboard, similar in spirit to the js-framework-benchmark
(see Figure~\ref{fig:js-framework-benchmark}). By providing an
accessible web interface with regularly updated
results, the framework gives VPN developers a concrete, public
baseline to measure against.
A secondary goal was to create an automated benchmarking framework
that generates a public leaderboard, similar in spirit to the
js-framework-benchmark (see Figure~\ref{fig:js-framework-benchmark}).
A web interface with regularly updated results gives VPN developers a
concrete baseline to measure against.
\section{Research Contribution}
This thesis makes the following contributions:
\begin{enumerate}
\item A comprehensive benchmark of ten peer-to-peer VPN
implementations across seven workloads (including real-world
video streaming and package downloads) and four network
impairment profiles, producing over 300 unique measurements.
\item A source code analysis of all ten VPN implementations,
combining manual code review with LLM-assisted analysis,
followed by verification through direct engagement with the
respective maintainers on GitHub.
\item A fully reproducible experimental framework built on
Nix, NixOS, and the Clan deployment system, with pinned
dependencies, declarative system configuration, and
deterministic cryptographic material generation, enabling
independent replication of all results.
\item A performance analysis demonstrating that Tailscale
outperforms the Linux kernel's default networking stack under
degraded conditions, and that kernel parameter tuning (Reno
congestion control in place of CUBIC, with RACK
disabled) yields measurable throughput improvements.
\item A benchmark of ten peer-to-peer VPN implementations across
seven workloads (including video streaming and package downloads)
and four network impairment profiles, with over 300 unique
measurements.
\item A source code analysis of all ten VPN implementations. Manual
code review was combined with LLM-assisted analysis and the results
were verified by the respective maintainers on GitHub.
\item A reproducible experimental framework built on Nix, NixOS,
and the Clan deployment system. All dependencies are pinned,
system configuration is declarative, and cryptographic material
is generated deterministically, so every result can be
independently replicated.
\item A performance analysis showing that Tailscale outperforms the
Linux kernel's default networking stack under degraded conditions,
and that kernel parameter tuning (Reno congestion control in place
of CUBIC, with RACK disabled) yields measurable throughput
improvements.
\item The discovery of several security vulnerabilities across
the evaluated VPN implementations.
\item An automated benchmarking framework designed for public
leaderboard generation, intended to encourage ongoing
optimization by VPN developers.
\item An automated benchmarking framework that produces a public
leaderboard, giving VPN developers a target to optimize
against.
\end{enumerate}
\begin{figure}[H]

View File

@@ -7,11 +7,9 @@
This chapter describes the methodology used to benchmark and analyze
peer-to-peer mesh VPN implementations. The evaluation combines
performance benchmarking under controlled network conditions with a
structured source code analysis of each implementation. The
benchmarking framework prioritizes reproducibility at every layer,
from pinned dependencies and declarative system configuration to
automated test orchestration, enabling independent verification of
results and facilitating future comparative studies.
structured source code analysis of each implementation. All
dependencies, system configurations, and test procedures are pinned
or declared so that the experiments can be independently reproduced.
\section{Experimental Setup}
@@ -29,19 +27,19 @@ identical specifications:
RDRAND, SSE4.2
\end{itemize}
The presence of hardware cryptographic acceleration is relevant because
many VPN implementations use AES-NI for encryption, and the results
may differ on systems without these features.
Results may differ on systems without hardware cryptographic
acceleration, since most of the tested VPNs offload encryption to
AES-NI.
\subsection{Network Topology}
The three machines are connected via a direct 1 Gbps LAN on the same
network segment. Each machine has a publicly reachable IPv4 address,
which is used to deploy configuration changes via Clan. This baseline
topology provides a controlled environment with minimal latency and no
packet loss, allowing the overhead introduced by each VPN implementation
to be measured in isolation. Figure~\ref{fig:mesh_topology} illustrates
the full-mesh connectivity between the three machines.
which is used to deploy configuration changes via Clan. On this
baseline topology, latency is sub-millisecond and there is no packet
loss, so measured overhead can be attributed to the VPN itself.
Figure~\ref{fig:mesh_topology} illustrates the full-mesh connectivity
between the three machines.
\begin{figure}[H]
\centering
@@ -74,8 +72,8 @@ double the per-machine values.
\subsection{Configuration Methodology}
Each VPN is built from source within the Nix flake, ensuring that all
dependencies are pinned to exact versions. VPNs not packaged in nixpkgs
Each VPN is built from source within the Nix flake, with all
dependencies pinned to exact versions. VPNs not packaged in nixpkgs
(Hyprspace, EasyTier, VpnCloud) have dedicated build expressions
under \texttt{pkgs/} in the flake.
@@ -85,13 +83,14 @@ system.
Generated keys are stored in version control under
\texttt{vars/per-machine/\{name\}/} and read at NixOS evaluation time,
making key material part of the reproducible configuration.
so key material is part of the reproducible configuration.
\section{Benchmark Suite}
The benchmark suite includes both synthetic throughput tests and
real-world workloads. This combination addresses a limitation of prior
work that relied exclusively on iperf3.
The benchmark suite includes synthetic throughput tests and
application-level workloads. Prior comparative work relied exclusively
on iperf3; the additional benchmarks here capture behavior that
iperf3 alone misses.
Table~\ref{tab:benchmark_suite} summarises each benchmark.
\begin{table}[H]
@@ -114,8 +113,8 @@ Table~\ref{tab:benchmark_suite} summarises each benchmark.
\end{tabular}
\end{table}
The first four benchmarks use well-known network testing tools;
the remaining three target workloads closer to real-world usage.
The first four benchmarks use standard network testing tools;
the remaining three test application-level workloads.
The subsections below describe configuration details that the table
does not capture.
@@ -133,48 +132,49 @@ counters.
\subsection{Parallel iPerf3}
Runs TCP streams on all three machines simultaneously in a circular
pattern (A$\rightarrow$B, B$\rightarrow$C, C$\rightarrow$A) for
60 seconds with zero-copy (\texttt{-Z}). This creates contention
across the overlay network, stressing shared resources that
single-stream tests leave idle.
Runs one bidirectional TCP stream on all three machine pairs
simultaneously in a circular pattern (A$\rightarrow$B,
B$\rightarrow$C, C$\rightarrow$A) for 60 seconds with zero-copy
(\texttt{-Z}). The three concurrent bidirectional links produce six
unidirectional flows in total. This contention stresses shared
resources that single-stream tests leave idle.
\subsection{QPerf}
Spawns one qperf process per CPU core, each running for 30 seconds.
Per-core bandwidth is summed per second. Unlike the iPerf3 tests,
QPerf targets QUIC connection-level performance, capturing time to
first byte and connection establishment time alongside throughput.
Per-core bandwidth is summed per second. In addition to throughput,
QPerf reports time to first byte and connection establishment time,
which iPerf3 does not measure.
\subsection{RIST Video Streaming}
Generates a 4K ($3840\times2160$) H.264 test pattern at 30\,fps
(ultrafast preset, zerolatency tuning, 25\,Mbps target bitrate) with
ffmpeg and transmits it over the RIST protocol for 30 seconds. RIST
(Reliable Internet Stream Transport) is designed for low-latency
video contribution over unreliable networks, making it a realistic
test of VPN behavior under multimedia workloads. In addition to
standard network metrics, the benchmark records encoding-side
statistics (actual bitrate, frame rate, dropped frames) and
RIST-specific counters (packets recovered via retransmission, quality
score).
(ultrafast preset, zerolatency tuning, 25\,Mbps bitrate cap) with
ffmpeg and transmits it over the RIST protocol for 30 seconds. Because
the synthetic test pattern is highly compressible, the actual encoding
bitrate is approximately 3.3\,Mbps, well below the configured cap. RIST
(Reliable Internet Stream Transport) is a protocol for low-latency
video contribution over unreliable networks. The benchmark records
encoding-side statistics (actual bitrate, frame rate, dropped frames)
and RIST-specific counters (packets recovered via retransmission,
quality score).
\subsection{Nix Cache Download}
A Harmonia Nix binary cache server on the target machine serves the
Firefox package. The client downloads it via \texttt{nix copy}
through the VPN, exercising many small HTTP requests rather than a
single bulk transfer. Benchmarked with hyperfine (1 warmup run,
2 timed runs); the local Nix store and SQLite metadata are cleared
between runs.
through the VPN. Unlike the iPerf3 tests, this workload issues many
short-lived HTTP requests instead of a single bulk transfer.
Benchmarked with hyperfine (1 warmup run, 2 timed runs); the local
Nix store and SQLite metadata are cleared between runs.
\section{Network Impairment Profiles}
To evaluate VPN performance under different network conditions, four
impairment profiles are defined, ranging from an unmodified baseline
to a severely degraded link. All impairments are injected with Linux
traffic control (\texttt{tc netem}) on the egress side of every
machine's primary interface.
Four impairment profiles simulate progressively worse network
conditions, from an unmodified baseline to a severely degraded link.
All impairments are injected with Linux traffic control
(\texttt{tc netem}) on the egress side of every machine's primary
interface.
Table~\ref{tab:impairment_profiles} lists the per-machine values.
Because impairments are applied on both ends of a connection, the
effective round-trip impact is roughly double the listed values.
@@ -222,14 +222,14 @@ aspect of the simulated degradation:
\end{itemize}
A 30-second stabilization period follows TC application before
measurements begin, allowing queuing disciplines to settle.
measurements begin so that queuing disciplines can settle.
\section{Experimental Procedure}
\subsection{Automation}
The benchmark suite is fully automated via a Python orchestrator
(\texttt{vpn\_bench/}). For each VPN under test, the orchestrator:
A Python orchestrator (\texttt{vpn\_bench/}) automates the full
benchmark suite. For each VPN under test, it:
\begin{enumerate}
\item Cleans all state directories from previous VPN runs
@@ -327,11 +327,12 @@ Each metric is summarized as a statistics dictionary containing:
Aggregation differs by benchmark type. Benchmarks that execute
multiple discrete runs, ping (3 runs of 100 packets each) and
nix-cache (2 timed runs via hyperfine), first compute statistics
within each run, then average the resulting statistics across runs.
Concretely, if ping produces three runs with mean RTTs of
5.1, 5.3, and 5.0\,ms, the reported average is the mean of
those three values (5.13\,ms). The reported minimum is the
single lowest RTT observed across all three runs.
within each run, then aggregate across runs: averages and percentiles
are averaged, while the reported minimum and maximum are the global
extremes across all runs. Concretely, if ping produces three runs
with mean RTTs of 5.1, 5.3, and 5.0\,ms, the reported average is
the mean of those three values (5.13\,ms). The reported minimum is
the single lowest RTT observed across all three runs.
Benchmarks that produce continuous per-second samples, qperf and
RIST streaming for example, pool all per-second measurements from a single
@@ -340,9 +341,9 @@ bandwidth is first summed across CPU cores for each second, and
statistics are then computed over the resulting time series.
The analysis reports empirical percentiles (p25, p50, p75) alongside
min/max bounds rather than parametric confidence intervals. This
choice is deliberate: benchmark latency and throughput distributions
are often skewed or multimodal, making assumptions of normality
min/max bounds rather than parametric confidence intervals.
Benchmark latency and throughput distributions are often skewed or
multimodal, so parametric assumptions of normality would be
unreliable. The interquartile range (p25--p75) conveys the spread of
typical observations, while min and max capture outlier behavior.
The nix-cache benchmark additionally reports standard deviation via
@@ -350,9 +351,8 @@ hyperfine's built-in statistical output.
\section{Source Code Analysis}
To complement the performance benchmarks with architectural
understanding, we conducted a structured source code analysis of
all ten VPN implementations. The analysis followed three phases.
We also conducted a structured source code analysis of all ten VPN
implementations. The analysis followed three phases.
\subsection{Repository Collection and LLM-Assisted Overview}
@@ -378,23 +378,23 @@ aspects:
\end{itemize}
Each agent was required to reference the specific file and line
range supporting every claim, enabling direct verification.
range supporting every claim so that outputs could be verified
against the source.
\subsection{Manual Verification}
The LLM-generated overviews served as a navigational aid rather than
a trusted source. The most important code paths identified in each
overview were manually read and verified against the actual source
code, correcting inaccuracies and deepening the analysis where the
automated summaries remained superficial.
code. Where the automated summaries were inaccurate or superficial,
they were corrected and expanded.
\subsection{Feature Matrix and Maintainer Review}
The findings from both the automated and manual analysis were
consolidated into a feature matrix cataloguing 131 features across
all ten VPN implementations. The matrix covers
protocol characteristics, cryptographic primitives, NAT traversal
strategies, routing behavior, and security properties.
The findings from both phases were consolidated into a feature matrix
of 131 features across all ten VPN implementations, covering protocol
characteristics, cryptographic primitives, NAT traversal strategies,
routing behavior, and security properties.
The completed feature matrix was published and sent to the respective
VPN maintainers for review. We incorporated their feedback as
@@ -402,7 +402,7 @@ corrections and clarifications to the final classification.
\section{Reproducibility}
The experimental stack pins or declares every variable that could
The experimental stack pins or declares the variables that could
affect results.
\subsection{Dependency Pinning}
@@ -412,8 +412,8 @@ cryptographic hashes (\texttt{narHash}) and commit SHAs for each input.
Key pinned inputs include:
\begin{itemize}
\bitem{nixpkgs:} Follows \texttt{clan-core/nixpkgs}, ensuring a
single version across the dependency graph
\bitem{nixpkgs:} Follows \texttt{clan-core/nixpkgs}, so a single
version is used across the dependency graph
\bitem{clan-core:} The Clan framework, pinned to a specific commit
\bitem{VPN sources:} Hyprspace, EasyTier, Nebula locked to
exact commits
@@ -527,9 +527,8 @@ VPNs were selected based on:
\bitem{Linux support:} All VPNs must run on Linux.
\end{itemize}
Ten VPN implementations were selected for evaluation, spanning a range
of architectures from centralized coordination to fully decentralized
mesh topologies. Table~\ref{tab:vpn_selection} summarizes the selection.
Table~\ref{tab:vpn_selection} lists the ten VPN implementations
selected for evaluation.
\begin{table}[H]
\centering
@@ -556,7 +555,7 @@ mesh topologies. Table~\ref{tab:vpn_selection} summarizes the selection.
\end{tabular}
\end{table}
WireGuard is included as a reference point despite not being a mesh VPN.
Its minimal overhead and widespread adoption make it a useful comparison
for understanding the cost of mesh coordination and NAT traversal logic.
WireGuard is not a mesh VPN but is included as a reference point.
Comparing its overhead to the mesh VPNs isolates the cost of mesh
coordination and NAT traversal.

View File

@@ -10,8 +10,9 @@ follows the impairment profiles from ideal to degraded:
Section~\ref{sec:baseline} establishes overhead under ideal
conditions, then subsequent sections examine how each VPN responds to
increasing network impairment. The chapter concludes with findings
from the source code analysis. A recurring theme throughout is that
no single metric captures VPN performance; the rankings shift
from the source code analysis. A recurring theme is that no single
metric captures VPN
performance; the rankings shift
depending on whether one measures throughput, latency, retransmit
behavior, or real-world application performance.
@@ -184,7 +185,7 @@ opposite extreme: brute-force retransmission can still yield high
throughput (814\,Mbps with 1\,163 retransmits), at the cost of wasted
bandwidth and unstable flow behavior.
VpnCloud warrants specific attention: its sender reports 538.8\,Mbps
VpnCloud stands out: its sender reports 538.8\,Mbps
but the receiver measures only 413.4\,Mbps, leaving a 23\,\% gap (the largest
in the dataset). This suggests significant in-tunnel packet loss or
buffering at the VpnCloud layer that the retransmit count (857)
@@ -256,10 +257,10 @@ times, which cluster into three distinct ranges.
\end{table}
Six VPNs stay below 1.3\,ms, comfortably close to the bare-metal
0.60\,ms. VpnCloud is a notable result: it posts the lowest latency
of any VPN (1.13\,ms), edging out WireGuard (1.20\,ms), yet its
throughput tops out at only 539\,Mbps. Low per-packet latency does
not guarantee high bulk throughput. A second group (Headscale,
0.60\,ms. VpnCloud posts the lowest latency of any VPN (1.13\,ms), below
WireGuard (1.20\,ms), yet its throughput tops out at only 539\,Mbps.
Low per-packet latency does not guarantee high bulk throughput. A
second group (Headscale,
Hyprspace, Yggdrasil) lands in the 1.5--2.2\,ms range, representing
moderate overhead. Then there is Mycelium at 34.9\,ms, so far
removed from the rest that Section~\ref{sec:mycelium_routing} gives
@@ -289,8 +290,8 @@ the CPU, not the network, is the bottleneck.
Figure~\ref{fig:latency_throughput} makes this disconnect easy to
spot.
Looking at CPU efficiency more broadly, the qperf measurements
reveal a wide spread. Hyprspace (55.1\,\%) and Yggdrasil
The qperf measurements also reveal a wide spread in CPU usage.
Hyprspace (55.1\,\%) and Yggdrasil
(52.8\,\%) consume 5--6$\times$ as much CPU as Internal's
9.7\,\%. WireGuard sits at 30.8\,\%, surprisingly high for a
kernel-level implementation, though much of that goes to
@@ -318,20 +319,21 @@ The single-stream benchmark tests one link direction at a time. The
parallel benchmark changes this setup: all three link directions
(lom$\rightarrow$yuki, yuki$\rightarrow$luna,
luna$\rightarrow$lom) run simultaneously in a circular pattern for
60~seconds, each carrying ten TCP streams. Because three independent
60~seconds, each carrying one bidirectional TCP stream (six
unidirectional flows in total). Because three independent
link pairs now compete for shared tunnel resources at once, the
aggregate throughput is naturally higher than any single direction
alone, which is why even Internal reaches 1.50$\times$ its
single-stream figure. The scaling factor (parallel throughput
divided by single-stream throughput) therefore captures two effects:
the benefit of utilizing multiple link pairs in parallel, and how
divided by single-stream throughput) captures two effects:
the benefit of using multiple link pairs in parallel, and how
well the VPN handles the resulting contention.
Table~\ref{tab:parallel_scaling} lists the results.
\begin{table}[H]
\centering
\caption{Parallel TCP scaling at baseline. Scaling factor is the
ratio of ten-stream to single-stream throughput. Internal's
ratio of parallel to single-stream throughput. Internal's
1.50$\times$ represents the expected scaling on this hardware.}
\label{tab:parallel_scaling}
\begin{tabular}{lrrr}
@@ -357,7 +359,7 @@ Table~\ref{tab:parallel_scaling} lists the results.
The VPNs that gain the most are those most constrained in
single-stream mode. Mycelium's 34.9\,ms RTT means a lone TCP stream
can never fill the pipe: the bandwidth-delay product demands a window
larger than any single flow maintains, so ten streams collectively
larger than any single flow maintains, so multiple concurrent flows
compensate for that constraint and push throughput to 2.20$\times$
the single-stream figure. Hyprspace scales almost as well
(2.18$\times$) but for a
@@ -379,8 +381,8 @@ streams: throughput drops from 706\,Mbps to 648\,Mbps
streams are clearly fighting each other for resources inside the
tunnel.
More streams also amplify existing retransmit problems across the
board. Hyprspace climbs from 4\,965 to 17\,426~retransmits;
More streams also amplify existing retransmit problems. Hyprspace
climbs from 4\,965 to 17\,426~retransmits;
VpnCloud from 857 to 6\,023. VPNs that were clean in single-stream
mode stay clean under load, while the stressed ones only get worse.
@@ -702,8 +704,8 @@ propagate.
\label{sec:pathological}
Three VPNs exhibit behaviors that the aggregate numbers alone cannot
explain. The following subsections synthesize observations from the
preceding benchmarks into per-VPN diagnoses.
explain. The following subsections piece together observations from
earlier benchmarks into per-VPN diagnoses.
\paragraph{Hyprspace: Buffer Bloat.}
\label{sec:hyprspace_bloat}

View File

@@ -8,21 +8,21 @@
\addchaptertocentry{Zusammenfassung}
Diese Arbeit evaluiert zehn Peer-to-Peer-Mesh-VPN-Implementierungen
unter kontrollierten Netzwerkbedingungen mithilfe eines
under kontrollierten Netzwerkbedingungen mithilfe eines
reproduzierbaren, Nix-basierten Benchmark-Frameworks, das auf einem
Deployment-System namens Clan aufbaut. Die Implementierungen reichen
von Kernel-Protokollen (WireGuard, als Referenz-Baseline) bis zu
von Kernel-Protokollen (WireGuard, also Reference-Baseline) bis zu
Userspace-Overlays (Tinc, Yggdrasil, Nebula, Hyprspace und
weitere). Jede wird unter vier Beeinträchtigungsprofilen mit
weitere). Jede wird under vier Beeinträchtigungsprofilen mit
variierendem Paketverlust, Paketumsortierung, Latenz und Jitter
getestet, was über 300 Messungen in sieben Benchmarks ergibt, von
reinem TCP- und UDP-Durchsatz bis zu Video-Streaming und
Anwendungs-Downloads.
Ein zentrales Ergebnis ist, dass keine einzelne Metrik die
In zentrales Ergebnis ist, dass keine einzelne Metrik die
VPN-Leistung vollständig erfasst: Die Rangfolge verschiebt sich je
nachdem, ob Durchsatz, Latenz, Retransmit-Verhalten oder
Transferzeit auf Anwendungsebene gemessen wird. Unter
Transferzeit auf Anwendungsebene gemessen wird. Under
Netzwerkbeeinträchtigung übertrifft Tailscale (über Headscale) den
Standard-Netzwerkstack des Linux-Kernels, eine Anomalie, die wir
auf die optimierten Congestion-Control- und Pufferparameter seines