1431 lines
60 KiB
TeX
1431 lines
60 KiB
TeX
% Chapter Template
|
|
|
|
\chapter{Results} % Main chapter title
|
|
|
|
\label{Results}
|
|
|
|
This chapter presents the results of the benchmark suite across all
|
|
ten VPN implementations and the internal baseline. The structure
|
|
follows the impairment profiles from ideal to degraded:
|
|
Section~\ref{sec:baseline} establishes overhead under ideal
|
|
conditions, then subsequent sections examine how each VPN responds to
|
|
increasing network impairment. The chapter concludes with findings
|
|
from the source code analysis. A recurring theme is that no single
|
|
metric captures VPN
|
|
performance; the rankings shift
|
|
depending on whether one measures throughput, latency, retransmit
|
|
behavior, or real-world application performance.
|
|
|
|
\section{Baseline Performance}
|
|
\label{sec:baseline}
|
|
|
|
The baseline impairment profile introduces no artificial loss or
|
|
reordering, so any performance gap between VPNs can be attributed to
|
|
the VPN itself. Throughout the plots in this section, the
|
|
\emph{internal} bar marks a direct host-to-host connection with no VPN
|
|
in the path; it represents the best the hardware can do. On its own,
|
|
this link delivers 934\,Mbps on a single TCP stream and a round-trip
|
|
latency of just
|
|
0.60\,ms. WireGuard comes remarkably close to these numbers, reaching
|
|
92.5\,\% of bare-metal throughput with only a single retransmit across
|
|
an entire 30-second test. Mycelium sits at the other extreme, adding
|
|
34.9\,ms of latency, roughly 58$\times$ the bare-metal figure.
|
|
|
|
\subsection{Test Execution Overview}
|
|
|
|
Running the full baseline suite across all ten VPNs and the internal
|
|
reference took just over four hours. The bulk of that time, about
|
|
2.6~hours (63\,\%), was spent on actual benchmark execution; VPN
|
|
installation and deployment accounted for another 45~minutes (19\,\%),
|
|
and roughly 21~minutes (9\,\%) went to waiting for VPN tunnels to come
|
|
up after restarts. The remaining time was consumed by VPN service restarts
|
|
and traffic-control (tc) stabilization.
|
|
Figure~\ref{fig:test_duration} breaks this down per VPN.
|
|
|
|
Most VPNs completed every benchmark without issues, but four failed
|
|
one test each: Nebula and Headscale timed out on the qperf
|
|
QUIC performance benchmark after six retries, while Hyprspace and
|
|
Mycelium failed the UDP iPerf3 test
|
|
with a 120-second timeout. Their individual success rate is
|
|
85.7\,\%, with all other VPNs passing the full suite
|
|
(Figure~\ref{fig:success_rate}).
|
|
|
|
\begin{figure}[H]
|
|
\centering
|
|
\begin{subfigure}[t]{1.0\textwidth}
|
|
\centering
|
|
\includegraphics[width=\textwidth]{{Figures/baseline/Average Test
|
|
Duration per Machine}.png}
|
|
\caption{Average test duration per VPN, including installation
|
|
time and benchmark execution}
|
|
\label{fig:test_duration}
|
|
\end{subfigure}
|
|
|
|
\vspace{1em}
|
|
|
|
\begin{subfigure}[t]{1.0\textwidth}
|
|
\centering
|
|
\includegraphics[width=\textwidth]{{Figures/baseline/Benchmark
|
|
Success Rate}.png}
|
|
\caption{Benchmark success rate across all seven tests}
|
|
\label{fig:success_rate}
|
|
\end{subfigure}
|
|
\caption{Test execution overview. Hyprspace has the longest average
|
|
duration due to UDP timeouts and long VPN connectivity
|
|
waits. WireGuard completes fastest. Nebula, Headscale,
|
|
Hyprspace, and Mycelium each fail one benchmark.}
|
|
\label{fig:test_overview}
|
|
\end{figure}
|
|
|
|
\subsection{TCP Throughput}
|
|
|
|
Each VPN ran a single-stream iPerf3 session for 30~seconds on every
|
|
link direction (lom$\rightarrow$yuki, yuki$\rightarrow$luna,
|
|
luna$\rightarrow$lom); Table~\ref{tab:tcp_baseline} shows the
|
|
averages. Three distinct performance tiers emerge, separated by
|
|
natural gaps in the data.
|
|
|
|
\begin{table}[H]
|
|
\centering
|
|
\caption{Single-stream TCP throughput at baseline, sorted by
|
|
throughput. Retransmits are averaged per 30-second test across
|
|
all three link directions. The horizontal rules separate the
|
|
three performance tiers.}
|
|
\label{tab:tcp_baseline}
|
|
\begin{tabular}{lrrr}
|
|
\hline
|
|
\textbf{VPN} & \textbf{Throughput (Mbps)} &
|
|
\textbf{Baseline (\%)} & \textbf{Retransmits} \\
|
|
\hline
|
|
Internal & 934 & 100.0 & 1.7 \\
|
|
WireGuard & 864 & 92.5 & 1 \\
|
|
ZeroTier & 814 & 87.2 & 1163 \\
|
|
Headscale & 800 & 85.6 & 102 \\
|
|
Yggdrasil & 795 & 85.1 & 75 \\
|
|
\hline
|
|
Nebula & 706 & 75.6 & 955 \\
|
|
EasyTier & 636 & 68.1 & 537 \\
|
|
VpnCloud & 539 & 57.7 & 857 \\
|
|
\hline
|
|
Hyprspace & 368 & 39.4 & 4965 \\
|
|
Tinc & 336 & 36.0 & 240 \\
|
|
Mycelium & 259 & 27.7 & 710 \\
|
|
\hline
|
|
\end{tabular}
|
|
\end{table}
|
|
|
|
The top tier ($>$80\,\% of baseline) groups WireGuard, ZeroTier,
|
|
Headscale, and Yggdrasil, all within 15\,\% of the bare-metal link.
|
|
A middle tier (55--80\,\%) follows with Nebula, EasyTier, and
|
|
VpnCloud, while Hyprspace, Tinc, and Mycelium occupy the bottom tier
|
|
at under 40\,\% of baseline.
|
|
Figure~\ref{fig:tcp_throughput} visualizes this hierarchy.
|
|
|
|
Raw throughput alone is incomplete, however. The retransmit column
|
|
reveals that not all high-throughput VPNs get there cleanly.
|
|
ZeroTier, for instance, reaches 814\,Mbps but accumulates
|
|
1\,163~retransmits per test, over 1\,000$\times$ what WireGuard
|
|
needs. ZeroTier compensates for tunnel-internal packet loss by
|
|
repeatedly triggering TCP congestion-control recovery, whereas
|
|
WireGuard sends data once and it arrives. Across all VPNs,
|
|
retransmit behaviour falls into three groups: \emph{clean} ($<$110:
|
|
WireGuard, Internal, Yggdrasil, Headscale), \emph{stressed}
|
|
(200--900: Tinc, EasyTier, Mycelium, VpnCloud), and
|
|
\emph{pathological} ($>$950: Nebula, ZeroTier, Hyprspace).
|
|
|
|
% TODO: Is this naming scheme any good?
|
|
|
|
% TODO: Fix TCP Throughput plot
|
|
|
|
\begin{figure}[H]
|
|
\centering
|
|
\begin{subfigure}[t]{\textwidth}
|
|
\centering
|
|
\includegraphics[width=\textwidth]{{Figures/baseline/tcp/TCP
|
|
Throughput}.png}
|
|
\caption{Average single-stream TCP throughput}
|
|
\label{fig:tcp_throughput}
|
|
\end{subfigure}
|
|
|
|
\vspace{1em}
|
|
|
|
\begin{subfigure}[t]{\textwidth}
|
|
\centering
|
|
\includegraphics[width=\textwidth]{{Figures/baseline/tcp/TCP
|
|
Retransmit Rate}.png}
|
|
\caption{Average TCP retransmits per 30-second test (log scale)}
|
|
\label{fig:tcp_retransmits}
|
|
\end{subfigure}
|
|
\caption{TCP throughput and retransmit rate at baseline. WireGuard
|
|
leads at 864\,Mbps with 1 retransmit. Hyprspace has nearly 5000
|
|
retransmits per test. The retransmit count does not always track
|
|
inversely with throughput: ZeroTier achieves high throughput
|
|
\emph{despite} high retransmits.}
|
|
\label{fig:tcp_results}
|
|
\end{figure}
|
|
|
|
Retransmits have a direct mechanical relationship with TCP congestion
|
|
control. Each retransmit triggers a reduction in the congestion window
|
|
(\texttt{cwnd}), throttling the sender. This relationship is visible
|
|
in Figure~\ref{fig:retransmit_correlations}: Hyprspace, with 4965
|
|
retransmits, maintains the smallest average congestion window in the
|
|
dataset (205\,KB), while Yggdrasil's 75 retransmits allow a 4.3\,MB
|
|
window, the largest of any VPN. At first glance this suggests a
|
|
clean inverse correlation between retransmits and congestion window
|
|
size, but the picture is misleading. Yggdrasil's outsized window is
|
|
largely an artifact of its jumbo overlay MTU (32\,731 bytes): each
|
|
segment carries far more data, so the window in bytes is inflated
|
|
relative to VPNs using a standard ${\sim}$1\,400-byte MTU. Comparing
|
|
congestion windows across different MTU sizes is not meaningful
|
|
without normalizing for segment size. What \emph{is} clear is that
|
|
high retransmit rates force TCP to spend more time in congestion
|
|
recovery than in steady-state transmission, capping throughput
|
|
regardless of available bandwidth. ZeroTier illustrates the
|
|
opposite extreme: brute-force retransmission can still yield high
|
|
throughput (814\,Mbps with 1\,163 retransmits), at the cost of wasted
|
|
bandwidth and unstable flow behavior.
|
|
|
|
VpnCloud stands out: its sender reports 538.8\,Mbps
|
|
but the receiver measures only 413.4\,Mbps, leaving a 23\,\% gap (the largest
|
|
in the dataset). This suggests significant in-tunnel packet loss or
|
|
buffering at the VpnCloud layer that the retransmit count (857)
|
|
alone does not fully explain.
|
|
|
|
Run-to-run variability also differs substantially. WireGuard ranges
|
|
from 824 to 884\,Mbps (a 60\,Mbps window), while Mycelium ranges
|
|
from 122 to 379\,Mbps, a 3:1 ratio between worst and best runs. A
|
|
VPN with wide variance is harder to capacity-plan around than one
|
|
with consistent performance, even if the average is lower.
|
|
|
|
\begin{figure}[H]
|
|
\centering
|
|
\begin{subfigure}[t]{\textwidth}
|
|
\centering
|
|
\includegraphics[width=\textwidth]{Figures/baseline/retransmits-vs-throughput.png}
|
|
\caption{Retransmits vs.\ throughput}
|
|
\label{fig:retransmit_throughput}
|
|
\end{subfigure}
|
|
|
|
\vspace{1em}
|
|
|
|
\begin{subfigure}[t]{\textwidth}
|
|
\centering
|
|
\includegraphics[width=\textwidth]{Figures/baseline/retransmits-vs-max-congestion-window.png}
|
|
\caption{Retransmits vs.\ max congestion window}
|
|
\label{fig:retransmit_cwnd}
|
|
\end{subfigure}
|
|
\caption{Retransmit correlations (log scale on x-axis). High
|
|
retransmits do not always mean low throughput (ZeroTier: 1\,163
|
|
retransmits, 814\,Mbps), but extreme retransmits do (Hyprspace:
|
|
4\,965 retransmits, 368\,Mbps). The apparent inverse correlation
|
|
between retransmits and congestion window size is dominated by
|
|
Yggdrasil's outlier (4.3\,MB \texttt{cwnd}), which is inflated
|
|
by its 32\,KB jumbo overlay MTU rather than by low retransmits
|
|
alone.}
|
|
\label{fig:retransmit_correlations}
|
|
\end{figure}
|
|
|
|
\subsection{Latency}
|
|
|
|
Sorting by latency rearranges the rankings considerably.
|
|
Table~\ref{tab:latency_baseline} lists the average ping round-trip
|
|
times, which cluster into three distinct ranges.
|
|
|
|
\begin{table}[H]
|
|
\centering
|
|
\caption{Average ping RTT at baseline, sorted by latency}
|
|
\label{tab:latency_baseline}
|
|
\begin{tabular}{lr}
|
|
\hline
|
|
\textbf{VPN} & \textbf{Avg RTT (ms)} \\
|
|
\hline
|
|
Internal & 0.60 \\
|
|
VpnCloud & 1.13 \\
|
|
Tinc & 1.19 \\
|
|
WireGuard & 1.20 \\
|
|
Nebula & 1.25 \\
|
|
ZeroTier & 1.28 \\
|
|
EasyTier & 1.33 \\
|
|
\hline
|
|
Headscale & 1.64 \\
|
|
Hyprspace & 1.79 \\
|
|
Yggdrasil & 2.20 \\
|
|
\hline
|
|
Mycelium & 34.9 \\
|
|
\hline
|
|
\end{tabular}
|
|
\end{table}
|
|
|
|
Six VPNs stay below 1.3\,ms, comfortably close to the bare-metal
|
|
0.60\,ms. VpnCloud posts the lowest latency of any VPN (1.13\,ms), below
|
|
WireGuard (1.20\,ms), yet its throughput tops out at only 539\,Mbps.
|
|
Low per-packet latency does not guarantee high bulk throughput. A
|
|
second group (Headscale,
|
|
Hyprspace, Yggdrasil) lands in the 1.5--2.2\,ms range, representing
|
|
moderate overhead. Then there is Mycelium at 34.9\,ms, so far
|
|
removed from the rest that Section~\ref{sec:mycelium_routing} gives
|
|
it a dedicated analysis.
|
|
|
|
ZeroTier's average of 1.28\,ms looks unremarkable, but its maximum
|
|
RTT spikes to 8.6\,ms, a 6.8$\times$ jump and the largest for any
|
|
sub-2\,ms VPN. These spikes point to periodic control-plane
|
|
interference that the average hides.
|
|
|
|
\begin{figure}[H]
|
|
\centering
|
|
\includegraphics[width=\textwidth]{{Figures/baseline/ping/Average RTT}.png}
|
|
\caption{Average ping RTT at baseline. Mycelium (34.9\,ms) is a
|
|
massive outlier at 58$\times$ the internal baseline. VpnCloud is
|
|
the fastest VPN at 1.13\,ms, slightly below WireGuard (1.20\,ms).}
|
|
\label{fig:ping_rtt}
|
|
\end{figure}
|
|
|
|
Tinc presents a paradox: it has the third-lowest latency (1.19\,ms)
|
|
but only the second-lowest throughput (336\,Mbps). Packets traverse
|
|
the tunnel quickly, yet single-threaded userspace processing cannot
|
|
keep up with the link speed. The qperf benchmark backs this up: Tinc
|
|
maxes out at
|
|
14.9\,\% CPU while delivering just 336\,Mbps, a clear sign that
|
|
the CPU, not the network, is the bottleneck.
|
|
Figure~\ref{fig:latency_throughput} makes this disconnect easy to
|
|
spot.
|
|
|
|
The qperf measurements also reveal a wide spread in CPU usage.
|
|
Hyprspace (55.1\,\%) and Yggdrasil
|
|
(52.8\,\%) consume 5--6$\times$ as much CPU as Internal's
|
|
9.7\,\%. WireGuard sits at 30.8\,\%, surprisingly high for a
|
|
kernel-level implementation, though much of that goes to
|
|
cryptographic processing. On the efficient end, VpnCloud
|
|
(14.9\,\%), Tinc (14.9\,\%), and EasyTier (15.4\,\%) do the most
|
|
with the least CPU time. Nebula and Headscale are missing from
|
|
this comparison because qperf failed for both.
|
|
|
|
%TODO: Explain why they consistently failed
|
|
|
|
\begin{figure}[H]
|
|
\centering
|
|
\includegraphics[width=\textwidth]{Figures/baseline/latency-vs-throughput.png}
|
|
\caption{Latency vs.\ throughput at baseline. Each point represents
|
|
one VPN. The quadrants reveal different bottleneck types:
|
|
VpnCloud (low latency, moderate throughput), Tinc (low latency,
|
|
low throughput, CPU-bound), Mycelium (high latency, low
|
|
throughput, overlay routing overhead).}
|
|
\label{fig:latency_throughput}
|
|
\end{figure}
|
|
|
|
\subsection{Parallel TCP Scaling}
|
|
|
|
The single-stream benchmark tests one link direction at a time. The
|
|
parallel benchmark changes this setup: all three link directions
|
|
(lom$\rightarrow$yuki, yuki$\rightarrow$luna,
|
|
luna$\rightarrow$lom) run simultaneously in a circular pattern for
|
|
60~seconds, each carrying one bidirectional TCP stream (six
|
|
unidirectional flows in total). Because three independent
|
|
link pairs now compete for shared tunnel resources at once, the
|
|
aggregate throughput is naturally higher than any single direction
|
|
alone, which is why even Internal reaches 1.50$\times$ its
|
|
single-stream figure. The scaling factor (parallel throughput
|
|
divided by single-stream throughput) captures two effects:
|
|
the benefit of using multiple link pairs in parallel, and how
|
|
well the VPN handles the resulting contention.
|
|
Table~\ref{tab:parallel_scaling} lists the results.
|
|
|
|
\begin{table}[H]
|
|
\centering
|
|
\caption{Parallel TCP scaling at baseline. Scaling factor is the
|
|
ratio of parallel to single-stream throughput. Internal's
|
|
1.50$\times$ represents the expected scaling on this hardware.}
|
|
\label{tab:parallel_scaling}
|
|
\begin{tabular}{lrrr}
|
|
\hline
|
|
\textbf{VPN} & \textbf{Single (Mbps)} &
|
|
\textbf{Parallel (Mbps)} & \textbf{Scaling} \\
|
|
\hline
|
|
Mycelium & 259 & 569 & 2.20$\times$ \\
|
|
Hyprspace & 368 & 803 & 2.18$\times$ \\
|
|
Tinc & 336 & 563 & 1.68$\times$ \\
|
|
Yggdrasil & 795 & 1265 & 1.59$\times$ \\
|
|
Headscale & 800 & 1228 & 1.54$\times$ \\
|
|
Internal & 934 & 1398 & 1.50$\times$ \\
|
|
ZeroTier & 814 & 1206 & 1.48$\times$ \\
|
|
WireGuard & 864 & 1281 & 1.48$\times$ \\
|
|
EasyTier & 636 & 927 & 1.46$\times$ \\
|
|
VpnCloud & 539 & 763 & 1.42$\times$ \\
|
|
Nebula & 706 & 648 & 0.92$\times$ \\
|
|
\hline
|
|
\end{tabular}
|
|
\end{table}
|
|
|
|
The VPNs that gain the most are those most constrained in
|
|
single-stream mode. Mycelium's 34.9\,ms RTT means a lone TCP stream
|
|
can never fill the pipe: the bandwidth-delay product demands a window
|
|
larger than any single flow maintains, so multiple concurrent flows
|
|
compensate for that constraint and push throughput to 2.20$\times$
|
|
the single-stream figure. Hyprspace scales almost as well
|
|
(2.18$\times$) but for a
|
|
different reason: multiple streams work around the buffer bloat that
|
|
cripples any individual flow
|
|
(Section~\ref{sec:hyprspace_bloat}). Tinc picks up a
|
|
1.68$\times$ boost because several streams can collectively keep its
|
|
single-threaded CPU busy during what would otherwise be idle gaps in
|
|
a single flow.
|
|
|
|
WireGuard and Internal both scale cleanly at around
|
|
1.48--1.50$\times$ with zero retransmits, suggesting that
|
|
WireGuard's overhead is a fixed per-packet cost that does not worsen
|
|
under multiplexing.
|
|
|
|
Nebula is the only VPN that actually gets \emph{slower} with more
|
|
streams: throughput drops from 706\,Mbps to 648\,Mbps
|
|
(0.92$\times$) while retransmits jump from 955 to 2\,462. The ten
|
|
streams are clearly fighting each other for resources inside the
|
|
tunnel.
|
|
|
|
More streams also amplify existing retransmit problems. Hyprspace
|
|
climbs from 4\,965 to 17\,426~retransmits;
|
|
VpnCloud from 857 to 6\,023. VPNs that were clean in single-stream
|
|
mode stay clean under load, while the stressed ones only get worse.
|
|
|
|
\begin{figure}[H]
|
|
\centering
|
|
\begin{subfigure}[t]{\textwidth}
|
|
\centering
|
|
\includegraphics[width=\textwidth]{Figures/baseline/single-stream-vs-parallel-tcp-throughput.png}
|
|
\caption{Single-stream vs.\ parallel throughput}
|
|
\label{fig:single_vs_parallel}
|
|
\end{subfigure}
|
|
|
|
\vspace{1em}
|
|
|
|
\begin{subfigure}[t]{\textwidth}
|
|
\centering
|
|
\includegraphics[width=\textwidth]{Figures/baseline/parallel-tcp-scaling-factor.png}
|
|
\caption{Parallel TCP scaling factor}
|
|
\label{fig:scaling_factor}
|
|
\end{subfigure}
|
|
\caption{Parallel TCP scaling at baseline. Nebula is the only VPN
|
|
where parallel throughput is lower than single-stream
|
|
(0.92$\times$). Mycelium and Hyprspace benefit most from
|
|
parallelism ($>$2$\times$), compensating for latency and buffer
|
|
bloat respectively. The dashed line at 1.0$\times$ marks the
|
|
break-even point.}
|
|
\label{fig:parallel_tcp}
|
|
\end{figure}
|
|
|
|
\subsection{UDP Stress Test}
|
|
|
|
The UDP iPerf3 test uses unlimited sender rate (\texttt{-b 0}),
|
|
which is a deliberate overload test rather than a realistic workload.
|
|
The sender throughput values are artifacts: they reflect how fast the
|
|
sender can write to the socket, not how fast data traverses the
|
|
tunnel. Yggdrasil, for example, reports 63,744\,Mbps sender
|
|
throughput because it uses a 32,731-byte block size (a jumbo-frame
|
|
overlay MTU), inflating the apparent rate per \texttt{send()} system
|
|
call. Only the receiver throughput is meaningful.
|
|
|
|
\begin{table}[H]
|
|
\centering
|
|
\caption{UDP receiver throughput and packet loss at baseline
|
|
(\texttt{-b 0} stress test). Hyprspace and Mycelium timed out
|
|
at 120 seconds and are excluded.}
|
|
\label{tab:udp_baseline}
|
|
\begin{tabular}{lrr}
|
|
\hline
|
|
\textbf{VPN} & \textbf{Receiver (Mbps)} &
|
|
\textbf{Loss (\%)} \\
|
|
\hline
|
|
Internal & 952 & 0.0 \\
|
|
WireGuard & 898 & 0.0 \\
|
|
Nebula & 890 & 76.2 \\
|
|
Headscale & 876 & 69.8 \\
|
|
EasyTier & 865 & 78.3 \\
|
|
Yggdrasil & 852 & 98.7 \\
|
|
ZeroTier & 851 & 89.5 \\
|
|
VpnCloud & 773 & 83.7 \\
|
|
Tinc & 471 & 89.9 \\
|
|
\hline
|
|
\end{tabular}
|
|
\end{table}
|
|
|
|
%TODO: Explain that the UDP test also crashes often,
|
|
% which makes the test somewhat unreliable
|
|
% but a good indicator if the network traffic is "different" then
|
|
% the programmer expected
|
|
|
|
Only Internal and WireGuard achieve 0\,\% packet loss. Both operate at
|
|
the kernel level with proper backpressure that matches sender to
|
|
receiver rate. Every userspace VPN shows massive loss (69--99\%)
|
|
because the sender overwhelms the tunnel's processing capacity.
|
|
Yggdrasil's 98.7\% loss is the most extreme: it sends the most data
|
|
(due to its large block size) but loses almost all of it. These loss
|
|
rates do not reflect real-world UDP behavior but reveal which VPNs
|
|
implement effective flow control. Hyprspace and Mycelium could not
|
|
complete the UDP test at all, timing out after 120 seconds.
|
|
|
|
The \texttt{blksize\_bytes} field reveals each VPN's effective path
|
|
MTU: Yggdrasil at 32,731 bytes (jumbo overlay), ZeroTier at 2728,
|
|
Internal at 1448, VpnCloud at 1375, WireGuard at 1368, Tinc at 1353,
|
|
EasyTier at 1288, Nebula at 1228, and Headscale at 1208 (the
|
|
smallest). These differences affect fragmentation behavior under real
|
|
workloads, particularly for protocols that send large datagrams.
|
|
|
|
%TODO: Mention QUIC
|
|
%TODO: Mention again that the "default" settings of every VPN have been used
|
|
% to better reflect real world use, as most users probably won't
|
|
% change these defaults
|
|
% and explain that good defaults are as much a part of good software as
|
|
% having the features but they are hard to configure correctly
|
|
|
|
\begin{figure}[H]
|
|
\centering
|
|
\begin{subfigure}[t]{\textwidth}
|
|
\centering
|
|
\includegraphics[width=\textwidth]{{Figures/baseline/udp/UDP
|
|
Throughput}.png}
|
|
\caption{UDP receiver throughput}
|
|
\label{fig:udp_throughput}
|
|
\end{subfigure}
|
|
|
|
\vspace{1em}
|
|
|
|
\begin{subfigure}[t]{\textwidth}
|
|
\centering
|
|
\includegraphics[width=\textwidth]{{Figures/baseline/udp/UDP
|
|
Packet Loss}.png}
|
|
\caption{UDP packet loss}
|
|
\label{fig:udp_loss}
|
|
\end{subfigure}
|
|
\caption{UDP stress test results at baseline (\texttt{-b 0},
|
|
unlimited sender rate). Internal and WireGuard are the only
|
|
implementations with 0\% loss. Hyprspace and Mycelium are
|
|
excluded due to 120-second timeouts.}
|
|
\label{fig:udp_results}
|
|
\end{figure}
|
|
|
|
% TODO: Compare parallel TCP retransmit rate
|
|
% with single TCP retransmit rate and see what changed
|
|
|
|
\subsection{Real-World Workloads}
|
|
|
|
Saturating a link with iPerf3 measures peak capacity, but not how a
|
|
VPN performs under realistic traffic. This subsection switches to
|
|
application-level workloads: downloading packages from a Nix binary
|
|
cache and streaming video over RIST. Both interact with the VPN
|
|
tunnel the way real software does, through many short-lived
|
|
connections, TLS handshakes, and latency-sensitive UDP packets.
|
|
|
|
\paragraph{Nix Binary Cache Downloads.}
|
|
|
|
This test downloads a fixed set of Nix packages through each VPN and
|
|
measures the total transfer time. The results
|
|
(Table~\ref{tab:nix_cache}) compress the throughput hierarchy
|
|
considerably: even Hyprspace, the worst performer, finishes in
|
|
11.92\,s, only 40\,\% slower than bare metal. Once connection
|
|
setup, TLS handshakes, and HTTP round-trips enter the picture,
|
|
throughput differences between 500 and 900\,Mbps matter far less
|
|
than per-connection latency.
|
|
|
|
\begin{table}[H]
|
|
\centering
|
|
\caption{Nix binary cache download time at baseline, sorted by
|
|
duration. Overhead is relative to the internal baseline (8.53\,s).}
|
|
\label{tab:nix_cache}
|
|
\begin{tabular}{lrr}
|
|
\hline
|
|
\textbf{VPN} & \textbf{Mean (s)} &
|
|
\textbf{Overhead (\%)} \\
|
|
\hline
|
|
Internal & 8.53 & -- \\
|
|
Nebula & 9.15 & +7.3 \\
|
|
ZeroTier & 9.22 & +8.1 \\
|
|
VpnCloud & 9.39 & +10.0 \\
|
|
EasyTier & 9.39 & +10.1 \\
|
|
WireGuard & 9.45 & +10.8 \\
|
|
Headscale & 9.79 & +14.8 \\
|
|
Tinc & 10.00 & +17.2 \\
|
|
Mycelium & 10.07 & +18.1 \\
|
|
Yggdrasil & 10.59 & +24.2 \\
|
|
Hyprspace & 11.92 & +39.7 \\
|
|
\hline
|
|
\end{tabular}
|
|
\end{table}
|
|
|
|
Several rankings invert relative to raw throughput. ZeroTier
|
|
finishes faster than WireGuard (9.22\,s vs.\ 9.45\,s) despite
|
|
6\,\% fewer raw Mbps and 1\,000$\times$ more retransmits. Yggdrasil
|
|
is the clearest example: it has the
|
|
fourth-highest VPN throughput at 795\,Mbps, yet lands at 24\,\% overhead
|
|
because its
|
|
2.2\,ms latency adds up over the many small sequential HTTP requests
|
|
that constitute a Nix cache download.
|
|
Figure~\ref{fig:throughput_vs_download} confirms this weak link
|
|
between raw throughput and real-world download speed.
|
|
|
|
\begin{figure}[H]
|
|
\centering
|
|
\begin{subfigure}[t]{\textwidth}
|
|
\centering
|
|
\includegraphics[width=\textwidth]{{Figures/baseline/Nix Cache
|
|
Mean Download Time}.png}
|
|
\caption{Nix cache download time per VPN}
|
|
\label{fig:nix_cache}
|
|
\end{subfigure}
|
|
|
|
\vspace{1em}
|
|
|
|
\begin{subfigure}[t]{\textwidth}
|
|
\centering
|
|
\includegraphics[width=\textwidth]{Figures/baseline/raw-throughput-vs-nix-cache-download-time.png}
|
|
\caption{Raw throughput vs.\ download time}
|
|
\label{fig:throughput_vs_download}
|
|
\end{subfigure}
|
|
\caption{Application-level download performance. The throughput
|
|
hierarchy compresses under real HTTP workloads: the worst VPN
|
|
(Hyprspace, 11.92\,s) is only 40\% slower than bare metal.
|
|
Throughput explains some variance but not all: Yggdrasil
|
|
(795\,Mbps, 10.59\,s) is slower than Nebula (706\,Mbps, 9.15\,s)
|
|
because latency matters more for HTTP workloads.}
|
|
\label{fig:nix_download}
|
|
\end{figure}
|
|
|
|
\paragraph{Video Streaming (RIST).}
|
|
|
|
At just 3.3\,Mbps, the RIST video stream sits comfortably within
|
|
every VPN's throughput budget. This test therefore measures
|
|
something different: how well the VPN handles real-time UDP packet
|
|
delivery under steady load. Nine of the eleven VPNs pass without
|
|
incident, delivering 100\,\% video quality. The 14--16 dropped
|
|
frames that appear uniformly across all VPNs, including Internal,
|
|
trace back to encoder warm-up rather than tunnel overhead.
|
|
|
|
Headscale is the exception. It averages just 13.1\,\% quality,
|
|
dropping 288~packets per test interval. The degradation is not
|
|
bursty but sustained: median quality sits at 10\,\%, and the
|
|
interquartile range of dropped packets spans a narrow 255--330 band.
|
|
The qperf benchmark independently corroborates this, having failed
|
|
outright for Headscale, confirming that something beyond bulk TCP is
|
|
broken.
|
|
|
|
What makes this failure unexpected is that Headscale builds on
|
|
WireGuard, which handles video flawlessly. TCP throughput places
|
|
Headscale squarely in Tier~1. Yet the RIST test runs over UDP, and
|
|
qperf probes latency-sensitive paths using both TCP and UDP. The
|
|
pattern points toward Headscale's DERP relay or NAT traversal layer
|
|
as the source. Its effective path MTU of 1\,208~bytes, the smallest
|
|
of any VPN, likely compounds the issue: RIST packets that exceed
|
|
this limit must be fragmented, and reassembling fragments under
|
|
sustained load produces exactly the kind of steady, uniform packet
|
|
drops the data shows. For video conferencing, VoIP, or any
|
|
real-time media workload, this is a disqualifying result regardless
|
|
of TCP throughput.
|
|
|
|
Hyprspace reveals a different failure mode. Its average quality
|
|
reads 100\,\%, but the raw numbers underneath are far from stable:
|
|
mean packet drops of 1\,194 and a maximum spike of 55\,500, with
|
|
the 25th, 50th, and 75th percentiles all at zero. Hyprspace
|
|
alternates between perfect delivery and catastrophic bursts.
|
|
RIST's forward error correction compensates for most of these
|
|
events, but the worst spikes are severe enough to overwhelm FEC
|
|
entirely.
|
|
|
|
\begin{figure}[H]
|
|
\centering
|
|
\includegraphics[width=\textwidth]{{Figures/baseline/Video
|
|
Streaming/RIST Quality}.png}
|
|
\caption{RIST video streaming quality at baseline. Headscale at
|
|
13.1\% average quality is the clear outlier. Every other VPN
|
|
achieves 99.8\% or higher. Nebula is at 99.8\% (minor
|
|
degradation). The video bitrate (3.3\,Mbps) is well within every
|
|
VPN's throughput capacity, so this test reveals real-time UDP
|
|
handling quality rather than bandwidth limits.}
|
|
\label{fig:rist_quality}
|
|
\end{figure}
|
|
|
|
\subsection{Operational Resilience}
|
|
|
|
Sustained-load performance does not predict recovery speed. How
|
|
quickly a tunnel comes up after a reboot, and how reliably it
|
|
reconverges, matters as much as peak throughput for operational use.
|
|
|
|
First-time connectivity spans a wide range. Headscale and WireGuard
|
|
are ready in under 50\,ms, while ZeroTier (8--17\,s) and VpnCloud
|
|
(10--14\,s) spend seconds negotiating with their control planes
|
|
before passing traffic.
|
|
|
|
%TODO: Maybe we want to scrap first-time connectivity
|
|
|
|
Reboot reconnection rearranges the rankings. Hyprspace, the worst
|
|
performer under sustained TCP load, recovers in just 8.7~seconds on
|
|
average, faster than any other VPN. WireGuard and Nebula follow at
|
|
10.1\,s each. Nebula's consistency is striking: 10.06, 10.06,
|
|
10.07\,s across its three nodes, pointing to a hard-coded timer
|
|
rather than topology-dependent convergence.
|
|
Mycelium sits at the opposite end, needing 76.6~seconds and showing
|
|
the same suspiciously uniform pattern (75.7, 75.7, 78.3\,s),
|
|
suggesting a fixed protocol-level wait built into the overlay.
|
|
|
|
%TODO: Hard coded timer needs to be verified
|
|
|
|
Yggdrasil produces the most lopsided result in the dataset: its yuki
|
|
node is back in 7.1~seconds while lom and luna take 94.8 and
|
|
97.3~seconds respectively. The gap likely reflects the overlay's
|
|
spanning-tree rebuild: a node near the root of the tree reconverges
|
|
quickly, while one further out has to wait for the topology to
|
|
propagate.
|
|
|
|
%TODO: Needs clarifications what is a "spanning tree build"
|
|
|
|
\begin{figure}[H]
|
|
\centering
|
|
\begin{subfigure}[t]{\textwidth}
|
|
\centering
|
|
\includegraphics[width=\textwidth]{Figures/baseline/reboot-reconnection-time-per-vpn.png}
|
|
\caption{Average reconnection time per VPN}
|
|
\label{fig:reboot_bar}
|
|
\end{subfigure}
|
|
|
|
\vspace{1em}
|
|
|
|
\begin{subfigure}[t]{\textwidth}
|
|
\centering
|
|
\includegraphics[width=\textwidth]{Figures/baseline/reboot-reconnection-time-heatmap.png}
|
|
\caption{Per-node reconnection time heatmap}
|
|
\label{fig:reboot_heatmap}
|
|
\end{subfigure}
|
|
\caption{Reboot reconnection time at baseline. The heatmap reveals
|
|
Yggdrasil's extreme per-node asymmetry (7\,s for yuki vs.\
|
|
95--97\,s for lom/luna) and Mycelium's uniform slowness (75--78\,s
|
|
across all nodes). Hyprspace reconnects fastest (8.7\,s average)
|
|
despite its poor sustained-load performance.}
|
|
\label{fig:reboot_reconnection}
|
|
\end{figure}
|
|
|
|
\subsection{Pathological Cases}
|
|
\label{sec:pathological}
|
|
|
|
Three VPNs exhibit behaviors that the aggregate numbers alone cannot
|
|
explain. The following subsections piece together observations from
|
|
earlier benchmarks into per-VPN diagnoses.
|
|
|
|
\paragraph{Hyprspace: Buffer Bloat.}
|
|
\label{sec:hyprspace_bloat}
|
|
|
|
Hyprspace produces the most severe performance collapse in the
|
|
dataset. At idle, its ping latency is a modest 1.79\,ms.
|
|
Under TCP load, that number balloons to roughly 2\,800\,ms, a
|
|
1\,556$\times$ increase. This is not the network becoming
|
|
congested; it is the VPN tunnel itself filling up with buffered
|
|
packets and refusing to drain.
|
|
|
|
The consequences ripple through every TCP metric. With 4\,965
|
|
retransmits per 30-second test (one in every 200~segments), TCP
|
|
spends most of its time in congestion recovery rather than
|
|
steady-state transfer, shrinking the average congestion window to
|
|
205\,KB, the smallest in the dataset. Under parallel load the
|
|
situation worsens: retransmits climb to 17\,426. The buffering even
|
|
inverts iPerf3's measurements: the receiver reports 419.8\,Mbps
|
|
while the sender sees only 367.9\,Mbps, because massive ACK delays
|
|
cause the sender-side timer to undercount the actual data rate. The
|
|
UDP test never finished at all, timing out at 120~seconds.
|
|
|
|
% Should we always use percentages for retransmits?
|
|
|
|
What prevents Hyprspace from being entirely unusable is everything
|
|
\emph{except} sustained load. It has the fastest reboot
|
|
reconnection in the dataset (8.7\,s) and delivers 100\,\% video
|
|
quality outside of its burst events. The pathology is narrow but
|
|
severe: any continuous data stream saturates the tunnel's internal
|
|
buffers.
|
|
|
|
\paragraph{Mycelium: Routing Anomaly.}
|
|
\label{sec:mycelium_routing}
|
|
|
|
Mycelium's 34.9\,ms average latency appears to be the cost of
|
|
routing through a global overlay. The per-path numbers, however,
|
|
reveal a bimodal distribution:
|
|
|
|
\begin{itemize}
|
|
\bitem{luna$\rightarrow$lom:} 1.63\,ms (direct path, comparable
|
|
to Headscale at 1.64\,ms)
|
|
\bitem{lom$\rightarrow$yuki:} 51.47\,ms (overlay-routed)
|
|
\bitem{yuki$\rightarrow$luna:} 51.60\,ms (overlay-routed)
|
|
\end{itemize}
|
|
|
|
One of the three links has found a direct route; the other two still
|
|
bounce through the overlay. All three machines sit on the same
|
|
physical network, so Mycelium's path discovery is failing
|
|
intermittently, a more specific problem than blanket overlay
|
|
overhead. Throughput mirrors the split:
|
|
yuki$\rightarrow$luna reaches 379\,Mbps while
|
|
luna$\rightarrow$lom manages only 122\,Mbps, a 3:1 gap. In
|
|
bidirectional mode, the reverse direction on that worst link drops
|
|
to 58.4\,Mbps, the lowest single-direction figure in the entire
|
|
dataset.
|
|
|
|
\begin{figure}[H]
|
|
\centering
|
|
\includegraphics[width=\textwidth]{{Figures/baseline/tcp/Mycelium/Average
|
|
Throughput}.png}
|
|
\caption{Per-link TCP throughput for Mycelium, showing extreme
|
|
path asymmetry caused by inconsistent direct route discovery.
|
|
The 3:1 ratio between best (yuki$\rightarrow$luna, 379\,Mbps)
|
|
and worst (luna$\rightarrow$lom, 122\,Mbps) links reflects
|
|
different overlay routing paths.}
|
|
\label{fig:mycelium_paths}
|
|
\end{figure}
|
|
|
|
The overlay penalty shows up most clearly at connection setup.
|
|
Mycelium's average time-to-first-byte is 93.7\,ms (vs.\ Internal's
|
|
16.8\,ms, a 5.6$\times$ overhead), and connection establishment
|
|
alone costs 47.3\,ms (3$\times$ overhead). Every new connection
|
|
incurs that overhead, so workloads dominated by
|
|
short-lived connections accumulate it rapidly. Bulk downloads, by
|
|
contrast, amortize it: the Nix cache test finishes only 18\,\%
|
|
slower than Internal (10.07\,s vs.\ 8.53\,s) because once the
|
|
transfer phase begins, per-connection latency fades into the
|
|
background.
|
|
|
|
Mycelium is also the slowest VPN to recover from a reboot:
|
|
76.6~seconds on average, and almost suspiciously uniform across
|
|
nodes (75.7, 75.7, 78.3\,s). That kind of consistency points to a
|
|
hard-coded convergence timer in the overlay protocol rather than
|
|
anything topology-dependent. The UDP test timed out at
|
|
120~seconds, and even first-time connectivity required a
|
|
70-second wait at startup.
|
|
|
|
% Explain what topology-dependent means in this case.
|
|
|
|
\paragraph{Tinc: Userspace Processing Bottleneck.}
|
|
|
|
Tinc is a clear case of a CPU bottleneck masquerading as a network
|
|
problem. At 1.19\,ms latency, packets get through the
|
|
tunnel quickly. Yet throughput tops out at 336\,Mbps, barely a
|
|
third of the bare-metal link. The usual suspects do not apply:
|
|
Tinc's path MTU is a healthy 1\,500~bytes
|
|
(\texttt{blksize\_bytes} of 1\,353 from UDP iPerf3, comparable to
|
|
VpnCloud at 1\,375 and WireGuard at 1\,368), and its retransmit
|
|
count (240) is moderate. What limits Tinc is its single-threaded
|
|
userspace architecture: one CPU core simply cannot encrypt, copy,
|
|
and forward packets fast enough to fill the pipe.
|
|
|
|
The parallel benchmark confirms this diagnosis. Tinc scales to
|
|
563\,Mbps (1.68$\times$), beating Internal's 1.50$\times$ ratio.
|
|
Multiple TCP streams collectively keep that single core busy during
|
|
what would otherwise be idle gaps in any individual flow, squeezing
|
|
out throughput that no single stream could reach alone.
|
|
|
|
\section{Impact of Network Impairment}
|
|
\label{sec:impairment}
|
|
|
|
The impairment profiles from Table~\ref{tab:impairment_profiles} are
|
|
applied to the full benchmark suite. Baseline results from
|
|
Section~\ref{sec:baseline} serve as the reference.
|
|
|
|
\subsection{Ping}
|
|
|
|
Table~\ref{tab:ping_impairment} lists average round-trip times across
|
|
all four profiles. Most VPNs track the expected increase closely:
|
|
tc~netem adds roughly 4\,ms, 8\,ms, and 15\,ms of round-trip delay
|
|
at Low, Medium, and High respectively, and Internal's measured values
|
|
(4.82, 9.38, 15.49\,ms) confirm this.
|
|
|
|
\begin{table}[H]
|
|
\centering
|
|
\caption{Average ping RTT (ms) across impairment profiles, sorted
|
|
by High-profile RTT}
|
|
\label{tab:ping_impairment}
|
|
\begin{tabular}{lrrrr}
|
|
\hline
|
|
\textbf{VPN} & \textbf{Baseline} & \textbf{Low} &
|
|
\textbf{Medium} & \textbf{High} \\
|
|
\hline
|
|
Internal & 0.60 & 4.82 & 9.38 & 15.49 \\
|
|
Tinc & 1.19 & 5.32 & 9.85 & 15.92 \\
|
|
Nebula & 1.25 & 5.38 & 9.99 & 15.96 \\
|
|
WireGuard & 1.20 & 5.36 & 9.88 & 15.99 \\
|
|
Headscale & 1.64 & 5.82 & 10.39 & 16.07 \\
|
|
VpnCloud & 1.13 & 5.41 & 10.35 & 16.21 \\
|
|
ZeroTier & 1.28 & 5.34 & 10.02 & 16.54 \\
|
|
Yggdrasil & 2.20 & 6.73 & 11.99 & 20.20 \\
|
|
Hyprspace & 1.79 & 6.15 & 10.76 & 24.49 \\
|
|
EasyTier & 1.33 & 6.27 & 14.13 & 26.60 \\
|
|
Mycelium & 34.90 & 23.42 & 43.88 & 33.05 \\
|
|
\hline
|
|
\end{tabular}
|
|
\end{table}
|
|
|
|
% PLOT: line chart
|
|
% File: Figures/impairment/Ping Average RTT Heatmap.png
|
|
% Data: Average ping RTT for all 11 VPNs at baseline, low, medium, high
|
|
% Show: Most VPNs in a tight parallel band; Mycelium's non-monotonic curve;
|
|
% EasyTier and Hyprspace diverging upward at high impairment
|
|
|
|
Mycelium defies the pattern. Its RTT \emph{drops} from 34.9\,ms at
|
|
baseline to 23.4\,ms at Low impairment, a 33\% improvement where
|
|
every other VPN gets slower. It then rises to 43.9\,ms at Medium
|
|
before falling again to 33.0\,ms at High. The baseline analysis
|
|
(Section~\ref{sec:mycelium_routing}) showed that Mycelium's latency
|
|
comes from a bimodal routing distribution: one path runs at 1.63\,ms
|
|
while two others route through the global overlay at
|
|
${\sim}$51\,ms. The impairment appears to push Mycelium's path
|
|
discovery toward shorter routes, so a larger share of traffic takes
|
|
the direct path. The non-monotonic pattern is consistent with a path
|
|
selection algorithm that responds to measured link quality, but not
|
|
linearly with degradation severity.
|
|
|
|
Mycelium also achieves 0\% ping packet loss at Low and Medium
|
|
impairment, while most VPNs show 0.1--3.2\% loss at those profiles.
|
|
At High impairment, Mycelium's loss jumps to 11.1\%.
|
|
|
|
EasyTier accumulates 11\,ms of excess latency at High impairment
|
|
beyond what tc~netem accounts for. Its average RTT of 26.6\,ms and
|
|
maximum of 290\,ms (vs.\ ${\sim}$40\,ms for WireGuard) point to a
|
|
userspace scheduling or retry mechanism that introduces escalating
|
|
variance. EasyTier's RTT standard deviation reaches 44.6\,ms at
|
|
High, the worst jitter of any VPN.
|
|
|
|
Hyprspace shows 11.1\% ping packet loss at every impairment level ---
|
|
Low, Medium, and High alike. With 9~measurement runs (3~machine
|
|
pairs $\times$ 3~runs of 100~packets), 11.1\% equals exactly 1/9:
|
|
one run per profile fails completely while the other eight report zero
|
|
loss. This binary pass/fail behavior is consistent with the buffer
|
|
bloat diagnosis from Section~\ref{sec:hyprspace_bloat}. When buffers
|
|
fill, an entire path stalls rather than degrading gradually.
|
|
|
|
\subsection{TCP Throughput}
|
|
|
|
Table~\ref{tab:tcp_impairment} presents single-stream TCP throughput
|
|
across all four profiles. The baseline performance tiers from
|
|
Section~\ref{sec:baseline} dissolve almost immediately under
|
|
impairment.
|
|
|
|
\begin{table}[H]
|
|
\centering
|
|
\caption{Single-stream TCP throughput (Mbps) across impairment
|
|
profiles, sorted by baseline. Retention is the
|
|
Low-to-baseline ratio.}
|
|
\label{tab:tcp_impairment}
|
|
\begin{tabular}{lrrrrr}
|
|
\hline
|
|
\textbf{VPN} & \textbf{Baseline} & \textbf{Low} &
|
|
\textbf{Medium} & \textbf{High} & \textbf{Retention} \\
|
|
\hline
|
|
Internal & 934 & 333 & 29.6 & 4.25 & 35.7\% \\
|
|
WireGuard & 864 & 54.7 & 8.77 & 2.63 & 6.3\% \\
|
|
ZeroTier & 814 & 63.7 & 12.0 & 4.01 & 7.8\% \\
|
|
Headscale & 800 & 274 & 41.5 & 4.21 & 34.3\% \\
|
|
Yggdrasil & 795 & 13.2 & 6.08 & 3.40 & 1.7\% \\
|
|
\hline
|
|
Nebula & 706 & 49.8 & 7.82 & 2.60 & 7.1\% \\
|
|
EasyTier & 636 & 156 & 17.4 & 3.59 & 24.6\% \\
|
|
VpnCloud & 539 & 58.2 & 8.33 & 1.86 & 10.8\% \\
|
|
\hline
|
|
Hyprspace & 368 & 4.42 & 2.05 & 1.39 & 1.2\% \\
|
|
Tinc & 336 & 54.4 & 5.53 & 2.77 & 16.2\% \\
|
|
Mycelium & 259 & 16.2 & 3.87 & 2.73 & 6.3\% \\
|
|
\hline
|
|
\end{tabular}
|
|
\end{table}
|
|
|
|
% PLOT: line chart
|
|
% File: Figures/impairment/TCP Throughput Heatmap.png
|
|
% Data: Single-stream TCP throughput for all 11 VPNs at baseline,
|
|
% low, medium, high
|
|
% Show: Headscale crossing above Internal at medium impairment;
|
|
% Yggdrasil's cliff from baseline to low; convergence of all
|
|
% VPNs at high impairment
|
|
|
|
Yggdrasil crashes from 795\,Mbps to 13.2\,Mbps at Low impairment, a
|
|
98.3\% throughput loss from adding just 2\,ms latency, 2\,ms jitter,
|
|
0.25\% packet loss, and 0.5\% reordering per machine. Even Mycelium,
|
|
the slowest VPN at baseline (259\,Mbps), retains more throughput at
|
|
Low than Yggdrasil does. The jumbo overlay MTU of 32\,731~bytes,
|
|
which inflated baseline metrics
|
|
(Section~\ref{sec:baseline}), becomes a liability under impairment:
|
|
each lost or reordered outer packet triggers retransmission of
|
|
${\sim}$24$\times$ more inner-layer data than a standard
|
|
1\,400-byte MTU VPN would lose.
|
|
|
|
Headscale retains 34.3\% of its baseline throughput at Low, nearly
|
|
matching Internal's 35.7\%. At Medium impairment, Headscale
|
|
(41.5\,Mbps) overtakes Internal (29.6\,Mbps) --- a VPN outperforming
|
|
the bare-metal baseline.
|
|
Section~\ref{sec:tailscale_degraded} investigates this anomaly in
|
|
detail.
|
|
|
|
At High impairment, the throughput range compresses from 675\,Mbps at
|
|
baseline to just 2.9\,Mbps. Internal leads at 4.25\,Mbps; Hyprspace
|
|
trails at 1.39\,Mbps. The impairment profile itself becomes the
|
|
bottleneck. With 2.5\% packet loss and 5\% reordering per machine,
|
|
every implementation is TCP-loss-limited, and architectural
|
|
differences that matter at gigabit speeds become irrelevant.
|
|
|
|
\subsection{UDP Throughput}
|
|
|
|
The UDP stress test (\texttt{-b~0}) suffers from widespread failures
|
|
under impairment. Hyprspace and Mycelium, which already failed at
|
|
baseline, continue to fail at all profiles. Tinc and ZeroTier fail
|
|
at most non-baseline profiles. The sparse dataset limits
|
|
conclusions, but one pattern stands out.
|
|
|
|
Kernel-level implementations maintain throughput regardless of
|
|
impairment. Internal holds ${\sim}$950\,Mbps across all profiles
|
|
where data exists. Headscale sustains 700--876\,Mbps and WireGuard
|
|
850--908\,Mbps; % TODO: verify WireGuard UDP range -- analysis doc says 850-898, possible digit transposition
|
|
both rely on WireGuard's in-kernel UDP handling with
|
|
proper backpressure. Userspace VPNs collapse: EasyTier drops from
|
|
865 to 435 to 38.5 to 6.1\,Mbps across successive profiles.
|
|
Yggdrasil, already pathological at baseline (98.7\% loss), crashes to
|
|
12.3\,Mbps at Low and fails entirely at Medium and High.
|
|
|
|
% PLOT: heatmap
|
|
% File: Figures/impairment/UDP Receiver Throughput Heatmap.png
|
|
% Data: UDP receiver throughput for all 11 VPNs at baseline, low,
|
|
% medium, high (grey/hatched cells for failures)
|
|
% Show: Kernel-level VPNs (Internal, WireGuard, Headscale) maintaining
|
|
% high throughput across all profiles; userspace VPNs failing or
|
|
% collapsing; the large number of empty cells
|
|
|
|
|
|
The failure rate of this benchmark under impairment makes it more
|
|
useful as a robustness indicator than a throughput measurement. A VPN
|
|
that cannot complete a 30-second UDP flood under 0.25\% packet loss
|
|
has fundamental flow-control problems that will surface under real
|
|
workloads too, even if the symptoms are milder.
|
|
|
|
\subsection{Parallel TCP}
|
|
|
|
Table~\ref{tab:parallel_impairment} shows aggregate throughput across
|
|
three concurrent bidirectional links (six unidirectional flows). The
|
|
Headscale anomaly from the single-stream results is amplified here.
|
|
|
|
\begin{table}[H]
|
|
\centering
|
|
\caption{Parallel TCP throughput (Mbps) across impairment profiles.
|
|
Three concurrent bidirectional links produce six unidirectional
|
|
flows.}
|
|
\label{tab:parallel_impairment}
|
|
\begin{tabular}{lrrrr}
|
|
\hline
|
|
\textbf{VPN} & \textbf{Baseline} & \textbf{Low} &
|
|
\textbf{Medium} & \textbf{High} \\
|
|
\hline
|
|
Internal & 1398 & 277 & 82.6 & 10.4 \\
|
|
Headscale & 1228 & 718 & 113 & 20.0 \\
|
|
WireGuard & 1281 & 173 & 24.5 & 8.39 \\
|
|
Yggdrasil & 1265 & 38.7 & 16.7 & 8.95 \\
|
|
ZeroTier & 1206 & 176 & 35.4 & 7.97 \\
|
|
EasyTier & 927 & 473 & 57.4 & 10.7 \\
|
|
Hyprspace & 803 & 2.87 & 6.94 & 3.62 \\
|
|
VpnCloud & 763 & 174 & 23.7 & 8.25 \\
|
|
Nebula & 648 & 103 & 15.3 & 4.93 \\
|
|
Mycelium & 569 & 72.7 & 7.51 & 3.69 \\
|
|
Tinc & 563 & 168 & 23.7 & 8.25 \\
|
|
\hline
|
|
\end{tabular}
|
|
\end{table}
|
|
|
|
% PLOT: heatmap
|
|
% File: Figures/impairment/Parallel TCP Throughput Heatmap.png
|
|
% Data: Parallel TCP throughput for all 11 VPNs at baseline, low,
|
|
% medium, high
|
|
% Show: Headscale dominating at low impairment (718 Mbps vs Internal's
|
|
% 277); EasyTier as runner-up (473 Mbps); Hyprspace's collapse
|
|
% to 2.87 Mbps
|
|
|
|
|
|
Headscale at Low impairment: 718\,Mbps --- 2.6$\times$ Internal
|
|
(277\,Mbps) and 4.1$\times$ WireGuard (173\,Mbps). At Medium,
|
|
Headscale (113\,Mbps) still leads Internal (82.6\,Mbps) by 37\%.
|
|
The single-stream anomaly from
|
|
Section~\ref{sec:tailscale_degraded} compounds when multiple flows
|
|
each independently benefit from Headscale's congestion control
|
|
tuning.
|
|
|
|
EasyTier is the second-most resilient VPN under parallel load, at
|
|
473\,Mbps at Low (51\% of baseline). Both EasyTier and Headscale
|
|
retain more than half their baseline parallel throughput at Low
|
|
impairment; no other VPN exceeds 30\%.
|
|
|
|
Hyprspace collapses from 803\,Mbps to 2.87\,Mbps at Low, a 99.6\%
|
|
loss. The buffer bloat that plagues single-stream transfers
|
|
(Section~\ref{sec:hyprspace_bloat}) becomes catastrophic when six
|
|
concurrent flows compete for the same bloated buffers.
|
|
|
|
The High-profile convergence effect is even more pronounced here than
|
|
in single-stream mode. Tinc and VpnCloud land at identical
|
|
8.25\,Mbps despite differing by 200\,Mbps at baseline.
|
|
|
|
\subsection{QUIC Performance}
|
|
|
|
Headscale and Nebula failed the qperf QUIC benchmark at baseline
|
|
(Section~\ref{sec:baseline}) and continue to fail across all
|
|
impairment profiles.
|
|
|
|
Yggdrasil's QUIC bandwidth drops from 745\,Mbps at baseline to
|
|
7.67\,Mbps at Low, 3.45\,Mbps at Medium, and 2.17\,Mbps at High ---
|
|
the same cliff observed in its TCP results, driven by the same
|
|
jumbo-MTU amplification of outer-layer packet loss.
|
|
|
|
At High impairment, WireGuard (23.2\,Mbps), VpnCloud (23.4\,Mbps),
|
|
ZeroTier (23.0\,Mbps), and Tinc (23.4\,Mbps) converge to within
|
|
0.4\,Mbps of each other. At baseline these four span a 500\,Mbps
|
|
range. QUIC's own congestion control, operating atop the
|
|
already-degraded outer link, becomes the sole limiter.
|
|
|
|
% PLOT: heatmap
|
|
% File: Figures/impairment/QUIC Bandwidth Heatmap.png
|
|
% Data: QPerf QUIC bandwidth for VPNs with data at all four profiles
|
|
% (WireGuard, VpnCloud, ZeroTier, Tinc, Yggdrasil, Internal)
|
|
% Show: Yggdrasil's cliff from baseline to low; convergence of
|
|
% WireGuard, VpnCloud, ZeroTier, Tinc at high (~23 Mbps)
|
|
|
|
|
|
\subsection{Video Streaming}
|
|
|
|
Table~\ref{tab:rist_impairment} presents RIST video quality scores
|
|
across profiles. The actual encoding bitrate of ${\sim}$3.3\,Mbps
|
|
sits well within every VPN's throughput budget even at High
|
|
impairment, so quality differences reflect packet delivery reliability
|
|
rather than bandwidth limits.
|
|
|
|
\begin{table}[H]
|
|
\centering
|
|
\caption{RIST video streaming quality (\%) across impairment
|
|
profiles, sorted by High-profile quality}
|
|
\label{tab:rist_impairment}
|
|
\begin{tabular}{lrrrr}
|
|
\hline
|
|
\textbf{VPN} & \textbf{Baseline} & \textbf{Low} &
|
|
\textbf{Medium} & \textbf{High} \\
|
|
\hline
|
|
Mycelium & 100.0 & 100.0 & 100.0 & 99.9 \\
|
|
EasyTier & 100.0 & 100.0 & 96.2 & 85.5 \\
|
|
Internal & 100.0 & 99.2 & 89.3 & 80.2 \\
|
|
ZeroTier & 100.0 & 99.3 & 89.9 & 80.2 \\
|
|
VpnCloud & 100.0 & 99.2 & 89.7 & 80.1 \\
|
|
WireGuard & 100.0 & 99.3 & 90.0 & 80.0 \\
|
|
Hyprspace & 100.0 & 92.9 & 87.9 & 78.1 \\
|
|
Tinc & 100.0 & 99.3 & 90.0 & 77.8 \\
|
|
Nebula & 99.8 & 98.8 & 85.6 & 72.1 \\
|
|
Yggdrasil & 100.0 & 94.7 & 71.4 & 43.3 \\
|
|
Headscale & 13.1 & 13.0 & 13.0 & 13.0 \\
|
|
\hline
|
|
\end{tabular}
|
|
\end{table}
|
|
|
|
% PLOT: heatmap
|
|
% File: Figures/impairment/Video Streaming Quality Heatmap.png
|
|
% Data: RIST quality for all 11 VPNs at baseline, low, medium, high
|
|
% Show: Headscale stuck at 13% (red row); Mycelium stuck near 100%
|
|
% (green row); gradual degradation for the bulk; Yggdrasil's
|
|
% steep decline to 43%
|
|
|
|
|
|
Headscale stays at ${\sim}$13\% across all four profiles: 13.1\%,
|
|
13.0\%, 13.0\%, 13.0\%. The profile-independence confirms the
|
|
baseline diagnosis from Section~\ref{sec:baseline}. The failure is
|
|
structural --- likely MTU fragmentation in the DERP relay layer ---
|
|
and cannot worsen because it is already saturated. Adding latency or
|
|
loss on top of an 87\% packet drop floor changes nothing.
|
|
|
|
Mycelium delivers 99.9\% quality even at High impairment, better than
|
|
Internal (80.2\%) and every other VPN. At 3.3\,Mbps, even
|
|
Mycelium's degraded overlay paths can sustain the stream. Its
|
|
overlay retransmission mechanism, which cripples bulk TCP transfers,
|
|
works well for steady low-bandwidth UDP flows. RIST's own forward
|
|
error correction handles whatever Mycelium's retransmissions miss.
|
|
|
|
Yggdrasil degrades the most steeply: 100\% at baseline, 94.7\% at
|
|
Low, 71.4\% at Medium, 43.3\% at High. The jumbo MTU that hurt TCP
|
|
throughput also hurts here --- large overlay packets carrying RIST
|
|
data are more likely to be lost or reordered at the outer layer, and
|
|
RIST's FEC cannot recover from the resulting burst losses.
|
|
|
|
\subsection{Application-Level Download}
|
|
|
|
Table~\ref{tab:nix_impairment} shows Nix binary cache download times
|
|
across profiles. This HTTP-heavy workload, dominated by many
|
|
short-lived TCP connections, is more sensitive to per-connection
|
|
latency than to raw bandwidth.
|
|
|
|
\begin{table}[H]
|
|
\centering
|
|
\caption{Nix binary cache download time (seconds) across impairment
|
|
profiles, sorted by Low-profile time. ``--'' marks a failed
|
|
run.}
|
|
\label{tab:nix_impairment}
|
|
\begin{tabular}{lrrrr}
|
|
\hline
|
|
\textbf{VPN} & \textbf{Baseline} & \textbf{Low} &
|
|
\textbf{Medium} & \textbf{High} \\
|
|
\hline
|
|
Internal & 8.53 & 11.9 & 58.6 & -- \\
|
|
Headscale & 9.79 & 13.5 & 48.8 & 219 \\
|
|
EasyTier & 9.39 & 22.1 & 141 & -- \\
|
|
VpnCloud & 9.39 & 27.9 & 163 & -- \\
|
|
WireGuard & 9.45 & 28.8 & 161 & -- \\
|
|
Nebula & 9.15 & 30.8 & 180 & 547 \\
|
|
Tinc & 10.0 & 30.9 & 166 & 496 \\
|
|
ZeroTier & 9.22 & 36.2 & 141 & -- \\
|
|
Mycelium & 10.1 & 79.5 & -- & -- \\
|
|
Yggdrasil & 10.6 & 230 & -- & -- \\
|
|
Hyprspace & 11.9 & -- & 170 & -- \\
|
|
\hline
|
|
\end{tabular}
|
|
\end{table}
|
|
|
|
% PLOT: heatmap
|
|
% File: Figures/impairment/Nix Cache Download Time Heatmap.png
|
|
% Data: Nix cache download time for all VPNs at baseline, low, medium,
|
|
% high (hatched/absent bars for failures)
|
|
% Show: Headscale as the only VPN completing all four profiles;
|
|
% Headscale beating Internal at medium (48.8 vs 58.6 s);
|
|
% Yggdrasil's 22x slowdown at low impairment
|
|
|
|
|
|
Headscale is the only VPN to complete all four profiles. At Medium
|
|
impairment, it finishes in 48.8~seconds --- faster than Internal's
|
|
58.6~seconds. Internal itself fails at High impairment while
|
|
Headscale completes in 219~seconds. Only Nebula (547\,s) and Tinc
|
|
(496\,s) also survive High impairment.
|
|
|
|
Yggdrasil's download time explodes from 10.6\,s to 230\,s at Low
|
|
impairment, a 22$\times$ slowdown. Every HTTP request incurs the
|
|
latency penalty from Yggdrasil's impairment-amplified
|
|
retransmissions. Mycelium also degrades severely (10.1\,s to
|
|
79.5\,s, an 8$\times$ increase), consistent with its overlay routing
|
|
overhead, which compounds over hundreds of sequential HTTP
|
|
connections.
|
|
|
|
The failure map reveals a clean gradient: more demanding profiles
|
|
knock out more VPNs. At Low, 10 of 11 complete (Hyprspace fails).
|
|
At Medium, 9 complete. At High, only 3 survive (Headscale, Nebula,
|
|
Tinc). Internal's failure at High is the most surprising --- the
|
|
bare-metal baseline cannot sustain a multi-connection HTTP workload
|
|
under severe degradation, but Headscale, shielded by its userspace
|
|
TCP stack, can. Section~\ref{sec:tailscale_degraded} explains why.
|
|
|
|
\section{Tailscale Under Degraded Conditions}
|
|
\label{sec:tailscale_degraded}
|
|
|
|
\subsection{Observed Anomaly}
|
|
|
|
At Medium impairment, Headscale delivers 41.5\,Mbps single-stream TCP
|
|
throughput --- 40\% more than Internal's 29.6\,Mbps. A VPN built
|
|
atop WireGuard outperforms the bare-metal connection it tunnels
|
|
through. The anomaly is consistent across benchmarks:
|
|
Table~\ref{tab:headscale_anomaly} summarizes the comparison.
|
|
|
|
\begin{table}[H]
|
|
\centering
|
|
\caption{Headscale vs.\ Internal vs.\ WireGuard under impairment
|
|
(18.12.2025 run). For TCP benchmarks, higher is better. For
|
|
Nix cache, lower is better; ``--'' marks a failed run.}
|
|
\label{tab:headscale_anomaly}
|
|
\begin{tabular}{llrrr}
|
|
\hline
|
|
\textbf{Benchmark} & \textbf{Profile} & \textbf{Internal} &
|
|
\textbf{Headscale} & \textbf{WireGuard} \\
|
|
\hline
|
|
Single TCP (Mbps) & Low & 333 & 274 & 54.7 \\
|
|
Single TCP (Mbps) & Medium & 29.6 & 41.5 & 8.77 \\
|
|
Single TCP (Mbps) & High & 4.25 & 4.21 & 2.63 \\
|
|
Parallel TCP (Mbps) & Low & 277 & 718 & 173 \\
|
|
Parallel TCP (Mbps) & Medium & 82.6 & 113 & 24.5 \\
|
|
Nix cache (s) & Medium & 58.6 & 48.8 & 161 \\
|
|
Nix cache (s) & High & -- & 219 & -- \\
|
|
\hline
|
|
\end{tabular}
|
|
\end{table}
|
|
|
|
% TODO: Needs to be created, use the tools/ folder
|
|
% PLOT: line chart
|
|
% File: Figures/impairment/headscale-vs-internal-across-profiles.png
|
|
% Data: Single-stream TCP throughput for Internal, Headscale, and
|
|
% WireGuard across all four profiles
|
|
% Show: Headscale crossing above Internal at medium impairment;
|
|
% WireGuard far below both; convergence at high
|
|
% Y-axis: log scale
|
|
|
|
In parallel TCP at Low impairment, Headscale reaches 718\,Mbps vs.\
|
|
Internal's 277\,Mbps (2.6$\times$). The Nix cache download at
|
|
Medium takes Headscale 48.8\,s vs.\ Internal's 58.6\,s (17\%
|
|
faster). At High impairment, Internal fails the Nix cache entirely
|
|
while Headscale completes in 219\,s.
|
|
|
|
WireGuard, which shares Headscale's cryptographic layer, shows no
|
|
such advantage: 54.7\,Mbps at Low, 8.77\,Mbps at Medium. Whatever
|
|
protects Headscale is not the encryption or the tunnel --- it is
|
|
something in Tailscale's userspace networking stack.
|
|
|
|
The retransmit data provides the first clue. At Medium impairment,
|
|
Headscale's retransmit percentage is approximately 2.4\%, matching
|
|
Internal's ${\sim}$2.4\%. WireGuard's is 5.2\%. Headscale achieves
|
|
Internal's retransmit efficiency while delivering higher throughput
|
|
--- fewer spurious retransmissions leave more bandwidth for actual
|
|
data.
|
|
|
|
\subsection{Congestion Control Analysis}
|
|
|
|
Tailscale uses a userspace TCP/IP stack derived from Google's gVisor
|
|
(netstack). This stack does not inherit the host kernel's TCP
|
|
parameters. Three defaults differ from the Linux kernel in ways that
|
|
matter under packet reordering:
|
|
|
|
\begin{itemize}
|
|
\bitem{\texttt{tcp\_reordering}:} gVisor uses 10; the Linux kernel
|
|
defaults to~3. This parameter controls how many out-of-order
|
|
packets TCP tolerates before treating the event as a loss. With
|
|
tc~netem injecting 0.5--2.5\% reordering per machine, bursts of
|
|
3+ reordered packets are frequent. The kernel's threshold of~3
|
|
causes spurious fast retransmits and congestion window reductions
|
|
for packets that are merely reordered, not lost.
|
|
\bitem{\texttt{tcp\_recovery} (RACK):} gVisor disables it; the
|
|
Linux kernel enables it by default. RACK uses timing-based loss
|
|
detection that is more aggressive than the pure sequence-based
|
|
approach gVisor uses. Under reordering, RACK's timing heuristics
|
|
can falsely classify delayed packets as lost.
|
|
\bitem{\texttt{tcp\_early\_retrans} (TLP):} gVisor disables it; the
|
|
kernel enables it. Tail Loss Probe sends speculative retransmits
|
|
on idle connections, which can worsen congestion when the link is
|
|
already impaired.
|
|
\end{itemize}
|
|
|
|
The combined effect: under network conditions with packet reordering,
|
|
the default Linux TCP stack fires retransmits and cuts the congestion
|
|
window far more often than necessary. Each false positive shrinks the
|
|
window and reduces throughput. Tailscale's gVisor stack tolerates
|
|
more reordering before reacting, so its congestion window stays larger
|
|
and throughput stays higher.
|
|
|
|
This explains why the anomaly grows with impairment severity. At
|
|
baseline, there is no reordering, so the threshold difference is
|
|
irrelevant and Internal's kernel-level processing advantage dominates.
|
|
As reordering increases from 0.5\% (Low) to 2.5\% (Medium) per
|
|
machine, the kernel's aggressive loss detection fires more often, and
|
|
the throughput gap shifts in Headscale's favor.
|
|
|
|
\subsection{Tuned Kernel Parameters}
|
|
|
|
Two follow-up benchmark runs applied Tailscale's gVisor TCP
|
|
parameters to the host kernel via sysctl:
|
|
|
|
\begin{itemize}
|
|
\bitem{Full gVisor (27.02.2026):} All parameters ---
|
|
\texttt{tcp\_reordering=10}, \texttt{tcp\_recovery=0},
|
|
\texttt{tcp\_early\_retrans=0}, plus enlarged buffer sizes
|
|
(\texttt{tcp\_rmem}, \texttt{tcp\_wmem}, \texttt{rmem\_max},
|
|
\texttt{wmem\_max}). Tested on Internal, Headscale, WireGuard,
|
|
Tinc, and ZeroTier.
|
|
\bitem{Reorder-only (06.03.2026):} Only
|
|
\texttt{tcp\_reordering=10}, \texttt{tcp\_recovery=0}, and
|
|
\texttt{tcp\_early\_retrans=0}. Buffer sizes left at kernel
|
|
defaults. Tested on Internal and Headscale only.
|
|
\end{itemize}
|
|
|
|
Table~\ref{tab:kernel_tuning_internal} shows how Internal responds
|
|
to the tuning. Both follow-up runs used the same impairment profiles
|
|
and hardware as the original 18.12.2025 run.
|
|
|
|
\begin{table}[H]
|
|
\centering
|
|
\caption{Internal (no VPN) throughput across three kernel
|
|
configurations. ``Default'' is the 18.12.2025 run with stock
|
|
Linux TCP parameters.}
|
|
\label{tab:kernel_tuning_internal}
|
|
\begin{tabular}{llrrr}
|
|
\hline
|
|
\textbf{Metric} & \textbf{Profile} & \textbf{Default} &
|
|
\textbf{Full gVisor} & \textbf{Reorder-only} \\
|
|
\hline
|
|
Single TCP (Mbps) & Baseline & 934 & 934 & 934 \\
|
|
Single TCP (Mbps) & Low & 333 & 363 & 354 \\
|
|
Single TCP (Mbps) & Medium & 29.6 & 64.2 & 72.7 \\
|
|
Parallel TCP (Mbps) & Low & 277 & 893 & 902 \\
|
|
Parallel TCP (Mbps) & Medium & 82.6 & 226 & 211 \\
|
|
Retransmit \% & Medium & ${\sim}$2.4 & 1.21 & 1.11 \\
|
|
Nix cache (s) & Medium & 58.6 & 29.7 & 29.1 \\
|
|
\hline
|
|
\end{tabular}
|
|
\end{table}
|
|
|
|
|
|
% PLOT: grouped bar chart
|
|
% File: Figures/impairment/no_vpn_kernel_tuning_comparison.png
|
|
% Data: Internal single-stream TCP at baseline/low/medium across
|
|
% original, full gVisor, and reorder-only configurations
|
|
% Show: Dramatic jump at medium (29.6 -> 64.2 -> 72.7 Mbps);
|
|
% baseline unchanged; modest improvement at low
|
|
% Y-axis: linear scale
|
|
|
|
Internal's Medium-impairment throughput jumps from 29.6 to
|
|
72.7\,Mbps --- a 146\% increase from a three-line sysctl change. The
|
|
retransmit percentage drops from ${\sim}$2.4\% to 1.11\%; most of the
|
|
original retransmissions were spurious. The Nix cache download at
|
|
Medium halves from 58.6\,s to 29.1\,s.
|
|
|
|
Parallel TCP sees an even larger gain. Internal at Low impairment
|
|
climbs from 277 to 902\,Mbps, a 226\% increase that now exceeds
|
|
Headscale's original 718\,Mbps. With six concurrent flows each
|
|
independently benefiting from the higher reordering threshold, the
|
|
aggregate improvement compounds.
|
|
|
|
The anomaly reverses. At every impairment level and benchmark, tuned
|
|
Internal now meets or exceeds Headscale. At Medium impairment:
|
|
Internal 72.7\,Mbps vs.\ Headscale 50.1\,Mbps (Internal 45\% ahead),
|
|
where the original result had Headscale 40\% ahead. The Nix cache
|
|
flips too: Internal completes in 29.1\,s vs.\ Headscale's 36.3\,s,
|
|
where the original had Headscale 17\% faster.
|
|
|
|
% PLOT: before/after comparison
|
|
% File: Figures/impairment/headscale-gap-reversal.png
|
|
% Data: Internal vs Headscale throughput ratio at each impairment
|
|
% level, original vs tuned (reorder-only)
|
|
% Show: The crossover from "Headscale wins" (ratio < 1) to "Internal
|
|
% wins" (ratio > 1) at medium impairment after tuning
|
|
% Y-axis: ratio (Internal / Headscale), 1.0 as break-even
|
|
|
|
The reorder-only configuration (06.03) matches or exceeds the full
|
|
gVisor configuration (27.02) at most metrics; the two exceptions are
|
|
single-stream TCP at Low (354 vs.\ 363\,Mbps) and parallel TCP at
|
|
Medium (211 vs.\ 226\,Mbps), both within 7\%. Internal
|
|
reaches 72.7\,Mbps at Medium with reorder-only vs.\ 64.2\,Mbps with
|
|
full gVisor. The enlarged buffer sizes are unnecessary and may
|
|
introduce mild buffer bloat that partially offsets the reordering
|
|
benefit. The entire Headscale advantage is explained by three kernel
|
|
parameters: \texttt{tcp\_reordering}, \texttt{tcp\_recovery}, and
|
|
\texttt{tcp\_early\_retrans}.
|
|
|
|
Other VPNs benefit less from the kernel tuning. WireGuard's Medium
|
|
throughput rises from 8.77 to 12.2\,Mbps (+39\%) and Tinc's from
|
|
5.53 to 11.5\,Mbps (+108\%). ZeroTier shows no change (12.0 to
|
|
11.5\,Mbps). The tuning helps the kernel TCP stack, but VPNs that
|
|
add their own encapsulation overhead and userspace processing have
|
|
independent bottlenecks that the sysctl parameters cannot remove.
|
|
|
|
Headscale itself gets modestly faster with kernel tuning (+21\% at
|
|
Medium) but slightly slower at Low impairment ($-$5\%). Its
|
|
userspace gVisor stack already optimizes for reordering tolerance.
|
|
When the kernel stack also increases its tolerance, the two layers of
|
|
tuning may interact suboptimally --- both independently delay
|
|
retransmits, which can cause compound delays on the
|
|
kernel-to-Headscale socket path.
|
|
|
|
\section{Source Code Analysis}
|
|
|
|
\subsection{Feature Matrix Overview}
|
|
|
|
% Summary of the 131-feature matrix across all ten VPNs.
|
|
% Highlight key architectural differences that explain
|
|
% performance results.
|
|
|
|
\subsection{Security Vulnerabilities}
|
|
|
|
% Vulnerabilities discovered during source code review.
|
|
|
|
\section{Summary of Findings}
|
|
|
|
% Brief summary table or ranking of VPNs by key metrics.
|
|
% Save deeper interpretation for a Discussion chapter.
|