add impairment profile chapter
This commit is contained in:
+566
-20
@@ -552,9 +552,9 @@ than per-connection latency.
|
|||||||
|
|
||||||
Several rankings invert relative to raw throughput. ZeroTier
|
Several rankings invert relative to raw throughput. ZeroTier
|
||||||
finishes faster than WireGuard (9.22\,s vs.\ 9.45\,s) despite
|
finishes faster than WireGuard (9.22\,s vs.\ 9.45\,s) despite
|
||||||
30\,\% fewer raw Mbps and 1\,000$\times$ more retransmits. Yggdrasil
|
6\,\% fewer raw Mbps and 1\,000$\times$ more retransmits. Yggdrasil
|
||||||
is the clearest example: it has the
|
is the clearest example: it has the
|
||||||
third-highest throughput at 795\,Mbps, yet lands at 24\,\% overhead
|
fourth-highest VPN throughput at 795\,Mbps, yet lands at 24\,\% overhead
|
||||||
because its
|
because its
|
||||||
2.2\,ms latency adds up over the many small sequential HTTP requests
|
2.2\,ms latency adds up over the many small sequential HTTP requests
|
||||||
that constitute a Nix cache download.
|
that constitute a Nix cache download.
|
||||||
@@ -815,57 +815,603 @@ what would otherwise be idle gaps in any individual flow, squeezing
|
|||||||
out throughput that no single stream could reach alone.
|
out throughput that no single stream could reach alone.
|
||||||
|
|
||||||
\section{Impact of Network Impairment}
|
\section{Impact of Network Impairment}
|
||||||
|
\label{sec:impairment}
|
||||||
|
|
||||||
This section examines how each VPN responds to the Low, Medium, and
|
The impairment profiles from Table~\ref{tab:impairment_profiles} are
|
||||||
High impairment profiles defined in Chapter~\ref{Methodology}.
|
applied to the full benchmark suite. Baseline results from
|
||||||
|
Section~\ref{sec:baseline} serve as the reference.
|
||||||
|
|
||||||
\subsection{Ping}
|
\subsection{Ping}
|
||||||
|
|
||||||
% RTT and packet loss across impairment profiles.
|
Table~\ref{tab:ping_impairment} lists average round-trip times across
|
||||||
|
all four profiles. Most VPNs track the expected increase closely:
|
||||||
|
tc~netem adds roughly 4\,ms, 8\,ms, and 15\,ms of round-trip delay
|
||||||
|
at Low, Medium, and High respectively, and Internal's measured values
|
||||||
|
(4.82, 9.38, 15.49\,ms) confirm this.
|
||||||
|
|
||||||
|
\begin{table}[H]
|
||||||
|
\centering
|
||||||
|
\caption{Average ping RTT (ms) across impairment profiles, sorted
|
||||||
|
by High-profile RTT}
|
||||||
|
\label{tab:ping_impairment}
|
||||||
|
\begin{tabular}{lrrrr}
|
||||||
|
\hline
|
||||||
|
\textbf{VPN} & \textbf{Baseline} & \textbf{Low} &
|
||||||
|
\textbf{Medium} & \textbf{High} \\
|
||||||
|
\hline
|
||||||
|
Internal & 0.60 & 4.82 & 9.38 & 15.49 \\
|
||||||
|
Tinc & 1.19 & 5.32 & 9.85 & 15.92 \\
|
||||||
|
Nebula & 1.25 & 5.38 & 9.99 & 15.96 \\
|
||||||
|
WireGuard & 1.20 & 5.36 & 9.88 & 15.99 \\
|
||||||
|
Headscale & 1.64 & 5.82 & 10.39 & 16.07 \\
|
||||||
|
VpnCloud & 1.13 & 5.41 & 10.35 & 16.21 \\
|
||||||
|
ZeroTier & 1.28 & 5.34 & 10.02 & 16.54 \\
|
||||||
|
Yggdrasil & 2.20 & 6.73 & 11.99 & 20.20 \\
|
||||||
|
Hyprspace & 1.79 & 6.15 & 10.76 & 24.49 \\
|
||||||
|
EasyTier & 1.33 & 6.27 & 14.13 & 26.60 \\
|
||||||
|
Mycelium & 34.90 & 23.42 & 43.88 & 33.05 \\
|
||||||
|
\hline
|
||||||
|
\end{tabular}
|
||||||
|
\end{table}
|
||||||
|
|
||||||
|
% PLOT: line chart
|
||||||
|
% File: Figures/impairment/Ping Average RTT Heatmap.png
|
||||||
|
% Data: Average ping RTT for all 11 VPNs at baseline, low, medium, high
|
||||||
|
% Show: Most VPNs in a tight parallel band; Mycelium's non-monotonic curve;
|
||||||
|
% EasyTier and Hyprspace diverging upward at high impairment
|
||||||
|
|
||||||
|
Mycelium defies the pattern. Its RTT \emph{drops} from 34.9\,ms at
|
||||||
|
baseline to 23.4\,ms at Low impairment, a 33\% improvement where
|
||||||
|
every other VPN gets slower. It then rises to 43.9\,ms at Medium
|
||||||
|
before falling again to 33.0\,ms at High. The baseline analysis
|
||||||
|
(Section~\ref{sec:mycelium_routing}) showed that Mycelium's latency
|
||||||
|
comes from a bimodal routing distribution: one path runs at 1.63\,ms
|
||||||
|
while two others route through the global overlay at
|
||||||
|
${\sim}$51\,ms. The impairment appears to push Mycelium's path
|
||||||
|
discovery toward shorter routes, so a larger share of traffic takes
|
||||||
|
the direct path. The non-monotonic pattern is consistent with a path
|
||||||
|
selection algorithm that responds to measured link quality, but not
|
||||||
|
linearly with degradation severity.
|
||||||
|
|
||||||
|
Mycelium also achieves 0\% ping packet loss at Low and Medium
|
||||||
|
impairment, while most VPNs show 0.1--3.2\% loss at those profiles.
|
||||||
|
At High impairment, Mycelium's loss jumps to 11.1\%.
|
||||||
|
|
||||||
|
EasyTier accumulates 11\,ms of excess latency at High impairment
|
||||||
|
beyond what tc~netem accounts for. Its average RTT of 26.6\,ms and
|
||||||
|
maximum of 290\,ms (vs.\ ${\sim}$40\,ms for WireGuard) point to a
|
||||||
|
userspace scheduling or retry mechanism that introduces escalating
|
||||||
|
variance. EasyTier's RTT standard deviation reaches 44.6\,ms at
|
||||||
|
High, the worst jitter of any VPN.
|
||||||
|
|
||||||
|
Hyprspace shows 11.1\% ping packet loss at every impairment level ---
|
||||||
|
Low, Medium, and High alike. With 9~measurement runs (3~machine
|
||||||
|
pairs $\times$ 3~runs of 100~packets), 11.1\% equals exactly 1/9:
|
||||||
|
one run per profile fails completely while the other eight report zero
|
||||||
|
loss. This binary pass/fail behavior is consistent with the buffer
|
||||||
|
bloat diagnosis from Section~\ref{sec:hyprspace_bloat}. When buffers
|
||||||
|
fill, an entire path stalls rather than degrading gradually.
|
||||||
|
|
||||||
\subsection{TCP Throughput}
|
\subsection{TCP Throughput}
|
||||||
|
|
||||||
% TCP iperf3: throughput, retransmits, congestion window.
|
Table~\ref{tab:tcp_impairment} presents single-stream TCP throughput
|
||||||
|
across all four profiles. The baseline performance tiers from
|
||||||
|
Section~\ref{sec:baseline} dissolve almost immediately under
|
||||||
|
impairment.
|
||||||
|
|
||||||
|
\begin{table}[H]
|
||||||
|
\centering
|
||||||
|
\caption{Single-stream TCP throughput (Mbps) across impairment
|
||||||
|
profiles, sorted by baseline. Retention is the
|
||||||
|
Low-to-baseline ratio.}
|
||||||
|
\label{tab:tcp_impairment}
|
||||||
|
\begin{tabular}{lrrrrr}
|
||||||
|
\hline
|
||||||
|
\textbf{VPN} & \textbf{Baseline} & \textbf{Low} &
|
||||||
|
\textbf{Medium} & \textbf{High} & \textbf{Retention} \\
|
||||||
|
\hline
|
||||||
|
Internal & 934 & 333 & 29.6 & 4.25 & 35.7\% \\
|
||||||
|
WireGuard & 864 & 54.7 & 8.77 & 2.63 & 6.3\% \\
|
||||||
|
ZeroTier & 814 & 63.7 & 12.0 & 4.01 & 7.8\% \\
|
||||||
|
Headscale & 800 & 274 & 41.5 & 4.21 & 34.3\% \\
|
||||||
|
Yggdrasil & 795 & 13.2 & 6.08 & 3.40 & 1.7\% \\
|
||||||
|
\hline
|
||||||
|
Nebula & 706 & 49.8 & 7.82 & 2.60 & 7.1\% \\
|
||||||
|
EasyTier & 636 & 156 & 17.4 & 3.59 & 24.6\% \\
|
||||||
|
VpnCloud & 539 & 58.2 & 8.33 & 1.86 & 10.8\% \\
|
||||||
|
\hline
|
||||||
|
Hyprspace & 368 & 4.42 & 2.05 & 1.39 & 1.2\% \\
|
||||||
|
Tinc & 336 & 54.4 & 5.53 & 2.77 & 16.2\% \\
|
||||||
|
Mycelium & 259 & 16.2 & 3.87 & 2.73 & 6.3\% \\
|
||||||
|
\hline
|
||||||
|
\end{tabular}
|
||||||
|
\end{table}
|
||||||
|
|
||||||
|
% PLOT: line chart
|
||||||
|
% File: Figures/impairment/TCP Throughput Heatmap.png
|
||||||
|
% Data: Single-stream TCP throughput for all 11 VPNs at baseline,
|
||||||
|
% low, medium, high
|
||||||
|
% Show: Headscale crossing above Internal at medium impairment;
|
||||||
|
% Yggdrasil's cliff from baseline to low; convergence of all
|
||||||
|
% VPNs at high impairment
|
||||||
|
|
||||||
|
Yggdrasil crashes from 795\,Mbps to 13.2\,Mbps at Low impairment, a
|
||||||
|
98.3\% throughput loss from adding just 2\,ms latency, 2\,ms jitter,
|
||||||
|
0.25\% packet loss, and 0.5\% reordering per machine. Even Mycelium,
|
||||||
|
the slowest VPN at baseline (259\,Mbps), retains more throughput at
|
||||||
|
Low than Yggdrasil does. The jumbo overlay MTU of 32\,731~bytes,
|
||||||
|
which inflated baseline metrics
|
||||||
|
(Section~\ref{sec:baseline}), becomes a liability under impairment:
|
||||||
|
each lost or reordered outer packet triggers retransmission of
|
||||||
|
${\sim}$24$\times$ more inner-layer data than a standard
|
||||||
|
1\,400-byte MTU VPN would lose.
|
||||||
|
|
||||||
|
Headscale retains 34.3\% of its baseline throughput at Low, nearly
|
||||||
|
matching Internal's 35.7\%. At Medium impairment, Headscale
|
||||||
|
(41.5\,Mbps) overtakes Internal (29.6\,Mbps) --- a VPN outperforming
|
||||||
|
the bare-metal baseline.
|
||||||
|
Section~\ref{sec:tailscale_degraded} investigates this anomaly in
|
||||||
|
detail.
|
||||||
|
|
||||||
|
At High impairment, the throughput range compresses from 675\,Mbps at
|
||||||
|
baseline to just 2.9\,Mbps. Internal leads at 4.25\,Mbps; Hyprspace
|
||||||
|
trails at 1.39\,Mbps. The impairment profile itself becomes the
|
||||||
|
bottleneck. With 2.5\% packet loss and 5\% reordering per machine,
|
||||||
|
every implementation is TCP-loss-limited, and architectural
|
||||||
|
differences that matter at gigabit speeds become irrelevant.
|
||||||
|
|
||||||
\subsection{UDP Throughput}
|
\subsection{UDP Throughput}
|
||||||
|
|
||||||
% UDP iperf3: throughput, jitter, packet loss.
|
The UDP stress test (\texttt{-b~0}) suffers from widespread failures
|
||||||
|
under impairment. Hyprspace and Mycelium, which already failed at
|
||||||
|
baseline, continue to fail at all profiles. Tinc and ZeroTier fail
|
||||||
|
at most non-baseline profiles. The sparse dataset limits
|
||||||
|
conclusions, but one pattern stands out.
|
||||||
|
|
||||||
|
Kernel-level implementations maintain throughput regardless of
|
||||||
|
impairment. Internal holds ${\sim}$950\,Mbps across all profiles
|
||||||
|
where data exists. Headscale sustains 700--876\,Mbps and WireGuard
|
||||||
|
850--908\,Mbps; % TODO: verify WireGuard UDP range -- analysis doc says 850-898, possible digit transposition
|
||||||
|
both rely on WireGuard's in-kernel UDP handling with
|
||||||
|
proper backpressure. Userspace VPNs collapse: EasyTier drops from
|
||||||
|
865 to 435 to 38.5 to 6.1\,Mbps across successive profiles.
|
||||||
|
Yggdrasil, already pathological at baseline (98.7\% loss), crashes to
|
||||||
|
12.3\,Mbps at Low and fails entirely at Medium and High.
|
||||||
|
|
||||||
|
% PLOT: heatmap
|
||||||
|
% File: Figures/impairment/UDP Receiver Throughput Heatmap.png
|
||||||
|
% Data: UDP receiver throughput for all 11 VPNs at baseline, low,
|
||||||
|
% medium, high (grey/hatched cells for failures)
|
||||||
|
% Show: Kernel-level VPNs (Internal, WireGuard, Headscale) maintaining
|
||||||
|
% high throughput across all profiles; userspace VPNs failing or
|
||||||
|
% collapsing; the large number of empty cells
|
||||||
|
|
||||||
|
|
||||||
|
The failure rate of this benchmark under impairment makes it more
|
||||||
|
useful as a robustness indicator than a throughput measurement. A VPN
|
||||||
|
that cannot complete a 30-second UDP flood under 0.25\% packet loss
|
||||||
|
has fundamental flow-control problems that will surface under real
|
||||||
|
workloads too, even if the symptoms are milder.
|
||||||
|
|
||||||
\subsection{Parallel TCP}
|
\subsection{Parallel TCP}
|
||||||
|
|
||||||
% Parallel iperf3: throughput under contention (A->B, B->C, C->A).
|
Table~\ref{tab:parallel_impairment} shows aggregate throughput across
|
||||||
|
three concurrent bidirectional links (six unidirectional flows). The
|
||||||
|
Headscale anomaly from the single-stream results is amplified here.
|
||||||
|
|
||||||
|
\begin{table}[H]
|
||||||
|
\centering
|
||||||
|
\caption{Parallel TCP throughput (Mbps) across impairment profiles.
|
||||||
|
Three concurrent bidirectional links produce six unidirectional
|
||||||
|
flows.}
|
||||||
|
\label{tab:parallel_impairment}
|
||||||
|
\begin{tabular}{lrrrr}
|
||||||
|
\hline
|
||||||
|
\textbf{VPN} & \textbf{Baseline} & \textbf{Low} &
|
||||||
|
\textbf{Medium} & \textbf{High} \\
|
||||||
|
\hline
|
||||||
|
Internal & 1398 & 277 & 82.6 & 10.4 \\
|
||||||
|
Headscale & 1228 & 718 & 113 & 20.0 \\
|
||||||
|
WireGuard & 1281 & 173 & 24.5 & 8.39 \\
|
||||||
|
Yggdrasil & 1265 & 38.7 & 16.7 & 8.95 \\
|
||||||
|
ZeroTier & 1206 & 176 & 35.4 & 7.97 \\
|
||||||
|
EasyTier & 927 & 473 & 57.4 & 10.7 \\
|
||||||
|
Hyprspace & 803 & 2.87 & 6.94 & 3.62 \\
|
||||||
|
VpnCloud & 763 & 174 & 23.7 & 8.25 \\
|
||||||
|
Nebula & 648 & 103 & 15.3 & 4.93 \\
|
||||||
|
Mycelium & 569 & 72.7 & 7.51 & 3.69 \\
|
||||||
|
Tinc & 563 & 168 & 23.7 & 8.25 \\
|
||||||
|
\hline
|
||||||
|
\end{tabular}
|
||||||
|
\end{table}
|
||||||
|
|
||||||
|
% PLOT: heatmap
|
||||||
|
% File: Figures/impairment/Parallel TCP Throughput Heatmap.png
|
||||||
|
% Data: Parallel TCP throughput for all 11 VPNs at baseline, low,
|
||||||
|
% medium, high
|
||||||
|
% Show: Headscale dominating at low impairment (718 Mbps vs Internal's
|
||||||
|
% 277); EasyTier as runner-up (473 Mbps); Hyprspace's collapse
|
||||||
|
% to 2.87 Mbps
|
||||||
|
|
||||||
|
|
||||||
|
Headscale at Low impairment: 718\,Mbps --- 2.6$\times$ Internal
|
||||||
|
(277\,Mbps) and 4.1$\times$ WireGuard (173\,Mbps). At Medium,
|
||||||
|
Headscale (113\,Mbps) still leads Internal (82.6\,Mbps) by 37\%.
|
||||||
|
The single-stream anomaly from
|
||||||
|
Section~\ref{sec:tailscale_degraded} compounds when multiple flows
|
||||||
|
each independently benefit from Headscale's congestion control
|
||||||
|
tuning.
|
||||||
|
|
||||||
|
EasyTier is the second-most resilient VPN under parallel load, at
|
||||||
|
473\,Mbps at Low (51\% of baseline). Both EasyTier and Headscale
|
||||||
|
retain more than half their baseline parallel throughput at Low
|
||||||
|
impairment; no other VPN exceeds 30\%.
|
||||||
|
|
||||||
|
Hyprspace collapses from 803\,Mbps to 2.87\,Mbps at Low, a 99.6\%
|
||||||
|
loss. The buffer bloat that plagues single-stream transfers
|
||||||
|
(Section~\ref{sec:hyprspace_bloat}) becomes catastrophic when six
|
||||||
|
concurrent flows compete for the same bloated buffers.
|
||||||
|
|
||||||
|
The High-profile convergence effect is even more pronounced here than
|
||||||
|
in single-stream mode. Tinc and VpnCloud land at identical
|
||||||
|
8.25\,Mbps despite differing by 200\,Mbps at baseline.
|
||||||
|
|
||||||
\subsection{QUIC Performance}
|
\subsection{QUIC Performance}
|
||||||
|
|
||||||
% qperf: bandwidth, TTFB, connection establishment time.
|
Headscale and Nebula failed the qperf QUIC benchmark at baseline
|
||||||
|
(Section~\ref{sec:baseline}) and continue to fail across all
|
||||||
|
impairment profiles.
|
||||||
|
|
||||||
|
Yggdrasil's QUIC bandwidth drops from 745\,Mbps at baseline to
|
||||||
|
7.67\,Mbps at Low, 3.45\,Mbps at Medium, and 2.17\,Mbps at High ---
|
||||||
|
the same cliff observed in its TCP results, driven by the same
|
||||||
|
jumbo-MTU amplification of outer-layer packet loss.
|
||||||
|
|
||||||
|
At High impairment, WireGuard (23.2\,Mbps), VpnCloud (23.4\,Mbps),
|
||||||
|
ZeroTier (23.0\,Mbps), and Tinc (23.4\,Mbps) converge to within
|
||||||
|
0.4\,Mbps of each other. At baseline these four span a 500\,Mbps
|
||||||
|
range. QUIC's own congestion control, operating atop the
|
||||||
|
already-degraded outer link, becomes the sole limiter.
|
||||||
|
|
||||||
|
% PLOT: heatmap
|
||||||
|
% File: Figures/impairment/QUIC Bandwidth Heatmap.png
|
||||||
|
% Data: QPerf QUIC bandwidth for VPNs with data at all four profiles
|
||||||
|
% (WireGuard, VpnCloud, ZeroTier, Tinc, Yggdrasil, Internal)
|
||||||
|
% Show: Yggdrasil's cliff from baseline to low; convergence of
|
||||||
|
% WireGuard, VpnCloud, ZeroTier, Tinc at high (~23 Mbps)
|
||||||
|
|
||||||
|
|
||||||
\subsection{Video Streaming}
|
\subsection{Video Streaming}
|
||||||
|
|
||||||
% RIST: bitrate, dropped frames, packets recovered, quality score.
|
Table~\ref{tab:rist_impairment} presents RIST video quality scores
|
||||||
|
across profiles. The actual encoding bitrate of ${\sim}$3.3\,Mbps
|
||||||
|
sits well within every VPN's throughput budget even at High
|
||||||
|
impairment, so quality differences reflect packet delivery reliability
|
||||||
|
rather than bandwidth limits.
|
||||||
|
|
||||||
|
\begin{table}[H]
|
||||||
|
\centering
|
||||||
|
\caption{RIST video streaming quality (\%) across impairment
|
||||||
|
profiles, sorted by High-profile quality}
|
||||||
|
\label{tab:rist_impairment}
|
||||||
|
\begin{tabular}{lrrrr}
|
||||||
|
\hline
|
||||||
|
\textbf{VPN} & \textbf{Baseline} & \textbf{Low} &
|
||||||
|
\textbf{Medium} & \textbf{High} \\
|
||||||
|
\hline
|
||||||
|
Mycelium & 100.0 & 100.0 & 100.0 & 99.9 \\
|
||||||
|
EasyTier & 100.0 & 100.0 & 96.2 & 85.5 \\
|
||||||
|
Internal & 100.0 & 99.2 & 89.3 & 80.2 \\
|
||||||
|
ZeroTier & 100.0 & 99.3 & 89.9 & 80.2 \\
|
||||||
|
VpnCloud & 100.0 & 99.2 & 89.7 & 80.1 \\
|
||||||
|
WireGuard & 100.0 & 99.3 & 90.0 & 80.0 \\
|
||||||
|
Hyprspace & 100.0 & 92.9 & 87.9 & 78.1 \\
|
||||||
|
Tinc & 100.0 & 99.3 & 90.0 & 77.8 \\
|
||||||
|
Nebula & 99.8 & 98.8 & 85.6 & 72.1 \\
|
||||||
|
Yggdrasil & 100.0 & 94.7 & 71.4 & 43.3 \\
|
||||||
|
Headscale & 13.1 & 13.0 & 13.0 & 13.0 \\
|
||||||
|
\hline
|
||||||
|
\end{tabular}
|
||||||
|
\end{table}
|
||||||
|
|
||||||
|
% PLOT: heatmap
|
||||||
|
% File: Figures/impairment/Video Streaming Quality Heatmap.png
|
||||||
|
% Data: RIST quality for all 11 VPNs at baseline, low, medium, high
|
||||||
|
% Show: Headscale stuck at 13% (red row); Mycelium stuck near 100%
|
||||||
|
% (green row); gradual degradation for the bulk; Yggdrasil's
|
||||||
|
% steep decline to 43%
|
||||||
|
|
||||||
|
|
||||||
|
Headscale stays at ${\sim}$13\% across all four profiles: 13.1\%,
|
||||||
|
13.0\%, 13.0\%, 13.0\%. The profile-independence confirms the
|
||||||
|
baseline diagnosis from Section~\ref{sec:baseline}. The failure is
|
||||||
|
structural --- likely MTU fragmentation in the DERP relay layer ---
|
||||||
|
and cannot worsen because it is already saturated. Adding latency or
|
||||||
|
loss on top of an 87\% packet drop floor changes nothing.
|
||||||
|
|
||||||
|
Mycelium delivers 99.9\% quality even at High impairment, better than
|
||||||
|
Internal (80.2\%) and every other VPN. At 3.3\,Mbps, even
|
||||||
|
Mycelium's degraded overlay paths can sustain the stream. Its
|
||||||
|
overlay retransmission mechanism, which cripples bulk TCP transfers,
|
||||||
|
works well for steady low-bandwidth UDP flows. RIST's own forward
|
||||||
|
error correction handles whatever Mycelium's retransmissions miss.
|
||||||
|
|
||||||
|
Yggdrasil degrades the most steeply: 100\% at baseline, 94.7\% at
|
||||||
|
Low, 71.4\% at Medium, 43.3\% at High. The jumbo MTU that hurt TCP
|
||||||
|
throughput also hurts here --- large overlay packets carrying RIST
|
||||||
|
data are more likely to be lost or reordered at the outer layer, and
|
||||||
|
RIST's FEC cannot recover from the resulting burst losses.
|
||||||
|
|
||||||
\subsection{Application-Level Download}
|
\subsection{Application-Level Download}
|
||||||
|
|
||||||
% Nix cache: download duration for Firefox package.
|
Table~\ref{tab:nix_impairment} shows Nix binary cache download times
|
||||||
|
across profiles. This HTTP-heavy workload, dominated by many
|
||||||
|
short-lived TCP connections, is more sensitive to per-connection
|
||||||
|
latency than to raw bandwidth.
|
||||||
|
|
||||||
|
\begin{table}[H]
|
||||||
|
\centering
|
||||||
|
\caption{Nix binary cache download time (seconds) across impairment
|
||||||
|
profiles, sorted by Low-profile time. ``--'' marks a failed
|
||||||
|
run.}
|
||||||
|
\label{tab:nix_impairment}
|
||||||
|
\begin{tabular}{lrrrr}
|
||||||
|
\hline
|
||||||
|
\textbf{VPN} & \textbf{Baseline} & \textbf{Low} &
|
||||||
|
\textbf{Medium} & \textbf{High} \\
|
||||||
|
\hline
|
||||||
|
Internal & 8.53 & 11.9 & 58.6 & -- \\
|
||||||
|
Headscale & 9.79 & 13.5 & 48.8 & 219 \\
|
||||||
|
EasyTier & 9.39 & 22.1 & 141 & -- \\
|
||||||
|
VpnCloud & 9.39 & 27.9 & 163 & -- \\
|
||||||
|
WireGuard & 9.45 & 28.8 & 161 & -- \\
|
||||||
|
Nebula & 9.15 & 30.8 & 180 & 547 \\
|
||||||
|
Tinc & 10.0 & 30.9 & 166 & 496 \\
|
||||||
|
ZeroTier & 9.22 & 36.2 & 141 & -- \\
|
||||||
|
Mycelium & 10.1 & 79.5 & -- & -- \\
|
||||||
|
Yggdrasil & 10.6 & 230 & -- & -- \\
|
||||||
|
Hyprspace & 11.9 & -- & 170 & -- \\
|
||||||
|
\hline
|
||||||
|
\end{tabular}
|
||||||
|
\end{table}
|
||||||
|
|
||||||
|
% PLOT: heatmap
|
||||||
|
% File: Figures/impairment/Nix Cache Download Time Heatmap.png
|
||||||
|
% Data: Nix cache download time for all VPNs at baseline, low, medium,
|
||||||
|
% high (hatched/absent bars for failures)
|
||||||
|
% Show: Headscale as the only VPN completing all four profiles;
|
||||||
|
% Headscale beating Internal at medium (48.8 vs 58.6 s);
|
||||||
|
% Yggdrasil's 22x slowdown at low impairment
|
||||||
|
|
||||||
|
|
||||||
|
Headscale is the only VPN to complete all four profiles. At Medium
|
||||||
|
impairment, it finishes in 48.8~seconds --- faster than Internal's
|
||||||
|
58.6~seconds. Internal itself fails at High impairment while
|
||||||
|
Headscale completes in 219~seconds. Only Nebula (547\,s) and Tinc
|
||||||
|
(496\,s) also survive High impairment.
|
||||||
|
|
||||||
|
Yggdrasil's download time explodes from 10.6\,s to 230\,s at Low
|
||||||
|
impairment, a 22$\times$ slowdown. Every HTTP request incurs the
|
||||||
|
latency penalty from Yggdrasil's impairment-amplified
|
||||||
|
retransmissions. Mycelium also degrades severely (10.1\,s to
|
||||||
|
79.5\,s, an 8$\times$ increase), consistent with its overlay routing
|
||||||
|
overhead, which compounds over hundreds of sequential HTTP
|
||||||
|
connections.
|
||||||
|
|
||||||
|
The failure map reveals a clean gradient: more demanding profiles
|
||||||
|
knock out more VPNs. At Low, 10 of 11 complete (Hyprspace fails).
|
||||||
|
At Medium, 9 complete. At High, only 3 survive (Headscale, Nebula,
|
||||||
|
Tinc). Internal's failure at High is the most surprising --- the
|
||||||
|
bare-metal baseline cannot sustain a multi-connection HTTP workload
|
||||||
|
under severe degradation, but Headscale, shielded by its userspace
|
||||||
|
TCP stack, can. Section~\ref{sec:tailscale_degraded} explains why.
|
||||||
|
|
||||||
\section{Tailscale Under Degraded Conditions}
|
\section{Tailscale Under Degraded Conditions}
|
||||||
|
\label{sec:tailscale_degraded}
|
||||||
% The central finding: Tailscale outperforming the raw Linux
|
|
||||||
% networking stack under impairment.
|
|
||||||
|
|
||||||
\subsection{Observed Anomaly}
|
\subsection{Observed Anomaly}
|
||||||
|
|
||||||
% Present the data showing Tailscale exceeding internal baseline
|
At Medium impairment, Headscale delivers 41.5\,Mbps single-stream TCP
|
||||||
% throughput under Medium/High impairment.
|
throughput --- 40\% more than Internal's 29.6\,Mbps. A VPN built
|
||||||
|
atop WireGuard outperforms the bare-metal connection it tunnels
|
||||||
|
through. The anomaly is consistent across benchmarks:
|
||||||
|
Table~\ref{tab:headscale_anomaly} summarizes the comparison.
|
||||||
|
|
||||||
|
\begin{table}[H]
|
||||||
|
\centering
|
||||||
|
\caption{Headscale vs.\ Internal vs.\ WireGuard under impairment
|
||||||
|
(18.12.2025 run). For TCP benchmarks, higher is better. For
|
||||||
|
Nix cache, lower is better; ``--'' marks a failed run.}
|
||||||
|
\label{tab:headscale_anomaly}
|
||||||
|
\begin{tabular}{llrrr}
|
||||||
|
\hline
|
||||||
|
\textbf{Benchmark} & \textbf{Profile} & \textbf{Internal} &
|
||||||
|
\textbf{Headscale} & \textbf{WireGuard} \\
|
||||||
|
\hline
|
||||||
|
Single TCP (Mbps) & Low & 333 & 274 & 54.7 \\
|
||||||
|
Single TCP (Mbps) & Medium & 29.6 & 41.5 & 8.77 \\
|
||||||
|
Single TCP (Mbps) & High & 4.25 & 4.21 & 2.63 \\
|
||||||
|
Parallel TCP (Mbps) & Low & 277 & 718 & 173 \\
|
||||||
|
Parallel TCP (Mbps) & Medium & 82.6 & 113 & 24.5 \\
|
||||||
|
Nix cache (s) & Medium & 58.6 & 48.8 & 161 \\
|
||||||
|
Nix cache (s) & High & -- & 219 & -- \\
|
||||||
|
\hline
|
||||||
|
\end{tabular}
|
||||||
|
\end{table}
|
||||||
|
|
||||||
|
% TODO: Needs to be created, use the tools/ folder
|
||||||
|
% PLOT: line chart
|
||||||
|
% File: Figures/impairment/headscale-vs-internal-across-profiles.png
|
||||||
|
% Data: Single-stream TCP throughput for Internal, Headscale, and
|
||||||
|
% WireGuard across all four profiles
|
||||||
|
% Show: Headscale crossing above Internal at medium impairment;
|
||||||
|
% WireGuard far below both; convergence at high
|
||||||
|
% Y-axis: log scale
|
||||||
|
|
||||||
|
In parallel TCP at Low impairment, Headscale reaches 718\,Mbps vs.\
|
||||||
|
Internal's 277\,Mbps (2.6$\times$). The Nix cache download at
|
||||||
|
Medium takes Headscale 48.8\,s vs.\ Internal's 58.6\,s (17\%
|
||||||
|
faster). At High impairment, Internal fails the Nix cache entirely
|
||||||
|
while Headscale completes in 219\,s.
|
||||||
|
|
||||||
|
WireGuard, which shares Headscale's cryptographic layer, shows no
|
||||||
|
such advantage: 54.7\,Mbps at Low, 8.77\,Mbps at Medium. Whatever
|
||||||
|
protects Headscale is not the encryption or the tunnel --- it is
|
||||||
|
something in Tailscale's userspace networking stack.
|
||||||
|
|
||||||
|
The retransmit data provides the first clue. At Medium impairment,
|
||||||
|
Headscale's retransmit percentage is approximately 2.4\%, matching
|
||||||
|
Internal's ${\sim}$2.4\%. WireGuard's is 5.2\%. Headscale achieves
|
||||||
|
Internal's retransmit efficiency while delivering higher throughput
|
||||||
|
--- fewer spurious retransmissions leave more bandwidth for actual
|
||||||
|
data.
|
||||||
|
|
||||||
\subsection{Congestion Control Analysis}
|
\subsection{Congestion Control Analysis}
|
||||||
|
|
||||||
% Reno vs CUBIC, RACK disabled to avoid spurious retransmits
|
Tailscale uses a userspace TCP/IP stack derived from Google's gVisor
|
||||||
% under reordering.
|
(netstack). This stack does not inherit the host kernel's TCP
|
||||||
|
parameters. Three defaults differ from the Linux kernel in ways that
|
||||||
|
matter under packet reordering:
|
||||||
|
|
||||||
|
\begin{itemize}
|
||||||
|
\bitem{\texttt{tcp\_reordering}:} gVisor uses 10; the Linux kernel
|
||||||
|
defaults to~3. This parameter controls how many out-of-order
|
||||||
|
packets TCP tolerates before treating the event as a loss. With
|
||||||
|
tc~netem injecting 0.5--2.5\% reordering per machine, bursts of
|
||||||
|
3+ reordered packets are frequent. The kernel's threshold of~3
|
||||||
|
causes spurious fast retransmits and congestion window reductions
|
||||||
|
for packets that are merely reordered, not lost.
|
||||||
|
\bitem{\texttt{tcp\_recovery} (RACK):} gVisor disables it; the
|
||||||
|
Linux kernel enables it by default. RACK uses timing-based loss
|
||||||
|
detection that is more aggressive than the pure sequence-based
|
||||||
|
approach gVisor uses. Under reordering, RACK's timing heuristics
|
||||||
|
can falsely classify delayed packets as lost.
|
||||||
|
\bitem{\texttt{tcp\_early\_retrans} (TLP):} gVisor disables it; the
|
||||||
|
kernel enables it. Tail Loss Probe sends speculative retransmits
|
||||||
|
on idle connections, which can worsen congestion when the link is
|
||||||
|
already impaired.
|
||||||
|
\end{itemize}
|
||||||
|
|
||||||
|
The combined effect: under network conditions with packet reordering,
|
||||||
|
the default Linux TCP stack fires retransmits and cuts the congestion
|
||||||
|
window far more often than necessary. Each false positive shrinks the
|
||||||
|
window and reduces throughput. Tailscale's gVisor stack tolerates
|
||||||
|
more reordering before reacting, so its congestion window stays larger
|
||||||
|
and throughput stays higher.
|
||||||
|
|
||||||
|
This explains why the anomaly grows with impairment severity. At
|
||||||
|
baseline, there is no reordering, so the threshold difference is
|
||||||
|
irrelevant and Internal's kernel-level processing advantage dominates.
|
||||||
|
As reordering increases from 0.5\% (Low) to 2.5\% (Medium) per
|
||||||
|
machine, the kernel's aggressive loss detection fires more often, and
|
||||||
|
the throughput gap shifts in Headscale's favor.
|
||||||
|
|
||||||
\subsection{Tuned Kernel Parameters}
|
\subsection{Tuned Kernel Parameters}
|
||||||
|
|
||||||
% Re-run results with tuned buffer sizes and congestion control
|
Two follow-up benchmark runs applied Tailscale's gVisor TCP
|
||||||
% on the internal baseline, showing the gap closes.
|
parameters to the host kernel via sysctl:
|
||||||
|
|
||||||
|
\begin{itemize}
|
||||||
|
\bitem{Full gVisor (27.02.2026):} All parameters ---
|
||||||
|
\texttt{tcp\_reordering=10}, \texttt{tcp\_recovery=0},
|
||||||
|
\texttt{tcp\_early\_retrans=0}, plus enlarged buffer sizes
|
||||||
|
(\texttt{tcp\_rmem}, \texttt{tcp\_wmem}, \texttt{rmem\_max},
|
||||||
|
\texttt{wmem\_max}). Tested on Internal, Headscale, WireGuard,
|
||||||
|
Tinc, and ZeroTier.
|
||||||
|
\bitem{Reorder-only (06.03.2026):} Only
|
||||||
|
\texttt{tcp\_reordering=10}, \texttt{tcp\_recovery=0}, and
|
||||||
|
\texttt{tcp\_early\_retrans=0}. Buffer sizes left at kernel
|
||||||
|
defaults. Tested on Internal and Headscale only.
|
||||||
|
\end{itemize}
|
||||||
|
|
||||||
|
Table~\ref{tab:kernel_tuning_internal} shows how Internal responds
|
||||||
|
to the tuning. Both follow-up runs used the same impairment profiles
|
||||||
|
and hardware as the original 18.12.2025 run.
|
||||||
|
|
||||||
|
\begin{table}[H]
|
||||||
|
\centering
|
||||||
|
\caption{Internal (no VPN) throughput across three kernel
|
||||||
|
configurations. ``Default'' is the 18.12.2025 run with stock
|
||||||
|
Linux TCP parameters.}
|
||||||
|
\label{tab:kernel_tuning_internal}
|
||||||
|
\begin{tabular}{llrrr}
|
||||||
|
\hline
|
||||||
|
\textbf{Metric} & \textbf{Profile} & \textbf{Default} &
|
||||||
|
\textbf{Full gVisor} & \textbf{Reorder-only} \\
|
||||||
|
\hline
|
||||||
|
Single TCP (Mbps) & Baseline & 934 & 934 & 934 \\
|
||||||
|
Single TCP (Mbps) & Low & 333 & 363 & 354 \\
|
||||||
|
Single TCP (Mbps) & Medium & 29.6 & 64.2 & 72.7 \\
|
||||||
|
Parallel TCP (Mbps) & Low & 277 & 893 & 902 \\
|
||||||
|
Parallel TCP (Mbps) & Medium & 82.6 & 226 & 211 \\
|
||||||
|
Retransmit \% & Medium & ${\sim}$2.4 & 1.21 & 1.11 \\
|
||||||
|
Nix cache (s) & Medium & 58.6 & 29.7 & 29.1 \\
|
||||||
|
\hline
|
||||||
|
\end{tabular}
|
||||||
|
\end{table}
|
||||||
|
|
||||||
|
% TODO: Needs to be created
|
||||||
|
% PLOT: grouped bar chart
|
||||||
|
% File: Figures/impairment/kernel-tuning-internal-throughput.png
|
||||||
|
% Data: Internal single-stream TCP at baseline/low/medium across
|
||||||
|
% original, full gVisor, and reorder-only configurations
|
||||||
|
% Show: Dramatic jump at medium (29.6 -> 64.2 -> 72.7 Mbps);
|
||||||
|
% baseline unchanged; modest improvement at low
|
||||||
|
% Y-axis: linear scale
|
||||||
|
|
||||||
|
Internal's Medium-impairment throughput jumps from 29.6 to
|
||||||
|
72.7\,Mbps --- a 146\% increase from a three-line sysctl change. The
|
||||||
|
retransmit percentage drops from ${\sim}$2.4\% to 1.11\%; most of the
|
||||||
|
original retransmissions were spurious. The Nix cache download at
|
||||||
|
Medium halves from 58.6\,s to 29.1\,s.
|
||||||
|
|
||||||
|
Parallel TCP sees an even larger gain. Internal at Low impairment
|
||||||
|
climbs from 277 to 902\,Mbps, a 226\% increase that now exceeds
|
||||||
|
Headscale's original 718\,Mbps. With six concurrent flows each
|
||||||
|
independently benefiting from the higher reordering threshold, the
|
||||||
|
aggregate improvement compounds.
|
||||||
|
|
||||||
|
The anomaly reverses. At every impairment level and benchmark, tuned
|
||||||
|
Internal now meets or exceeds Headscale. At Medium impairment:
|
||||||
|
Internal 72.7\,Mbps vs.\ Headscale 50.1\,Mbps (Internal 45\% ahead),
|
||||||
|
where the original result had Headscale 40\% ahead. The Nix cache
|
||||||
|
flips too: Internal completes in 29.1\,s vs.\ Headscale's 36.3\,s,
|
||||||
|
where the original had Headscale 17\% faster.
|
||||||
|
|
||||||
|
% TODO: Needs to be created
|
||||||
|
% PLOT: before/after comparison
|
||||||
|
% File: Figures/impairment/headscale-gap-reversal.png
|
||||||
|
% Data: Internal vs Headscale throughput ratio at each impairment
|
||||||
|
% level, original vs tuned (reorder-only)
|
||||||
|
% Show: The crossover from "Headscale wins" (ratio < 1) to "Internal
|
||||||
|
% wins" (ratio > 1) at medium impairment after tuning
|
||||||
|
% Y-axis: ratio (Internal / Headscale), 1.0 as break-even
|
||||||
|
|
||||||
|
The reorder-only configuration (06.03) matches or exceeds the full
|
||||||
|
gVisor configuration (27.02) at most metrics; the two exceptions are
|
||||||
|
single-stream TCP at Low (354 vs.\ 363\,Mbps) and parallel TCP at
|
||||||
|
Medium (211 vs.\ 226\,Mbps), both within 7\%. Internal
|
||||||
|
reaches 72.7\,Mbps at Medium with reorder-only vs.\ 64.2\,Mbps with
|
||||||
|
full gVisor. The enlarged buffer sizes are unnecessary and may
|
||||||
|
introduce mild buffer bloat that partially offsets the reordering
|
||||||
|
benefit. The entire Headscale advantage is explained by three kernel
|
||||||
|
parameters: \texttt{tcp\_reordering}, \texttt{tcp\_recovery}, and
|
||||||
|
\texttt{tcp\_early\_retrans}.
|
||||||
|
|
||||||
|
Other VPNs benefit less from the kernel tuning. WireGuard's Medium
|
||||||
|
throughput rises from 8.77 to 12.2\,Mbps (+39\%) and Tinc's from
|
||||||
|
5.53 to 11.5\,Mbps (+108\%). ZeroTier shows no change (12.0 to
|
||||||
|
11.5\,Mbps). The tuning helps the kernel TCP stack, but VPNs that
|
||||||
|
add their own encapsulation overhead and userspace processing have
|
||||||
|
independent bottlenecks that the sysctl parameters cannot remove.
|
||||||
|
|
||||||
|
Headscale itself gets modestly faster with kernel tuning (+21\% at
|
||||||
|
Medium) but slightly slower at Low impairment ($-$5\%). Its
|
||||||
|
userspace gVisor stack already optimizes for reordering tolerance.
|
||||||
|
When the kernel stack also increases its tolerance, the two layers of
|
||||||
|
tuning may interact suboptimally --- both independently delay
|
||||||
|
retransmits, which can cause compound delays on the
|
||||||
|
kernel-to-Headscale socket path.
|
||||||
|
|
||||||
\section{Source Code Analysis}
|
\section{Source Code Analysis}
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user