diff --git a/Chapters/Results.tex b/Chapters/Results.tex index 9295c00..297c227 100644 --- a/Chapters/Results.tex +++ b/Chapters/Results.tex @@ -552,9 +552,9 @@ than per-connection latency. Several rankings invert relative to raw throughput. ZeroTier finishes faster than WireGuard (9.22\,s vs.\ 9.45\,s) despite -30\,\% fewer raw Mbps and 1\,000$\times$ more retransmits. Yggdrasil +6\,\% fewer raw Mbps and 1\,000$\times$ more retransmits. Yggdrasil is the clearest example: it has the -third-highest throughput at 795\,Mbps, yet lands at 24\,\% overhead +fourth-highest VPN throughput at 795\,Mbps, yet lands at 24\,\% overhead because its 2.2\,ms latency adds up over the many small sequential HTTP requests that constitute a Nix cache download. @@ -815,57 +815,603 @@ what would otherwise be idle gaps in any individual flow, squeezing out throughput that no single stream could reach alone. \section{Impact of Network Impairment} +\label{sec:impairment} -This section examines how each VPN responds to the Low, Medium, and -High impairment profiles defined in Chapter~\ref{Methodology}. +The impairment profiles from Table~\ref{tab:impairment_profiles} are +applied to the full benchmark suite. Baseline results from +Section~\ref{sec:baseline} serve as the reference. \subsection{Ping} -% RTT and packet loss across impairment profiles. +Table~\ref{tab:ping_impairment} lists average round-trip times across +all four profiles. Most VPNs track the expected increase closely: +tc~netem adds roughly 4\,ms, 8\,ms, and 15\,ms of round-trip delay +at Low, Medium, and High respectively, and Internal's measured values +(4.82, 9.38, 15.49\,ms) confirm this. + +\begin{table}[H] + \centering + \caption{Average ping RTT (ms) across impairment profiles, sorted + by High-profile RTT} + \label{tab:ping_impairment} + \begin{tabular}{lrrrr} + \hline + \textbf{VPN} & \textbf{Baseline} & \textbf{Low} & + \textbf{Medium} & \textbf{High} \\ + \hline + Internal & 0.60 & 4.82 & 9.38 & 15.49 \\ + Tinc & 1.19 & 5.32 & 9.85 & 15.92 \\ + Nebula & 1.25 & 5.38 & 9.99 & 15.96 \\ + WireGuard & 1.20 & 5.36 & 9.88 & 15.99 \\ + Headscale & 1.64 & 5.82 & 10.39 & 16.07 \\ + VpnCloud & 1.13 & 5.41 & 10.35 & 16.21 \\ + ZeroTier & 1.28 & 5.34 & 10.02 & 16.54 \\ + Yggdrasil & 2.20 & 6.73 & 11.99 & 20.20 \\ + Hyprspace & 1.79 & 6.15 & 10.76 & 24.49 \\ + EasyTier & 1.33 & 6.27 & 14.13 & 26.60 \\ + Mycelium & 34.90 & 23.42 & 43.88 & 33.05 \\ + \hline + \end{tabular} +\end{table} + +% PLOT: line chart +% File: Figures/impairment/Ping Average RTT Heatmap.png +% Data: Average ping RTT for all 11 VPNs at baseline, low, medium, high +% Show: Most VPNs in a tight parallel band; Mycelium's non-monotonic curve; +% EasyTier and Hyprspace diverging upward at high impairment + +Mycelium defies the pattern. Its RTT \emph{drops} from 34.9\,ms at +baseline to 23.4\,ms at Low impairment, a 33\% improvement where +every other VPN gets slower. It then rises to 43.9\,ms at Medium +before falling again to 33.0\,ms at High. The baseline analysis +(Section~\ref{sec:mycelium_routing}) showed that Mycelium's latency +comes from a bimodal routing distribution: one path runs at 1.63\,ms +while two others route through the global overlay at +${\sim}$51\,ms. The impairment appears to push Mycelium's path +discovery toward shorter routes, so a larger share of traffic takes +the direct path. The non-monotonic pattern is consistent with a path +selection algorithm that responds to measured link quality, but not +linearly with degradation severity. + +Mycelium also achieves 0\% ping packet loss at Low and Medium +impairment, while most VPNs show 0.1--3.2\% loss at those profiles. +At High impairment, Mycelium's loss jumps to 11.1\%. + +EasyTier accumulates 11\,ms of excess latency at High impairment +beyond what tc~netem accounts for. Its average RTT of 26.6\,ms and +maximum of 290\,ms (vs.\ ${\sim}$40\,ms for WireGuard) point to a +userspace scheduling or retry mechanism that introduces escalating +variance. EasyTier's RTT standard deviation reaches 44.6\,ms at +High, the worst jitter of any VPN. + +Hyprspace shows 11.1\% ping packet loss at every impairment level --- +Low, Medium, and High alike. With 9~measurement runs (3~machine +pairs $\times$ 3~runs of 100~packets), 11.1\% equals exactly 1/9: +one run per profile fails completely while the other eight report zero +loss. This binary pass/fail behavior is consistent with the buffer +bloat diagnosis from Section~\ref{sec:hyprspace_bloat}. When buffers +fill, an entire path stalls rather than degrading gradually. \subsection{TCP Throughput} -% TCP iperf3: throughput, retransmits, congestion window. +Table~\ref{tab:tcp_impairment} presents single-stream TCP throughput +across all four profiles. The baseline performance tiers from +Section~\ref{sec:baseline} dissolve almost immediately under +impairment. + +\begin{table}[H] + \centering + \caption{Single-stream TCP throughput (Mbps) across impairment + profiles, sorted by baseline. Retention is the + Low-to-baseline ratio.} + \label{tab:tcp_impairment} + \begin{tabular}{lrrrrr} + \hline + \textbf{VPN} & \textbf{Baseline} & \textbf{Low} & + \textbf{Medium} & \textbf{High} & \textbf{Retention} \\ + \hline + Internal & 934 & 333 & 29.6 & 4.25 & 35.7\% \\ + WireGuard & 864 & 54.7 & 8.77 & 2.63 & 6.3\% \\ + ZeroTier & 814 & 63.7 & 12.0 & 4.01 & 7.8\% \\ + Headscale & 800 & 274 & 41.5 & 4.21 & 34.3\% \\ + Yggdrasil & 795 & 13.2 & 6.08 & 3.40 & 1.7\% \\ + \hline + Nebula & 706 & 49.8 & 7.82 & 2.60 & 7.1\% \\ + EasyTier & 636 & 156 & 17.4 & 3.59 & 24.6\% \\ + VpnCloud & 539 & 58.2 & 8.33 & 1.86 & 10.8\% \\ + \hline + Hyprspace & 368 & 4.42 & 2.05 & 1.39 & 1.2\% \\ + Tinc & 336 & 54.4 & 5.53 & 2.77 & 16.2\% \\ + Mycelium & 259 & 16.2 & 3.87 & 2.73 & 6.3\% \\ + \hline + \end{tabular} +\end{table} + +% PLOT: line chart +% File: Figures/impairment/TCP Throughput Heatmap.png +% Data: Single-stream TCP throughput for all 11 VPNs at baseline, +% low, medium, high +% Show: Headscale crossing above Internal at medium impairment; +% Yggdrasil's cliff from baseline to low; convergence of all +% VPNs at high impairment + +Yggdrasil crashes from 795\,Mbps to 13.2\,Mbps at Low impairment, a +98.3\% throughput loss from adding just 2\,ms latency, 2\,ms jitter, +0.25\% packet loss, and 0.5\% reordering per machine. Even Mycelium, +the slowest VPN at baseline (259\,Mbps), retains more throughput at +Low than Yggdrasil does. The jumbo overlay MTU of 32\,731~bytes, +which inflated baseline metrics +(Section~\ref{sec:baseline}), becomes a liability under impairment: +each lost or reordered outer packet triggers retransmission of +${\sim}$24$\times$ more inner-layer data than a standard +1\,400-byte MTU VPN would lose. + +Headscale retains 34.3\% of its baseline throughput at Low, nearly +matching Internal's 35.7\%. At Medium impairment, Headscale +(41.5\,Mbps) overtakes Internal (29.6\,Mbps) --- a VPN outperforming +the bare-metal baseline. +Section~\ref{sec:tailscale_degraded} investigates this anomaly in +detail. + +At High impairment, the throughput range compresses from 675\,Mbps at +baseline to just 2.9\,Mbps. Internal leads at 4.25\,Mbps; Hyprspace +trails at 1.39\,Mbps. The impairment profile itself becomes the +bottleneck. With 2.5\% packet loss and 5\% reordering per machine, +every implementation is TCP-loss-limited, and architectural +differences that matter at gigabit speeds become irrelevant. \subsection{UDP Throughput} -% UDP iperf3: throughput, jitter, packet loss. +The UDP stress test (\texttt{-b~0}) suffers from widespread failures +under impairment. Hyprspace and Mycelium, which already failed at +baseline, continue to fail at all profiles. Tinc and ZeroTier fail +at most non-baseline profiles. The sparse dataset limits +conclusions, but one pattern stands out. + +Kernel-level implementations maintain throughput regardless of +impairment. Internal holds ${\sim}$950\,Mbps across all profiles +where data exists. Headscale sustains 700--876\,Mbps and WireGuard +850--908\,Mbps; % TODO: verify WireGuard UDP range -- analysis doc says 850-898, possible digit transposition +both rely on WireGuard's in-kernel UDP handling with +proper backpressure. Userspace VPNs collapse: EasyTier drops from +865 to 435 to 38.5 to 6.1\,Mbps across successive profiles. +Yggdrasil, already pathological at baseline (98.7\% loss), crashes to +12.3\,Mbps at Low and fails entirely at Medium and High. + +% PLOT: heatmap +% File: Figures/impairment/UDP Receiver Throughput Heatmap.png +% Data: UDP receiver throughput for all 11 VPNs at baseline, low, +% medium, high (grey/hatched cells for failures) +% Show: Kernel-level VPNs (Internal, WireGuard, Headscale) maintaining +% high throughput across all profiles; userspace VPNs failing or +% collapsing; the large number of empty cells + + +The failure rate of this benchmark under impairment makes it more +useful as a robustness indicator than a throughput measurement. A VPN +that cannot complete a 30-second UDP flood under 0.25\% packet loss +has fundamental flow-control problems that will surface under real +workloads too, even if the symptoms are milder. \subsection{Parallel TCP} -% Parallel iperf3: throughput under contention (A->B, B->C, C->A). +Table~\ref{tab:parallel_impairment} shows aggregate throughput across +three concurrent bidirectional links (six unidirectional flows). The +Headscale anomaly from the single-stream results is amplified here. + +\begin{table}[H] + \centering + \caption{Parallel TCP throughput (Mbps) across impairment profiles. + Three concurrent bidirectional links produce six unidirectional + flows.} + \label{tab:parallel_impairment} + \begin{tabular}{lrrrr} + \hline + \textbf{VPN} & \textbf{Baseline} & \textbf{Low} & + \textbf{Medium} & \textbf{High} \\ + \hline + Internal & 1398 & 277 & 82.6 & 10.4 \\ + Headscale & 1228 & 718 & 113 & 20.0 \\ + WireGuard & 1281 & 173 & 24.5 & 8.39 \\ + Yggdrasil & 1265 & 38.7 & 16.7 & 8.95 \\ + ZeroTier & 1206 & 176 & 35.4 & 7.97 \\ + EasyTier & 927 & 473 & 57.4 & 10.7 \\ + Hyprspace & 803 & 2.87 & 6.94 & 3.62 \\ + VpnCloud & 763 & 174 & 23.7 & 8.25 \\ + Nebula & 648 & 103 & 15.3 & 4.93 \\ + Mycelium & 569 & 72.7 & 7.51 & 3.69 \\ + Tinc & 563 & 168 & 23.7 & 8.25 \\ + \hline + \end{tabular} +\end{table} + +% PLOT: heatmap +% File: Figures/impairment/Parallel TCP Throughput Heatmap.png +% Data: Parallel TCP throughput for all 11 VPNs at baseline, low, +% medium, high +% Show: Headscale dominating at low impairment (718 Mbps vs Internal's +% 277); EasyTier as runner-up (473 Mbps); Hyprspace's collapse +% to 2.87 Mbps + + +Headscale at Low impairment: 718\,Mbps --- 2.6$\times$ Internal +(277\,Mbps) and 4.1$\times$ WireGuard (173\,Mbps). At Medium, +Headscale (113\,Mbps) still leads Internal (82.6\,Mbps) by 37\%. +The single-stream anomaly from +Section~\ref{sec:tailscale_degraded} compounds when multiple flows +each independently benefit from Headscale's congestion control +tuning. + +EasyTier is the second-most resilient VPN under parallel load, at +473\,Mbps at Low (51\% of baseline). Both EasyTier and Headscale +retain more than half their baseline parallel throughput at Low +impairment; no other VPN exceeds 30\%. + +Hyprspace collapses from 803\,Mbps to 2.87\,Mbps at Low, a 99.6\% +loss. The buffer bloat that plagues single-stream transfers +(Section~\ref{sec:hyprspace_bloat}) becomes catastrophic when six +concurrent flows compete for the same bloated buffers. + +The High-profile convergence effect is even more pronounced here than +in single-stream mode. Tinc and VpnCloud land at identical +8.25\,Mbps despite differing by 200\,Mbps at baseline. \subsection{QUIC Performance} -% qperf: bandwidth, TTFB, connection establishment time. +Headscale and Nebula failed the qperf QUIC benchmark at baseline +(Section~\ref{sec:baseline}) and continue to fail across all +impairment profiles. + +Yggdrasil's QUIC bandwidth drops from 745\,Mbps at baseline to +7.67\,Mbps at Low, 3.45\,Mbps at Medium, and 2.17\,Mbps at High --- +the same cliff observed in its TCP results, driven by the same +jumbo-MTU amplification of outer-layer packet loss. + +At High impairment, WireGuard (23.2\,Mbps), VpnCloud (23.4\,Mbps), +ZeroTier (23.0\,Mbps), and Tinc (23.4\,Mbps) converge to within +0.4\,Mbps of each other. At baseline these four span a 500\,Mbps +range. QUIC's own congestion control, operating atop the +already-degraded outer link, becomes the sole limiter. + +% PLOT: heatmap +% File: Figures/impairment/QUIC Bandwidth Heatmap.png +% Data: QPerf QUIC bandwidth for VPNs with data at all four profiles +% (WireGuard, VpnCloud, ZeroTier, Tinc, Yggdrasil, Internal) +% Show: Yggdrasil's cliff from baseline to low; convergence of +% WireGuard, VpnCloud, ZeroTier, Tinc at high (~23 Mbps) + \subsection{Video Streaming} -% RIST: bitrate, dropped frames, packets recovered, quality score. +Table~\ref{tab:rist_impairment} presents RIST video quality scores +across profiles. The actual encoding bitrate of ${\sim}$3.3\,Mbps +sits well within every VPN's throughput budget even at High +impairment, so quality differences reflect packet delivery reliability +rather than bandwidth limits. + +\begin{table}[H] + \centering + \caption{RIST video streaming quality (\%) across impairment + profiles, sorted by High-profile quality} + \label{tab:rist_impairment} + \begin{tabular}{lrrrr} + \hline + \textbf{VPN} & \textbf{Baseline} & \textbf{Low} & + \textbf{Medium} & \textbf{High} \\ + \hline + Mycelium & 100.0 & 100.0 & 100.0 & 99.9 \\ + EasyTier & 100.0 & 100.0 & 96.2 & 85.5 \\ + Internal & 100.0 & 99.2 & 89.3 & 80.2 \\ + ZeroTier & 100.0 & 99.3 & 89.9 & 80.2 \\ + VpnCloud & 100.0 & 99.2 & 89.7 & 80.1 \\ + WireGuard & 100.0 & 99.3 & 90.0 & 80.0 \\ + Hyprspace & 100.0 & 92.9 & 87.9 & 78.1 \\ + Tinc & 100.0 & 99.3 & 90.0 & 77.8 \\ + Nebula & 99.8 & 98.8 & 85.6 & 72.1 \\ + Yggdrasil & 100.0 & 94.7 & 71.4 & 43.3 \\ + Headscale & 13.1 & 13.0 & 13.0 & 13.0 \\ + \hline + \end{tabular} +\end{table} + +% PLOT: heatmap +% File: Figures/impairment/Video Streaming Quality Heatmap.png +% Data: RIST quality for all 11 VPNs at baseline, low, medium, high +% Show: Headscale stuck at 13% (red row); Mycelium stuck near 100% +% (green row); gradual degradation for the bulk; Yggdrasil's +% steep decline to 43% + + +Headscale stays at ${\sim}$13\% across all four profiles: 13.1\%, +13.0\%, 13.0\%, 13.0\%. The profile-independence confirms the +baseline diagnosis from Section~\ref{sec:baseline}. The failure is +structural --- likely MTU fragmentation in the DERP relay layer --- +and cannot worsen because it is already saturated. Adding latency or +loss on top of an 87\% packet drop floor changes nothing. + +Mycelium delivers 99.9\% quality even at High impairment, better than +Internal (80.2\%) and every other VPN. At 3.3\,Mbps, even +Mycelium's degraded overlay paths can sustain the stream. Its +overlay retransmission mechanism, which cripples bulk TCP transfers, +works well for steady low-bandwidth UDP flows. RIST's own forward +error correction handles whatever Mycelium's retransmissions miss. + +Yggdrasil degrades the most steeply: 100\% at baseline, 94.7\% at +Low, 71.4\% at Medium, 43.3\% at High. The jumbo MTU that hurt TCP +throughput also hurts here --- large overlay packets carrying RIST +data are more likely to be lost or reordered at the outer layer, and +RIST's FEC cannot recover from the resulting burst losses. \subsection{Application-Level Download} -% Nix cache: download duration for Firefox package. +Table~\ref{tab:nix_impairment} shows Nix binary cache download times +across profiles. This HTTP-heavy workload, dominated by many +short-lived TCP connections, is more sensitive to per-connection +latency than to raw bandwidth. + +\begin{table}[H] + \centering + \caption{Nix binary cache download time (seconds) across impairment + profiles, sorted by Low-profile time. ``--'' marks a failed + run.} + \label{tab:nix_impairment} + \begin{tabular}{lrrrr} + \hline + \textbf{VPN} & \textbf{Baseline} & \textbf{Low} & + \textbf{Medium} & \textbf{High} \\ + \hline + Internal & 8.53 & 11.9 & 58.6 & -- \\ + Headscale & 9.79 & 13.5 & 48.8 & 219 \\ + EasyTier & 9.39 & 22.1 & 141 & -- \\ + VpnCloud & 9.39 & 27.9 & 163 & -- \\ + WireGuard & 9.45 & 28.8 & 161 & -- \\ + Nebula & 9.15 & 30.8 & 180 & 547 \\ + Tinc & 10.0 & 30.9 & 166 & 496 \\ + ZeroTier & 9.22 & 36.2 & 141 & -- \\ + Mycelium & 10.1 & 79.5 & -- & -- \\ + Yggdrasil & 10.6 & 230 & -- & -- \\ + Hyprspace & 11.9 & -- & 170 & -- \\ + \hline + \end{tabular} +\end{table} + +% PLOT: heatmap +% File: Figures/impairment/Nix Cache Download Time Heatmap.png +% Data: Nix cache download time for all VPNs at baseline, low, medium, +% high (hatched/absent bars for failures) +% Show: Headscale as the only VPN completing all four profiles; +% Headscale beating Internal at medium (48.8 vs 58.6 s); +% Yggdrasil's 22x slowdown at low impairment + + +Headscale is the only VPN to complete all four profiles. At Medium +impairment, it finishes in 48.8~seconds --- faster than Internal's +58.6~seconds. Internal itself fails at High impairment while +Headscale completes in 219~seconds. Only Nebula (547\,s) and Tinc +(496\,s) also survive High impairment. + +Yggdrasil's download time explodes from 10.6\,s to 230\,s at Low +impairment, a 22$\times$ slowdown. Every HTTP request incurs the +latency penalty from Yggdrasil's impairment-amplified +retransmissions. Mycelium also degrades severely (10.1\,s to +79.5\,s, an 8$\times$ increase), consistent with its overlay routing +overhead, which compounds over hundreds of sequential HTTP +connections. + +The failure map reveals a clean gradient: more demanding profiles +knock out more VPNs. At Low, 10 of 11 complete (Hyprspace fails). +At Medium, 9 complete. At High, only 3 survive (Headscale, Nebula, +Tinc). Internal's failure at High is the most surprising --- the +bare-metal baseline cannot sustain a multi-connection HTTP workload +under severe degradation, but Headscale, shielded by its userspace +TCP stack, can. Section~\ref{sec:tailscale_degraded} explains why. \section{Tailscale Under Degraded Conditions} - -% The central finding: Tailscale outperforming the raw Linux -% networking stack under impairment. +\label{sec:tailscale_degraded} \subsection{Observed Anomaly} -% Present the data showing Tailscale exceeding internal baseline -% throughput under Medium/High impairment. +At Medium impairment, Headscale delivers 41.5\,Mbps single-stream TCP +throughput --- 40\% more than Internal's 29.6\,Mbps. A VPN built +atop WireGuard outperforms the bare-metal connection it tunnels +through. The anomaly is consistent across benchmarks: +Table~\ref{tab:headscale_anomaly} summarizes the comparison. + +\begin{table}[H] + \centering + \caption{Headscale vs.\ Internal vs.\ WireGuard under impairment + (18.12.2025 run). For TCP benchmarks, higher is better. For + Nix cache, lower is better; ``--'' marks a failed run.} + \label{tab:headscale_anomaly} + \begin{tabular}{llrrr} + \hline + \textbf{Benchmark} & \textbf{Profile} & \textbf{Internal} & + \textbf{Headscale} & \textbf{WireGuard} \\ + \hline + Single TCP (Mbps) & Low & 333 & 274 & 54.7 \\ + Single TCP (Mbps) & Medium & 29.6 & 41.5 & 8.77 \\ + Single TCP (Mbps) & High & 4.25 & 4.21 & 2.63 \\ + Parallel TCP (Mbps) & Low & 277 & 718 & 173 \\ + Parallel TCP (Mbps) & Medium & 82.6 & 113 & 24.5 \\ + Nix cache (s) & Medium & 58.6 & 48.8 & 161 \\ + Nix cache (s) & High & -- & 219 & -- \\ + \hline + \end{tabular} +\end{table} + +% TODO: Needs to be created, use the tools/ folder +% PLOT: line chart +% File: Figures/impairment/headscale-vs-internal-across-profiles.png +% Data: Single-stream TCP throughput for Internal, Headscale, and +% WireGuard across all four profiles +% Show: Headscale crossing above Internal at medium impairment; +% WireGuard far below both; convergence at high +% Y-axis: log scale + +In parallel TCP at Low impairment, Headscale reaches 718\,Mbps vs.\ +Internal's 277\,Mbps (2.6$\times$). The Nix cache download at +Medium takes Headscale 48.8\,s vs.\ Internal's 58.6\,s (17\% +faster). At High impairment, Internal fails the Nix cache entirely +while Headscale completes in 219\,s. + +WireGuard, which shares Headscale's cryptographic layer, shows no +such advantage: 54.7\,Mbps at Low, 8.77\,Mbps at Medium. Whatever +protects Headscale is not the encryption or the tunnel --- it is +something in Tailscale's userspace networking stack. + +The retransmit data provides the first clue. At Medium impairment, +Headscale's retransmit percentage is approximately 2.4\%, matching +Internal's ${\sim}$2.4\%. WireGuard's is 5.2\%. Headscale achieves +Internal's retransmit efficiency while delivering higher throughput +--- fewer spurious retransmissions leave more bandwidth for actual +data. \subsection{Congestion Control Analysis} -% Reno vs CUBIC, RACK disabled to avoid spurious retransmits -% under reordering. +Tailscale uses a userspace TCP/IP stack derived from Google's gVisor +(netstack). This stack does not inherit the host kernel's TCP +parameters. Three defaults differ from the Linux kernel in ways that +matter under packet reordering: + +\begin{itemize} + \bitem{\texttt{tcp\_reordering}:} gVisor uses 10; the Linux kernel + defaults to~3. This parameter controls how many out-of-order + packets TCP tolerates before treating the event as a loss. With + tc~netem injecting 0.5--2.5\% reordering per machine, bursts of + 3+ reordered packets are frequent. The kernel's threshold of~3 + causes spurious fast retransmits and congestion window reductions + for packets that are merely reordered, not lost. + \bitem{\texttt{tcp\_recovery} (RACK):} gVisor disables it; the + Linux kernel enables it by default. RACK uses timing-based loss + detection that is more aggressive than the pure sequence-based + approach gVisor uses. Under reordering, RACK's timing heuristics + can falsely classify delayed packets as lost. + \bitem{\texttt{tcp\_early\_retrans} (TLP):} gVisor disables it; the + kernel enables it. Tail Loss Probe sends speculative retransmits + on idle connections, which can worsen congestion when the link is + already impaired. +\end{itemize} + +The combined effect: under network conditions with packet reordering, +the default Linux TCP stack fires retransmits and cuts the congestion +window far more often than necessary. Each false positive shrinks the +window and reduces throughput. Tailscale's gVisor stack tolerates +more reordering before reacting, so its congestion window stays larger +and throughput stays higher. + +This explains why the anomaly grows with impairment severity. At +baseline, there is no reordering, so the threshold difference is +irrelevant and Internal's kernel-level processing advantage dominates. +As reordering increases from 0.5\% (Low) to 2.5\% (Medium) per +machine, the kernel's aggressive loss detection fires more often, and +the throughput gap shifts in Headscale's favor. \subsection{Tuned Kernel Parameters} -% Re-run results with tuned buffer sizes and congestion control -% on the internal baseline, showing the gap closes. +Two follow-up benchmark runs applied Tailscale's gVisor TCP +parameters to the host kernel via sysctl: + +\begin{itemize} + \bitem{Full gVisor (27.02.2026):} All parameters --- + \texttt{tcp\_reordering=10}, \texttt{tcp\_recovery=0}, + \texttt{tcp\_early\_retrans=0}, plus enlarged buffer sizes + (\texttt{tcp\_rmem}, \texttt{tcp\_wmem}, \texttt{rmem\_max}, + \texttt{wmem\_max}). Tested on Internal, Headscale, WireGuard, + Tinc, and ZeroTier. + \bitem{Reorder-only (06.03.2026):} Only + \texttt{tcp\_reordering=10}, \texttt{tcp\_recovery=0}, and + \texttt{tcp\_early\_retrans=0}. Buffer sizes left at kernel + defaults. Tested on Internal and Headscale only. +\end{itemize} + +Table~\ref{tab:kernel_tuning_internal} shows how Internal responds +to the tuning. Both follow-up runs used the same impairment profiles +and hardware as the original 18.12.2025 run. + +\begin{table}[H] + \centering + \caption{Internal (no VPN) throughput across three kernel + configurations. ``Default'' is the 18.12.2025 run with stock + Linux TCP parameters.} + \label{tab:kernel_tuning_internal} + \begin{tabular}{llrrr} + \hline + \textbf{Metric} & \textbf{Profile} & \textbf{Default} & + \textbf{Full gVisor} & \textbf{Reorder-only} \\ + \hline + Single TCP (Mbps) & Baseline & 934 & 934 & 934 \\ + Single TCP (Mbps) & Low & 333 & 363 & 354 \\ + Single TCP (Mbps) & Medium & 29.6 & 64.2 & 72.7 \\ + Parallel TCP (Mbps) & Low & 277 & 893 & 902 \\ + Parallel TCP (Mbps) & Medium & 82.6 & 226 & 211 \\ + Retransmit \% & Medium & ${\sim}$2.4 & 1.21 & 1.11 \\ + Nix cache (s) & Medium & 58.6 & 29.7 & 29.1 \\ + \hline + \end{tabular} +\end{table} + +% TODO: Needs to be created +% PLOT: grouped bar chart +% File: Figures/impairment/kernel-tuning-internal-throughput.png +% Data: Internal single-stream TCP at baseline/low/medium across +% original, full gVisor, and reorder-only configurations +% Show: Dramatic jump at medium (29.6 -> 64.2 -> 72.7 Mbps); +% baseline unchanged; modest improvement at low +% Y-axis: linear scale + +Internal's Medium-impairment throughput jumps from 29.6 to +72.7\,Mbps --- a 146\% increase from a three-line sysctl change. The +retransmit percentage drops from ${\sim}$2.4\% to 1.11\%; most of the +original retransmissions were spurious. The Nix cache download at +Medium halves from 58.6\,s to 29.1\,s. + +Parallel TCP sees an even larger gain. Internal at Low impairment +climbs from 277 to 902\,Mbps, a 226\% increase that now exceeds +Headscale's original 718\,Mbps. With six concurrent flows each +independently benefiting from the higher reordering threshold, the +aggregate improvement compounds. + +The anomaly reverses. At every impairment level and benchmark, tuned +Internal now meets or exceeds Headscale. At Medium impairment: +Internal 72.7\,Mbps vs.\ Headscale 50.1\,Mbps (Internal 45\% ahead), +where the original result had Headscale 40\% ahead. The Nix cache +flips too: Internal completes in 29.1\,s vs.\ Headscale's 36.3\,s, +where the original had Headscale 17\% faster. + +% TODO: Needs to be created +% PLOT: before/after comparison +% File: Figures/impairment/headscale-gap-reversal.png +% Data: Internal vs Headscale throughput ratio at each impairment +% level, original vs tuned (reorder-only) +% Show: The crossover from "Headscale wins" (ratio < 1) to "Internal +% wins" (ratio > 1) at medium impairment after tuning +% Y-axis: ratio (Internal / Headscale), 1.0 as break-even + +The reorder-only configuration (06.03) matches or exceeds the full +gVisor configuration (27.02) at most metrics; the two exceptions are +single-stream TCP at Low (354 vs.\ 363\,Mbps) and parallel TCP at +Medium (211 vs.\ 226\,Mbps), both within 7\%. Internal +reaches 72.7\,Mbps at Medium with reorder-only vs.\ 64.2\,Mbps with +full gVisor. The enlarged buffer sizes are unnecessary and may +introduce mild buffer bloat that partially offsets the reordering +benefit. The entire Headscale advantage is explained by three kernel +parameters: \texttt{tcp\_reordering}, \texttt{tcp\_recovery}, and +\texttt{tcp\_early\_retrans}. + +Other VPNs benefit less from the kernel tuning. WireGuard's Medium +throughput rises from 8.77 to 12.2\,Mbps (+39\%) and Tinc's from +5.53 to 11.5\,Mbps (+108\%). ZeroTier shows no change (12.0 to +11.5\,Mbps). The tuning helps the kernel TCP stack, but VPNs that +add their own encapsulation overhead and userspace processing have +independent bottlenecks that the sysctl parameters cannot remove. + +Headscale itself gets modestly faster with kernel tuning (+21\% at +Medium) but slightly slower at Low impairment ($-$5\%). Its +userspace gVisor stack already optimizes for reordering tolerance. +When the kernel stack also increases its tolerance, the two layers of +tuning may interact suboptimally --- both independently delay +retransmits, which can cause compound delays on the +kernel-to-Headscale socket path. \section{Source Code Analysis}