diff --git a/Chapters/Results.tex b/Chapters/Results.tex
index 981711c..bfa248e 100644
--- a/Chapters/Results.tex
+++ b/Chapters/Results.tex
@@ -31,6 +31,24 @@ latency of just
 an entire 30-second test.  Mycelium sits at the other extreme, adding
 34.9\,ms of latency, roughly 58$\times$ the bare-metal figure.
 
+A note on naming: ``Headscale'' in every table and figure of this
+chapter labels the test scenario in which the Tailscale client
+(\texttt{tailscaled}) connects to a self-hosted Headscale control
+server.  The data plane is therefore the Tailscale client built on
+\texttt{wireguard-go}, not the Headscale binary itself, which is
+only a control-plane server.  The test rig launches
+\texttt{tailscaled} via the NixOS \texttt{services.tailscale}
+module with \texttt{interfaceName = "ts-headscale"}, which
+translates to \texttt{--tun ts-headscale}; this means the Tailscale
+client uses a real kernel TUN device and the host kernel's TCP/IP
+stack handles every tunneled packet.  The alternate
+\texttt{--tun=userspace-networking} mode, in which gVisor netstack
+terminates tunneled TCP inside the \texttt{tailscaled} process, is
+\emph{not} engaged in any of the benchmarks reported here.
+Statements below about ``Headscale'' running \texttt{wireguard-go}
+should be read as statements about the Tailscale client in this
+scenario.
+
 \subsection{Test Execution Overview}
 
 Running the full baseline suite across all ten VPNs and the internal
@@ -127,16 +145,15 @@ ZeroTier, for instance, reaches 814\,Mbps but accumulates
 1\,163~retransmits per test, over 1\,000$\times$ what WireGuard
 needs.  ZeroTier compensates for tunnel-internal packet loss by
 repeatedly triggering TCP congestion-control recovery, whereas
-WireGuard delivers data with negligible in-tunnel loss.  Across all VPNs,
-retransmit behaviour falls into three groups: \emph{clean} ($<$110:
-WireGuard, Internal, Yggdrasil, Headscale), \emph{stressed}
-(200--900: Tinc, EasyTier, Mycelium, VpnCloud), and
-\emph{pathological} ($>$950: Nebula, ZeroTier, Hyprspace).
+WireGuard delivers data with negligible in-tunnel loss.  The
+bare-metal Internal reference sits at 1.7~retransmits per test —
+essentially noise — and the VPNs split into three groups around
+it: \emph{clean} ($<$110: WireGuard, Yggdrasil, Headscale),
+\emph{stressed} (200--900: Tinc, EasyTier, Mycelium, VpnCloud),
+and \emph{pathological} ($>$950: Nebula, ZeroTier, Hyprspace).
 
 % TODO: Is this naming scheme any good?
 
-% TODO: Fix TCP Throughput plot
-
 \begin{figure}[H]
   \centering
   \begin{subfigure}[t]{\textwidth}
@@ -171,10 +188,7 @@ WireGuard, Internal, Yggdrasil, Headscale), \emph{stressed}
 
 Retransmits have a direct mechanical relationship with TCP congestion
 control. Each retransmit triggers a reduction in the congestion window
-(\texttt{cwnd}), throttling the sender. % TODO: The text says "average congestion window" but
-% Figure~\ref{fig:retransmit_cwnd} plots "Max Congestion Window."
-% Use consistent terminology --- either change the text to "max" or
-% change the figure axis label.
+(\texttt{cwnd}), throttling the sender.
 This relationship is visible
 in Figure~\ref{fig:retransmit_correlations}: Hyprspace, with 4965
 retransmits, maintains the smallest max congestion window in the
@@ -200,17 +214,17 @@ in the dataset). This suggests significant in-tunnel packet loss or
 buffering at the VpnCloud layer that the retransmit count (857)
 alone does not fully explain.
 
-% TODO: Mycelium's 122--379 Mbps range is per-link asymmetry (different
-% overlay routing paths), not stochastic run-to-run variability.
-% Section~\ref{sec:mycelium_routing} confirms the same numbers as
-% per-link throughput.  Conflating link asymmetry with run-to-run
-% variance is misleading --- either separate the two or clarify that
-% Mycelium's spread comes from path selection, not randomness.
-Run-to-run variability also differs substantially. WireGuard ranges
-from 824 to 884\,Mbps (a 60\,Mbps window), while Mycelium ranges
-from 122 to 379\,Mbps, a 3:1 ratio between worst and best runs. A
-VPN with wide variance is harder to capacity-plan around than one
-with consistent performance, even if the average is lower.
+Variability — whether stochastic across runs or systematic across
+links — also differs substantially.  WireGuard's three link
+directions cluster tightly (824 to 884\,Mbps, a 60\,Mbps window),
+behaving almost identically.  Mycelium's three directions span
+122 to 379\,Mbps, a 3:1 ratio, but this is not run-to-run noise:
+Section~\ref{sec:mycelium_routing} shows the spread is per-link
+path-selection asymmetry, with one link finding a direct route and
+the other two routing through the global overlay.  Either way, a
+VPN whose throughput varies that widely across links is harder to
+capacity-plan around than one that delivers a consistent figure
+on every direction.
 
 \begin{figure}[H]
   \centering
@@ -306,22 +320,27 @@ keep up with the link speed.  The qperf benchmark backs this up: Tinc
 maxes out at
 14.9\,\% total system CPU while delivering just 336\,Mbps.
 % TODO: 14.9\% total CPU does not obviously indicate a bottleneck.
-% Clarify that this is whole-system utilization on a multi-core
-% machine, and that Tinc's single-threaded design means one core is
-% saturated while the rest are idle.  Also note that VpnCloud reports
-% the same 14.9\% yet achieves 539 Mbps --- explain why the same CPU
-% utilization yields different throughput (e.g., different per-packet
-% processing cost).
-On a multi-core system, the low percentage reflects a single
-saturated core, a clear sign that the CPU, not the network, is the
-bottleneck.
+% This is whole-system utilization on a multi-core machine, and a
+% single saturated core fits the budget — but VpnCloud reports the
+% same 14.9\% \emph{and} reaches 539\,Mbps, much more than Tinc.
+% The single-saturated-core story alone therefore cannot explain
+% the throughput gap; per-packet processing cost must differ
+% materially between the two.  Verify with per-thread CPU sampling
+% or eBPF profiling.
+On a multi-core system, this low percentage is consistent with a
+single saturated core (and Tinc is single-threaded), which would
+explain why the CPU rather than the network is the bottleneck.
+The story is incomplete, however: VpnCloud shows the same 14.9\,\%
+total system CPU yet delivers 539\,Mbps — 60\,\% more than Tinc —
+so a difference in per-packet processing cost between the two
+implementations must also be in play.
 Figure~\ref{fig:latency_throughput} makes this disconnect easy to
 spot.
 
 % TODO: These CPU numbers are stated inline but never shown in a plot
 % or table.  Add a CPU utilization figure or table so readers can
 % verify.  Also, the claim that WireGuard's CPU usage "goes to
-% cryptographic processing" is unsubstantiated --- no profiling data
+% cryptographic processing" is unsubstantiated: no profiling data
 % is presented.  Either add profiling evidence or soften to
 % "likely" / "presumably."
 The qperf measurements also reveal a wide spread in CPU usage.
@@ -329,12 +348,7 @@ Hyprspace (55.1\,\%) and Yggdrasil
 (52.8\,\%) consume 5--6$\times$ as much CPU as Internal's
 9.7\,\%.  WireGuard sits at 30.8\,\%, surprisingly high for a
 kernel-level implementation, presumably due to in-kernel
-cryptographic processing.  % TODO: "do the most with the least CPU time" is misleading ---
-% Tinc gets only 336 Mbps at 14.9% CPU (22.6 Mbps/%), while
-% WireGuard gets 864 Mbps at 30.8% (28 Mbps/%).  These three use
-% the least CPU but don't necessarily achieve the best throughput/CPU
-% ratio.  Rephrase to "use the least CPU" or calculate actual
-% efficiency ratios.
+cryptographic processing.
 On the efficient end, VpnCloud
 (14.9\,\%), Tinc (14.9\,\%), and EasyTier (15.4\,\%) use the least
 CPU time.  Nebula and Headscale are missing from
@@ -355,7 +369,8 @@ this comparison because qperf failed for both.
 
 \subsection{Parallel TCP Scaling}
 
-The single-stream benchmark tests one link direction at a time.  % TODO: The plot labels this benchmark "10-stream parallel" but this
+The single-stream benchmark tests one link direction at a time.  %
+% TODO: The plot labels this benchmark "10-stream parallel" but this
 % description says "six unidirectional flows." Verify the actual test
 % configuration and reconcile the two.
 The
@@ -404,18 +419,29 @@ single-stream mode.  Mycelium's 34.9\,ms RTT means a lone TCP stream
 can never fill the pipe: the bandwidth-delay product demands a window
 larger than any single flow maintains, so multiple concurrent flows
 compensate for that constraint and push throughput to 2.20$\times$
-the single-stream figure.  % TODO: The buffer-bloat workaround explanation for Hyprspace's
-% parallel scaling is a hypothesis.  No direct evidence is shown
-% that multiple streams specifically alleviate buffer bloat.
-% Consider adding bufferbloat measurements or softening the claim.
-% TODO: DOWNSTREAM DEPENDENCY — This claim depends on the buffer bloat
-% diagnosis in Section hyprspace_bloat, which itself rests on the unverified
-% 2,800 ms under-load latency (see TODO there).  If that latency figure
-% is not confirmed, this parallel-scaling explanation collapses.
-Hyprspace scales almost as well
-(2.18$\times$), possibly because multiple streams collectively work
-around the buffer bloat that cripples any individual flow
-(Section~\ref{sec:hyprspace_bloat}).  Tinc picks up a
+the single-stream figure.  Hyprspace scales almost as well
+(2.18$\times$) for the same reason but with a different
+bottleneck.  Its libp2p send pipeline accumulates roughly
+2\,800\,ms of under-load latency
+(Section~\ref{sec:hyprspace_bloat}), which gives any single TCP
+flow a bandwidth-delay product on the order of hundreds of
+megabytes to fill — far beyond any single kernel cwnd.  And
+because Hyprspace keys \texttt{activeStreams} by destination
+\texttt{peer.ID} (Listing~\ref{lst:hyprspace_sendpacket}), the
+three concurrent peer pairs in the parallel benchmark each get
+their own libp2p stream, their own mutex, and their own yamux
+flow-control window.  The three TCP senders therefore maintain
+three independent windows in flight, and three windows fill
+more of the bloated pipeline than one can.
+% TODO: This is still a hypothesis: it generalises the same
+% bandwidth-delay-product argument used for Mycelium directly
+% above, and is now grounded in the per-peer
+% \texttt{SharedStream} structure verified in
+% Listing~\ref{lst:hyprspace_sendpacket}, but neither the
+% per-flow window evolution nor the actual under-load latency
+% has been measured directly.  A tcpdump of one Hyprspace
+% iPerf3 run with inter-arrival timing analysis would settle
+% it.  Tinc picks up a
 1.68$\times$ boost because several streams can collectively keep its
 single-threaded CPU busy during what would otherwise be idle gaps in
 a single flow.
@@ -430,8 +456,6 @@ under multiplexing.
 Nebula is the only VPN that actually gets \emph{slower} with more
 streams: throughput drops from 706\,Mbps to 648\,Mbps
 (0.92$\times$) while retransmits jump from 955 to 2\,462.  The
-% TODO: "ten streams" vs "six unidirectional flows" --- reconcile
-% with the test description above.
 streams are clearly fighting each other for resources inside the
 tunnel.
 
@@ -510,12 +534,14 @@ Only Internal and WireGuard achieve 0\,\% packet loss. Both operate at
 the kernel level with proper backpressure that matches sender to
 receiver rate. Every other VPN shows massive loss (69--99\%)
 because the sender overwhelms the tunnel's userspace processing capacity.
-% TODO: Headscale also uses WireGuard's kernel module but still shows
-% 69.8\% loss.  Explain that Headscale's userspace netstack sits
-% between the application and the WireGuard kernel module, so UDP
-% traffic must pass through userspace before reaching the kernel
-% tunnel --- this is why it behaves like a userspace VPN here despite
-% using WireGuard underneath.
+Headscale shares WireGuard's cryptographic protocol but, contrary to
+intuition, does not share its kernel datapath: Tailscale's
+\texttt{magicsock} layer intercepts every packet to handle endpoint
+selection and DERP relay, which is incompatible with the in-kernel
+WireGuard module.  Headscale therefore runs \texttt{wireguard-go}
+entirely in userspace, and the unbounded \texttt{-b~0} flood overruns
+that userspace pipeline just as it overruns every other userspace
+implementation, producing 69.8\,\% loss despite the WireGuard branding.
 Yggdrasil's 98.7\% loss is the most extreme: it sends the most data
 (due to its large block size) but loses almost all of it. These loss
 rates do not reflect real-world UDP behavior but reveal which VPNs
@@ -653,61 +679,54 @@ between raw throughput and real-world download speed.
   \label{fig:nix_download}
 \end{figure}
 
-\paragraph{Video Streaming (RIST).}
+\paragraph{Video streaming (RIST).}
 
-At just 3.3\,Mbps, the RIST video stream sits comfortably within
-every VPN's throughput budget.  This test therefore measures
-something different: how well the VPN handles real-time UDP packet
-delivery under steady load.  % TODO: The RIST plot shows Nebula at 99.8\%, not 100\%.  "Nine of
-% eleven deliver 100\%" is inaccurate --- eight deliver 100\%, Nebula
-% delivers 99.8\%.  Also, the claim that 14--16 dropped frames trace
-% to encoder warm-up is stated without evidence.  How was this
-% determined?  Add a reference or explain the methodology.
-Nine of the eleven VPNs pass without
-incident, delivering near-perfect video quality.  The 14--16 dropped
-frames that appear uniformly across all VPNs, including Internal,
-likely trace back to encoder warm-up rather than tunnel overhead.
+At 3.3\,Mbps, the RIST video stream sits well within every VPN's
+throughput budget.  The test therefore measures something else: how
+well each VPN handles real-time UDP delivery under steady load.
+
+Most VPNs pass without incident.  Eight deliver 100\% quality,
+Nebula sits just below at 99.8\%, and Hyprspace's headline figure
+of 100\% conceals a separate failure mode discussed below.  The
+14--16 dropped frames that appear uniformly across every run, including
+Internal, are most likely encoder warm-up artefacts rather than
+tunnel overhead, though we have not verified this directly.
 
 % TODO: The packet-drop distribution statistics (288 mean,
 % 10\% median, IQR 255--330) are not shown in any figure.
 % Add a box plot or distribution figure for Headscale's RIST drops.
-Headscale is the exception.  It averages just 13.1\,\% quality,
-dropping 288~packets per test interval.  The degradation is not
-bursty but sustained: median quality sits at 10\,\%, and the
-interquartile range of dropped packets spans a narrow 255--330 band.
-The qperf benchmark independently corroborates this, having failed
-outright for Headscale, confirming that something beyond bulk TCP is
-broken.
+Headscale is the clear failure.  Its mean quality is 13.1\%, and
+each test interval drops 288 packets.  The degradation is sustained
+rather than bursty: median quality is 10\%, and the interquartile
+range of dropped packets is a narrow 255--330.  The qperf benchmark
+also fails outright for Headscale at baseline, which rules out a
+bulk-TCP explanation.  Something in the real-time path is broken.
 
-What makes this failure unexpected is that Headscale builds on
-WireGuard, which handles video flawlessly.  TCP throughput places
-Headscale squarely in Tier~1.  Yet the RIST test runs over UDP, and
+The failure is unexpected because Headscale builds on WireGuard,
+which handles video without trouble, and Headscale's own TCP
+throughput puts it in Tier~1.  RIST runs over UDP, however, and
 qperf probes latency-sensitive paths using both TCP and UDP.  The
-% TODO: The DERP relay / MTU fragmentation hypothesis is plausible
-% but unverified.  No packet capture or fragmentation analysis is
-% presented.  Either add tcpdump / packet-level evidence or mark
-% this more clearly as a hypothesis.
-pattern points toward Headscale's DERP relay or NAT traversal layer
-as the source.  Its effective UDP payload size of 1\,208~bytes, the smallest
-of any VPN, may compound the issue: RIST packets that exceed
-this limit would be fragmented, and reassembling fragments under
-sustained load could produce exactly the kind of steady, uniform packet
-drops the data shows.  For video conferencing, VoIP, or any
-real-time media workload, this is a disqualifying result regardless
-of TCP throughput.
+most plausible source is Headscale's DERP relay or NAT traversal
+layer.  Headscale's effective UDP payload size is 1\,208~bytes, the
+smallest in the dataset.  RIST packets larger than this would be
+fragmented, and fragment reassembly under sustained load could
+produce exactly the steady, uniform drop pattern the data shows.
+This is a hypothesis, not a confirmed cause: it would need a
+packet capture to verify.  Either way, the result disqualifies
+Headscale from video conferencing, VoIP, or any other real-time
+media workload, regardless of TCP throughput.
 
 % TODO: Hyprspace's packet-drop statistics (mean 1,194, max 55,500,
 % percentiles all zero) are not visible in the RIST Quality bar chart.
 % Add a distribution plot or note in the caption that the bar
 % chart hides this variance.
-Hyprspace reveals a different failure mode.  Its average quality
-reads 100\,\%, but the raw numbers underneath are far from stable:
-mean packet drops of 1\,194 and a maximum spike of 55\,500, with
-the 25th, 50th, and 75th percentiles all at zero.  Hyprspace
-alternates between perfect delivery and catastrophic bursts.
-RIST's forward error correction compensates for most of these
-events, but the worst spikes are severe enough to overwhelm FEC
-entirely.
+Hyprspace fails differently.  Its average quality reads 100\%, but
+the raw drop counts underneath are unstable: mean packet drops of
+1\,194 and a maximum spike of 55\,500.  The 25th, 50th, and 75th
+percentiles are all zero, so most runs deliver perfectly while a
+small number suffer catastrophic bursts.  RIST's forward error
+correction recovers from most of these events, but the worst spikes
+overwhelm FEC entirely.
 
 \begin{figure}[H]
   \centering
@@ -742,14 +761,17 @@ Reboot reconnection rearranges the rankings.  Hyprspace, the worst
 performer under sustained TCP load, recovers in just 8.7~seconds on
 average, faster than any other VPN.  WireGuard and Nebula follow at
 10.1\,s each.  Nebula's consistency is striking: 10.06, 10.06,
-10.07\,s across its three nodes, pointing to a hard-coded timer
-rather than topology-dependent convergence.
+10.07\,s across its three nodes, an exact match for Nebula's
+\texttt{HostUpdateNotification} interval, whose default is
+10~seconds in the lighthouse protocol (configurable, but the
+benchmarks use the default).  After a reboot, a node must
+wait until the next periodic update before its lighthouses learn
+its new endpoint, so the reconnection time tracks the timer rather
+than any topology-dependent convergence.
 Mycelium sits at the opposite end, needing 76.6~seconds and showing
 the same suspiciously uniform pattern (75.7, 75.7, 78.3\,s),
 suggesting a fixed protocol-level wait built into the overlay.
 
-%TODO: Hard coded timer needs to be verified
-
 Yggdrasil produces the most lopsided result in the dataset: its yuki
 node is back in 7.1~seconds while lom and luna take 94.8 and
 97.3~seconds respectively.  The gap likely reflects the overlay's
@@ -810,7 +832,8 @@ retransmits per 30-second test (one in every 200~segments), TCP
 spends most of its time in congestion recovery rather than
 steady-state transfer, shrinking the max congestion window to
 205\,KB, the smallest in the dataset.  Under parallel load the
-situation worsens: retransmits climb to 17\,426.  % TODO: The explanation for the sender/receiver inversion (ACK delays
+situation worsens: retransmits climb to 17\,426.  % TODO: The
+% explanation for the sender/receiver inversion (ACK delays
 % causing sender-side timer undercounting) is a hypothesis.  Normally
 % sender >= receiver.  Consider verifying with packet captures or
 % note this as a likely but unconfirmed explanation.
@@ -829,43 +852,145 @@ quality outside of its burst events.  The pathology is narrow but
 severe: any continuous data stream saturates the tunnel's internal
 buffers.
 
+Hyprspace does import gVisor netstack, but reading the source
+confirms that the gVisor TCP stack sits exclusively behind the
+in-VPN ``service network'' feature.  Regular tunnel traffic uses
+an ordinary kernel TUN device created through the
+\texttt{songgao/water} library, and the forwarding loop in
+\texttt{node/node.go} only diverts a packet into the gVisor
+stack when its destination falls inside the
+\texttt{fd00:hyprspsv::/80} service prefix \emph{and} the L4
+protocol is TCP; everything else is shipped verbatim over a
+libp2p stream and written back into the receiving peer's kernel
+TUN.  Listings~\ref{lst:hyprspace_kernel_tun},
+\ref{lst:hyprspace_dispatch}, and \ref{lst:hyprspace_netstack}
+show the relevant code in the upstream Hyprspace tree.
+
+\lstinputlisting[language=Go,caption={Hyprspace creates a real
+    kernel TUN via \texttt{songgao/water}; this is the device every
+    peer-to-peer packet traverses.
+\textit{hyprspace/tun/tun\_linux.go:14--36}},label={lst:hyprspace_kernel_tun}]{Listings/hyprspace_tun_linux.go}
+
+\lstinputlisting[language=Go,caption={The IPv6 dispatch in the
+    Hyprspace forwarding loop only diverts to the gVisor service-network
+    TUN when the destination matches the
+    \texttt{fd00:hyprspsv::/80} service prefix \emph{and} the L4
+    protocol byte is \texttt{0x06} (TCP); every other packet is left
+    on the kernel TUN path and forwarded over libp2p.
+\textit{hyprspace/node/node.go:255--283}},label={lst:hyprspace_dispatch}]{Listings/hyprspace_dispatch.go}
+
+\lstinputlisting[language=Go,caption={Hyprspace's gVisor netstack
+    initialiser only enables TCP SACK; there is no \texttt{TCPRecovery}
+    override (RACK stays at gVisor's default), no congestion-control
+    override, and no buffer-size override. The text in
+    \texttt{tun.go} also notes the file is taken verbatim from
+    wireguard-go.
+\textit{hyprspace/netstack/tun.go:6--80}},label={lst:hyprspace_netstack}]{Listings/hyprspace_netstack.go}
+
+Since the benchmark targets the regular Hyprspace IPv4/IPv6
+addresses rather than service-network proxies, both endpoints
+rely on their host kernel's TCP stack for the entire transfer.
+Whatever options Hyprspace's gVisor instance might set
+internally — congestion control, loss recovery, buffer sizes —
+are therefore irrelevant to these measurements; the inner TCP
+state machine the kernel runs is the only one in the path.
+The same caveat applies more sharply to Tailscale, where the
+upstream documentation talks about an in-process gVisor TCP
+stack but the benchmark traffic never reaches it; that case is
+the subject of Section~\ref{sec:tailscale_degraded}.
+
+If gVisor is out of scope, the buffer bloat must originate
+further up the Hyprspace stack instead.  The most plausible
+source is the libp2p / yamux stream layer through which raw IP
+packets are funnelled.  Hyprspace's TUN-read loop dispatches
+each outbound packet on its own goroutine, and every such
+goroutine ends up in \texttt{node/node.go}'s
+\texttt{sendPacket}, which keeps exactly one libp2p stream per
+destination peer in \texttt{activeStreams} and guards it with a
+single per-peer \texttt{sync.Mutex}
+(Listing~\ref{lst:hyprspace_sendpacket}).  Concurrent
+application TCP flows to the same Hyprspace neighbour therefore
+serialise behind that one lock: the parallel iPerf3 test, which
+opens multiple TCP connections to the same peer at once,
+collapses to a single send pipeline at this layer.  Each
+goroutine waiting for the lock pins its own 1420-byte packet
+buffer, and the underlying yamux session adds a per-stream
+flow-control window on top.  None of this is visible to the
+kernel TCP sender that produced the inner segments — the kernel
+sees only that the TUN write returned — so it keeps growing
+its congestion window while the libp2p layer falls further
+behind.  The geometry is the textbook one for buffer bloat: a
+fast producer (kernel TCP) sitting upstream of a slow,
+serialised consumer (the single yamux stream per peer) with
+no flow-control signal coupling the two.
+
+\lstinputlisting[language=Go,caption={Hyprspace's outbound
+    fast path keeps exactly one libp2p stream per destination peer
+    in \texttt{activeStreams} and guards it with a per-peer
+    \texttt{sync.Mutex} held inside the \texttt{SharedStream}
+    record.  The TUN-read loop spawns a fresh goroutine per packet
+    (\texttt{node.go:282}); each one calls \texttt{sendPacket} and
+    takes \texttt{ms.Lock} for the duration of the libp2p stream
+    write, so concurrent application TCP flows to the same
+    Hyprspace neighbour are serialised behind a single mutex.
+    \textit{hyprspace/node/node.go:36--39, 282,
+328--348}},label={lst:hyprspace_sendpacket}]{Listings/hyprspace_sendpacket.go}
+
 \paragraph{Mycelium: Routing Anomaly.}
 \label{sec:mycelium_routing}
 
 Mycelium's 34.9\,ms average latency appears to be the cost of
-routing through a global overlay.  The per-path numbers, however,
+routing through a global overlay.  The per-path
+numbers, however,
 reveal a bimodal distribution:
 
 \begin{itemize}
-    \bitem{luna$\rightarrow$lom:} 1.63\,ms (direct path, comparable
+    \bitem{luna$\rightarrow$lom:} 1.63\,ms (direct
+      path, comparable
     to Headscale at 1.64\,ms)
     \bitem{lom$\rightarrow$yuki:} 51.47\,ms (overlay-routed)
     \bitem{yuki$\rightarrow$luna:} 51.60\,ms (overlay-routed)
 \end{itemize}
 
-One of the three links has found a direct route; the other two still
+One of the three links has found a direct route; the
+other two still
 bounce through the overlay.  All three machines sit on the same
-% TODO: Characterising path discovery as "failing intermittently" assumes
-% direct routing is the expected outcome on a LAN.  Mycelium is designed
-% as a global overlay and may intentionally route through supernodes.
-% If this is by-design behaviour, rephrase to avoid implying a bug.
-% This characterisation also propagates to the impairment ping analysis
-% (around line 966) which says impairment "pushes path discovery toward
-% shorter routes."
-% TODO: The throughput data INVERTS the latency split rather than
-% "mirroring" it.  The direct path (luna→lom, 1.63 ms RTT) achieves
-% only 122 Mbps, while the overlay-routed path (yuki→luna, 51.60 ms
-% RTT) reaches 379 Mbps --- the opposite of what TCP theory predicts.
-% The plot also shows luna→lom receiver throughput at only 57.2 Mbps
-% (a 53% sender/receiver gap on that link).  Explain why the direct
+% TODO: Characterising path discovery as "failing
+% intermittently" assumes
+% direct routing is the expected outcome on a LAN.
+% Mycelium is designed
+% as a global overlay and may intentionally route
+% through supernodes.
+% If this is by-design behaviour, rephrase to avoid
+% implying a bug.
+% This characterisation also propagates to the
+% impairment ping analysis
+% in Section sec:impairment, which says impairment "pushes path
+% discovery toward shorter routes."
+% TODO: The throughput data INVERTS the latency split
+% rather than
+% "mirroring" it.  The direct path (luna→lom, 1.63 ms
+% RTT) achieves
+% only 122 Mbps, while the overlay-routed path
+% (yuki→luna, 51.60 ms
+% RTT) reaches 379 Mbps: the opposite of what TCP
+% theory predicts.
+% The plot also shows luna→lom receiver throughput at
+% only 57.2 Mbps
+% (a 53% sender/receiver gap on that link).  Explain
+% why the direct
 % path is 3× slower than the overlay path, or acknowledge the
-% contradiction.  The current wording "mirrors the split" is incorrect.
-physical network, so Mycelium's path discovery is not consistently
-selecting the direct route, a more specific problem than blanket overlay
+% contradiction.  The current wording "mirrors the
+% split" is incorrect.
+physical network, so Mycelium's path discovery is not
+consistently
+selecting the direct route, a more specific problem
+than blanket overlay
 overhead.  Throughput shows a similarly lopsided split:
 yuki$\rightarrow$luna reaches 379\,Mbps while
 luna$\rightarrow$lom manages only 122\,Mbps, a 3:1 gap.  In
-bidirectional mode, the reverse direction on that worst link drops
+bidirectional mode, the reverse direction on that
+worst link drops
 to 58.4\,Mbps, the lowest single-direction figure in the entire
 dataset.
 
@@ -873,29 +998,37 @@ dataset.
   \centering
   \includegraphics[width=\textwidth]{{Figures/baseline/tcp/Mycelium/Average
   Throughput}.png}
-  % TODO: The caption attributes the asymmetry to "inconsistent direct
-  % route discovery" but the direct-route link (luna→lom, 1.63 ms RTT)
-  % is actually the SLOWEST (122 Mbps).  The caption should address
+  % TODO: The caption attributes the asymmetry to
+  % "inconsistent direct
+  % route discovery" but the direct-route link
+  % (luna→lom, 1.63 ms RTT)
+  % is actually the SLOWEST (122 Mbps).  The caption
+  % should address
   % why the direct path underperforms the overlay paths.
   \caption{Per-link TCP throughput for Mycelium, showing extreme
     path asymmetry.  The 3:1 ratio between best
     (yuki$\rightarrow$luna, 379\,Mbps) and worst
-    (luna$\rightarrow$lom, 122\,Mbps) links does not correlate with
+    (luna$\rightarrow$lom, 122\,Mbps) links does not
+    correlate with
   the latency split (Section~\ref{sec:mycelium_routing}).}
   \label{fig:mycelium_paths}
 \end{figure}
 
 % TODO: TTFB (93.7 ms vs.\ 16.8 ms) and connection establishment
 % (47.3 ms) numbers are from qperf but not shown in any figure.
-% Add a connection-setup latency table or plot.  Also clarify what
-% Internal's connection establishment time is (47.3 / 3 = 15.8 ms?)
+% Add a connection-setup latency table or plot.  Also
+% clarify what
+% Internal's connection establishment time is (47.3 /
+% 3 = 15.8 ms?)
 % so the "3× overhead" can be verified.
 The overlay penalty shows up most clearly at connection setup.
-Mycelium's average time-to-first-byte is 93.7\,ms (vs.\ Internal's
+Mycelium's average time-to-first-byte is 93.7\,ms
+(vs.\ Internal's
 16.8\,ms, a 5.6$\times$ overhead), and connection establishment
 alone costs 47.3\,ms (3$\times$ overhead).  Every new connection
 incurs that overhead, so workloads dominated by
-short-lived connections accumulate it rapidly.  Bulk downloads, by
+short-lived connections accumulate it rapidly.  Bulk
+downloads, by
 contrast, amortize it: the Nix cache test finishes only 18\,\%
 slower than Internal (10.07\,s vs.\ 8.53\,s) because once the
 transfer phase begins, per-connection latency fades into the
@@ -903,69 +1036,101 @@ background.
 
 Mycelium is also the slowest VPN to recover from a reboot:
 76.6~seconds on average, and almost suspiciously uniform across
-nodes (75.7, 75.7, 78.3\,s).  That kind of consistency points to a
-hard-coded convergence timer in the overlay protocol rather than
-anything topology-dependent.  The UDP test timed out at
-120~seconds, and even first-time connectivity required a
-70-second wait at startup.
+nodes (75.7, 75.7, 78.3\,s).  That kind of consistency points to
+a fixed convergence timer in the overlay protocol —
+most likely a
+default interval rather than anything topology-dependent.
+% TODO: Identify which Mycelium constant or default this 75-78 s
+% recovery actually corresponds to before claiming it is a fixed
+% timer; the source code would settle whether it is hard-coded,
+% a configurable default, or coincidence.
+The UDP test timed out at 120~seconds, and even first-time
+connectivity required a 70-second wait at startup.
 
 % Explain what topology-dependent means in this case.
 
 \paragraph{Tinc: Userspace Processing Bottleneck.}
 
-Tinc is a clear case of a CPU bottleneck masquerading as a network
+Tinc is a clear case of a CPU bottleneck masquerading
+as a network
 problem.  At 1.19\,ms latency, packets get through the
 tunnel quickly.  Yet throughput tops out at 336\,Mbps, barely a
-third of the bare-metal link.  % TODO: "path MTU is a healthy 1,500 bytes" but blksize_bytes is
-% 1,353.  These are different metrics --- blksize_bytes is the UDP
-% payload size, not the path MTU.  Clarify the distinction or
-% remove the 1,500 claim.
+third of the bare-metal link.
 The usual suspects do not apply:
 Tinc's effective UDP payload size (\texttt{blksize\_bytes} of
-1\,353 from UDP iPerf3, comparable to VpnCloud at 1\,375 and
+  1\,353 from UDP iPerf3, comparable to VpnCloud at 1\,375 and
 WireGuard at 1\,368) is in the normal range, and its retransmit
-count (240) is moderate.  What limits Tinc is its single-threaded
-userspace architecture: one CPU core simply cannot encrypt, copy,
+count (240) is moderate.  What limits Tinc is its
+single-threaded
+userspace architecture: one CPU core simply cannot
+encrypt, copy,
 and forward packets fast enough to fill the pipe.
 
-% TODO: DOWNSTREAM DEPENDENCY — This "confirms" the Tinc CPU bottleneck
-% diagnosis from above, but the 14.9% CPU figure has an unresolved TODO
-% (the same utilization as VpnCloud at 539 Mbps).  If the CPU claim is
+% TODO: DOWNSTREAM DEPENDENCY — This "confirms" the
+% Tinc CPU bottleneck
+% diagnosis from above, but the 14.9% CPU figure has
+% an unresolved TODO
+% (the same utilization as VpnCloud at 539 Mbps).  If
+% the CPU claim is
 % revised or refuted, this confirmation must be updated too.
 The parallel benchmark confirms this diagnosis.  Tinc scales to
 563\,Mbps (1.68$\times$), beating Internal's 1.50$\times$ ratio.
-Multiple TCP streams collectively keep that single core busy during
-what would otherwise be idle gaps in any individual flow, squeezing
+Multiple TCP streams collectively keep that single
+core busy during
+what would otherwise be idle gaps in any individual
+flow, squeezing
 out throughput that no single stream could reach alone.
 
-\section{Impact of Network Impairment}
+\section{Impact of network impairment}
 \label{sec:impairment}
 
-Baseline benchmarks rank VPNs by overhead under ideal conditions.
-The impairment profiles from Table~\ref{tab:impairment_profiles}
-test a different property: resilience.  Two results dominate the
-data.  First, the throughput hierarchy from
-Section~\ref{sec:baseline} collapses under degradation --- at High
-impairment, the 675\,Mbps spread across all implementations compresses
-to under 3\,Mbps, and architectural differences that matter at gigabit speeds
-vanish.  Second, Headscale outperforms the bare-metal Internal
-baseline at Medium impairment across TCP, parallel TCP, and Nix
-cache benchmarks.  A VPN built on WireGuard should not beat a direct
-connection; Section~\ref{sec:tailscale_degraded} traces the cause to
-three TCP parameters in Tailscale's userspace network stack.
+Baseline benchmarks rank VPNs by overhead under ideal
+conditions.
+The impairment profiles in
+Table~\ref{tab:impairment_profiles} test
+a different property: resilience.  Two results
+dominate the data.
+
+The first is the collapse of the throughput hierarchy.  At High
+impairment, the 675\,Mbps spread between fastest and slowest
+implementation compresses to under 3\,Mbps.  Architectural
+differences that mattered at gigabit speeds become
+invisible once
+the network is the bottleneck.
+
+The second is harder to explain.  Headscale outperforms the
+bare-metal Internal baseline at Medium impairment across TCP,
+parallel TCP, and the Nix cache benchmark.  A VPN built on
+WireGuard should not beat a direct connection.
+Section~\ref{sec:tailscale_degraded} pursues this anomaly
+through what turns out to be the wrong hypothesis.  The
+investigation begins with Tailscale's much-discussed gVisor TCP
+stack, validates the candidate parameters in isolation on the
+bare-metal host, and only then discovers — by reading the rig's
+own NixOS module — that the gVisor stack is not actually in the
+data path of the benchmark at all.  The real culprit is a
+combination of the Linux kernel's tight default
+\texttt{tcp\_reordering} threshold and the way
+\texttt{wireguard-go}
+batches packets between the wire and the host kernel TCP stack.
 
 \subsection{Ping}
 
-Latency is the most predictable metric under impairment.  Most VPNs
-absorb the injected delay with a fixed per-hop overhead, and rankings
+Latency is the most predictable metric under
+impairment.  Most VPNs
+absorb the injected delay with a fixed per-hop
+overhead, and rankings
 within the central cluster barely change across profiles
-(Table~\ref{tab:ping_impairment}).  tc~netem adds roughly 4, 8, and
-15\,ms of round-trip delay at Low, Medium, and High respectively;
+(Table~\ref{tab:ping_impairment}).  tc~netem adds
+roughly 4, 8, and
+15\,ms of round-trip delay at Low, Medium, and High
+respectively;
 Internal's measured values (4.82, 9.38, 15.49\,ms) confirm this.
 
 \begin{table}[H]
   \centering
-  \caption{Average ping RTT (ms) across impairment profiles, sorted
+  \caption{Average ping RTT (ms) across impairment
+    profiles, sorted
   by High-profile RTT}
   \label{tab:ping_impairment}
   \begin{tabular}{lrrrr}
@@ -990,74 +1155,107 @@ Internal's measured values (4.82, 9.38, 15.49\,ms) confirm this.
 
 \begin{figure}[H]
   \centering
-  \includegraphics[width=\textwidth]{{Figures/impairment/Ping Average RTT Heatmap}.png}
-  \caption{Average ping RTT across impairment profiles.  Most VPNs
+  \includegraphics[width=\textwidth]{{Figures/impairment/Ping
+  Average RTT Heatmap}.png}
+  \caption{Average ping RTT across impairment
+    profiles.  Most VPNs
     form a tight parallel band; Mycelium's non-monotonic curve,
     EasyTier's excess latency at High, and Hyprspace's upward
   divergence stand out.}
   \label{fig:ping_impairment_heatmap}
 \end{figure}
 
-Mycelium defies the pattern.  Its RTT \emph{drops} from 34.9\,ms at
-baseline to 23.4\,ms at Low impairment, a 33\% improvement where
-every other VPN gets slower.  It then rises to 43.9\,ms at Medium
-before falling again to 33.0\,ms at High.  The baseline analysis
-(Section~\ref{sec:mycelium_routing}) showed that Mycelium's latency
-comes from a bimodal routing distribution: one path runs at 1.63\,ms
-while two others route through the global overlay at
-${\sim}$51\,ms.  % TODO: DOWNSTREAM DEPENDENCY — This explanation depends on the baseline
-% characterisation of Mycelium's path discovery as "failing intermittently"
-% (Section mycelium_routing).  If that characterisation is revised (e.g.,
-% overlay routing is by-design, not a failure), then the claim that
-% impairment "pushes path discovery toward shorter routes" needs rethinking:
-% the mechanism would be different if Mycelium is not trying to find direct
+Mycelium defies the pattern.  Its RTT \emph{drops}
+from 34.9\,ms at
+baseline to 23.4\,ms at Low impairment, a 33\%
+improvement at the
+profile where every other VPN gets slower.  It then climbs to
+43.9\,ms at Medium before falling again to 33.0\,ms
+at High.  The
+baseline analysis
+(Section~\ref{sec:mycelium_routing}) showed that
+Mycelium's latency comes from a bimodal routing
+distribution: one
+path runs at 1.63\,ms, two others route through the
+global overlay at
+${\sim}$51\,ms.  % TODO: DOWNSTREAM DEPENDENCY — This
+% explanation depends on the baseline
+% characterisation of Mycelium's path discovery as
+% "failing intermittently"
+% (Section mycelium_routing).  If that
+% characterisation is revised (e.g.,
+% overlay routing is by-design, not a failure), then
+% the claim that
+% impairment "pushes path discovery toward shorter
+% routes" needs rethinking:
+% the mechanism would be different if Mycelium is not
+% trying to find direct
 % routes in the first place.
-The impairment appears to push Mycelium's path
-discovery toward shorter routes, so a larger share of traffic takes
-the direct path.  The non-monotonic pattern is consistent with a path
-selection algorithm that responds to measured link quality, but not
-linearly with degradation severity.
+Impairment seems to push Mycelium's path selection toward the
+shorter route, so a larger share of traffic avoids the overlay
+detour.  The non-monotonic curve is consistent with a
+path selection
+algorithm that reacts to measured link quality but
+not linearly with
+degradation severity.
 
 % TODO: Ping packet loss data is not shown in any figure.  Add a
-% packet loss table/figure or reference the raw data so readers can
+% packet loss table/figure or reference the raw data
+% so readers can
 % verify these numbers.
-Mycelium also achieves 0\% ping packet loss at Low and Medium
-impairment, while most VPNs show 0.1--3.2\% loss at those profiles.
-At High impairment, Mycelium's loss jumps to 11.1\%.
+Mycelium loses zero ping packets at Low and Medium impairment.
+Most other VPNs show 0.1--3.2\% loss at those profiles.  At High
+impairment Mycelium's loss jumps to 11.1\%.
 
-% TODO: EasyTier's max RTT (290 ms), WireGuard's max (~40 ms), and
-% EasyTier's std dev (44.6 ms) are not shown in any plot.  The ping
-% heatmap only shows averages.  Add a jitter/distribution figure.
+% TODO: EasyTier's max RTT (290 ms), WireGuard's max
+% (~40 ms), and
+% EasyTier's std dev (44.6 ms) are not shown in any
+% plot.  The ping
+% heatmap only shows averages.  Add a
+% jitter/distribution figure.
 % Also, the "userspace retry mechanism" is a hypothesized cause
 % without source-code or packet-level evidence.
 EasyTier accumulates 11\,ms of excess latency at High impairment
-beyond what tc~netem accounts for.  Its average RTT of 26.6\,ms and
-maximum of 290\,ms (vs.\ ${\sim}$40\,ms for WireGuard) suggest a
-userspace retry mechanism that introduces escalating variance.
-EasyTier's RTT standard deviation reaches 44.6\,ms at High, the
-worst jitter of any VPN.
+beyond what tc~netem injects.  Its average RTT is
+26.6\,ms and its
+maximum reaches 290\,ms, against ${\sim}$40\,ms for
+WireGuard.  The
+RTT standard deviation reaches 44.6\,ms at High, the
+worst jitter
+of any VPN.  A userspace retry mechanism is the
+likely cause, but
+without source-code evidence we cannot say so with certainty.
 
 % TODO: Ping packet loss data is not shown in any plot.  The 1/9
-% = 11.1\% interpretation is clever but depends on the exact test
-% structure (3 pairs × 3 runs × 100 packets).  Verify this matches
+% = 11.1\% interpretation is clever but depends on
+% the exact test
+% structure (3 pairs × 3 runs × 100 packets).  Verify
+% this matches
 % the actual test setup and add a supporting figure or table.
-Hyprspace shows 11.1\% ping packet loss at every impairment level ---
-Low, Medium, and High alike.  With 9~measurement runs (3~machine
-pairs $\times$ 3~runs of 100~packets), 11.1\% equals exactly 1/9:
-one run per profile fails completely while the other eight report zero
-loss.  % TODO: DOWNSTREAM DEPENDENCY — This is a third reference to the buffer
-% bloat diagnosis from Section hyprspace_bloat, which depends on the
+Hyprspace shows the same 11.1\% ping packet loss at Low, Medium,
+and High impairment.  With 9~measurement runs per
+profile (3~machine
+pairs $\times$ 3~runs of 100~packets), 11.1\% is
+exactly 1/9: one
+run fails completely while the other eight report zero loss.
+% TODO: DOWNSTREAM DEPENDENCY — This is a third
+% reference to the buffer
+% bloat diagnosis from Section hyprspace_bloat, which
+% depends on the
 % unverified 2,800 ms under-load latency.  If that diagnosis is
 % revised, this explanation must also be revisited.
-This binary pass/fail behavior is consistent with the buffer bloat
-diagnosis from Section~\ref{sec:hyprspace_bloat}: when buffers fill,
-an entire path stalls rather than degrading gradually.
+The binary pass/fail behaviour fits the buffer bloat
+diagnosis from
+Section~\ref{sec:hyprspace_bloat}: when the tunnel's
+buffers fill, a
+path stalls completely rather than degrading gradually.
 
-\subsection{TCP Throughput}
+\subsection{TCP throughput}
 
-TCP throughput is where the baseline hierarchy breaks down.  The
-three performance tiers from Section~\ref{sec:baseline} dissolve at
-the first impairment step (Table~\ref{tab:tcp_impairment}).
+The baseline TCP hierarchy does not survive impairment.  The
+three performance tiers from
+Section~\ref{sec:baseline} dissolve at
+the first step (Table~\ref{tab:tcp_impairment}).
 
 \begin{table}[H]
   \centering
@@ -1089,116 +1287,186 @@ the first impairment step (Table~\ref{tab:tcp_impairment}).
 
 \begin{figure}[H]
   \centering
-  \includegraphics[width=\textwidth]{{Figures/impairment/TCP Throughput Heatmap}.png}
-  \caption{Single-stream TCP throughput across impairment profiles.
+  \includegraphics[width=\textwidth]{{Figures/impairment/TCP
+  Throughput Heatmap}.png}
+  \caption{Single-stream TCP throughput across
+    impairment profiles.
     Headscale crosses above Internal at Medium impairment;
     Yggdrasil collapses from 795 to 13\,Mbps at Low; all VPNs
   converge at High.}
   \label{fig:tcp_impairment_heatmap}
 \end{figure}
 
-Yggdrasil crashes from 795\,Mbps to 13.2\,Mbps at Low impairment, a
-98.3\% throughput loss from adding just 2\,ms latency, 2\,ms jitter,
-0.25\% packet loss, and 0.5\% reordering per machine.  Even Mycelium,
-the slowest VPN at baseline (259\,Mbps), retains more throughput at
-Low than Yggdrasil does.  The jumbo overlay MTU of 32\,731~bytes,
-which inflated baseline metrics
-(Section~\ref{sec:baseline}), becomes a liability under impairment:
-each lost or reordered outer packet triggers retransmission of
-${\sim}$24$\times$ more inner-layer data than a standard
-1\,400-byte MTU VPN would lose.
+Yggdrasil crashes from 795\,Mbps to 13.2\,Mbps at Low
+impairment, a
+98.3\% loss after adding only 2\,ms of latency, 2\,ms of jitter,
+0.25\% packet loss, and 0.5\% reordering per machine.
+Even Mycelium,
+the slowest VPN at baseline (259\,Mbps), retains more
+throughput at
+Low than Yggdrasil does.  The jumbo overlay MTU of 32\,731~bytes
+that inflated Yggdrasil's baseline numbers
+(Section~\ref{sec:baseline}) becomes a liability
+under impairment:
+every lost or reordered outer packet costs roughly
+24$\times$ more
+retransmitted inner data than a standard 1\,400-byte
+MTU VPN would
+lose.
 
-Headscale retains 34.3\% of its baseline throughput at Low, nearly
-matching Internal's 35.7\%.  At Medium impairment, Headscale
-(41.5\,Mbps) overtakes Internal (29.6\,Mbps) --- a VPN outperforming
-the bare-metal baseline.
-Section~\ref{sec:tailscale_degraded} investigates this anomaly in
+Headscale retains 34.3\% of its baseline throughput
+at Low, almost
+the same as Internal's 35.7\%.  At Medium impairment, Headscale
+(41.5\,Mbps) overtakes Internal (29.6\,Mbps).
+Section~\ref{sec:tailscale_degraded} investigates
+this anomaly in
 detail.
 
-At High impairment, the throughput range compresses from 675\,Mbps at
-baseline to just 2.9\,Mbps.  Internal leads at 4.25\,Mbps; Hyprspace
-trails at 1.39\,Mbps.  The impairment profile itself becomes the
-bottleneck.  With 2.5\% packet loss and 5\% reordering per machine,
-every implementation is TCP-loss-limited, and architectural
-differences that matter at gigabit speeds become irrelevant.
+At High impairment, the throughput range collapses
+from 675\,Mbps to
+2.9\,Mbps.  Internal leads at 4.25\,Mbps, Hyprspace trails at
+1.39\,Mbps, and the impairment profile itself is the bottleneck.
+With 2.5\% packet loss and 5\% reordering per machine, every
+implementation is loss-limited, and the architectural
+differences
+that mattered at gigabit speeds no longer matter at all.
 
-\subsection{UDP Throughput}
+\subsection{UDP throughput}
 
-The UDP stress test (\texttt{-b~0}) separates kernel-level from
-userspace implementations more cleanly than any TCP benchmark.  It
-also produces widespread failures under impairment: Hyprspace and
-Mycelium, which already failed at baseline, continue to time out at
-% TODO: Tinc fails at Low and Medium but succeeds at High (8 Mbps) ---
-% the same non-monotonic failure pattern as Internal/WireGuard (fail
-% at Low, succeed at Medium/High).  This suggests the failures are
-% iPerf3/tc interaction issues rather than fundamental VPN limitations.
-% Nebula and VpnCloud also fail selectively.  The widespread non-monotonic
-% failure pattern undermines using this benchmark as a reliability
-% indicator (see line 1163 claim).  Consider discussing this pattern.
-all profiles, and Tinc drops out at Low and Medium while ZeroTier
-fails at Medium.  Despite the sparse dataset, one pattern is clear.
+The UDP stress test (\texttt{-b~0}) separates
+implementations with
+effective backpressure from those without it more
+cleanly than any
+TCP benchmark.  Under impairment, it also produces widespread
+failures.
+% TODO: Tinc fails at Low and Medium but succeeds at
+% High (8 Mbps):
+% the same non-monotonic failure pattern as
+% Internal/WireGuard (fail
+% at Low, succeed at Medium/High).  This suggests the
+% failures are
+% iPerf3/tc interaction issues rather than
+% fundamental VPN limitations.
+% Nebula and VpnCloud also fail selectively.  The
+% widespread non-monotonic
+% failure pattern undermines using this benchmark as
+% a reliability
+% indicator (see line 1163 claim).  Consider
+% discussing this pattern.
+Hyprspace and Mycelium continue to time out at all profiles,
+extending their baseline failures.  Tinc drops out at Low and
+Medium, ZeroTier at Medium.  The data is sparse, but one pattern
+emerges from the runs that did complete.
 
-% TODO: The heatmap shows Internal and WireGuard both fail (×) at
-% some impairment profiles (e.g., Internal fails at Low, WireGuard
+% TODO: The heatmap shows Internal and WireGuard both
+% fail (×) at
+% some impairment profiles (e.g., Internal fails at
+% Low, WireGuard
 % at Low and High).  "Regardless of impairment" overstates the
 % evidence.  Rephrase to reflect the failures, or explain why
 % those runs failed despite the claim of maintained throughput.
-% TODO: Internal (and WireGuard) fail at Low impairment in the UDP
-% test but succeed at Medium and High --- the opposite of what one
-% would expect.  This is never explained.  Investigate and add an
-% explanation (e.g., iPerf3 crash, tc interaction, timing issue).
-Kernel-level implementations maintain throughput at the profiles
-where data exists.  Internal holds ${\sim}$950\,Mbps at
-Baseline, Medium, and High.  Headscale sustains 700--876\,Mbps and WireGuard
-850--898\,Mbps; % TODO: verify WireGuard UDP range -- analysis doc says 850-898, possible digit transposition
-both use WireGuard's kernel module for the outer tunnel, which
-provides proper backpressure at the transport layer.  Userspace VPNs collapse: EasyTier drops from
+% TODO: Internal (and WireGuard) fail at Low
+% impairment in the UDP
+% test but succeed at Medium and High: the opposite of what one
+% would expect.  This is never explained.
+% Investigate and add an
+% explanation (e.g., iPerf3 crash, tc interaction,
+% timing issue).
+Three implementations maintain throughput at the profiles where
+data exists.  Internal holds ${\sim}$950\,Mbps at
+Baseline, Medium,
+and High; WireGuard sustains 850--898\,Mbps; and
+Headscale sustains
+700--876\,Mbps. % TODO: verify WireGuard UDP range --
+% analysis doc says 850-898, possible digit transposition
+Internal and WireGuard ride the host kernel's transport-layer
+backpressure (Internal directly, WireGuard via the in-kernel
+WireGuard module).  Headscale, by contrast, never
+uses the kernel
+module even though it builds on the WireGuard protocol: as
+established in Section~\ref{sec:baseline}, Tailscale's
+\texttt{magicsock} layer intercepts every packet for endpoint
+selection, DERP relay, and the disco protocol, and that
+interception is incompatible with the kernel WireGuard datapath.
+Headscale therefore runs \texttt{wireguard-go} in userspace and
+compensates with UDP batching
+(\texttt{recvmmsg}/\texttt{sendmmsg}),
+host-kernel UDP segmentation/aggregation offload
+(\texttt{UDP\_SEGMENT}/\texttt{UDP\_GRO}, applied to the outer
+WireGuard socket), and a 7\,MB socket buffer on the same outer
+socket.  These offloads live in the host kernel; gVisor netstack
+itself implements no UDP GSO or UDP GRO of its own.
+Together they
+absorb a \texttt{-b 0} sender flood without
+collapsing.  Userspace
+VPNs without the same engineering do collapse:
+EasyTier drops from
 865 to 435 to 38.5 to 6.1\,Mbps across successive profiles.
-Yggdrasil, already pathological at baseline (98.7\% loss), crashes to
-12.3\,Mbps at Low and fails entirely at Medium and High.
+Yggdrasil, already pathological at baseline (98.7\%
+loss), crashes
+to 12.3\,Mbps at Low and fails entirely at Medium and High.
 
 \begin{figure}[H]
   \centering
-  \includegraphics[width=\textwidth]{{Figures/impairment/UDP Receiver Throughput Heatmap}.png}
-  % TODO: This caption says "kernel-level VPNs maintain high throughput"
-  % but the heatmap shows Internal, WireGuard, and Headscale ALL fail
-  % ($\times$) at Low impairment.  WireGuard also fails at High.
-  % Rephrase to acknowledge the failures or explain them.
+  \includegraphics[width=\textwidth]{{Figures/impairment/UDP
+  Receiver Throughput Heatmap}.png}
+  % TODO: The heatmap shows Internal, WireGuard, and
+  % Headscale all
+  % fail ($\times$) at Low impairment.  WireGuard also fails at
+  % High.  These selective failures need an explanation
+  % (iPerf3/tc interaction?).
   \caption{UDP receiver throughput across impairment profiles.
-    Kernel-level VPNs (Internal, WireGuard, Headscale) maintain high
-    throughput where they complete; userspace VPNs collapse or fail
-  entirely ($\times$ marks a failed run).}
+    Implementations with effective UDP backpressure
+    (Internal and
+      WireGuard via the in-kernel datapath; Headscale via
+    \texttt{wireguard-go} batching plus large socket buffers)
+    maintain high throughput where they complete;
+    other userspace
+  VPNs collapse or fail entirely ($\times$ marks a failed run).}
   \label{fig:udp_impairment_heatmap}
 \end{figure}
 
-% TODO: This "robustness indicator" interpretation is undermined by
-% the non-monotonic failure pattern.  Internal and WireGuard fail at
-% Low (0.25% loss) but succeed at Medium and High (1%+ loss).  If
-% failures indicated "fundamental flow-control problems," they should
-% get worse with more impairment, not better.  The pattern suggests
-% iPerf3 or tc timing issues rather than VPN limitations.  Either
+% TODO: This "robustness indicator" interpretation is
+% undermined by
+% the non-monotonic failure pattern.  Internal and
+% WireGuard fail at
+% Low (0.25% loss) but succeed at Medium and High
+% (1%+ loss).  If
+% failures indicated "fundamental flow-control
+% problems," they should
+% get worse with more impairment, not better.  The
+% pattern suggests
+% iPerf3 or tc timing issues rather than VPN
+% limitations.  Either
 % explain the non-monotonic failures or weaken this conclusion.
-The failure rate of this benchmark under impairment makes it more
-useful as a robustness indicator than a throughput measurement.  A VPN
-that cannot complete a 30-second UDP flood under 0.25\% packet loss
-has fundamental flow-control problems that will surface under real
-workloads too, even if the symptoms are milder.
+Under impairment this benchmark is more useful as a robustness
+indicator than as a throughput measurement.  A VPN that cannot
+complete a 30-second UDP flood under 0.25\% packet loss has a
+flow-control problem that will surface under real workloads too,
+even when the symptoms are milder.
 
 \subsection{Parallel TCP}
 
-% TODO: DOWNSTREAM DEPENDENCY — "six unidirectional flows" must match
-% the baseline parallel test description.  The baseline section has an
-% unresolved TODO about whether the test uses 6 or 10 streams.  If the
-% baseline is corrected to 10, this section must also be updated.
+% TODO: DOWNSTREAM DEPENDENCY — "six unidirectional
+% flows" must match
+% the baseline parallel test description.  The
+% baseline section has an
+% unresolved TODO about whether the test uses 6 or 10
+% streams.  If the
+% baseline is corrected to 10, this section must also
+% be updated.
 The Headscale anomaly from single-stream TCP grows larger under
-parallel load.  Table~\ref{tab:parallel_impairment} shows aggregate
+parallel load.  Table~\ref{tab:parallel_impairment}
+shows aggregate
 throughput across three concurrent bidirectional links (six
 unidirectional flows).
 
 \begin{table}[H]
   \centering
-  \caption{Parallel TCP throughput (Mbps) across impairment profiles.
-    Three concurrent bidirectional links produce six unidirectional
+  \caption{Parallel TCP throughput (Mbps) across
+    impairment profiles.
+    Three concurrent bidirectional links produce six
+    unidirectional
   flows.}
   \label{tab:parallel_impairment}
   \begin{tabular}{lrrrr}
@@ -1223,77 +1491,93 @@ unidirectional flows).
 
 \begin{figure}[H]
   \centering
-  \includegraphics[width=\textwidth]{{Figures/impairment/Parallel TCP Throughput Heatmap}.png}
+  \includegraphics[width=\textwidth]{{Figures/impairment/Parallel
+  TCP Throughput Heatmap}.png}
   \caption{Parallel TCP throughput across impairment profiles.
     Headscale dominates at Low (718\,Mbps vs.\ Internal's 277);
-    EasyTier is the runner-up (473\,Mbps); Hyprspace collapses to
+    EasyTier is the runner-up (473\,Mbps); Hyprspace
+    collapses to
   2.87\,Mbps.}
   \label{fig:parallel_impairment_heatmap}
 \end{figure}
 
-Headscale at Low impairment: 718\,Mbps --- 2.6$\times$ Internal
-(277\,Mbps) and 4.1$\times$ WireGuard (173\,Mbps).  At Medium,
-Headscale (113\,Mbps) still leads Internal (82.6\,Mbps) by 37\%.
-Whatever mechanism produces the single-stream crossover at Medium
-scales with the number of flows: six independent streams each
-benefit from it.
+At Low impairment, Headscale reaches 718\,Mbps: 2.6$\times$
+Internal's 277\,Mbps and 4.1$\times$ WireGuard's 173\,Mbps.  At
+Medium, Headscale (113\,Mbps) still leads Internal
+(82.6\,Mbps) by
+37\%.  Whatever mechanism produces the single-stream
+crossover at
+Medium scales with the flow count, because each of the six
+concurrent streams benefits from it independently.
 
-% TODO: EasyTier's resilience (473 Mbps at Low, 51% retention) is the
-% second-best result after Headscale, yet receives no architectural
-% explanation.  Headscale gets an entire subsection attributing its
-% resilience to gVisor TCP tuning.  Either explain what gives EasyTier
-% its resilience (e.g., its own TCP stack, congestion control, FEC)
-% or acknowledge the gap explicitly.
-EasyTier is the second-most resilient VPN under parallel load, at
-473\,Mbps at Low (51\% of baseline).  Both EasyTier and Headscale
-retain more than half their baseline parallel throughput at Low
-impairment; no other VPN exceeds 30\%.
+EasyTier is the runner-up under parallel load: 473\,Mbps at Low,
+51\% of its baseline.  Headscale and EasyTier are the only VPNs
+that retain more than half their baseline parallel throughput at
+Low impairment; no other implementation exceeds 30\%.
+We have no
+direct architectural explanation for EasyTier's resilience and
+do not claim one here.
 
-Hyprspace collapses from 803\,Mbps to 2.87\,Mbps at Low, a 99.6\%
-loss.  % TODO: DOWNSTREAM DEPENDENCY — This references the buffer bloat diagnosis
-% from Section hyprspace_bloat, which depends on the unverified 2,800 ms
-% under-load latency.  If that diagnosis is revised, this explanation
+Hyprspace collapses from 803\,Mbps to 2.87\,Mbps at
+Low, a 99.6\%
+loss.  % TODO: DOWNSTREAM DEPENDENCY — This
+% references the buffer bloat diagnosis
+% from Section hyprspace_bloat, which depends on the
+% unverified 2,800 ms
+% under-load latency.  If that diagnosis is revised,
+% this explanation
 % for parallel collapse must also be revisited.
-The buffer bloat that plagues single-stream transfers
-(Section~\ref{sec:hyprspace_bloat}) becomes catastrophic when six
-concurrent flows compete for the same bloated buffers.
+The buffer bloat that already plagues single-stream transfers
+(Section~\ref{sec:hyprspace_bloat}) turns catastrophic when six
+flows compete for the same bloated buffers at once.
 
-The High-profile convergence effect is even more pronounced here than
-in single-stream mode.  Tinc and VpnCloud land at identical
-8.25\,Mbps despite differing by 200\,Mbps at baseline.
+High-profile convergence is more pronounced here than in
+single-stream mode.  Tinc and VpnCloud land at identical
+8.25\,Mbps even though they differ by 200\,Mbps at baseline.
 
-\subsection{QUIC Performance}
+\subsection{QUIC performance}
 
 Headscale and Nebula failed the qperf QUIC benchmark at baseline
-(Section~\ref{sec:baseline}) and continue to fail across all
-impairment profiles.
+(Section~\ref{sec:baseline}) and continue to fail at every
+impairment profile.
 
 Yggdrasil's QUIC bandwidth drops from 745\,Mbps at baseline to
-7.67\,Mbps at Low, 3.45\,Mbps at Medium, and 2.17\,Mbps at High ---
-the same cliff observed in its TCP results, again driven by
-jumbo-MTU amplification of outer-layer packet loss.
+7.67\,Mbps at Low, 3.45\,Mbps at Medium, and 2.17\,Mbps at High.
+This is the same cliff observed in its TCP results,
+driven by the
+same jumbo-MTU amplification of outer-layer packet loss.
 
-At High impairment, WireGuard (23.2\,Mbps), VpnCloud (23.4\,Mbps),
+At High impairment, WireGuard (23.2\,Mbps), VpnCloud
+(23.4\,Mbps),
 ZeroTier (23.0\,Mbps), and Tinc (23.4\,Mbps) converge to within
-0.4\,Mbps of each other.  At baseline these four span a 188\,Mbps
-range (844 to 656\,Mbps).  QUIC's own congestion control, operating atop the
-already-degraded outer link, becomes the sole limiter.
+0.4\,Mbps of one another.  At baseline these four
+span a 188\,Mbps
+range (656 to 844\,Mbps).  QUIC's own congestion
+control, running on
+top of an already-degraded outer link, has become the
+sole limiter.
 
 \begin{figure}[H]
   \centering
-  \includegraphics[width=\textwidth]{{Figures/impairment/QUIC Bandwidth Heatmap}.png}
+  \includegraphics[width=\textwidth]{{Figures/impairment/QUIC
+  Bandwidth Heatmap}.png}
   \caption{QUIC bandwidth across impairment profiles.  Yggdrasil
-    drops from 745 to 8\,Mbps at Low; WireGuard, VpnCloud, ZeroTier,
-    and Tinc converge to ${\sim}$23\,Mbps at High.  Headscale and
+    drops from 745 to 8\,Mbps at Low; WireGuard,
+    VpnCloud, ZeroTier,
+    and Tinc converge to ${\sim}$23\,Mbps at High.
+    Headscale and
   Nebula fail at all profiles ($\times$).}
   \label{fig:quic_impairment_heatmap}
 \end{figure}
 
-\subsection{Video Streaming}
+\subsection{Video streaming}
 
-At ${\sim}$3.3\,Mbps, the RIST video stream sits within every VPN's
-throughput budget even at High impairment.  Quality differences in
-Table~\ref{tab:rist_impairment} therefore reflect packet delivery
+At ${\sim}$3.3\,Mbps, the RIST video stream sits
+within every VPN's
+throughput budget even at High impairment.  Quality
+differences in
+Table~\ref{tab:rist_impairment} therefore reflect
+packet delivery
 reliability, not bandwidth.
 
 \begin{table}[H]
@@ -1323,51 +1607,69 @@ reliability, not bandwidth.
 
 \begin{figure}[H]
   \centering
-  \includegraphics[width=\textwidth]{{Figures/impairment/Video Streaming Quality Heatmap}.png}
-  \caption{RIST video streaming quality across impairment profiles.
+  \includegraphics[width=\textwidth]{{Figures/impairment/Video
+  Streaming Quality Heatmap}.png}
+  \caption{RIST video streaming quality across
+    impairment profiles.
     Headscale is stuck at ${\sim}$13\% regardless of profile;
     Mycelium maintains ${\sim}$100\% even at High; Yggdrasil
   declines steeply to 43\%.}
   \label{fig:rist_impairment_heatmap}
 \end{figure}
 
-Headscale stays at ${\sim}$13\% across all four profiles: 13.1\%,
-13.0\%, 13.0\%, 13.0\%.  The profile-independence confirms the
-baseline diagnosis from Section~\ref{sec:baseline}.  The failure is
-% TODO: DOWNSTREAM DEPENDENCY — This repeats the DERP/MTU hypothesis from
-% Section baseline as though it were established.  The baseline TODO notes
-% this hypothesis is unverified (no packet capture evidence).  Do not
-% present it as a confirmed diagnosis here without resolving the upstream TODO.
-structural --- likely MTU fragmentation in the DERP relay layer ---
-and cannot worsen because it is already saturated.  Adding latency or
-loss on top of an 87\% packet drop floor changes nothing.
+Headscale sits at ${\sim}$13\% across all four profiles: 13.1\%,
+13.0\%, 13.0\%, 13.0\%.  This profile-independence confirms the
+baseline diagnosis (Section~\ref{sec:baseline}): the failure is
+% TODO: DOWNSTREAM DEPENDENCY — This repeats the
+% DERP/MTU hypothesis from
+% Section baseline as though it were established.
+% The baseline TODO notes
+% this hypothesis is unverified (no packet capture
+% evidence).  Do not
+% present it as a confirmed diagnosis here without
+% resolving the upstream TODO.
+structural (most plausibly MTU fragmentation in the DERP relay
+layer) and cannot worsen because it is already
+saturated.  Adding
+latency or loss on top of an 87\% packet drop floor changes
+nothing.
 
-Mycelium delivers 99.9\% quality even at High impairment, better than
+Mycelium holds 99.9\% quality even at High impairment, ahead of
 Internal (80.2\%) and every other VPN.  At 3.3\,Mbps, even
-Mycelium's degraded overlay paths can sustain the stream.  The same
-overlay routing that adds 34.9\,ms of latency and cripples bulk TCP
-transfers is harmless at video bitrates.  RIST's own forward error
-correction compensates for whatever packet loss remains.
+Mycelium's degraded overlay paths comfortably sustain
+the stream.
+The same overlay routing that adds 34.9\,ms of
+latency and cripples
+bulk TCP transfers is harmless at video bitrates, and RIST's
+forward error correction handles the residual loss.
 
-% TODO: The claim that jumbo MTU causes burst losses that overwhelm
-% FEC is a hypothesis.  No FEC analysis or packet-level evidence is
-% shown.  Consider adding packet capture data or softening the claim.
-Yggdrasil degrades the most steeply: 100\% at baseline, 94.7\% at
-Low, 71.4\% at Medium, 43.3\% at High.  The jumbo MTU that hurt TCP
-throughput likely hurts here too --- large overlay packets carrying
-RIST data are more likely to be lost or reordered at the outer layer,
-and RIST's FEC may not recover from the resulting burst losses.
+% TODO: The claim that jumbo MTU causes burst losses
+% that overwhelm
+% FEC is a hypothesis.  No FEC analysis or
+% packet-level evidence is
+% shown.  Consider adding packet capture data or
+% softening the claim.
+Yggdrasil degrades the most steeply: 100\% at
+baseline, 94.7\% at
+Low, 71.4\% at Medium, 43.3\% at High.  The jumbo MTU
+that hurt TCP
+throughput likely hurts here as well: large overlay packets are
+more exposed to loss and reordering at the outer layer, and the
+resulting burst losses may exceed what RIST's FEC can recover.
 
-\subsection{Application-Level Download}
+\subsection{Application-level download}
+
+The Nix binary cache download is the most demanding
+application-level benchmark.  Hundreds of sequential HTTP
+connections amplify the per-connection latency
+penalties that bulk
+throughput tests amortise.  Table~\ref{tab:nix_impairment} shows
+download times across profiles.
 
-The Nix binary cache download is the most demanding application-level
-benchmark: hundreds of sequential HTTP connections amplify
-per-connection latency penalties that bulk throughput tests amortize.
-Table~\ref{tab:nix_impairment} shows download times across profiles.
- 
 \begin{table}[H]
   \centering
-  \caption{Nix binary cache download time (seconds) across impairment
+  \caption{Nix binary cache download time (seconds)
+    across impairment
     profiles, sorted by Low-profile time.  ``--'' marks a failed
   run.}
   \label{tab:nix_impairment}
@@ -1393,55 +1695,81 @@ Table~\ref{tab:nix_impairment} shows download times across profiles.
 
 \begin{figure}[H]
   \centering
-  \includegraphics[width=\textwidth]{{Figures/impairment/Nix Cache Download Time Heatmap}.png}
-  \caption{Nix binary cache download time across impairment profiles.
-    Headscale, Nebula, and Tinc complete all four profiles; Headscale
+  \includegraphics[width=\textwidth]{{Figures/impairment/Nix
+  Cache Download Time Heatmap}.png}
+  \caption{Nix binary cache download time across
+    impairment profiles.
+    Headscale, Nebula, and Tinc complete all four
+    profiles; Headscale
     beats Internal at Medium (49\,s vs.\ 59\,s).  Yggdrasil's
-  Low-profile time explodes to 230\,s ($\times$ marks a failed run).}
+    Low-profile time explodes to 230\,s ($\times$ marks
+  a failed run).}
   \label{fig:nix_impairment_heatmap}
 \end{figure}
 
-Headscale, Nebula, and Tinc are the only VPNs to complete all four
-profiles.  At Medium impairment, Headscale finishes in 48.8~seconds
---- faster than Internal's 58.6~seconds.  Internal itself fails at
-High impairment while Headscale completes in 219~seconds, Tinc in
+Headscale, Nebula, and Tinc are the only VPNs to
+complete all four
+profiles.  At Medium impairment, Headscale finishes
+in 48.8~seconds,
+faster than Internal's 58.6~seconds.  Internal itself
+fails at High
+impairment while Headscale completes in 219~seconds, Tinc in
 496~seconds, and Nebula in 547~seconds.
 
 Yggdrasil's download time explodes from 10.6\,s to 230\,s at Low
-impairment, a 22$\times$ slowdown.  Every HTTP request incurs the
-latency penalty from Yggdrasil's impairment-amplified
-retransmissions.  Mycelium also degrades severely (10.1\,s to
-79.5\,s, an 8$\times$ increase), consistent with its overlay routing
-overhead, which compounds over hundreds of sequential HTTP
-connections.
+impairment, a 22$\times$ slowdown.  Every HTTP request pays the
+latency penalty of Yggdrasil's impairment-amplified
+retransmissions.
+Mycelium degrades almost as badly (10.1\,s to 79.5\,s, an
+8$\times$ increase): its overlay routing overhead compounds over
+hundreds of sequential HTTP connections.
 
 % TODO: Hyprspace fails at Low but completes at Medium (170 s).
-% This contradicts the "clean gradient" claim.  Explain why a VPN
+% This contradicts the "clean gradient" claim.
+% Explain why a VPN
 % can fail at Low but succeed at Medium, or note the anomaly.
-The failure map reveals a mostly clean gradient: more demanding
-profiles knock out more VPNs.  At Low, 10 of 11 complete (Hyprspace
-fails).  At Medium, 9 complete (though Hyprspace, which failed at
-Low, completes at 170\,s).  At High, only 3 survive (Headscale,
-Nebula, Tinc).  Internal's failure at High is the most surprising --- the
-bare-metal baseline cannot sustain a multi-connection HTTP workload
-under severe degradation, but Headscale, shielded by its userspace
-TCP stack, can.  Section~\ref{sec:tailscale_degraded} explains why.
+The failure map shows a mostly clean gradient: more demanding
+profiles knock out more VPNs.  At Low, 10 of 11
+finish (Hyprspace
+fails).  At Medium, 9 finish, though Hyprspace, which had failed
+at Low, completes here in 170\,s.  At High, only Headscale,
+Nebula, and Tinc survive.  Internal's failure at High is the
+surprising one: the bare-metal baseline cannot sustain a
+multi-connection HTTP workload under severe degradation, while
+Headscale's userspace TCP stack pulls it through.
+Section~\ref{sec:tailscale_degraded} explains why.
 
-\section{Tailscale Under Degraded Conditions}
+\section{Tailscale under degraded conditions}
 \label{sec:tailscale_degraded}
 
-\subsection{Observed Anomaly}
+This section is about an observation that should not exist:
+Headscale, a tunnelling VPN built on a kernel TCP stack and
+\texttt{wireguard-go}, beats the bare-metal Internal baseline at
+Medium impairment, and at Low impairment under parallel load
+beats it by a factor of 2.6.  The short answer turns out to be
+different from the obvious answer, and we worked it out only by
+chasing the obvious answer to its end.
 
-At Medium impairment, Headscale delivers 41.5\,Mbps single-stream TCP
-throughput --- 40\% more than Internal's 29.6\,Mbps.  A VPN built
-atop WireGuard outperforms the bare-metal connection it tunnels
-through.  The anomaly is consistent across benchmarks:
-Table~\ref{tab:headscale_anomaly} summarizes the comparison.
+\subsection{An anomaly worth pursuing}
+
+At Medium impairment, Headscale reaches 41.5\,Mbps on a single
+TCP stream against Internal's 29.6\,Mbps — a 40\,\% lead for
+the VPN over the direct host-to-host link it tunnels through.
+Headscale costs the expected ${\sim}$14\,\% at baseline, and at
+Low and High impairment it lags Internal by some margin.  Yet at
+Medium the order inverts, and not by a sliver: a 12\,Mbps gap on
+a 30\,Mbps link is well above measurement noise.  The same thing
+happens, more dramatically, on the parallel TCP test, where
+Headscale's 718\,Mbps at Low beats Internal's 277\,Mbps by a
+factor of 2.6.  Table~\ref{tab:headscale_anomaly} collects the
+comparison.
 
 \begin{table}[H]
   \centering
-  \caption{Headscale vs.\ Internal vs.\ WireGuard under impairment
-    (18.12.2025 run).  For TCP benchmarks, higher is better.  For
+  \caption{Headscale vs.\ Internal vs.\ WireGuard
+    under impairment
+    (18.12.2025 run).  For TCP benchmarks, higher is
+    better.  For
   Nix cache, lower is better; ``--'' marks a failed run.}
   \label{tab:headscale_anomaly}
   \begin{tabular}{llrrr}
@@ -1463,236 +1791,507 @@ Table~\ref{tab:headscale_anomaly} summarizes the comparison.
 \begin{figure}[H]
   \centering
   \includegraphics[width=\textwidth]{Figures/impairment/headscale-vs-internal-across-profiles.png}
-  \caption{Single-stream TCP throughput for Internal, Headscale, and
+  \caption{Single-stream TCP throughput for Internal,
+    Headscale, and
     WireGuard across impairment profiles (log scale).  Headscale
-    crosses above Internal at Medium impairment; WireGuard stays far
+    crosses above Internal at Medium impairment;
+    WireGuard stays far
   below both; all three converge at High.}
   \label{fig:headscale_vs_internal}
 \end{figure}
 
-In parallel TCP at Low impairment, Headscale reaches 718\,Mbps vs.\
-Internal's 277\,Mbps (2.6$\times$).  The Nix cache download at
-Medium takes Headscale 48.8\,s vs.\ Internal's 58.6\,s (17\%
-faster).  At High impairment, Internal fails the Nix cache entirely
-while Headscale completes in 219\,s.
-
-WireGuard, which shares Headscale's cryptographic layer, shows no
-such advantage: 54.7\,Mbps at Low, 8.77\,Mbps at Medium.  Whatever
-protects Headscale is not the encryption or the tunnel --- it is
-something in Tailscale's userspace networking stack.
+WireGuard-the-kernel-module is the obvious sanity
+check.  It uses
+the same Noise/WireGuard cryptographic protocol Tailscale ships
+and is the closest available comparison without the rest of
+Tailscale's stack.  WireGuard shows none of Headscale's
+advantage: 54.7\,Mbps at Low and 8.77\,Mbps at Medium, both well
+below Internal at the same profile.  So the encryption layer is
+not the answer, and the basic UDP tunnel is not the answer.
+Whatever Headscale is doing differently lives somewhere else in
+the rest of Tailscale's implementation.
 
 % TODO: The Medium-impairment retransmit percentages (5.2\%,
-% 2.4\%) are not in any table or figure.  Add a retransmit rate
-% table for impaired profiles or reference the data source.
-The retransmit data provides the first clue.  At Medium impairment,
-WireGuard's retransmit rate is 5.2\% --- more than double Internal's
-${\sim}$2.4\%.  Headscale, despite being a VPN, matches Internal at
-${\sim}$2.4\%.  WireGuard uses the host kernel's TCP stack, which
-treats reordered packets as losses and fires spurious retransmits;
-Headscale's gVisor stack tolerates more reordering, so fewer
-retransmissions are wasted on packets that were merely delayed.
+% 2.4\%) are not in any table or figure.  Add a retransmit
+% rate table for impaired profiles or reference the data
+% source.
+The retransmit data narrows the search.  At Medium, WireGuard's
+TCP retransmit rate is 5.2\,\%, more than double Internal's
+${\sim}$2.4\,\%.  Headscale matches Internal at ${\sim}$2.4\,\%
+even though it is a tunnelling VPN.  Both Headscale and
+bare-metal Internal run the same host kernel TCP stack at the
+inner layer, so the asymmetry is not about a different TCP
+implementation.  It is about what the kernel TCP stack is being
+asked to process: something on Headscale's path is suppressing
+the spurious retransmits the kernel would otherwise fire under
+\texttt{tc netem}-induced reordering, and WireGuard's path is
+not.
 
-\subsection{Congestion Control Analysis}
+\subsection{A plausible villain: Tailscale's gVisor stack}
 
-Tailscale uses a userspace TCP/IP stack derived from Google's gVisor
-(netstack).  This stack does not inherit the host kernel's TCP
-parameters.  Three defaults differ from the Linux kernel in ways that
-matter under packet reordering:
+The candidate explanation we pursued first, and the one any
+reading of the upstream Tailscale documentation will lead to,
+is Tailscale's userspace TCP/IP stack.  The Tailscale client
+imports Google's gVisor netstack
+(\texttt{gvisor.dev/gvisor/pkg/tcpip}) as a Go library and uses
+it as an in-process TCP implementation.  The gVisor
+documentation is direct about why this matters: netstack is
+designed for adverse networks where the host kernel's TCP
+defaults are too aggressive.  Tailscale's release notes go
+further, calling out specific overrides on top of gVisor — the
+most visible being an explicit RACK disable and 8\,MiB / 6\,MiB
+receive and send buffers.
+
+Reading Tailscale's source confirms it.
+\texttt{wgengine/netstack/netstack.go} contains the netstack
+initialiser, and Listing~\ref{lst:tailscale_netstack_overrides}
+reproduces the relevant overrides verbatim.  RACK is disabled
+(\texttt{TCPRecovery(0)}) with a comment pointing at
+\texttt{tailscale/issues/9707}: ``gVisor's RACK performs
+poorly. ACKs do not appear to be handled in a timely manner,
+leading to spurious retransmissions and a reduced congestion
+window.''  Reno is set explicitly with a comment pointing at
+\texttt{gvisor/issues/11632}, an integer-overflow bug in
+gVisor's CUBIC implementation.  The TCP send and receive
+buffer maxima are pushed up to 8\,MiB and 6\,MiB.  SACK is
+enabled (gVisor's default is off).
+
+\lstinputlisting[language=Go,caption={Tailscale's gVisor
+    netstack initialiser explicitly disables RACK, pins Reno as
+    the congestion control, and enlarges the TCP buffer maxima.
+    These overrides live inside
+    \texttt{wgengine/netstack/netstack.go}.
+\textit{tailscale/wgengine/netstack/netstack.go:264--339}},label={lst:tailscale_netstack_overrides}]{Listings/tailscale_netstack_overrides.go}
+
+Read against the Linux kernel defaults — RACK on, CUBIC by
+default, ${\sim}$1\,MiB receive and send buffers,
+\texttt{tcp\_reordering=3}, Tail Loss Probe enabled — these
+overrides describe a TCP stack better suited to a lossy,
+reordering link than the host kernel.  The hypothesis writes
+itself: Headscale's iPerf3 traffic is processed
+by this gVisor
+instance instead of by the host kernel TCP stack, and so it
+inherits the more reordering-tolerant behaviour.
+WireGuard-the-kernel-module shares only the cryptographic
+protocol; it does not get the gVisor stack, and
+therefore does
+not get the advantage.
+
+It is a clean story.  The natural way to test it
+is to extract
+the parameters Tailscale sets inside gVisor, apply their
+nearest Linux equivalents to the bare-metal host as sysctls,
+and see whether Internal — with no VPN at all — picks up the
+same advantage.  If it does, the gVisor explanation is
+supported.  If it does not, the hypothesis fails.
+
+\subsection{Reproducing the effect on bare metal}
+\label{sec:tuned}
+
+We ran two follow-up benchmarks on the same hardware and
+impairment setup as the original 18.12.2025 run.
 
 \begin{itemize}
-  \bitem{\texttt{tcp\_reordering}:} gVisor uses 10; the Linux kernel
-    defaults to~3.  This parameter controls how many out-of-order
-    packets TCP tolerates before treating the event as a loss.  With
-    tc~netem injecting 0.5--2.5\% reordering per machine, bursts of
-    3+ reordered packets are frequent.  The kernel's threshold of~3
-    causes spurious fast retransmits and congestion window reductions
-    for packets that are merely reordered, not lost.
-  \bitem{\texttt{tcp\_recovery} (RACK):} gVisor disables it; the
-    Linux kernel enables it by default.  RACK uses timing-based loss
-    detection that is more aggressive than the pure sequence-based
-    approach gVisor uses.  Under reordering, RACK's timing heuristics
-    can falsely classify delayed packets as lost.
-  \bitem{\texttt{tcp\_early\_retrans} (TLP):} gVisor disables it; the
-    kernel enables it.  Tail Loss Probe sends speculative retransmits
-    on idle connections, which can worsen congestion when the link is
-    already impaired.
-\end{itemize}
-
-Under packet reordering, these three defaults compound.  The Linux
-TCP stack fires retransmits and cuts the congestion window far more
-often than necessary; each false positive shrinks the window and
-reduces throughput.  Tailscale's gVisor stack tolerates more
-reordering before reacting, so its congestion window stays larger and
-throughput stays higher.
-
-% TODO: The claim that the anomaly "grows with impairment severity" is
-% not fully supported.  At High impairment, Headscale (4.21 Mbps) and
-% Internal (4.25 Mbps) converge --- the anomaly vanishes rather than
-% growing.  The logic predicts continued divergence at High reordering
-% (5% per machine), but the data shows both become loss-limited.
-% Rephrase to say the anomaly emerges at Medium but disappears at High
-% when absolute loss dominates.
-This explains why the anomaly emerges as impairment increases.  At
-baseline, there is no reordering, so the threshold difference is
-irrelevant and Internal's kernel-level processing advantage dominates.
-As reordering increases from 0.5\% (Low) to 2.5\% (Medium) per
-machine, the kernel's aggressive loss detection fires more often, and
-the throughput gap shifts in Headscale's favor.  At High impairment,
-however, both converge to ${\sim}$4.2\,Mbps: the absolute packet loss
-rate becomes the dominant bottleneck, overriding the reordering
-tolerance advantage.
-
-\subsection{Tuned Kernel Parameters}
-
-Two follow-up benchmark runs applied Tailscale's gVisor TCP
-parameters to the host kernel via sysctl:
-
-\begin{itemize}
-  \bitem{Full gVisor (27.02.2026):} All parameters ---
+    \bitem{Tailscale-style (27.02.2026):}
     \texttt{tcp\_reordering=10}, \texttt{tcp\_recovery=0},
-    \texttt{tcp\_early\_retrans=0}, plus enlarged buffer sizes
-    (\texttt{tcp\_rmem}, \texttt{tcp\_wmem}, \texttt{rmem\_max},
-    \texttt{wmem\_max}).  Tested on Internal, Headscale, WireGuard,
-    Tinc, and ZeroTier.
-  \bitem{Reorder-only (06.03.2026):} Only
-    \texttt{tcp\_reordering=10}, \texttt{tcp\_recovery=0}, and
-    \texttt{tcp\_early\_retrans=0}.  Buffer sizes left at kernel
-    defaults.  Tested on Internal and Headscale only.
+    \texttt{tcp\_early\_retrans=0}, plus enlarged
+    buffer sizes
+    (\texttt{tcp\_rmem}, \texttt{tcp\_wmem},
+      \texttt{rmem\_max},
+    \texttt{wmem\_max}).  Tested on Internal, Headscale,
+    WireGuard, Tinc, and ZeroTier.
+    \bitem{Reorder-only (06.03.2026):} Only
+    \texttt{tcp\_reordering=10},
+    \texttt{tcp\_recovery=0}, and
+    \texttt{tcp\_early\_retrans=0}.  Buffer sizes left at
+    kernel defaults.  Tested on Internal and Headscale only.
 \end{itemize}
 
-Table~\ref{tab:kernel_tuning_internal} shows how Internal responds
-to the tuning.  Both follow-up runs used the same impairment profiles
-and hardware as the original 18.12.2025 run.
-
 \begin{table}[H]
   \centering
   \caption{Internal (no VPN) throughput across three kernel
-    configurations.  ``Default'' is the 18.12.2025 run with stock
+    configurations.  ``Default'' is the
+    18.12.2025 run with stock
   Linux TCP parameters.}
   \label{tab:kernel_tuning_internal}
   \begin{tabular}{llrrr}
     \hline
     \textbf{Metric} & \textbf{Profile} & \textbf{Default} &
-    \textbf{Full gVisor} & \textbf{Reorder-only} \\
+    \textbf{Tailscale-style} & \textbf{Reorder-only} \\
     \hline
-    Single TCP (Mbps)   & Baseline & 934       & 934  & 934  \\
-    Single TCP (Mbps)   & Low      & 333       & 363  & 354  \\
-    Single TCP (Mbps)   & Medium   & 29.6      & 64.2 & 72.7 \\
-    Parallel TCP (Mbps) & Low      & 277       & 893  & 902  \\
-    Parallel TCP (Mbps) & Medium   & 82.6      & 226  & 211  \\
-    Retransmit \%       & Medium   & ${\sim}$2.4 & 1.21 & 1.11 \\
-    Nix cache (s)       & Medium   & 58.6      & 29.7 & 29.1 \\
+    Single TCP (Mbps)   & Baseline & 934       &
+    934  & 934  \\
+    Single TCP (Mbps)   & Low      & 333       &
+    363  & 354  \\
+    Single TCP (Mbps)   & Medium   & 29.6      &
+    64.2 & 72.7 \\
+    Parallel TCP (Mbps) & Low      & 277       &
+    893  & 902  \\
+    Parallel TCP (Mbps) & Medium   & 82.6      &
+    226  & 211  \\
+    Retransmit \%       & Medium   & ${\sim}$2.4
+    & 1.21 & 1.11 \\
+    Nix cache (s)       & Medium   & 58.6      &
+    29.7 & 29.1 \\
     \hline
   \end{tabular}
 \end{table}
 
-
 \begin{figure}[H]
   \centering
   \includegraphics[width=\textwidth]{Figures/impairment/no_vpn_kernel_tuning_comparison.png}
-  \caption{Internal (no VPN) single-stream TCP throughput across three
+  \caption{Internal (no VPN) single-stream TCP
+    throughput across three
     kernel configurations.  Baseline is unchanged; at Medium
-    impairment, throughput jumps from 30 to 64 to 73\,Mbps as
+    impairment, throughput jumps from 30 to 64 to
+    73\,Mbps as
   reordering tolerance increases.}
   \label{fig:kernel_tuning_comparison}
 \end{figure}
 
-Internal's Medium-impairment throughput jumps from 29.6 to
-72.7\,Mbps --- a 146\% increase from a three-line sysctl change.  The
-retransmit percentage drops from ${\sim}$2.4\% to 1.11\%; over half
-of the original retransmissions were spurious.  The Nix cache download at
-Medium halves from 58.6\,s to 29.1\,s.
+The result felt like confirmation.  Internal's
+Medium-impairment throughput jumped from 29.6\,Mbps to
+72.7\,Mbps under the reorder-only configuration — a 146\,\%
+increase from a three-line sysctl change — and
+the retransmit
+rate at Medium dropped from ${\sim}$2.4\,\% to
+1.11\,\%, which
+means more than half of the original retransmissions were
+spurious.  The Nix cache download at Medium roughly halved,
+from 58.6\,s to 29.1\,s.
 
-Parallel TCP sees an even larger gain.  Internal at Low impairment
-climbs from 277 to 902\,Mbps, a 226\% increase that now exceeds
-Headscale's original 718\,Mbps.  % TODO: DOWNSTREAM DEPENDENCY — "six concurrent flows" inherits the
-% unresolved 6-vs-10 stream count from the baseline parallel test
+Parallel TCP gained more.  Internal at Low
+climbed from 277 to
+902\,Mbps, a 226\,\% increase that not only
+exceeds Internal's
+old single-stream best but actually overtakes Headscale's
+original 718\,Mbps from the unmodified run.  %
+% TODO: DOWNSTREAM
+% DEPENDENCY — "six concurrent flows" inherits
+% the unresolved
+% 6-vs-10 stream count from the baseline parallel test
 % description.  Update when that TODO is resolved.
-With six concurrent flows each
-independently benefiting from the higher reordering threshold, the
-aggregate improvement compounds.
+Each of the six concurrent flows benefits independently from
+the higher reordering threshold, and the gains compound.
 
-% TODO: Headscale's tuned-run values (50.1 Mbps, 36.3 s) are not in
-% any table.  Add a table showing Headscale's results from the
-% follow-up runs alongside Internal's so readers can verify the
-% reversal.
-% TODO: "At every impairment level and benchmark" is a strong claim
-% but only single-stream TCP at Medium and Nix cache at Medium are
-% shown with both Internal and Headscale values.  The Headscale tuned
-% data is not in any table (see TODO above).  Either add the full
-% comparison table or weaken to "at the metrics shown."
-The anomaly reverses.  At the measured impairment levels and benchmarks,
-tuned Internal now meets or exceeds Headscale.  At Medium impairment:
-Internal 72.7\,Mbps vs.\ Headscale 50.1\,Mbps (Internal 45\% ahead),
-where the original result had Headscale 40\% ahead.  The Nix cache
-flips too: Internal completes in 29.1\,s vs.\ Headscale's 36.3\,s,
-where the original had Headscale 17\% faster.
+% TODO: Headscale's tuned-run values (50.1 Mbps, 36.3 s) are
+% not in any table.  Add a table showing Headscale's results
+% from the follow-up runs alongside Internal's so
+% readers can
+% verify the reversal.
+Headscale itself, retested with the same sysctls,
+gained more
+modestly: +21\,\% at Medium and a small $-$5\,\% wobble at
+Low.  And the anomaly reversed entirely.  At Medium, tuned
+Internal reached 72.7\,Mbps against Headscale's 50.1\,Mbps —
+a 45\,\% lead for Internal where the original run
+had Headscale
+40\,\% ahead.  The Nix cache flipped the same way: Internal
+completed in 29.1\,s against Headscale's 36.3\,s, where the
+original had Headscale 17\,\% faster.
 
 \begin{figure}[H]
   \centering
   \includegraphics[width=\textwidth]{Figures/impairment/headscale-gap-reversal.png}
-  \caption{Internal-to-Headscale speed-up factor before and after
-    kernel tuning.  Values above 1.0 mean Internal is faster.  At
-    Medium impairment, the ratio flips from 0.71$\times$ (Headscale
+  \caption{Internal-to-Headscale speed-up factor
+    before and after
+    kernel tuning.  Values above 1.0 mean
+    Internal is faster.  At
+    Medium impairment, the ratio flips from
+    0.71$\times$ (Headscale
   ahead) to 1.45$\times$ (Internal ahead).}
   \label{fig:headscale_gap_reversal}
 \end{figure}
 
-The reorder-only configuration (06.03) matches or exceeds the full
-gVisor configuration (27.02) at most metrics; the two exceptions are
-single-stream TCP at Low (354 vs.\ 363\,Mbps) and parallel TCP at
-Medium (211 vs.\ 226\,Mbps), both within 7\%.  Internal
-reaches 72.7\,Mbps at Medium with reorder-only vs.\ 64.2\,Mbps with
-full gVisor.  % TODO: The "mild buffer bloat" explanation for full-gVisor being
-% slightly slower than reorder-only is speculative.  The difference
-% (64.2 vs 72.7 Mbps) could be within run-to-run variance.  Either
-% test with more runs or present this as one possible explanation.
-The enlarged buffer sizes appear unnecessary and may
-introduce mild buffer bloat that partially offsets the reordering
-benefit, though the difference could also reflect normal run-to-run
-variance.  The entire Headscale advantage is explained by three kernel
-parameters: \texttt{tcp\_reordering}, \texttt{tcp\_recovery}, and
+The reorder-only configuration matched or exceeded the full
+Tailscale-style configuration on most metrics.  The two
+exceptions were single-stream TCP at Low (354
+vs.\ 363\,Mbps)
+and parallel TCP at Medium (211 vs.\ 226\,Mbps), both within
+7\,\%.  The enlarged buffer sizes did not help and may have
+added mild buffer bloat that partially offset the reordering
+benefit, though the gap could also be run-to-run variance.
+Either way, the entire Headscale advantage on Internal
+collapsed to three host-kernel sysctls:
+\texttt{tcp\_reordering}, \texttt{tcp\_recovery}, and
 \texttt{tcp\_early\_retrans}.
 
+At this point in the investigation the hypothesis seemed
+settled.  Tailscale's gVisor stack ships with
+these overrides;
+the bare-metal kernel ships with stricter defaults; matching
+the kernel to gVisor reproduces the effect.  Then we checked
+which Tailscale code path the test rig was actually running.
+
+\subsection{The data path that was not there}
+
+In default mode — what anyone running \texttt{tailscale up}
+on a Linux host gets — the Tailscale client creates a real
+kernel TUN device, registers a route for the
+Tailscale subnet
+through it, and forwards inbound and outbound
+packets through
+that interface.  An application like iPerf3 issues a
+\texttt{connect} to the remote peer's Tailscale
+IP.  The host
+kernel TCP stack handles the application TCP.  The kernel
+routes the resulting outbound packets to the TUN device.
+\texttt{tailscaled} (with \texttt{wireguard-go} embedded)
+reads them from the TUN, encrypts them, and sends them as
+outer WireGuard UDP packets on the wire.  The receiving side
+reverses the process and writes the decrypted inner packets
+back into its own TUN, where the host kernel TCP stack
+delivers them to the iPerf3 server.
+
+In that path, gVisor netstack is never instantiated.  The
+netstack initialiser in
+Listing~\ref{lst:tailscale_netstack_overrides}
+only runs when
+\texttt{tailscaled} is launched with
+\texttt{--tun=userspace-networking}, a mode that has no
+kernel TUN at all and is reachable only from processes
+running inside \texttt{tailscaled} itself (Tailscale SSH,
+Taildrop, the metric endpoint).  External processes such as
+iPerf3 cannot reach the Tailscale network in that mode.
+
+The test rig does not use that mode.
+Listing~\ref{lst:nixos_tailscale} shows the relevant line of
+the upstream NixOS \texttt{services.tailscale} module, which
+assembles the daemon command line as
+\texttt{tailscaled --tun
+\$\{cfg.interfaceName\}~\dots}, with
+no \texttt{userspace-networking} fall-back unless
+the operator
+explicitly sets \texttt{interfaceName =
+"userspace-networking"}.
+Listing~\ref{lst:rig_interface_name} shows what
+the benchmark
+suite's Headscale module sets the interface name to:
+\texttt{ts-\$\{instanceName\}}, truncated to fifteen
+characters.  The two together resolve to
+\texttt{tailscaled --tun ts-headscale} on every
+test machine,
+a real kernel TUN.  gVisor netstack is unreachable from any
+external benchmark traffic in this rig.
+
+\lstinputlisting[language=Nix,caption={The NixOS
+    \texttt{services.tailscale} module passes \texttt{--tun
+    \$\{interfaceName\}} as the daemon's TUN argument. There is
+    no \texttt{--tun=userspace-networking} fall-back unless the
+    user explicitly sets \texttt{interfaceName = "userspace-networking"}.
+\textit{nixpkgs/nixos/modules/services/networking/tailscale.nix:158}},label={lst:nixos_tailscale}]{Listings/nixos_tailscale.nix}
+
+\lstinputlisting[language=Nix,caption={The
+    benchmark suite's
+    Headscale module sets \texttt{interfaceName} to a real kernel
+    TUN name (\texttt{ts-<instance>}, truncated to 15 characters).
+    Combined with Listing~\ref{lst:nixos_tailscale}, this means
+    \texttt{tailscaled} runs as \texttt{tailscaled --tun ts-headscale}
+    on every test machine.
+\textit{vpn-benchmark-suite/clanModules/headscale/shared.nix:19,273--277}},label={lst:rig_interface_name}]{Listings/rig_interface_name.nix}
+
+The empirical fingerprint pins the same conclusion down without
+source-code reading.  Headscale itself gained +21\,\% at Medium
+from the host-kernel sysctl tuning.  If Headscale's iPerf3
+traffic were processed by gVisor netstack, host-kernel sysctls
+would change nothing — they configure the host kernel TCP stack
+and only the host kernel TCP stack. The fact that Headscale moves
+measurably under those sysctls is direct evidence that
+Headscale's application TCP runs on the host kernel stack, just
+as Internal's does.
+
+The validation experiment was therefore validating something
+other than the hypothesis it was supposed to validate.  It was
+confirming, very cleanly, that the Linux kernel's default
+\texttt{tcp\_reordering=3} is too tight for the kind of bursty,
+correlated reordering the Medium profile produces, and that
+loosening it produces a large throughput gain on a kernel-TCP
+data path.  That part of the result stands.  What does not stand
+is the inference that the gain reproduces something Tailscale was
+already doing in gVisor.  For this benchmark, Tailscale is not in
+the gVisor TCP business at all.
+
+\subsection{Where the advantage actually lives}
+
+The puzzle the investigation began with has not gone away.
+Headscale starts at 41.5\,Mbps where Internal starts at
+29.6\,Mbps, and both run their iPerf3 TCP on the same host kernel
+TCP stack.  Whatever Headscale is doing — partially, weakly, but
+reproducibly — is worth roughly twelve megabits per second on the
+Medium profile, and it is not gVisor netstack.
+
+The +21\,\% sysctl gain for Headscale itself is also informative
+about the size of the mechanism.  If the gain were 0\,\%,
+Headscale would already be doing the sysctls' work; if it were
++146\,\% like Internal's, Headscale would be doing nothing of its
+own.  The partial response says Headscale's mechanism produces an
+effect similar in kind to the sysctls but smaller in size, and
+that the two effects are not fully additive.
+
+Two features of the \texttt{wireguard-go} data-plane pipeline are
+the most likely candidates, and both live on the kernel-TUN path
+that Tailscale actually uses in the rig.
+
+The first is TUN TCP and UDP generic receive offload. Tailscale's
+\texttt{tstun} wrapper enables both on the kernel TUN device on
+Linux unless an environment knob disables them or a runtime probe
+rejects the feature (Listing~\ref{lst:tstun_gro}).  On the
+receive side, this means \texttt{wireguard-go} decrypts a burst
+of inbound WireGuard frames and then coalesces consecutive
+in-order TCP segments belonging to the same flow into a single
+super-segment before writing them back to the kernel TUN. On the
+transmit side, it accepts GSO super-segments from the kernel TUN
+read in the same way.  The receiving kernel TCP stack therefore
+sees fewer, larger segments per coalesced batch instead of $N$
+small ones, and the segment timing that survives to the kernel is
+the timing of GRO batches rather than of individual on-the-wire
+packets. Bare-metal Internal traffic has no equivalent path
+because it does not pass through any user-space TUN at all.
+
+\lstinputlisting[language=Go,caption={Tailscale enables TUN TCP
+    and UDP GRO on every Linux non-TAP \texttt{tailscaled} process
+    unless the operator disables them via environment knobs or a
+    kernel runtime probe rejects the feature. This is in the default
+    kernel-TUN data path; it is not gated on
+    \texttt{--tun=userspace-networking}.
+\textit{tailscale/net/tstun/wrap\_linux.go:25--43}},label={lst:tstun_gro}]{Listings/tstun_gro.go}
+
+The second is the 7\,MiB outer-UDP socket buffer that
+\texttt{magicsock} pins on the WireGuard UDP socket
+(Listing~\ref{lst:magicsock_buffer}), using the ``force''
+\texttt{SO\_*BUFFORCE} variant where available so the value is
+honoured even past \texttt{net.core.rmem\_max}.  The host kernel
+default is in the low hundreds of KiB.  Under burst-correlated
+impairment — Medium and High both use 50\,\% correlation, so
+losses and reorderings cluster — this larger buffer absorbs
+spikes in arrival rate that would otherwise overflow the kernel
+UDP receive queue and surface as additional inner-TCP losses.
+Internal has no such cushion on its incoming wire path.
+
+\lstinputlisting[language=Go,caption={\texttt{magicsock} pins the
+    outer WireGuard UDP socket's send and receive buffers to 7\,MiB
+    and uses \texttt{SetBufferSize} with the \texttt{SO\_*BUFFORCE}
+    (``force'') variant where available, so the value is honoured
+    even past \texttt{net.core.rmem\_max}.
+\textit{tailscale/wgengine/magicsock/magicsock.go:86,3908--3913}},label={lst:magicsock_buffer}]{Listings/magicsock_buffer.go}
+
+% TODO: Neither of the two candidate mechanisms above is directly
+% verified in this chapter.  A targeted follow-up — for example
+% tcpdump on the receiving \texttt{tailscale0} interface during a
+% Medium-impairment iPerf3 run, with inter-arrival timing
+% analysis — would distinguish their relative contributions and
+% confirm the mechanism.  The argument here is that they are the
+% most plausible candidates consistent with the evidence, not
+% measured causes.
+
+A third feature, batched UDP I/O, completes the picture without
+changing it qualitatively.  \texttt{wireguard-go} uses
+\texttt{recvmmsg} and \texttt{sendmmsg} on the outer UDP socket
+so a burst of WireGuard frames moves through a single system
+call.  This does not change \emph{whether} packets are reordered,
+but it reduces per-packet timing jitter that the kernel might
+otherwise interpret as additional reordering.
+
+Hyprspace cannot be used as a negative control for any of this.
+It does import gVisor netstack, but only for its in-VPN
+service-network feature, and the Hyprspace benchmark traffic goes
+through a kernel TUN exactly like Headscale's
+(Section~\ref{sec:hyprspace_bloat}). The two VPNs differ on the
+wireguard-go pipeline (TUN GRO and the 7\,MiB outer-UDP buffer),
+not on whether gVisor handles their inner TCP.  The gVisor angle
+simply does not apply to either of them in this benchmark.
+
+The kernel-side picture closes the loop.  Three host-kernel TCP
+parameters dominate the bare-metal behaviour the benchmarks
+expose. \texttt{net.ipv4.tcp\_reordering} (default 3) is the
+number of out-of-order segments the kernel will tolerate before
+declaring fast retransmit, and with \texttt{tc netem} injecting
+0.5--2.5\,\% reordering per machine, bursts of several reordered
+packets are frequent enough that the threshold is repeatedly
+tripped on the bare-metal path. \texttt{net.ipv4.tcp\_recovery}
+(default \texttt{1}, RACK enabled) adds time-based reordering
+detection on top of the segment-count threshold, which compounds
+the spurious retransmits when reordering is high. And
+\texttt{net.ipv4.tcp\_early\_retrans} (default \texttt{3}, Tail
+Loss Probe enabled) fires speculative retransmits when
+unacknowledged segments sit at the tail of a transmission window,
+which interacts poorly with an already-impaired link.  Loosening
+any one of the three softens the kernel's loss detection on the
+bare-metal path; loosening all three recovers most of the
+throughput.  The Headscale path reaches the same kernel TCP stack
+but is already feeding it the GRO-coalesced, buffer-cushioned
+stream described above, so the kernel's tight defaults fire less
+often there to begin with.
+
+The same logic explains the anomaly's shape across profiles. At
+baseline there is no reordering, so the kernel's tight
+\texttt{tcp\_reordering} threshold never trips and Internal's
+native kernel-stack speed wins. As reordering rises from 0.5\,\%
+(Low) to 2.5\,\% (Medium) per machine, the kernel's loss
+detection fires on the bare-metal path more often than on the
+GRO-coalesced Headscale path, and the throughput gap shifts in
+Headscale's favour.  At High impairment, both converge to
+${\sim}$4.2\,Mbps: absolute packet loss becomes the dominant
+bottleneck, and reordering tolerance no longer matters.
+
 % TODO: WireGuard (12.2 Mbps), Tinc (11.5 Mbps), and ZeroTier
 % (11.5 Mbps) tuned values are not in any table.  Add them to
 % Table~\ref{tab:kernel_tuning_internal} or a new table.
-Other VPNs benefit less from the kernel tuning.  WireGuard's Medium
-throughput rises from 8.77 to 12.2\,Mbps (+39\%) and Tinc's from
-5.53 to 11.5\,Mbps (+108\%).  ZeroTier stays flat (12.0 to
-11.5\,Mbps).  The tuning helps the kernel TCP stack, but VPNs that
-add their own encapsulation overhead and userspace processing have
-independent bottlenecks that the sysctl parameters cannot remove.
+Other VPNs respond unevenly to the same sysctl tuning.
+WireGuard's Medium throughput rises from 8.77 to 12.2\,Mbps
+(+39\,\%), Tinc's from 5.53 to 11.5\,Mbps (+108\,\%), and
+ZeroTier stays flat (12.0 to 11.5\,Mbps).  % TODO: The
+% reading below — that VPNs which add their own encapsulation and
+% userspace processing have bottlenecks the host kernel sysctls
+% cannot touch — does not cleanly fit the data: Tinc (a fully
+% userspace VPN) shows the largest gain (+108\,\%), larger than
+% kernel-WireGuard's.  A more complete explanation has to account
+% for which TCP stack each VPN's application traffic actually
+% traverses and which of those stacks the sysctls actually reach.
+The intuitive reading is that VPNs which add their own
+encapsulation and userspace processing have bottlenecks the host
+kernel sysctls cannot touch, but Tinc's large gain shows the
+picture is not that simple.
 
-% TODO: Headscale tuned-run percentages (+21\%, $-$5\%) are not in
-% any table.  Also, the "compound delays" hypothesis is speculative
-% --- no evidence is shown that double reordering tolerance causes
-% compound delays.  Either verify experimentally or weaken the claim.
-Headscale itself gets modestly faster with kernel tuning (+21\% at
-Medium) but slightly slower at Low impairment ($-$5\%).  Its
-userspace gVisor stack already optimizes for reordering tolerance.
-When the kernel stack also increases its tolerance, the two layers of
-tuning may interact suboptimally --- both independently delay
-retransmits, which could cause compound delays on the
-kernel-to-Headscale socket path.
+The resilient finding from this section, the one that survives
+regardless of which of the two Tailscale-side mechanisms turns
+out to dominate, is not about Tailscale at all.  It is about
+Linux.  The kernel's default \texttt{tcp\_reordering=3} threshold
+is too tight for the kind of bursty, correlated reordering
+\texttt{tc netem} produces at the Medium profile, and it costs
+the bare-metal host more than half of its achievable throughput.
+Three lines of \texttt{sysctl} repair it.  The fix is portable to
+any Linux host and entirely independent of any VPN.
 
-% TODO: These sections are empty stubs but the chapter introduction
-% (line 12--13) promises "findings from the source code analysis."
-% Either write these sections or remove the promise from the intro.
+The unresilient finding — the one that motivated us to write this
+section in the first place — is that Tailscale's much-discussed
+userspace TCP stack is, for the workload that exposed the
+anomaly, sitting on the bench.  The advantage we attributed to it
+must come from a more ordinary place: the way
+\texttt{wireguard-go} batches and coalesces packets between the
+wire and the kernel TCP stack, and the larger UDP buffer it pins
+on its outer socket.  We were chasing the wrong hypothesis with
+the right experiment, and the experiment turned out to be more
+useful than the hypothesis.
 
-\section{Source Code Analysis}
+% TODO: These sections are empty stubs but the chapter
+% introduction (line 12--13) promises "findings from the source
+% code analysis." Either write these sections or remove the
+% promise from the intro.
 
-\subsection{Feature Matrix Overview}
+\section{Source code analysis}
 
-% Summary of the 131-feature matrix across all ten VPNs.
+\subsection{Feature matrix overview}
+
+% Summary of the 108-feature matrix across all ten VPNs.
 % Highlight key architectural differences that explain
 % performance results.
 
-\subsection{Security Vulnerabilities}
+\subsection{Security vulnerabilities}
 
 % Vulnerabilities discovered during source code review.
 
-\section{Summary of Findings}
+\section{Summary of findings}
 
-% Brief summary table or ranking of VPNs by key metrics.
-% Save deeper interpretation for a Discussion chapter.
+% Brief summary table or ranking of VPNs by key metrics. Save
+% deeper interpretation for a Discussion chapter.
diff --git a/Figures/baseline/tcp/TCP Throughput.png b/Figures/baseline/tcp/TCP Throughput.png
index 49dce1c..a6bf25f 100644
Binary files a/Figures/baseline/tcp/TCP Throughput.png and b/Figures/baseline/tcp/TCP Throughput.png differ
diff --git a/Listings/hyprspace_dispatch.go b/Listings/hyprspace_dispatch.go
new file mode 100644
index 0000000..4231639
--- /dev/null
+++ b/Listings/hyprspace_dispatch.go
@@ -0,0 +1,26 @@
+} else if proto == 0x60 {
+    dstIP = net.IP(packet[24:40])
+    if node.cfg.BuiltinAddr6.Equal(dstIP) {
+        continue
+    } else if serviceNet.NetworkRange.Contains(dstIP) {
+        // Are you TCP because your protocol is 6, or is your
+        // protocol 6 because you are TCP?
+        if packet[6] == 0x06 {
+            port := uint16(packet[42])*256 + uint16(packet[43])
+            if serviceNet.EnsureListener([16]byte(packet[24:40]), port) {
+                count, err := (*serviceNet.Tun).Write([][]byte{packet}, 0)
+                if count == 0 || err != nil {
+                    logger.With(err).Error("Error writing to service-network tunnel")
+                }
+            }
+        }
+        continue
+    }
+}
+...
+// Check route table for destination address.
+route, found := node.cfg.FindRouteForIP(dstIP)
+if found {
+    dst = route.Target.ID
+    go node.sendPacket(dst, packet, plen)
+}
diff --git a/Listings/hyprspace_netstack.go b/Listings/hyprspace_netstack.go
new file mode 100644
index 0000000..8bf1676
--- /dev/null
+++ b/Listings/hyprspace_netstack.go
@@ -0,0 +1,21 @@
+// taken from https://git.zx2c4.com/wireguard-go/tree/tun/netstack/tun.go
+// rev 2b73054b299aec80cbb064954001810d30ee2e3c
+...
+func CreateNetTUN(localAddresses, dnsServers []netip.Addr, mtu int) (tun.Device, *Net, error) {
+    opts := stack.Options{
+        NetworkProtocols:   []stack.NetworkProtocolFactory{ipv4.NewProtocol, ipv6.NewProtocol},
+        TransportProtocols: []stack.TransportProtocolFactory{tcp.NewProtocol, udp.NewProtocol, icmp.NewProtocol6, icmp.NewProtocol4},
+        HandleLocal:        true,
+    }
+    dev := &netTun{
+        ep:             channel.New(1024, uint32(mtu), ""),
+        stack:          stack.New(opts),
+        ...
+    }
+    sackEnabledOpt := tcpip.TCPSACKEnabled(true) // TCP SACK is disabled by default
+    tcpipErr := dev.stack.SetTransportProtocolOption(tcp.ProtocolNumber, &sackEnabledOpt)
+    if tcpipErr != nil {
+        return nil, nil, fmt.Errorf("could not enable TCP SACK: %v", tcpipErr)
+    }
+    ...
+}
diff --git a/Listings/hyprspace_sendpacket.go b/Listings/hyprspace_sendpacket.go
new file mode 100644
index 0000000..540f1a3
--- /dev/null
+++ b/Listings/hyprspace_sendpacket.go
@@ -0,0 +1,31 @@
+type SharedStream struct {
+    Stream *network.Stream
+    Lock   *sync.Mutex
+}
+...
+// Inside the TUN-read loop:
+if found {
+    dst = route.Target.ID
+    go node.sendPacket(dst, packet, plen)
+}
+...
+func (node *Node) sendPacket(dst peer.ID, packet []byte, plen int) {
+    // Check if we already have an open connection to the destination peer.
+    ms, ok := node.activeStreams[dst]
+    if ok {
+        if func() bool {
+            ms.Lock.Lock()
+            defer ms.Lock.Unlock()
+            // Write out the packet's length to the libp2p stream to ensure
+            // we know the full size of the packet at the other end.
+            err := binary.Write(*ms.Stream, binary.LittleEndian, uint16(plen))
+            if err == nil {
+                // Write the packet out to the libp2p stream.
+                _, err = (*ms.Stream).Write(packet[:plen])
+                ...
+            }
+            ...
+        }() { return }
+    }
+    ...
+}
diff --git a/Listings/hyprspace_tun_linux.go b/Listings/hyprspace_tun_linux.go
new file mode 100644
index 0000000..9a889bc
--- /dev/null
+++ b/Listings/hyprspace_tun_linux.go
@@ -0,0 +1,15 @@
+// New creates and returns a new TUN interface for the application.
+func New(name string, opts ...Option) (*TUN, error) {
+    // Setup TUN Config
+    cfg := water.Config{
+        DeviceType: water.TUN,
+    }
+    cfg.Name = name
+
+    // Create Water Interface
+    iface, err := water.New(cfg)
+    if err != nil {
+        return nil, err
+    }
+    ...
+}
diff --git a/Listings/magicsock_buffer.go b/Listings/magicsock_buffer.go
new file mode 100644
index 0000000..6cec3cd
--- /dev/null
+++ b/Listings/magicsock_buffer.go
@@ -0,0 +1,9 @@
+socketBufferSize = 7 << 20
+...
+forceErr, portableErr := sockopts.SetBufferSize(pconn, direction, socketBufferSize)
+if forceErr != nil {
+    logf("magicsock: [warning] failed to force-set UDP %v buffer size to %d: %v; using kernel default values (impacts throughput only)", direction, socketBufferSize, forceErr)
+}
+if portableErr != nil {
+    logf("magicsock: failed to set UDP %v buffer size to %d: %v", direction, socketBufferSize, portableErr)
+}
diff --git a/Listings/nixos_tailscale.nix b/Listings/nixos_tailscale.nix
new file mode 100644
index 0000000..bda9f2d
--- /dev/null
+++ b/Listings/nixos_tailscale.nix
@@ -0,0 +1 @@
+''"FLAGS=--tun ${lib.escapeShellArg cfg.interfaceName} ${lib.concatStringsSep " " cfg.extraDaemonFlags}"''
diff --git a/Listings/rig_interface_name.nix b/Listings/rig_interface_name.nix
new file mode 100644
index 0000000..9912e9c
--- /dev/null
+++ b/Listings/rig_interface_name.nix
@@ -0,0 +1,10 @@
+let
+  interface = lib.substring 0 15 "ts-${instanceName}";
+in
+{
+  services.tailscale = {
+    enable = true;
+    # Use the interface name for the tunnel
+    interfaceName = interface;
+  };
+}
diff --git a/Listings/tailscale_netstack_overrides.go b/Listings/tailscale_netstack_overrides.go
new file mode 100644
index 0000000..ad474ba
--- /dev/null
+++ b/Listings/tailscale_netstack_overrides.go
@@ -0,0 +1,28 @@
+// values are biased towards higher throughput on high bandwidth-delay
+// product paths, except on memory-constrained platforms.
+tcpRXBufOpt := tcpip.TCPReceiveBufferSizeRangeOption{
+    ...
+    Max: tcpRXBufMaxSize,
+}
+tcpipErr := ipstack.SetTransportProtocolOption(tcp.ProtocolNumber, &tcpRXBufOpt)
+...
+tcpTXBufOpt := tcpip.TCPSendBufferSizeRangeOption{
+    ...
+    Max: tcpTXBufMaxSize,
+}
+tcpipErr = ipstack.SetTransportProtocolOption(tcp.ProtocolNumber, &tcpTXBufOpt)
+...
+sackEnabledOpt := tcpip.TCPSACKEnabled(true) // TCP SACK is disabled by default
+tcpipErr := ipstack.SetTransportProtocolOption(tcp.ProtocolNumber, &sackEnabledOpt)
+...
+// See https://github.com/tailscale/tailscale/issues/9707
+// gVisor's RACK performs poorly. ACKs do not appear to be handled in a
+// timely manner, leading to spurious retransmissions and a reduced
+// congestion window.
+tcpRecoveryOpt := tcpip.TCPRecovery(0)
+tcpipErr = ipstack.SetTransportProtocolOption(tcp.ProtocolNumber, &tcpRecoveryOpt)
+...
+// gVisor defaults to reno at the time of writing. We explicitly set reno
+// See https://github.com/google/gvisor/issues/11632
+renoOpt := tcpip.CongestionControlOption("reno")
+tcpipErr = ipstack.SetTransportProtocolOption(tcp.ProtocolNumber, &renoOpt)
diff --git a/Listings/tstun_gro.go b/Listings/tstun_gro.go
new file mode 100644
index 0000000..3516af9
--- /dev/null
+++ b/Listings/tstun_gro.go
@@ -0,0 +1,22 @@
+// SetLinkFeaturesPostUp configures link features on t based on select TS_TUN_
+// environment variables and OS feature tests. Callers should ensure t is
+// up prior to calling, otherwise OS feature tests may be inconclusive.
+func (t *Wrapper) SetLinkFeaturesPostUp() {
+    if t.isTAP || runtime.GOOS == "android" {
+        return
+    }
+    if groDev, ok := t.tdev.(tun.GRODevice); ok {
+        if envknob.Bool("TS_TUN_DISABLE_UDP_GRO") {
+            groDev.DisableUDPGRO()
+        }
+        if envknob.Bool("TS_TUN_DISABLE_TCP_GRO") {
+            groDev.DisableTCPGRO()
+        }
+        err := probeTCPGRO(groDev)
+        if errors.Is(err, unix.EINVAL) {
+            groDev.DisableTCPGRO()
+            groDev.DisableUDPGRO()
+            t.logf("disabled TUN TCP & UDP GRO due to GRO probe error: %v", err)
+        }
+    }
+}
diff --git a/_typos.toml b/_typos.toml
index 1c9e7b8..76bade6 100644
--- a/_typos.toml
+++ b/_typos.toml
@@ -4,7 +4,7 @@ extend-exclude = [
 	"**/value",
 	"**.rev",
 	"**/facter-report.nix",
-	"Chapters/Zusammenfassung.tex",
+	"**/Zusammenfassung.tex",
 	"**/key.json",
 	"pkgs/clan-cli/clan_lib/machines/test_suggestions.py",
 ]
diff --git a/main.tex b/main.tex
index 24edf20..836c9fd 100644
--- a/main.tex
+++ b/main.tex
@@ -62,6 +62,55 @@
 \usepackage{tikz}
 \usetikzlibrary{shapes.geometric}
 \usepackage[edges]{forest}
+\usepackage{listings} % Source code listings for evidence snippets
+% Syntax-highlighting colors (xcolor is already loaded by the class file)
+\definecolor{lstKeyword}{HTML}{0B5FA5}
+\definecolor{lstComment}{HTML}{4B7B4D}
+\definecolor{lstString}{HTML}{A31515}
+\definecolor{lstNumber}{HTML}{707070}
+\definecolor{lstBackground}{HTML}{F7F7F7}
+\definecolor{lstFrame}{HTML}{C8C8C8}
+\lstset{
+  basicstyle=\ttfamily\footnotesize,
+  keywordstyle=\color{lstKeyword}\bfseries,
+  commentstyle=\color{lstComment}\itshape,
+  stringstyle=\color{lstString},
+  numberstyle=\tiny\color{lstNumber},
+  identifierstyle=\color{black},
+  backgroundcolor=\color{lstBackground},
+  rulecolor=\color{lstFrame},
+  breaklines=true,
+  breakatwhitespace=false,
+  columns=fullflexible,
+  keepspaces=true,
+  showstringspaces=false,
+  frame=single,
+  framerule=0.4pt,
+  xleftmargin=0.5em,
+  xrightmargin=0.5em,
+  aboveskip=0.6em,
+  belowskip=0.6em,
+  captionpos=b,
+}
+\lstdefinelanguage{Nix}{
+  morekeywords={with,let,in,inherit,rec,if,then,else,import,true,false,null},
+  morecomment=[l]{\#},
+  morestring=[b]",
+  sensitive=true,
+}
+\lstdefinelanguage{Go}{
+  morekeywords={break,case,chan,const,continue,default,defer,else,fallthrough,
+    for,func,go,goto,if,import,interface,map,package,range,return,select,
+    struct,switch,type,var,bool,byte,complex64,complex128,error,float32,
+    float64,int,int8,int16,int32,int64,rune,string,uint,uint8,uint16,uint32,
+    uint64,uintptr,true,false,iota,nil,append,cap,close,complex,copy,delete,
+  imag,len,make,new,panic,print,println,real,recover},
+  morecomment=[l]{//},
+  morecomment=[s]{/*}{*/},
+  morestring=[b]",
+  morestring=[b]`,
+  sensitive=true,
+}
 
 \usepackage[backend=bibtex,style=numeric,natbib=true]{biblatex} %
 % Use the bibtex backend with the authoryear citation style (which
diff --git a/treefmt.nix b/treefmt.nix
index 1e5a60c..2871f75 100644
--- a/treefmt.nix
+++ b/treefmt.nix
@@ -18,6 +18,7 @@
         settings.global.excludes = [
           "AI_Data/**"
           "Figures/**"
+          "Chapters/Zusammenfassung.tex"
         ];
 
         programs.typos = {