improve mycelium argument

This commit is contained in:
2026-04-14 11:36:11 +02:00
parent 13633f092a
commit bbb5c6e886
13 changed files with 454 additions and 228 deletions
+77
View File
@@ -37,6 +37,83 @@ the 80\% success rate sets a baseline expectation, while the 55-second
timeout informs analysis of each implementation's keep-alive behavior timeout informs analysis of each implementation's keep-alive behavior
during source code review. during source code review.
\subsection{The Babel routing protocol}
\label{sec:babel}
Babel~\cite{chroboczek_babel_2021} is a distance-vector routing
protocol designed for both wired and wireless mesh networks. Each
node periodically sends \emph{Hello} messages to discover neighbours
and \emph{Update} messages to advertise reachable prefixes along with
a numeric cost metric. A node selects the route with the lowest
cumulative metric for each destination, subject to a
\emph{feasibility condition} that prevents routing loops. Because
Babel is distance-vector rather than link-state, nodes only know the
cost of their own best path, not the full topology.
Two properties of Babel matter for the benchmarks in
Chapter~\ref{Results}. First, route advertisements are periodic: a
node will not learn about a new path until the next Update interval,
which can be on the order of minutes depending on the implementation's
timer settings. Second, Babel intentionally resists frequent route
changes to avoid flapping; a node may continue using a suboptimal path
until a significantly better alternative is advertised. Both
properties can cause the selected route for a given destination to
differ across consecutive benchmark runs, even when the physical
topology has not changed.
\subsection{TCP flow control and congestion control}
\label{sec:tcp_windows}
TCP uses two window mechanisms to regulate how much unacknowledged data
a sender may have in flight. The \emph{receive window}
(\texttt{rwnd}), also called the \emph{send window} in
\texttt{iperf3} output, is advertised by the receiver and reflects how
much buffer space it has available. The \emph{congestion window}
(\texttt{cwnd}) is maintained locally by the sender and tracks the
network's estimated capacity. At any point, the sender may transmit
up to $\min(\texttt{rwnd}, \texttt{cwnd})$ bytes beyond the last
acknowledged byte \cite{rfc5681}.
The congestion window starts small (typically a few segments) and
grows during the \emph{slow-start} phase, doubling each round trip
until it reaches a threshold or triggers a loss event. After that,
\emph{congestion avoidance} takes over and the window grows linearly.
When the sender detects a loss (through duplicate ACKs or a
retransmission timeout), it treats the loss as a signal of congestion:
the window is reduced, often halved, and the sender enters a recovery
phase before resuming growth. Each retransmission therefore has a
direct mechanical cost: it shrinks the congestion window and reduces
the instantaneous sending rate.
The \emph{bandwidth-delay product} (BDP) determines how large the
window must be to fully utilize a link. It is the product of the
link's bandwidth and the round-trip time:
\begin{equation}
\text{BDP} = \text{bandwidth} \times \text{RTT}
\label{eq:bdp}
\end{equation}
A 1\,Gbps link with a 1\,ms RTT has a BDP of 125\,KB: the sender
must keep at least 125\,KB of unacknowledged data in flight to
saturate the link. If the congestion window is smaller than the BDP,
the sender will finish transmitting its window and then wait idle for
acknowledgements, leaving bandwidth unused. High-latency paths make
this problem worse because the BDP grows linearly with RTT. A
34\,ms RTT on the same 1\,Gbps link raises the BDP to 4.25\,MB, well
beyond the default congestion window of most TCP stacks. One common
workaround is to run multiple TCP flows in parallel: each flow
maintains its own congestion window, and their aggregate in-flight
data can approach the BDP even when no single flow could.
In VPN benchmarks these two windows appear as distinct bottlenecks. A
small receive window means the receiver (or the tunnel endpoint in
front of it) cannot absorb data fast enough. A small congestion
window means the path between sender and receiver is experiencing
loss, forcing TCP into repeated recovery cycles. Comparing congestion
windows across VPNs with different maximum segment sizes requires
care, because the window is measured in bytes: a VPN with jumbo
segments will report a larger byte-valued window for the same number
of in-flight segments.
\subsection{An Overview of Packet Reordering in TCP} \subsection{An Overview of Packet Reordering in TCP}
TODO \cite{leung_overview_2007} TODO \cite{leung_overview_2007}
+260 -220
View File
@@ -132,87 +132,77 @@ VpnCloud, while Hyprspace, Tinc, and Mycelium occupy the bottom tier
at under 40\,\% of baseline. at under 40\,\% of baseline.
Figure~\ref{fig:tcp_throughput} visualizes this hierarchy. Figure~\ref{fig:tcp_throughput} visualizes this hierarchy.
Raw throughput alone is incomplete, however. The retransmit column
reveals that not all high-throughput VPNs get there cleanly.
ZeroTier, for instance, reaches 814\,Mbps but accumulates
1\,163~retransmits per test, over 1\,000$\times$ what WireGuard
needs. ZeroTier compensates for tunnel-internal packet loss by
repeatedly triggering TCP congestion-control recovery, whereas
WireGuard delivers data with negligible in-tunnel loss. The
bare-metal Internal reference sits at 1.7~retransmits per test,
essentially noise, and the VPNs split into three groups around
it: \emph{clean} ($<$110: WireGuard, Yggdrasil, Headscale),
\emph{stressed} (200--900: Tinc, EasyTier, Mycelium, VpnCloud),
and \emph{pathological} ($>$950: Nebula, ZeroTier, Hyprspace).
% TODO: Is this naming scheme any good?
\begin{figure}[H] \begin{figure}[H]
\centering
\begin{subfigure}[t]{\textwidth}
\centering \centering
\includegraphics[width=\textwidth]{{Figures/baseline/tcp/TCP \includegraphics[width=\textwidth]{{Figures/baseline/tcp/TCP
Throughput}.png} Throughput}.png}
\caption{Average single-stream TCP throughput} \caption{Average single-stream TCP throughput}
\label{fig:tcp_throughput} \label{fig:tcp_throughput}
\end{subfigure} \end{figure}
\vspace{1em} Raw throughput alone is incomplete. The retransmit rate
(Figure~\ref{fig:tcp_retransmits}) normalizes raw retransmit counts
by estimated packet count, accounting for the different segment sizes
each VPN negotiates (1\,228 to 32\,731 bytes). WireGuard and
Headscale are effectively loss-free ($<$\,0.01\,\%). Tinc, EasyTier,
Nebula, and VpnCloud form a moderate band (0.03--0.06\,\%).
Yggdrasil, ZeroTier, and Mycelium cluster between 0.09\,\% and
0.13\,\%, and Hyprspace is the clear outlier at 0.49\,\%. ZeroTier
reaches 814\,Mbps despite a 0.10\,\% retransmit rate by compensating
for tunnel-internal loss through repeated TCP congestion-control
recovery; WireGuard delivers comparable throughput with effectively
zero loss.
\begin{subfigure}[t]{\textwidth} \begin{figure}[H]
\centering \centering
\includegraphics[width=\textwidth]{{Figures/baseline/tcp/TCP \includegraphics[width=\textwidth]{{Figures/baseline/tcp/TCP
Retransmit Rate}.png} Retransmit Rate}.png}
% TODO: Caption says "retransmits" (counts) but the plot axis shows \caption{TCP retransmit rate at baseline. WireGuard and Headscale
% "Retransmit Rate (\%)." Align the caption with the plot. are effectively loss-free ($<$\,0.01\,\%). Hyprspace is the clear
\caption{TCP retransmit rate (\%)} outlier at 0.49\,\%.}
\label{fig:tcp_retransmits} \label{fig:tcp_retransmits}
\end{subfigure}
% TODO: This parent caption still says "retransmit count" but the
% subfigure axis and caption were corrected to "retransmit rate (%)."
% Align the parent caption terminology (counts vs rates).
\caption{TCP throughput and retransmit rate at baseline. WireGuard
leads at 864\,Mbps with 1 retransmit. Hyprspace has nearly 5000
retransmits per test. The retransmit count does not always track
inversely with throughput: ZeroTier achieves high throughput
\emph{despite} high retransmits.}
\label{fig:tcp_results}
\end{figure} \end{figure}
Retransmits have a direct mechanical relationship with TCP congestion Retransmits have a direct mechanical relationship with TCP congestion
control: each one triggers a reduction in the congestion window control: each one triggers a reduction in the congestion window
(\texttt{cwnd}) and throttles the sender. (\texttt{cwnd}) and throttles the sender.
Figure~\ref{fig:retransmit_correlations} shows the relationship: Figure~\ref{fig:tcp_window} shows the raw window sizes, and
Hyprspace, with 4965 Figure~\ref{fig:retransmit_correlations} plots them against retransmit
retransmits, maintains the smallest max congestion window in the rate. Hyprspace, with a 0.49\,\% retransmit rate, maintains the
dataset (205\,KB), while Yggdrasil's 75 retransmits allow a 4.3\,MB smallest max congestion window in the dataset (200\,KB), while
window, the largest of any VPN. At first glance this suggests a Yggdrasil's 0.09\,\% rate allows a 4.2\,MB window, the largest of
clean inverse correlation between retransmits and congestion window any VPN. At
size, but the picture is misleading. Yggdrasil's outsized window is first glance this suggests a clean inverse correlation between
largely an artifact of its jumbo overlay MTU (32\,731 bytes): each retransmit rate and congestion window size, but the picture is
segment carries far more data, so the window in bytes is inflated misleading. Yggdrasil's outsized window is largely an artifact of
relative to VPNs using a standard ${\sim}$1\,400-byte MTU. Comparing its jumbo overlay MTU (32\,731 bytes): each segment carries far more
congestion windows across different MTU sizes is not meaningful data, so the window in bytes is inflated relative to VPNs using a
without normalizing for segment size. The reliable conclusion is standard ${\sim}$1\,400-byte MTU. Comparing congestion windows
simpler: high retransmit rates force TCP to spend more time in across different MTU sizes is not meaningful without normalizing for
congestion recovery than in steady-state transmission, and that segment size. The reliable conclusion is simpler: high retransmit
caps throughput regardless of available bandwidth. ZeroTier rates force TCP to spend more time in congestion recovery than in
illustrates the opposite extreme: brute-force retransmission can steady-state transmission, and that caps throughput regardless of
still yield high throughput (814\,Mbps with 1\,163 retransmits), at available bandwidth. ZeroTier illustrates the opposite extreme:
the cost of wasted bandwidth and unstable flow behavior. brute-force retransmission can still yield high throughput
(814\,Mbps at a 0.10\,\% rate), at the cost of wasted bandwidth and
unstable flow behavior.
\begin{figure}[H]
\centering
\includegraphics[width=\textwidth]{{Figures/baseline/tcp/Max TCP
Window Size}.png}
\caption{Maximum TCP window sizes (send and congestion) at baseline.
Yggdrasil's congestion window (4\,219\,KB) dwarfs all others but
is inflated by its 32\,KB jumbo overlay MTU. Hyprspace has the
smallest congestion window (200\,KB).}
\label{fig:tcp_window}
\end{figure}
VpnCloud stands out: its sender reports 538.8\,Mbps but the VpnCloud stands out: its sender reports 538.8\,Mbps but the
receiver measures only 413.4\,Mbps, a 23\,\% gap and the largest receiver measures only 413.4\,Mbps, a 23\,\% gap and the largest
in the dataset. This points to significant in-tunnel packet loss in the dataset. This points to significant in-tunnel packet loss
or buffering at the VpnCloud layer that the retransmit count (857) or buffering at the VpnCloud layer that the retransmit rate
alone does not fully explain. (0.06\,\%) alone does not fully explain.
% TODO: Clarify whether the headline TCP table
% (Table~\ref{tab:tcp_baseline}, 539\,Mbps for VpnCloud) reports
% sender or receiver throughput. The prose here cites sender
% 538.8 vs.\ receiver 413.4 --- the 539 figure matches the sender
% column, so the table caption should say so explicitly. Same
% clarification needed for Hyprspace (368 in table vs.\ sender
% 367.9 / receiver 419.8 in the pathological-cases paragraph).
Variability, whether stochastic across runs or systematic across Variability, whether stochastic across runs or systematic across
links, also differs substantially. WireGuard's three link links, also differs substantially. WireGuard's three link
@@ -243,14 +233,14 @@ on every direction.
\caption{Retransmits vs.\ max congestion window} \caption{Retransmits vs.\ max congestion window}
\label{fig:retransmit_cwnd} \label{fig:retransmit_cwnd}
\end{subfigure} \end{subfigure}
\caption{Retransmit correlations (log scale on x-axis). High \caption{Retransmit correlations (log scale on x-axis). A high
retransmits do not always mean low throughput (ZeroTier: 1\,163 retransmit rate does not always mean low throughput (ZeroTier:
retransmits, 814\,Mbps), but extreme retransmits do (Hyprspace: 0.10\,\%, 814\,Mbps), but an extreme rate does (Hyprspace:
4\,965 retransmits, 368\,Mbps). The apparent inverse correlation 0.49\,\%, 368\,Mbps). The apparent inverse correlation between
between retransmits and congestion window size is dominated by retransmit rate and congestion window size is dominated by
Yggdrasil's outlier (4.3\,MB \texttt{cwnd}), which is inflated Yggdrasil's outlier (4.3\,MB \texttt{cwnd}), which is inflated
by its 32\,KB jumbo overlay MTU rather than by low retransmits by its 32\,KB jumbo overlay MTU rather than by a low retransmit
alone.} rate alone.}
\label{fig:retransmit_correlations} \label{fig:retransmit_correlations}
\end{figure} \end{figure}
@@ -258,29 +248,35 @@ on every direction.
Sorting by latency rearranges the rankings considerably. Sorting by latency rearranges the rankings considerably.
Table~\ref{tab:latency_baseline} lists the average ping round-trip Table~\ref{tab:latency_baseline} lists the average ping round-trip
times, which cluster into three distinct ranges. times, which cluster into three distinct ranges. The table also
reports the average maximum RTT observed across test runs and the
resulting spike ratio (max/avg); a high ratio signals bursty tail
latency that the average alone conceals.
\begin{table}[H] \begin{table}[H]
\centering \centering
\caption{Average ping RTT at baseline, sorted by latency} \caption{Ping RTT statistics at baseline, sorted by average latency.
The spike ratio is max\,RTT\,/\,avg\,RTT; higher values indicate
bursty tail latency.}
\label{tab:latency_baseline} \label{tab:latency_baseline}
\begin{tabular}{lr} \begin{tabular}{lrrrr}
\hline \hline
\textbf{VPN} & \textbf{Avg RTT (ms)} \\ \textbf{VPN} & \textbf{Avg RTT (ms)} & \textbf{Max RTT (ms)}
& \textbf{Spike Ratio} & \textbf{Jitter (ms)} \\
\hline \hline
Internal & 0.60 \\ Internal & 0.60 & 0.65 & 1.1$\times$ & 0.04 \\
VpnCloud & 1.13 \\ VpnCloud & 1.13 & 3.14 & 2.8$\times$ & 0.25 \\
Tinc & 1.19 \\ Tinc & 1.19 & 1.31 & 1.1$\times$ & 0.07 \\
WireGuard & 1.20 \\ WireGuard & 1.20 & 1.81 & 1.5$\times$ & 0.13 \\
Nebula & 1.25 \\ Nebula & 1.25 & 1.53 & 1.2$\times$ & 0.10 \\
ZeroTier & 1.28 \\ ZeroTier & 1.28 & 3.00 & 2.3$\times$ & 0.25 \\
EasyTier & 1.33 \\ EasyTier & 1.33 & 1.55 & 1.2$\times$ & 0.10 \\
\hline \hline
Headscale & 1.64 \\ Headscale & 1.64 & 1.81 & 1.1$\times$ & 0.09 \\
Hyprspace & 1.79 \\ Hyprspace & 1.79 & 2.21 & 1.2$\times$ & 0.13 \\
Yggdrasil & 2.20 \\ Yggdrasil & 2.20 & 3.13 & 1.4$\times$ & 0.20 \\
\hline \hline
Mycelium & 34.9 \\ Mycelium & 34.9 & 48.6 & 1.4$\times$ & 1.49 \\
\hline \hline
\end{tabular} \end{tabular}
\end{table} \end{table}
@@ -296,13 +292,16 @@ moderate overhead. Then there is Mycelium at 34.9\,ms, so far
removed from the rest that Section~\ref{sec:mycelium_routing} gives removed from the rest that Section~\ref{sec:mycelium_routing} gives
it a dedicated analysis. it a dedicated analysis.
% TODO: The max RTT claim (8.6 ms) is not visible in the Average RTT The spike-ratio column in Table~\ref{tab:latency_baseline} exposes two
% plot. Add a max-RTT figure or table, or reference the raw data outliers among the low-latency VPNs. VpnCloud leads at
% source. 2.8$\times$ (avg 1.13\,ms, max 3.14\,ms) and ZeroTier follows at
ZeroTier's average of 1.28\,ms looks unremarkable, but its maximum 2.3$\times$ (avg 1.28\,ms, max 3.00\,ms); both share the highest
RTT spikes to 8.6\,ms, a 6.8$\times$ jump and the largest for any jitter in the table (0.25\,ms). Tinc and Headscale, by contrast,
sub-2\,ms VPN. These spikes point to periodic control-plane stay below 1.1$\times$ with jitter under 0.09\,ms, so their packet
interference that the average hides. timing is nearly as stable as bare metal. The spikes in VpnCloud and
ZeroTier are consistent with periodic
control-plane work such as key rotation or peer heartbeats that
briefly stalls the data path.
\begin{figure}[H] \begin{figure}[H]
\centering \centering
@@ -315,43 +314,42 @@ interference that the average hides.
Tinc presents a paradox: it has the third-lowest latency (1.19\,ms) Tinc presents a paradox: it has the third-lowest latency (1.19\,ms)
but only the second-lowest throughput (336\,Mbps). Packets traverse but only the second-lowest throughput (336\,Mbps). Packets traverse
the tunnel quickly, yet something caps the overall rate. The qperf the tunnel quickly, yet something caps the overall rate.
benchmark reports Tinc maxing out at 14.9\,\% total system CPU while Figure~\ref{fig:tcp_cpu} shows that Tinc uses only 12.3\,\% host CPU
delivering 336\,Mbps. On a multi-core host this figure is consistent during the TCP test. On a multi-core host this figure is consistent
with a single saturated core, which fits Tinc's single-threaded with a single saturated core, which fits Tinc's single-threaded
userspace architecture: one core encrypts, copies, and forwards userspace architecture: one core encrypts, copies, and forwards
packets, and the remaining cores sit idle. But VpnCloud reports the packets, and the remaining cores sit idle.
same 14.9\,\% and still reaches 539\,Mbps (60\,\% more than Tinc),
so whole-system CPU alone cannot explain the gap, and a per-packet \begin{figure}[H]
processing cost difference must also be in play. \centering
% TODO: 14.9\% total CPU does not pin the bottleneck on its own. \includegraphics[width=\textwidth]{{Figures/baseline/tcp/TCP CPU
% This is whole-system utilization on a multi-core machine, and a Utilization}.png}
% single saturated core fits the budget — but VpnCloud reports the \caption{CPU utilization during TCP throughput tests, split by host
% same 14.9\% \emph{and} reaches 539\,Mbps. Verify with per-thread (sender) and remote (receiver). Tinc (12.3\,\%) and VpnCloud
% CPU sampling or eBPF profiling to confirm the single-core story (14.2\,\%) use similar CPU, yet VpnCloud achieves 60\,\% higher
% and quantify the per-packet cost difference. throughput. Yggdrasil's low CPU (2.7\,\%) reflects its
kernel-level forwarding with jumbo segments.}
\label{fig:tcp_cpu}
\end{figure}
VpnCloud is also
single-threaded and uses slightly more CPU (14.2\,\%), yet reaches
539\,Mbps (60\,\% more throughput). The gap comes down to per-packet
cost. Tinc uses a hand-written ChaCha20-Poly1305 implementation
without hardware acceleration, allocates a fresh stack buffer and
copies the payload for each packet, and routes through a splay-tree
lookup. VpnCloud uses the \texttt{ring} cryptographic library, which
employs optimized assembly and can select AES-128-GCM with hardware
AES-NI instructions at runtime; it encrypts in place with no extra
buffer copies and routes through an $O(1)$ hash-map lookup. These
differences compound in a tight single-threaded loop: every
microsecond saved per packet raises the maximum packet rate the one
available core can sustain.
Figure~\ref{fig:latency_throughput} makes this disconnect easy to Figure~\ref{fig:latency_throughput} makes this disconnect easy to
spot. spot.
% TODO: These CPU numbers are stated inline but never shown in a plot
% or table. Add a CPU utilization figure or table so readers can
% verify. Also, the claim that WireGuard's CPU usage "goes to
% cryptographic processing" is unsubstantiated: no profiling data
% is presented. Either add profiling evidence or soften to
% "likely" / "presumably."
The qperf measurements also reveal a wide spread in CPU usage.
Hyprspace (55.1\,\%) and Yggdrasil
(52.8\,\%) consume 5--6$\times$ as much CPU as Internal's
9.7\,\%. WireGuard sits at 30.8\,\%, higher than expected for a
kernel-level implementation; in-kernel cryptographic processing
is the likely cause, though no profiling data confirms this.
On the efficient end, VpnCloud
(14.9\,\%), Tinc (14.9\,\%), and EasyTier (15.4\,\%) use the least
CPU time. Nebula and Headscale are missing from
this comparison because qperf failed for both.
%TODO: Explain why they consistently failed
\begin{figure}[H] \begin{figure}[H]
\centering \centering
\includegraphics[width=\textwidth]{Figures/baseline/latency-vs-throughput.png} \includegraphics[width=\textwidth]{Figures/baseline/latency-vs-throughput.png}
@@ -365,10 +363,7 @@ this comparison because qperf failed for both.
\subsection{Parallel TCP Scaling} \subsection{Parallel TCP Scaling}
The single-stream benchmark tests one link direction at a time. % The single-stream benchmark tests one link direction at a time.
% TODO: The plot labels this benchmark "10-stream parallel" but this
% description says "six unidirectional flows." Verify the actual test
% configuration and reconcile the two.
The The
parallel benchmark changes this setup: all three link directions parallel benchmark changes this setup: all three link directions
(lom$\rightarrow$yuki, yuki$\rightarrow$luna, (lom$\rightarrow$yuki, yuki$\rightarrow$luna,
@@ -411,26 +406,25 @@ Table~\ref{tab:parallel_scaling} lists the results.
\end{table} \end{table}
The VPNs that gain the most are those most constrained in The VPNs that gain the most are those most constrained in
single-stream mode. Mycelium's 34.9\,ms RTT means a lone TCP stream single-stream mode. Mycelium's 34.9\,ms RTT gives it a
can never fill the pipe: the bandwidth-delay product (the amount bandwidth-delay product (Equation~\ref{eq:bdp}) of roughly
of in-flight data a TCP flow needs to saturate a link, equal to the 4.4\,MB on a 1\,Gbps link. No single TCP flow maintains a
link bandwidth times the round-trip time) demands a window larger congestion window that large, so the link is never fully utilized.
than any single flow maintains, so multiple concurrent flows Multiple concurrent flows each contribute their own window, and
compensate for that constraint and push throughput to 2.20$\times$ their aggregate in-flight data approaches the BDP, which pushes
the single-stream figure. Hyprspace scales almost as well throughput to 2.20$\times$ the single-stream figure.
(2.18$\times$) for the same reason but with a different
bottleneck. Its libp2p send pipeline accumulates roughly Hyprspace scales almost as well (2.18$\times$) for the same
2\,800\,ms of under-load latency structural reason, but the bottleneck is different. Its libp2p send
(Section~\ref{sec:hyprspace_bloat}), which gives any single TCP pipeline accumulates roughly 2\,800\,ms of under-load latency
flow a bandwidth-delay product on the order of hundreds of (Section~\ref{sec:hyprspace_bloat}), which inflates the effective BDP
megabytes to fill, far beyond any single kernel cwnd. And to hundreds of megabytes, far beyond any single kernel congestion
because Hyprspace keys \texttt{activeStreams} by destination window. Because Hyprspace keys \texttt{activeStreams} by destination
\texttt{peer.ID} (Listing~\ref{lst:hyprspace_sendpacket}), the \texttt{peer.ID} (Listing~\ref{lst:hyprspace_sendpacket}), the three
three concurrent peer pairs in the parallel benchmark each get concurrent peer pairs in the parallel benchmark each get their own
their own libp2p stream, their own mutex, and their own yamux libp2p stream, their own mutex, and their own yamux flow-control
flow-control window. The three TCP senders therefore maintain window. Three independent windows in flight fill more of the bloated
three independent windows in flight, and three windows fill pipeline than one can.
more of the bloated pipeline than one can.
% TODO: This is still a hypothesis: it generalises the same % TODO: This is still a hypothesis: it generalises the same
% bandwidth-delay-product argument used for Mycelium directly % bandwidth-delay-product argument used for Mycelium directly
% above, and is now grounded in the per-peer % above, and is now grounded in the per-peer
@@ -445,23 +439,41 @@ Tinc picks up a
single-threaded CPU busy during what would otherwise be idle gaps in single-threaded CPU busy during what would otherwise be idle gaps in
a single flow. a single flow.
% TODO: "zero retransmits" in parallel mode is not shown in any table
% or figure. Add parallel-mode retransmit data or remove the claim.
WireGuard and Internal both scale cleanly at around WireGuard and Internal both scale cleanly at around
1.48--1.50$\times$ with zero retransmits. This is consistent 1.48--1.50$\times$ with a 0.00\,\% retransmit rate in both modes.
with WireGuard's overhead being a fixed per-packet cost that does This is consistent with WireGuard's overhead being a fixed per-packet
not worsen under multiplexing. cost that does not worsen under multiplexing.
Nebula is the only VPN that actually gets \emph{slower} with more Nebula is the only VPN that actually gets \emph{slower} with more
streams: throughput drops from 706\,Mbps to 648\,Mbps streams: throughput drops from 706\,Mbps to 648\,Mbps
(0.92$\times$) while retransmits jump from 955 to 2\,462. The (0.92$\times$). The cause is lock contention in Nebula's firewall
streams are clearly fighting each other for resources inside the connection tracker (Listing~\ref{lst:nebula_conntrack}). A single
tunnel. \texttt{sync.Mutex} protects the global \texttt{Conns} map, and every
packet in both directions must acquire it. The lock holder also
purges the timer wheel before releasing the lock, so other goroutines
stall while that housekeeping runs. Nebula mitigates this with a
per-routine cache that bypasses the global lock for known flows, but
the cache is invalidated every second, at which point all goroutines
contend on the mutex again. With parallel streams, the increased
goroutine count turns this periodic contention into a throughput
bottleneck.
More streams also amplify existing retransmit problems. Hyprspace \lstinputlisting[language=Go,caption={Nebula's firewall conntrack: a
climbs from 4\,965 to 17\,426~retransmits; global mutex protects the connection map and is acquired on every
VpnCloud from 857 to 6\,023. VPNs that were clean in single-stream packet.
mode stay clean under load, while the stressed ones only get worse. \textit{nebula/firewall.go:79--84,
486--558}},label={lst:nebula_conntrack}]{Listings/nebula_conntrack.go}
Retransmit rates under parallel load shift in two directions.
VpnCloud's rate climbs from 0.06\,\% to 0.14\,\% (2.5$\times$) and
Yggdrasil's from 0.09\,\% to 0.23\,\% (2.7$\times$), so
multiplexing genuinely increases loss for these VPNs. Hyprspace's
rate, by contrast, drops slightly from 0.49\,\% to 0.39\,\% even
though it sends far more data in parallel; the per-packet loss
probability does not worsen, but the absolute count still triples
because three pairs are transmitting simultaneously. VPNs that were
clean in single-stream mode (WireGuard, Internal) stay clean under
parallel load.
\begin{figure}[H] \begin{figure}[H]
\centering \centering
@@ -938,81 +950,109 @@ no flow-control signal coupling the two.
\textit{hyprspace/node/node.go:36--39, 282, \textit{hyprspace/node/node.go:36--39, 282,
328--348}},label={lst:hyprspace_sendpacket}]{Listings/hyprspace_sendpacket.go} 328--348}},label={lst:hyprspace_sendpacket}]{Listings/hyprspace_sendpacket.go}
\paragraph{Mycelium: Routing Anomaly.} \paragraph{Mycelium: routing anomaly.}
\label{sec:mycelium_routing} \label{sec:mycelium_routing}
Mycelium's 34.9\,ms average latency appears to be the cost of Mycelium's 34.9\,ms average latency looks like a
routing through a global overlay. The per-path straightforward cost of routing through a global
numbers, however, overlay. The per-path numbers do not fit this
reveal a bimodal distribution: explanation:
\begin{itemize} \begin{itemize}
\bitem{luna$\rightarrow$lom:} 1.63\,ms (direct \bitem{luna$\rightarrow$lom:} 1.63\,ms (comparable
path, comparable
to Headscale at 1.64\,ms) to Headscale at 1.64\,ms)
\bitem{lom$\rightarrow$yuki:} 51.47\,ms (overlay-routed) \bitem{lom$\rightarrow$yuki:} 51.47\,ms
\bitem{yuki$\rightarrow$luna:} 51.60\,ms (overlay-routed) \bitem{yuki$\rightarrow$luna:} 51.60\,ms
\end{itemize} \end{itemize}
One of the three links has found a direct route; the One link found a direct LAN path; the other two
other two still bounced through the overlay. All three machines sit on
bounce through the overlay. All three machines sit on the same the same physical network, so the split is not a matter
% TODO: Characterising path discovery as "failing of topology.
% intermittently" assumes
% direct routing is the expected outcome on a LAN. The throughput results invert the latency ranking.
% Mycelium is designed The link with the low ping latency,
% as a global overlay and may intentionally route luna$\rightarrow$lom at 1.63\,ms, should be the fastest
% through supernodes. according to TCP congestion theory. It is the slowest:
% If this is by-design behaviour, rephrase to avoid 122\,Mbps, with the reverse direction dropping to
% implying a bug. 58.4\,Mbps in bidirectional mode. Meanwhile
% This characterisation also propagates to the yuki$\rightarrow$luna, whose ICMP~RTT was 30$\times$
% impairment ping analysis higher, reaches 379\,Mbps
% in Section sec:impairment, which says impairment "pushes path (Figure~\ref{fig:mycelium_paths}). The throughput
% discovery toward shorter routes." ranking is the exact inverse of what the ping data
% TODO: The throughput data INVERTS the latency split predicts.
% rather than
% "mirroring" it. The direct path (luna→lom, 1.63 ms The explanation is in the iperf3 logs. Each TCP stream
% RTT) achieves reports a kernel-measured RTT that is independent of
% only 122 Mbps, while the overlay-routed path ICMP ping. For the luna$\rightarrow$lom stream, this
% (yuki→luna, 51.60 ms TCP~RTT starts at 51.6\,ms and climbs to a mean of
% RTT) reaches 379 Mbps: the opposite of what TCP 144\,ms over the 30-second run, with
% theory predicts. 757~retransmits---the link was clearly overlay-routed
% The plot also shows luna→lom receiver throughput at during the throughput test, even though ping had found a
% only 57.2 Mbps direct path eight minutes earlier. For
% (a 53% sender/receiver gap on that link). Explain yuki$\rightarrow$luna the reverse happened: the TCP
% why the direct stream measured only 12--22\,ms, and its bidirectional
% path is 3× slower than the overlay path, or acknowledge the return path recorded 1.0\,ms, a direct LAN connection
% contradiction. The current wording "mirrors the that the earlier ICMP test had not seen. The routes
% split" is incorrect. changed between the two tests.
physical network, so Mycelium's path discovery is not
consistently Mycelium uses the Babel routing protocol
selecting the direct route, a more specific problem (Section~\ref{sec:babel}) to discover and select paths.
than blanket overlay Two properties of its implementation explain why routes
overhead. Throughput shows a similarly lopsided split: shifted mid-benchmark. First, Mycelium advertises
yuki$\rightarrow$luna reaches 379\,Mbps while routes at a five-minute interval
luna$\rightarrow$lom manages only 122\,Mbps, a 3:1 gap. In (Listing~\ref{lst:mycelium_constants}):
bidirectional mode, the reverse direction on that
worst link drops \lstinputlisting[language=Rust,caption={Mycelium's
to 58.4\,Mbps, the lowest single-direction figure in the entire Babel timing constants. Routes are re-advertised
dataset. every 300\,s; the router will not learn about a new
path until the next cycle.
\textit{mycelium/src/router.rs:33--59}},label={lst:mycelium_constants}]{Listings/mycelium_route_constants.rs}
A direct path that appears between update cycles is
invisible to the router until the next advertisement
arrives. The benchmark's ping and throughput tests ran
sequentially with several minutes between them, so each
test observed whichever route happened to be selected at
that point in Babel's five-minute cycle.
Second, even when a better route \emph{is} advertised,
the router resists switching to it.
Listing~\ref{lst:mycelium_best_route} shows the
\texttt{find\_best\_route} function: a candidate route
is rejected unless its metric improves on the current
route by more than 10, or unless it is directly
connected (metric~0). This hysteresis prevents
flapping but also means that an overlay path, once
established, can persist for the remainder of the
update interval even after a shorter path becomes
available.
\lstinputlisting[language=Rust,caption={Route
selection with hysteresis. Lines~16--25 reject a
candidate route unless it is directly connected or
improves the composite metric by more than
\texttt{SIGNIFICANT\_METRIC\_IMPROVEMENT}\,(10).
\textit{mycelium/src/router.rs:1213--1238}},label={lst:mycelium_best_route}]{Listings/mycelium_find_best_route.rs}
The five-minute update interval and the switching
hysteresis together explain the throughput asymmetry.
The TCP-measured RTTs
are consistent with the observed throughput on every
link; only the ICMP~RTTs, measured minutes earlier under
a different routing state, give the impression of an
inversion.
\begin{figure}[H] \begin{figure}[H]
\centering \centering
\includegraphics[width=\textwidth]{{Figures/baseline/tcp/Mycelium/Average \includegraphics[width=\textwidth]{{Figures/baseline/tcp/Mycelium/Average
Throughput}.png} Throughput}.png}
% TODO: The caption attributes the asymmetry to \caption{Per-link TCP throughput for Mycelium. The
% "inconsistent direct luna$\rightarrow$lom link appears slow despite its
% route discovery" but the direct-route link low ping latency because Babel had switched to an
% (luna→lom, 1.63 ms RTT) overlay route by the time the throughput test ran.
% is actually the SLOWEST (122 Mbps). The caption The TCP-level RTTs reported by iperf3, not the
% should address earlier ICMP measurements, explain the 3:1 ratio.}
% why the direct path underperforms the overlay paths.
\caption{Per-link TCP throughput for Mycelium, showing extreme
path asymmetry. The 3:1 ratio between best
(yuki$\rightarrow$luna, 379\,Mbps) and worst
(luna$\rightarrow$lom, 122\,Mbps) links does not
correlate with
the latency split (Section~\ref{sec:mycelium_routing}).}
\label{fig:mycelium_paths} \label{fig:mycelium_paths}
\end{figure} \end{figure}
Binary file not shown.

After

Width:  |  Height:  |  Size: 29 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 43 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 42 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 53 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 53 KiB

+29
View File
@@ -0,0 +1,29 @@
fn find_best_route<'a>(&self, routes: &'a RouteList)
-> Option<&'a RouteEntry>
{
let source_table = self.source_table.read().unwrap();
let current = routes.selected();
let best = routes
.iter()
.filter(|re| !re.metric().is_infinite()
&& source_table.route_feasible(re))
.min_by_key(|re|
re.metric() + Metric::from(re.neighbour().link_cost()));
if let (Some(best), Some(current)) = (best, current) {
// Only switch if the metric is significantly better
// OR if the route is directly connected (metric 0).
if (best.source() != current.source()
|| best.neighbour() != current.neighbour())
&& !(best.metric()
+ Metric::from(best.neighbour().link_cost())
< current.metric()
+ Metric::from(current.neighbour().link_cost())
- SIGNIFICANT_METRIC_IMPROVEMENT
|| best.metric().is_direct())
{
return Some(current); // keep existing route
}
}
best
}
+9
View File
@@ -0,0 +1,9 @@
/// Time between HELLO messages, in seconds
const HELLO_INTERVAL: u64 = 20;
/// Max time used in UPDATE packets.
const UPDATE_INTERVAL: Duration =
Duration::from_secs(HELLO_INTERVAL * 3 * 5); // 300 s
/// The amount a metric of a route needs to improve
/// before we will consider switching to it.
const SIGNIFICANT_METRIC_IMPROVEMENT: Metric = Metric::new(10);
+39
View File
@@ -0,0 +1,39 @@
type FirewallConntrack struct {
sync.Mutex
Conns map[firewall.Packet]*conn
TimerWheel *TimerWheel[firewall.Packet]
}
func (f *Firewall) inConns(
fp firewall.Packet, h *HostInfo,
caPool *cert.CAPool,
localCache firewall.ConntrackCache,
) bool {
if localCache != nil {
if _, ok := localCache[fp]; ok {
return true
}
}
conntrack := f.Conntrack
conntrack.Lock()
// Purge every time we test
ep, has := conntrack.TimerWheel.Purge()
if has {
f.evict(ep)
}
c, ok := conntrack.Conns[fp]
if !ok {
conntrack.Unlock()
return false
}
// ... update expiry ...
conntrack.Unlock()
if localCache != nil {
localCache[fp] = struct{}{}
}
return true
}
View File
+10
View File
@@ -98,6 +98,16 @@
morestring=[b]", morestring=[b]",
sensitive=true, sensitive=true,
} }
\lstdefinelanguage{Rust}{
morekeywords={as,break,const,continue,crate,else,enum,extern,false,fn,for,
if,impl,in,let,loop,match,mod,move,mut,pub,ref,return,self,Self,static,
struct,super,trait,true,type,unsafe,use,where,while,async,await,dyn,
Some,None,Option,Result,Ok,Err,Duration},
morecomment=[l]{//},
morecomment=[s]{/*}{*/},
morestring=[b]",
sensitive=true,
}
\lstdefinelanguage{Go}{ \lstdefinelanguage{Go}{
morekeywords={break,case,chan,const,continue,default,defer,else,fallthrough, morekeywords={break,case,chan,const,continue,default,defer,else,fallthrough,
for,func,go,goto,if,import,interface,map,package,range,return,select, for,func,go,goto,if,import,interface,map,package,range,return,select,
+22
View File
@@ -617,3 +617,25 @@
PDF:/home/lhebendanz/Zotero/storage/KM9D625Y/Whitner et al. - 2008 PDF:/home/lhebendanz/Zotero/storage/KM9D625Y/Whitner et al. - 2008
- Improved Packet Reordering Metrics.pdf:application/pdf}, - Improved Packet Reordering Metrics.pdf:application/pdf},
} }
@misc{rfc5681,
title = {TCP Congestion Control},
author = {Allman, Mark and Paxson, Vern and Blanton, Ethan},
year = {2009},
month = sep,
howpublished = {RFC 5681},
doi = {10.17487/RFC5681},
url = {https://www.rfc-editor.org/rfc/rfc5681},
note = {Obsoletes RFC 2581},
}
@misc{chroboczek_babel_2021,
title = {The {Babel} Routing Protocol},
author = {Chroboczek, Juliusz and Schinazi, David},
year = {2021},
month = jun,
howpublished = {RFC 8966},
doi = {10.17487/RFC8966},
url = {https://www.rfc-editor.org/rfc/rfc8966},
note = {Obsoletes RFC 6126},
}