Improved Writing made it less verbouse

2026-03-18 21:50:15 +01:00
parent 6b32967f32
commit 64aeeb5772
4 changed files with 177 additions and 195 deletions
@@ -2,38 +2,33 @@

 \label{Introduction}

-Peer-to-peer overlay VPNs promise to restore genuine decentralization
-by enabling direct connectivity between nodes regardless of NAT or
-firewall restrictions. Yet practitioners choosing among the growing
-number of mesh VPN implementations must rely largely on anecdotal
-evidence: systematic, reproducible comparisons under realistic
-conditions are scarce.
+Peer-to-peer overlay VPNs allow nodes to connect directly regardless
+of NAT or firewall restrictions. Yet practitioners choosing among the
+growing number of mesh VPN implementations must rely largely on
+anecdotal evidence: systematic, reproducible comparisons under
+realistic conditions are scarce.

 This thesis addresses that gap. We benchmark ten peer-to-peer VPN
 implementations across seven workloads and four network impairment
-profiles, yielding over 300 unique measurements. We complement these
+profiles, producing over 300 unique measurements. We complement these
 performance benchmarks with a source code analysis of each
-implementation, verified through direct engagement with the respective
-maintainers. The entire experimental framework is built on Nix, NixOS,
-and the Clan deployment system, making every result independently
-reproducible.
+implementation, verified by the respective maintainers. The entire
+experimental framework is built on Nix, NixOS, and the Clan deployment
+system, so every result is independently reproducible.

 \section{Motivation}

-Peer-to-peer architectures promise censorship-resistant, fault-tolerant
-infrastructure by eliminating single points of failure
-\cite{shukla_towards_2021}.
-These architectures underpin a growing range of systems, from IoT
-edge computing and content delivery networks to blockchain platforms
-like Ethereum.
-Yet realizing these benefits requires distributing nodes across
-genuinely diverse hosting entities.
+Peer-to-peer architectures can provide censorship-resistant,
+fault-tolerant infrastructure because they have no single point of
+failure \cite{shukla_towards_2021}. IoT edge computing, content
+delivery networks, and blockchain platforms like Ethereum all rely on
+some form of peer-to-peer topology. But these benefits only hold when
+nodes are spread across diverse hosting entities.

 In practice, this diversity remains illusory.
 Amazon, Hetzner, and OVH collectively host 70\% of all Ethereum nodes
-(see Figure~\ref{fig:ethernodes_hosting}),
-concentrating nominally decentralized infrastructure
-within a handful of cloud providers.
+(see Figure~\ref{fig:ethernodes_hosting}), so nominally decentralized
+infrastructure actually sits in a handful of cloud providers.
 More concerning, these providers operate under overlapping regulatory
 jurisdictions,
 predominantly the United States and the European Union.
@@ -49,50 +44,40 @@ data disclosure, or traffic manipulation across a majority of the network.
  \label{fig:ethernodes_hosting}
 \end{figure}

-Why does this centralization persist despite the explicit goals of
-decentralization?
-The answer lies in the practical barriers to self-hosting.
-Cloud providers offer static IP addresses and publicly routable endpoints,
-eliminating the networking complexity that plagues residential and
-small-office deployments.
+This centralization persists because self-hosting is hard. Cloud
+providers offer static IP addresses and publicly routable endpoints,
+which avoids the networking problems that residential and small-office
+deployments face.
 Most internet-connected devices sit behind Network Address Translation (NAT),
 which prevents incoming connections without explicit port forwarding
 or relay infrastructure.
-Combined with dynamic IP assignments from ISPs, maintaining stable
-peer connectivity
-from self-hosted infrastructure traditionally required significant
-technical expertise.
+Combined with dynamic IP assignments from ISPs, stable peer
+connectivity from self-hosted infrastructure has traditionally
+required significant technical expertise.

-Overlay VPNs offer a solution to this fundamental barrier.
-By establishing encrypted tunnels that traverse NAT boundaries,
-mesh VPNs enable direct peer-to-peer connectivity without requiring
-static IP addresses or manual firewall configuration.
-Each node receives a stable virtual address within the overlay network,
-regardless of its underlying network topology.
-In practice, this means a device behind consumer-grade NAT can
-participate as a first-class peer in a distributed system,
-removing the primary technical advantage that cloud providers hold.
+Overlay VPNs solve this problem. They establish encrypted tunnels
+that traverse NAT boundaries, so peers can connect directly without
+static IP addresses or manual firewall configuration. Each node
+receives a stable virtual address within the overlay network,
+regardless of its physical network topology. A device behind
+consumer-grade NAT can therefore participate as a first-class peer
+in a distributed system.

-The Clan deployment framework builds on this foundation.
-Clan uses Nix and NixOS to eliminate configuration drift and
-dependency conflicts, reducing operational overhead enough for a
-single administrator to reliably self-host complex distributed
-services.
-Overlay VPNs are central to Clan's architecture,
-providing the secure peer connectivity that enables nodes
-to form cohesive networks regardless of their physical location or
-NAT situation.
-As illustrated in Figure~\ref{fig:vision-stages}, Clan envisions
-a web interface that enables users to design and deploy private P2P networks
-with minimal configuration, assisted by an integrated LLM
-for contextual guidance and troubleshooting.
+The Clan deployment framework uses Nix and NixOS to eliminate
+configuration drift and dependency conflicts. The result is that a
+single administrator can reliably self-host distributed services.
+Overlay VPNs are central to Clan's architecture: they supply the
+peer connectivity that lets nodes form a network regardless of
+physical location or NAT situation.
+As illustrated in Figure~\ref{fig:vision-stages}, Clan plans to offer
+a web interface that lets users design and deploy private P2P networks
+with minimal configuration, assisted by an integrated LLM.

-During the development of Clan, a recurring challenge became apparent:
-practitioners held divergent preferences for mesh VPN solutions,
-each citing different edge cases where their chosen VPN
-proved unreliable or lacked essential features.
-These discussions were grounded in anecdotal evidence rather than
-systematic evaluation, motivating the present work.
+During Clan's development, a recurring problem surfaced:
+practitioners disagreed on which mesh VPN to use, each pointing to
+different edge cases where their preferred VPN failed or lacked a
+needed feature. These discussions relied on anecdotal evidence rather
+than systematic evaluation, which motivated the present work.

 \subsection{Related Work}

@@ -108,49 +93,45 @@ for distributed systems, analyzing throughput, reliability under packet
 loss, and relay behavior for VPNs including ZeroTier. However, it
 focuses primarily on solutions with a central point of failure and
 limits its workloads to synthetic iperf3 tests. This thesis extends
-that foundation by evaluating a broader set of VPN implementations
-with emphasis on fully decentralized architectures, exercising them
-under real-world workloads such as video streaming and package
-downloads, applying multiple network impairment profiles, and
-providing a fully reproducible experimental framework built on
-Nix, NixOS, and Clan.
+that work: it evaluates a broader set of VPN implementations with
+emphasis on fully decentralized architectures, tests them under
+application-level workloads (video streaming, package downloads),
+applies multiple network impairment profiles, and provides a
+reproducible experimental framework built on Nix, NixOS, and Clan.

-Beyond filling this research gap, a further goal was to create a fully
-automated benchmarking framework capable of generating a public
-leaderboard, similar in spirit to the js-framework-benchmark
-(see Figure~\ref{fig:js-framework-benchmark}). By providing an
-accessible web interface with regularly updated
-results, the framework gives VPN developers a concrete, public
-baseline to measure against.
+A secondary goal was to create an automated benchmarking framework
+that generates a public leaderboard, similar in spirit to the
+js-framework-benchmark (see Figure~\ref{fig:js-framework-benchmark}).
+A web interface with regularly updated results gives VPN developers a
+concrete baseline to measure against.

 \section{Research Contribution}

 This thesis makes the following contributions:

 \begin{enumerate}
-  \item A comprehensive benchmark of ten peer-to-peer VPN
-    implementations across seven workloads (including real-world
-    video streaming and package downloads) and four network
-    impairment profiles, producing over 300 unique measurements.
-  \item A source code analysis of all ten VPN implementations,
-    combining manual code review with LLM-assisted analysis,
-    followed by verification through direct engagement with the
-    respective maintainers on GitHub.
-  \item A fully reproducible experimental framework built on
-    Nix, NixOS, and the Clan deployment system, with pinned
-    dependencies, declarative system configuration, and
-    deterministic cryptographic material generation, enabling
-    independent replication of all results.
-  \item A performance analysis demonstrating that Tailscale
-    outperforms the Linux kernel's default networking stack under
-    degraded conditions, and that kernel parameter tuning (Reno
-      congestion control in place of CUBIC, with RACK
-    disabled) yields measurable throughput improvements.
+  \item A benchmark of ten peer-to-peer VPN implementations across
+    seven workloads (including video streaming and package downloads)
+    and four network impairment profiles, with over 300 unique
+    measurements.
+  \item A source code analysis of all ten VPN implementations. Manual
+    code review was combined with LLM-assisted analysis and the results
+    were verified by the respective maintainers on GitHub.
+  \item A reproducible experimental framework built on Nix, NixOS,
+    and the Clan deployment system. All dependencies are pinned,
+    system configuration is declarative, and cryptographic material
+    is generated deterministically, so every result can be
+    independently replicated.
+  \item A performance analysis showing that Tailscale outperforms the
+    Linux kernel's default networking stack under degraded conditions,
+    and that kernel parameter tuning (Reno congestion control in place
+    of CUBIC, with RACK disabled) yields measurable throughput
+    improvements.
  \item The discovery of several security vulnerabilities across
    the evaluated VPN implementations.
-  \item An automated benchmarking framework designed for public
-    leaderboard generation, intended to encourage ongoing
-    optimization by VPN developers.
+  \item An automated benchmarking framework that produces a public
+    leaderboard, giving VPN developers a target to optimize
+    against.
 \end{enumerate}

 \begin{figure}[H]
@@ -7,11 +7,9 @@
 This chapter describes the methodology used to benchmark and analyze
 peer-to-peer mesh VPN implementations. The evaluation combines
 performance benchmarking under controlled network conditions with a
-structured source code analysis of each implementation. The
-benchmarking framework prioritizes reproducibility at every layer,
-from pinned dependencies and declarative system configuration to
-automated test orchestration, enabling independent verification of
-results and facilitating future comparative studies.
+structured source code analysis of each implementation. All
+dependencies, system configurations, and test procedures are pinned
+or declared so that the experiments can be independently reproduced.

 \section{Experimental Setup}

@@ -29,19 +27,19 @@ identical specifications:
    RDRAND, SSE4.2
 \end{itemize}

-The presence of hardware cryptographic acceleration is relevant because
-many VPN implementations use AES-NI for encryption, and the results
-may differ on systems without these features.
+Results may differ on systems without hardware cryptographic
+acceleration, since most of the tested VPNs offload encryption to
+AES-NI.

 \subsection{Network Topology}

 The three machines are connected via a direct 1 Gbps LAN on the same
 network segment. Each machine has a publicly reachable IPv4 address,
-which is used to deploy configuration changes via Clan. This baseline
-topology provides a controlled environment with minimal latency and no
-packet loss, allowing the overhead introduced by each VPN implementation
-to be measured in isolation. Figure~\ref{fig:mesh_topology} illustrates
-the full-mesh connectivity between the three machines.
+which is used to deploy configuration changes via Clan. On this
+baseline topology, latency is sub-millisecond and there is no packet
+loss, so measured overhead can be attributed to the VPN itself.
+Figure~\ref{fig:mesh_topology} illustrates the full-mesh connectivity
+between the three machines.

 \begin{figure}[H]
  \centering
@@ -74,8 +72,8 @@ double the per-machine values.

 \subsection{Configuration Methodology}

-Each VPN is built from source within the Nix flake, ensuring that all
-dependencies are pinned to exact versions. VPNs not packaged in nixpkgs
+Each VPN is built from source within the Nix flake, with all
+dependencies pinned to exact versions. VPNs not packaged in nixpkgs
 (Hyprspace, EasyTier, VpnCloud) have dedicated build expressions
 under \texttt{pkgs/} in the flake.

@@ -85,13 +83,14 @@ system.

 Generated keys are stored in version control under
 \texttt{vars/per-machine/\{name\}/} and read at NixOS evaluation time,
-making key material part of the reproducible configuration.
+so key material is part of the reproducible configuration.

 \section{Benchmark Suite}

-The benchmark suite includes both synthetic throughput tests and
-real-world workloads. This combination addresses a limitation of prior
-work that relied exclusively on iperf3.
+The benchmark suite includes synthetic throughput tests and
+application-level workloads. Prior comparative work relied exclusively
+on iperf3; the additional benchmarks here capture behavior that
+iperf3 alone misses.
 Table~\ref{tab:benchmark_suite} summarises each benchmark.

 \begin{table}[H]
@@ -114,8 +113,8 @@ Table~\ref{tab:benchmark_suite} summarises each benchmark.
  \end{tabular}
 \end{table}

-The first four benchmarks use well-known network testing tools;
-the remaining three target workloads closer to real-world usage.
+The first four benchmarks use standard network testing tools;
+the remaining three test application-level workloads.
 The subsections below describe configuration details that the table
 does not capture.

@@ -133,48 +132,49 @@ counters.

 \subsection{Parallel iPerf3}

-Runs TCP streams on all three machines simultaneously in a circular
-pattern (A$\rightarrow$B, B$\rightarrow$C, C$\rightarrow$A) for
-60 seconds with zero-copy (\texttt{-Z}). This creates contention
-across the overlay network, stressing shared resources that
-single-stream tests leave idle.
+Runs one bidirectional TCP stream on all three machine pairs
+simultaneously in a circular pattern (A$\rightarrow$B,
+B$\rightarrow$C, C$\rightarrow$A) for 60 seconds with zero-copy
+(\texttt{-Z}). The three concurrent bidirectional links produce six
+unidirectional flows in total. This contention stresses shared
+resources that single-stream tests leave idle.

 \subsection{QPerf}

 Spawns one qperf process per CPU core, each running for 30 seconds.
-Per-core bandwidth is summed per second. Unlike the iPerf3 tests,
-QPerf targets QUIC connection-level performance, capturing time to
-first byte and connection establishment time alongside throughput.
+Per-core bandwidth is summed per second. In addition to throughput,
+QPerf reports time to first byte and connection establishment time,
+which iPerf3 does not measure.

 \subsection{RIST Video Streaming}

 Generates a 4K ($3840\times2160$) H.264 test pattern at 30\,fps
-(ultrafast preset, zerolatency tuning, 25\,Mbps target bitrate) with
-ffmpeg and transmits it over the RIST protocol for 30 seconds. RIST
-(Reliable Internet Stream Transport) is designed for low-latency
-video contribution over unreliable networks, making it a realistic
-test of VPN behavior under multimedia workloads. In addition to
-standard network metrics, the benchmark records encoding-side
-statistics (actual bitrate, frame rate, dropped frames) and
-RIST-specific counters (packets recovered via retransmission, quality
-score).
+(ultrafast preset, zerolatency tuning, 25\,Mbps bitrate cap) with
+ffmpeg and transmits it over the RIST protocol for 30 seconds. Because
+the synthetic test pattern is highly compressible, the actual encoding
+bitrate is approximately 3.3\,Mbps, well below the configured cap. RIST
+(Reliable Internet Stream Transport) is a protocol for low-latency
+video contribution over unreliable networks. The benchmark records
+encoding-side statistics (actual bitrate, frame rate, dropped frames)
+and RIST-specific counters (packets recovered via retransmission,
+quality score).

 \subsection{Nix Cache Download}

 A Harmonia Nix binary cache server on the target machine serves the
 Firefox package. The client downloads it via \texttt{nix copy}
-through the VPN, exercising many small HTTP requests rather than a
-single bulk transfer. Benchmarked with hyperfine (1 warmup run,
-2 timed runs); the local Nix store and SQLite metadata are cleared
-between runs.
+through the VPN. Unlike the iPerf3 tests, this workload issues many
+short-lived HTTP requests instead of a single bulk transfer.
+Benchmarked with hyperfine (1 warmup run, 2 timed runs); the local
+Nix store and SQLite metadata are cleared between runs.

 \section{Network Impairment Profiles}

-To evaluate VPN performance under different network conditions, four
-impairment profiles are defined, ranging from an unmodified baseline
-to a severely degraded link. All impairments are injected with Linux
-traffic control (\texttt{tc netem}) on the egress side of every
-machine's primary interface.
+Four impairment profiles simulate progressively worse network
+conditions, from an unmodified baseline to a severely degraded link.
+All impairments are injected with Linux traffic control
+(\texttt{tc netem}) on the egress side of every machine's primary
+interface.
 Table~\ref{tab:impairment_profiles} lists the per-machine values.
 Because impairments are applied on both ends of a connection, the
 effective round-trip impact is roughly double the listed values.
@@ -222,14 +222,14 @@ aspect of the simulated degradation:
 \end{itemize}

 A 30-second stabilization period follows TC application before
-measurements begin, allowing queuing disciplines to settle.
+measurements begin so that queuing disciplines can settle.

 \section{Experimental Procedure}

 \subsection{Automation}

-The benchmark suite is fully automated via a Python orchestrator
-(\texttt{vpn\_bench/}). For each VPN under test, the orchestrator:
+A Python orchestrator (\texttt{vpn\_bench/}) automates the full
+benchmark suite. For each VPN under test, it:

 \begin{enumerate}
  \item Cleans all state directories from previous VPN runs
@@ -327,11 +327,12 @@ Each metric is summarized as a statistics dictionary containing:
 Aggregation differs by benchmark type. Benchmarks that execute
 multiple discrete runs, ping (3 runs of 100 packets each) and
 nix-cache (2 timed runs via hyperfine), first compute statistics
-within each run, then average the resulting statistics across runs.
-Concretely, if ping produces three runs with mean RTTs of
-5.1, 5.3, and 5.0\,ms, the reported average is the mean of
-those three values (5.13\,ms). The reported minimum is the
-single lowest RTT observed across all three runs.
+within each run, then aggregate across runs: averages and percentiles
+are averaged, while the reported minimum and maximum are the global
+extremes across all runs. Concretely, if ping produces three runs
+with mean RTTs of 5.1, 5.3, and 5.0\,ms, the reported average is
+the mean of those three values (5.13\,ms). The reported minimum is
+the single lowest RTT observed across all three runs.

 Benchmarks that produce continuous per-second samples, qperf and
 RIST streaming for example, pool all per-second measurements from a single
@@ -340,9 +341,9 @@ bandwidth is first summed across CPU cores for each second, and
 statistics are then computed over the resulting time series.

 The analysis reports empirical percentiles (p25, p50, p75) alongside
-min/max bounds rather than parametric confidence intervals. This
-choice is deliberate: benchmark latency and throughput distributions
-are often skewed or multimodal, making assumptions of normality
+min/max bounds rather than parametric confidence intervals.
+Benchmark latency and throughput distributions are often skewed or
+multimodal, so parametric assumptions of normality would be
 unreliable. The interquartile range (p25--p75) conveys the spread of
 typical observations, while min and max capture outlier behavior.
 The nix-cache benchmark additionally reports standard deviation via
@@ -350,9 +351,8 @@ hyperfine's built-in statistical output.

 \section{Source Code Analysis}

-To complement the performance benchmarks with architectural
-understanding, we conducted a structured source code analysis of
-all ten VPN implementations. The analysis followed three phases.
+We also conducted a structured source code analysis of all ten VPN
+implementations. The analysis followed three phases.

 \subsection{Repository Collection and LLM-Assisted Overview}

@@ -378,23 +378,23 @@ aspects:
 \end{itemize}

 Each agent was required to reference the specific file and line
-range supporting every claim, enabling direct verification.
+range supporting every claim so that outputs could be verified
+against the source.

 \subsection{Manual Verification}

 The LLM-generated overviews served as a navigational aid rather than
 a trusted source. The most important code paths identified in each
 overview were manually read and verified against the actual source
-code, correcting inaccuracies and deepening the analysis where the
-automated summaries remained superficial.
+code. Where the automated summaries were inaccurate or superficial,
+they were corrected and expanded.

 \subsection{Feature Matrix and Maintainer Review}

-The findings from both the automated and manual analysis were
-consolidated into a feature matrix cataloguing 131 features across
-all ten VPN implementations. The matrix covers
-protocol characteristics, cryptographic primitives, NAT traversal
-strategies, routing behavior, and security properties.
+The findings from both phases were consolidated into a feature matrix
+of 131 features across all ten VPN implementations, covering protocol
+characteristics, cryptographic primitives, NAT traversal strategies,
+routing behavior, and security properties.

 The completed feature matrix was published and sent to the respective
 VPN maintainers for review. We incorporated their feedback as
@@ -402,7 +402,7 @@ corrections and clarifications to the final classification.

 \section{Reproducibility}

-The experimental stack pins or declares every variable that could
+The experimental stack pins or declares the variables that could
 affect results.

 \subsection{Dependency Pinning}
@@ -412,8 +412,8 @@ cryptographic hashes (\texttt{narHash}) and commit SHAs for each input.
 Key pinned inputs include:

 \begin{itemize}
-    \bitem{nixpkgs:} Follows \texttt{clan-core/nixpkgs}, ensuring a
-    single version across the dependency graph
+    \bitem{nixpkgs:} Follows \texttt{clan-core/nixpkgs}, so a single
+    version is used across the dependency graph
    \bitem{clan-core:} The Clan framework, pinned to a specific commit
    \bitem{VPN sources:} Hyprspace, EasyTier, Nebula locked to
    exact commits
@@ -527,9 +527,8 @@ VPNs were selected based on:
    \bitem{Linux support:} All VPNs must run on Linux.
 \end{itemize}

-Ten VPN implementations were selected for evaluation, spanning a range
-of architectures from centralized coordination to fully decentralized
-mesh topologies. Table~\ref{tab:vpn_selection} summarizes the selection.
+Table~\ref{tab:vpn_selection} lists the ten VPN implementations
+selected for evaluation.

 \begin{table}[H]
  \centering
@@ -556,7 +555,7 @@ mesh topologies. Table~\ref{tab:vpn_selection} summarizes the selection.
  \end{tabular}
 \end{table}

-WireGuard is included as a reference point despite not being a mesh VPN.
-Its minimal overhead and widespread adoption make it a useful comparison
-for understanding the cost of mesh coordination and NAT traversal logic.
+WireGuard is not a mesh VPN but is included as a reference point.
+Comparing its overhead to the mesh VPNs isolates the cost of mesh
+coordination and NAT traversal.

@@ -10,8 +10,9 @@ follows the impairment profiles from ideal to degraded:
 Section~\ref{sec:baseline} establishes overhead under ideal
 conditions, then subsequent sections examine how each VPN responds to
 increasing network impairment. The chapter concludes with findings
-from the source code analysis. A recurring theme throughout is that
-no single metric captures VPN performance; the rankings shift
+from the source code analysis. A recurring theme is that no single
+metric captures VPN
+performance; the rankings shift
 depending on whether one measures throughput, latency, retransmit
 behavior, or real-world application performance.

@@ -184,7 +185,7 @@ opposite extreme: brute-force retransmission can still yield high
 throughput (814\,Mbps with 1\,163 retransmits), at the cost of wasted
 bandwidth and unstable flow behavior.

-VpnCloud warrants specific attention: its sender reports 538.8\,Mbps
+VpnCloud stands out: its sender reports 538.8\,Mbps
 but the receiver measures only 413.4\,Mbps, leaving a 23\,\% gap (the largest
 in the dataset). This suggests significant in-tunnel packet loss or
 buffering at the VpnCloud layer that the retransmit count (857)
@@ -256,10 +257,10 @@ times, which cluster into three distinct ranges.
 \end{table}

 Six VPNs stay below 1.3\,ms, comfortably close to the bare-metal
-0.60\,ms.  VpnCloud is a notable result: it posts the lowest latency
-of any VPN (1.13\,ms), edging out WireGuard (1.20\,ms), yet its
-throughput tops out at only 539\,Mbps.  Low per-packet latency does
-not guarantee high bulk throughput.  A second group (Headscale,
+0.60\,ms.  VpnCloud posts the lowest latency of any VPN (1.13\,ms), below
+WireGuard (1.20\,ms), yet its throughput tops out at only 539\,Mbps.
+Low per-packet latency does not guarantee high bulk throughput.  A
+second group (Headscale,
 Hyprspace, Yggdrasil) lands in the 1.5--2.2\,ms range, representing
 moderate overhead.  Then there is Mycelium at 34.9\,ms, so far
 removed from the rest that Section~\ref{sec:mycelium_routing} gives
@@ -289,8 +290,8 @@ the CPU, not the network, is the bottleneck.
 Figure~\ref{fig:latency_throughput} makes this disconnect easy to
 spot.

-Looking at CPU efficiency more broadly, the qperf measurements
-reveal a wide spread.  Hyprspace (55.1\,\%) and Yggdrasil
+The qperf measurements also reveal a wide spread in CPU usage.
+Hyprspace (55.1\,\%) and Yggdrasil
 (52.8\,\%) consume 5--6$\times$ as much CPU as Internal's
 9.7\,\%.  WireGuard sits at 30.8\,\%, surprisingly high for a
 kernel-level implementation, though much of that goes to
@@ -318,20 +319,21 @@ The single-stream benchmark tests one link direction at a time.  The
 parallel benchmark changes this setup: all three link directions
 (lom$\rightarrow$yuki, yuki$\rightarrow$luna,
 luna$\rightarrow$lom) run simultaneously in a circular pattern for
-60~seconds, each carrying ten TCP streams.  Because three independent
+60~seconds, each carrying one bidirectional TCP stream (six
+unidirectional flows in total).  Because three independent
 link pairs now compete for shared tunnel resources at once, the
 aggregate throughput is naturally higher than any single direction
 alone, which is why even Internal reaches 1.50$\times$ its
 single-stream figure.  The scaling factor (parallel throughput
-divided by single-stream throughput) therefore captures two effects:
-the benefit of utilizing multiple link pairs in parallel, and how
+divided by single-stream throughput) captures two effects:
+the benefit of using multiple link pairs in parallel, and how
 well the VPN handles the resulting contention.
 Table~\ref{tab:parallel_scaling} lists the results.

 \begin{table}[H]
  \centering
  \caption{Parallel TCP scaling at baseline. Scaling factor is the
-    ratio of ten-stream to single-stream throughput. Internal's
+    ratio of parallel to single-stream throughput. Internal's
  1.50$\times$ represents the expected scaling on this hardware.}
  \label{tab:parallel_scaling}
  \begin{tabular}{lrrr}
@@ -357,7 +359,7 @@ Table~\ref{tab:parallel_scaling} lists the results.
 The VPNs that gain the most are those most constrained in
 single-stream mode.  Mycelium's 34.9\,ms RTT means a lone TCP stream
 can never fill the pipe: the bandwidth-delay product demands a window
-larger than any single flow maintains, so ten streams collectively
+larger than any single flow maintains, so multiple concurrent flows
 compensate for that constraint and push throughput to 2.20$\times$
 the single-stream figure.  Hyprspace scales almost as well
 (2.18$\times$) but for a
@@ -379,8 +381,8 @@ streams: throughput drops from 706\,Mbps to 648\,Mbps
 streams are clearly fighting each other for resources inside the
 tunnel.

-More streams also amplify existing retransmit problems across the
-board.  Hyprspace climbs from 4\,965 to 17\,426~retransmits;
+More streams also amplify existing retransmit problems.  Hyprspace
+climbs from 4\,965 to 17\,426~retransmits;
 VpnCloud from 857 to 6\,023.  VPNs that were clean in single-stream
 mode stay clean under load, while the stressed ones only get worse.

@@ -702,8 +704,8 @@ propagate.
 \label{sec:pathological}

 Three VPNs exhibit behaviors that the aggregate numbers alone cannot
-explain.  The following subsections synthesize observations from the
-preceding benchmarks into per-VPN diagnoses.
+explain.  The following subsections piece together observations from
+earlier benchmarks into per-VPN diagnoses.

 \paragraph{Hyprspace: Buffer Bloat.}
 \label{sec:hyprspace_bloat}
@@ -8,21 +8,21 @@
  \addchaptertocentry{Zusammenfassung}

  Diese Arbeit evaluiert zehn Peer-to-Peer-Mesh-VPN-Implementierungen
-  unter kontrollierten Netzwerkbedingungen mithilfe eines
+  under kontrollierten Netzwerkbedingungen mithilfe eines
  reproduzierbaren, Nix-basierten Benchmark-Frameworks, das auf einem
  Deployment-System namens Clan aufbaut. Die Implementierungen reichen
-  von Kernel-Protokollen (WireGuard, als Referenz-Baseline) bis zu
+  von Kernel-Protokollen (WireGuard, also Reference-Baseline) bis zu
  Userspace-Overlays (Tinc, Yggdrasil, Nebula, Hyprspace und
-  weitere). Jede wird unter vier Beeinträchtigungsprofilen mit
+  weitere). Jede wird under vier Beeinträchtigungsprofilen mit
  variierendem Paketverlust, Paketumsortierung, Latenz und Jitter
  getestet, was über 300 Messungen in sieben Benchmarks ergibt, von
  reinem TCP- und UDP-Durchsatz bis zu Video-Streaming und
  Anwendungs-Downloads.

-  Ein zentrales Ergebnis ist, dass keine einzelne Metrik die
+  In zentrales Ergebnis ist, dass keine einzelne Metrik die
  VPN-Leistung vollständig erfasst: Die Rangfolge verschiebt sich je
  nachdem, ob Durchsatz, Latenz, Retransmit-Verhalten oder
-  Transferzeit auf Anwendungsebene gemessen wird. Unter
+  Transferzeit auf Anwendungsebene gemessen wird. Under
  Netzwerkbeeinträchtigung übertrifft Tailscale (über Headscale) den
  Standard-Netzwerkstack des Linux-Kernels, eine Anomalie, die wir
  auf die optimierten Congestion-Control- und Pufferparameter seines