Improved Writing made it less verbouse

2026-03-18 21:50:15 +01:00
parent 6b32967f32
commit 64aeeb5772
4 changed files with 177 additions and 195 deletions
--- a/Chapters/Methodology.tex
+++ b/Chapters/Methodology.tex
@@ -7,11 +7,9 @@
 This chapter describes the methodology used to benchmark and analyze
 peer-to-peer mesh VPN implementations. The evaluation combines
 performance benchmarking under controlled network conditions with a
-structured source code analysis of each implementation. The
-benchmarking framework prioritizes reproducibility at every layer,
-from pinned dependencies and declarative system configuration to
-automated test orchestration, enabling independent verification of
-results and facilitating future comparative studies.
+structured source code analysis of each implementation. All
+dependencies, system configurations, and test procedures are pinned
+or declared so that the experiments can be independently reproduced.

 \section{Experimental Setup}

@@ -29,19 +27,19 @@ identical specifications:
    RDRAND, SSE4.2
 \end{itemize}

-The presence of hardware cryptographic acceleration is relevant because
-many VPN implementations use AES-NI for encryption, and the results
-may differ on systems without these features.
+Results may differ on systems without hardware cryptographic
+acceleration, since most of the tested VPNs offload encryption to
+AES-NI.

 \subsection{Network Topology}

 The three machines are connected via a direct 1 Gbps LAN on the same
 network segment. Each machine has a publicly reachable IPv4 address,
-which is used to deploy configuration changes via Clan. This baseline
-topology provides a controlled environment with minimal latency and no
-packet loss, allowing the overhead introduced by each VPN implementation
-to be measured in isolation. Figure~\ref{fig:mesh_topology} illustrates
-the full-mesh connectivity between the three machines.
+which is used to deploy configuration changes via Clan. On this
+baseline topology, latency is sub-millisecond and there is no packet
+loss, so measured overhead can be attributed to the VPN itself.
+Figure~\ref{fig:mesh_topology} illustrates the full-mesh connectivity
+between the three machines.

 \begin{figure}[H]
  \centering
@@ -74,8 +72,8 @@ double the per-machine values.

 \subsection{Configuration Methodology}

-Each VPN is built from source within the Nix flake, ensuring that all
-dependencies are pinned to exact versions. VPNs not packaged in nixpkgs
+Each VPN is built from source within the Nix flake, with all
+dependencies pinned to exact versions. VPNs not packaged in nixpkgs
 (Hyprspace, EasyTier, VpnCloud) have dedicated build expressions
 under \texttt{pkgs/} in the flake.

@@ -85,13 +83,14 @@ system.

 Generated keys are stored in version control under
 \texttt{vars/per-machine/\{name\}/} and read at NixOS evaluation time,
-making key material part of the reproducible configuration.
+so key material is part of the reproducible configuration.

 \section{Benchmark Suite}

-The benchmark suite includes both synthetic throughput tests and
-real-world workloads. This combination addresses a limitation of prior
-work that relied exclusively on iperf3.
+The benchmark suite includes synthetic throughput tests and
+application-level workloads. Prior comparative work relied exclusively
+on iperf3; the additional benchmarks here capture behavior that
+iperf3 alone misses.
 Table~\ref{tab:benchmark_suite} summarises each benchmark.

 \begin{table}[H]
@@ -114,8 +113,8 @@ Table~\ref{tab:benchmark_suite} summarises each benchmark.
  \end{tabular}
 \end{table}

-The first four benchmarks use well-known network testing tools;
-the remaining three target workloads closer to real-world usage.
+The first four benchmarks use standard network testing tools;
+the remaining three test application-level workloads.
 The subsections below describe configuration details that the table
 does not capture.

@@ -133,48 +132,49 @@ counters.

 \subsection{Parallel iPerf3}

-Runs TCP streams on all three machines simultaneously in a circular
-pattern (A$\rightarrow$B, B$\rightarrow$C, C$\rightarrow$A) for
-60 seconds with zero-copy (\texttt{-Z}). This creates contention
-across the overlay network, stressing shared resources that
-single-stream tests leave idle.
+Runs one bidirectional TCP stream on all three machine pairs
+simultaneously in a circular pattern (A$\rightarrow$B,
+B$\rightarrow$C, C$\rightarrow$A) for 60 seconds with zero-copy
+(\texttt{-Z}). The three concurrent bidirectional links produce six
+unidirectional flows in total. This contention stresses shared
+resources that single-stream tests leave idle.

 \subsection{QPerf}

 Spawns one qperf process per CPU core, each running for 30 seconds.
-Per-core bandwidth is summed per second. Unlike the iPerf3 tests,
-QPerf targets QUIC connection-level performance, capturing time to
-first byte and connection establishment time alongside throughput.
+Per-core bandwidth is summed per second. In addition to throughput,
+QPerf reports time to first byte and connection establishment time,
+which iPerf3 does not measure.

 \subsection{RIST Video Streaming}

 Generates a 4K ($3840\times2160$) H.264 test pattern at 30\,fps
-(ultrafast preset, zerolatency tuning, 25\,Mbps target bitrate) with
-ffmpeg and transmits it over the RIST protocol for 30 seconds. RIST
-(Reliable Internet Stream Transport) is designed for low-latency
-video contribution over unreliable networks, making it a realistic
-test of VPN behavior under multimedia workloads. In addition to
-standard network metrics, the benchmark records encoding-side
-statistics (actual bitrate, frame rate, dropped frames) and
-RIST-specific counters (packets recovered via retransmission, quality
-score).
+(ultrafast preset, zerolatency tuning, 25\,Mbps bitrate cap) with
+ffmpeg and transmits it over the RIST protocol for 30 seconds. Because
+the synthetic test pattern is highly compressible, the actual encoding
+bitrate is approximately 3.3\,Mbps, well below the configured cap. RIST
+(Reliable Internet Stream Transport) is a protocol for low-latency
+video contribution over unreliable networks. The benchmark records
+encoding-side statistics (actual bitrate, frame rate, dropped frames)
+and RIST-specific counters (packets recovered via retransmission,
+quality score).

 \subsection{Nix Cache Download}

 A Harmonia Nix binary cache server on the target machine serves the
 Firefox package. The client downloads it via \texttt{nix copy}
-through the VPN, exercising many small HTTP requests rather than a
-single bulk transfer. Benchmarked with hyperfine (1 warmup run,
-2 timed runs); the local Nix store and SQLite metadata are cleared
-between runs.
+through the VPN. Unlike the iPerf3 tests, this workload issues many
+short-lived HTTP requests instead of a single bulk transfer.
+Benchmarked with hyperfine (1 warmup run, 2 timed runs); the local
+Nix store and SQLite metadata are cleared between runs.

 \section{Network Impairment Profiles}

-To evaluate VPN performance under different network conditions, four
-impairment profiles are defined, ranging from an unmodified baseline
-to a severely degraded link. All impairments are injected with Linux
-traffic control (\texttt{tc netem}) on the egress side of every
-machine's primary interface.
+Four impairment profiles simulate progressively worse network
+conditions, from an unmodified baseline to a severely degraded link.
+All impairments are injected with Linux traffic control
+(\texttt{tc netem}) on the egress side of every machine's primary
+interface.
 Table~\ref{tab:impairment_profiles} lists the per-machine values.
 Because impairments are applied on both ends of a connection, the
 effective round-trip impact is roughly double the listed values.
@@ -222,14 +222,14 @@ aspect of the simulated degradation:
 \end{itemize}

 A 30-second stabilization period follows TC application before
-measurements begin, allowing queuing disciplines to settle.
+measurements begin so that queuing disciplines can settle.

 \section{Experimental Procedure}

 \subsection{Automation}

-The benchmark suite is fully automated via a Python orchestrator
-(\texttt{vpn\_bench/}). For each VPN under test, the orchestrator:
+A Python orchestrator (\texttt{vpn\_bench/}) automates the full
+benchmark suite. For each VPN under test, it:

 \begin{enumerate}
  \item Cleans all state directories from previous VPN runs
@@ -327,11 +327,12 @@ Each metric is summarized as a statistics dictionary containing:
 Aggregation differs by benchmark type. Benchmarks that execute
 multiple discrete runs, ping (3 runs of 100 packets each) and
 nix-cache (2 timed runs via hyperfine), first compute statistics
-within each run, then average the resulting statistics across runs.
-Concretely, if ping produces three runs with mean RTTs of
-5.1, 5.3, and 5.0\,ms, the reported average is the mean of
-those three values (5.13\,ms). The reported minimum is the
-single lowest RTT observed across all three runs.
+within each run, then aggregate across runs: averages and percentiles
+are averaged, while the reported minimum and maximum are the global
+extremes across all runs. Concretely, if ping produces three runs
+with mean RTTs of 5.1, 5.3, and 5.0\,ms, the reported average is
+the mean of those three values (5.13\,ms). The reported minimum is
+the single lowest RTT observed across all three runs.

 Benchmarks that produce continuous per-second samples, qperf and
 RIST streaming for example, pool all per-second measurements from a single
@@ -340,9 +341,9 @@ bandwidth is first summed across CPU cores for each second, and
 statistics are then computed over the resulting time series.

 The analysis reports empirical percentiles (p25, p50, p75) alongside
-min/max bounds rather than parametric confidence intervals. This
-choice is deliberate: benchmark latency and throughput distributions
-are often skewed or multimodal, making assumptions of normality
+min/max bounds rather than parametric confidence intervals.
+Benchmark latency and throughput distributions are often skewed or
+multimodal, so parametric assumptions of normality would be
 unreliable. The interquartile range (p25--p75) conveys the spread of
 typical observations, while min and max capture outlier behavior.
 The nix-cache benchmark additionally reports standard deviation via
@@ -350,9 +351,8 @@ hyperfine's built-in statistical output.

 \section{Source Code Analysis}

-To complement the performance benchmarks with architectural
-understanding, we conducted a structured source code analysis of
-all ten VPN implementations. The analysis followed three phases.
+We also conducted a structured source code analysis of all ten VPN
+implementations. The analysis followed three phases.

 \subsection{Repository Collection and LLM-Assisted Overview}

@@ -378,23 +378,23 @@ aspects:
 \end{itemize}

 Each agent was required to reference the specific file and line
-range supporting every claim, enabling direct verification.
+range supporting every claim so that outputs could be verified
+against the source.

 \subsection{Manual Verification}

 The LLM-generated overviews served as a navigational aid rather than
 a trusted source. The most important code paths identified in each
 overview were manually read and verified against the actual source
-code, correcting inaccuracies and deepening the analysis where the
-automated summaries remained superficial.
+code. Where the automated summaries were inaccurate or superficial,
+they were corrected and expanded.

 \subsection{Feature Matrix and Maintainer Review}

-The findings from both the automated and manual analysis were
-consolidated into a feature matrix cataloguing 131 features across
-all ten VPN implementations. The matrix covers
-protocol characteristics, cryptographic primitives, NAT traversal
-strategies, routing behavior, and security properties.
+The findings from both phases were consolidated into a feature matrix
+of 131 features across all ten VPN implementations, covering protocol
+characteristics, cryptographic primitives, NAT traversal strategies,
+routing behavior, and security properties.

 The completed feature matrix was published and sent to the respective
 VPN maintainers for review. We incorporated their feedback as
@@ -402,7 +402,7 @@ corrections and clarifications to the final classification.

 \section{Reproducibility}

-The experimental stack pins or declares every variable that could
+The experimental stack pins or declares the variables that could
 affect results.

 \subsection{Dependency Pinning}
@@ -412,8 +412,8 @@ cryptographic hashes (\texttt{narHash}) and commit SHAs for each input.
 Key pinned inputs include:

 \begin{itemize}
-    \bitem{nixpkgs:} Follows \texttt{clan-core/nixpkgs}, ensuring a
-    single version across the dependency graph
+    \bitem{nixpkgs:} Follows \texttt{clan-core/nixpkgs}, so a single
+    version is used across the dependency graph
    \bitem{clan-core:} The Clan framework, pinned to a specific commit
    \bitem{VPN sources:} Hyprspace, EasyTier, Nebula locked to
    exact commits
@@ -527,9 +527,8 @@ VPNs were selected based on:
    \bitem{Linux support:} All VPNs must run on Linux.
 \end{itemize}

-Ten VPN implementations were selected for evaluation, spanning a range
-of architectures from centralized coordination to fully decentralized
-mesh topologies. Table~\ref{tab:vpn_selection} summarizes the selection.
+Table~\ref{tab:vpn_selection} lists the ten VPN implementations
+selected for evaluation.

 \begin{table}[H]
  \centering
@@ -556,7 +555,7 @@ mesh topologies. Table~\ref{tab:vpn_selection} summarizes the selection.
  \end{tabular}
 \end{table}

-WireGuard is included as a reference point despite not being a mesh VPN.
-Its minimal overhead and widespread adoption make it a useful comparison
-for understanding the cost of mesh coordination and NAT traversal logic.
+WireGuard is not a mesh VPN but is included as a reference point.
+Comparing its overhead to the mesh VPNs isolates the cost of mesh
+coordination and NAT traversal.