|
|
|
|
@@ -72,7 +72,6 @@ and reordering. These impairments are applied symmetrically on all
|
|
|
|
|
machines, meaning effective round-trip impairment is approximately
|
|
|
|
|
double the per-machine values.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
\subsection{Configuration Methodology}
|
|
|
|
|
|
|
|
|
|
Each VPN is built from source within the Nix flake, ensuring that all
|
|
|
|
|
@@ -93,109 +92,92 @@ making key material part of the reproducible configuration.
|
|
|
|
|
The benchmark suite includes both synthetic throughput tests and
|
|
|
|
|
real-world workloads. This combination addresses a limitation of prior
|
|
|
|
|
work that relied exclusively on iperf3.
|
|
|
|
|
Table~\ref{tab:benchmark_suite} summarises each benchmark.
|
|
|
|
|
|
|
|
|
|
\begin{table}[H]
|
|
|
|
|
\centering
|
|
|
|
|
\caption{Benchmark suite overview}
|
|
|
|
|
\label{tab:benchmark_suite}
|
|
|
|
|
\begin{tabular}{llll}
|
|
|
|
|
\hline
|
|
|
|
|
\textbf{Benchmark} & \textbf{Protocol} & \textbf{Duration} &
|
|
|
|
|
\textbf{Key Metrics} \\
|
|
|
|
|
\hline
|
|
|
|
|
Ping & ICMP & 3 runs $\times$ 100 pkts & RTT, packet loss \\
|
|
|
|
|
TCP iPerf3 & TCP & 30 s & Throughput, retransmits, CPU \\
|
|
|
|
|
UDP iPerf3 & UDP & 30 s & Throughput, jitter, packet loss \\
|
|
|
|
|
Parallel iPerf3 & TCP & 60 s & Throughput under contention \\
|
|
|
|
|
QPerf & QUIC & 30 s & Bandwidth, TTFB, conn. time \\
|
|
|
|
|
RIST Streaming & RIST & 30 s & Bitrate, dropped frames, RTT \\
|
|
|
|
|
Nix Cache Download & HTTP & 2 runs & Download duration \\
|
|
|
|
|
\hline
|
|
|
|
|
\end{tabular}
|
|
|
|
|
\end{table}
|
|
|
|
|
|
|
|
|
|
The first four benchmarks use well-known network testing tools.
|
|
|
|
|
The remaining three target workloads that are closer to real-world
|
|
|
|
|
usage. The subsections below describe the configuration details
|
|
|
|
|
that the table does not capture.
|
|
|
|
|
|
|
|
|
|
\subsection{Ping}
|
|
|
|
|
|
|
|
|
|
Measures ICMP round-trip latency and packet delivery reliability.
|
|
|
|
|
Sends 100 ICMP echo requests at 200\,ms intervals with a 1-second
|
|
|
|
|
per-packet timeout, repeated for 3 runs.
|
|
|
|
|
|
|
|
|
|
\begin{itemize}
|
|
|
|
|
\bitem{Method:} 100 ICMP echo requests at 200 ms intervals,
|
|
|
|
|
1-second per-packet timeout, repeated for 3 runs.
|
|
|
|
|
\bitem{Metrics:} RTT (min, avg, max, mdev), packet loss percentage,
|
|
|
|
|
per-packet RTTs.
|
|
|
|
|
\end{itemize}
|
|
|
|
|
\subsection{TCP and UDP iPerf3}
|
|
|
|
|
|
|
|
|
|
\subsection{TCP iPerf3}
|
|
|
|
|
|
|
|
|
|
Measures bulk TCP throughput with iperf3
|
|
|
|
|
a common tool used in research to measure network performance.
|
|
|
|
|
|
|
|
|
|
\begin{itemize}
|
|
|
|
|
\bitem{Method:} 30-second bidirectional test zero-copy mode
|
|
|
|
|
(\texttt{-Z}) to minimize CPU
|
|
|
|
|
overhead.
|
|
|
|
|
\bitem{Metrics:} Throughput (bits/s), retransmits, congestion
|
|
|
|
|
window and CPU utilization.
|
|
|
|
|
\end{itemize}
|
|
|
|
|
|
|
|
|
|
\subsection{UDP iPerf3}
|
|
|
|
|
|
|
|
|
|
Measures bulk UDP throughput with the same flags as the TCP Iperf3 benchmark.
|
|
|
|
|
|
|
|
|
|
\begin{itemize}
|
|
|
|
|
\bitem{Method:} plus unlimited target bandwidth (\texttt{-b 0}) and
|
|
|
|
|
64-bit counters flags.
|
|
|
|
|
\bitem{Metrics:} Throughput (bits/s), jitter, packet loss and CPU
|
|
|
|
|
utilization.
|
|
|
|
|
\end{itemize}
|
|
|
|
|
Both tests run for 30 seconds in bidirectional mode with zero-copy
|
|
|
|
|
(\texttt{-Z}) to minimize CPU overhead. The UDP variant additionally
|
|
|
|
|
sets unlimited target bandwidth (\texttt{-b 0}) and enables 64-bit
|
|
|
|
|
counters.
|
|
|
|
|
|
|
|
|
|
\subsection{Parallel iPerf3}
|
|
|
|
|
|
|
|
|
|
Tests concurrent overlay network traffic by running TCP streams on all machines
|
|
|
|
|
simultaneously in a circular pattern (A$\rightarrow$B,
|
|
|
|
|
B$\rightarrow$C, C$\rightarrow$A) for 60 seconds. This simulates
|
|
|
|
|
contention across the overlay network.
|
|
|
|
|
|
|
|
|
|
\begin{itemize}
|
|
|
|
|
\bitem{Method:} 60-second bidirectional test zero-copy mode
|
|
|
|
|
(\texttt{-Z}) to minimize CPU
|
|
|
|
|
overhead.
|
|
|
|
|
\bitem{Metrics:} Throughput (bits/s), retransmits, congestion
|
|
|
|
|
window and CPU utilization.
|
|
|
|
|
\end{itemize}
|
|
|
|
|
Runs TCP streams on all three machines simultaneously in a circular
|
|
|
|
|
pattern (A$\rightarrow$B, B$\rightarrow$C, C$\rightarrow$A) for
|
|
|
|
|
60 seconds with zero-copy (\texttt{-Z}). This creates contention
|
|
|
|
|
across the overlay network, stressing shared resources that
|
|
|
|
|
single-stream tests leave idle.
|
|
|
|
|
|
|
|
|
|
\subsection{QPerf}
|
|
|
|
|
|
|
|
|
|
Measures connection-level QUIC performance rather
|
|
|
|
|
than bulk UDP or TCP throughput.
|
|
|
|
|
|
|
|
|
|
\begin{itemize}
|
|
|
|
|
\bitem{Method:} One qperf process per CPU core in parallel, each
|
|
|
|
|
running for 30 seconds. Bandwidth from all cores is summed per second.
|
|
|
|
|
\bitem{Metrics:} Total bandwidth (Mbps), CPU usage, time to first
|
|
|
|
|
byte (TTFB), connection establishment time.
|
|
|
|
|
\end{itemize}
|
|
|
|
|
Spawns one qperf process per CPU core, each running for 30 seconds.
|
|
|
|
|
Per-core bandwidth is summed per second. Unlike the iPerf3 tests,
|
|
|
|
|
QPerf targets QUIC connection-level performance, capturing time to
|
|
|
|
|
first byte and connection establishment time alongside throughput.
|
|
|
|
|
|
|
|
|
|
\subsection{RIST Video Streaming}
|
|
|
|
|
|
|
|
|
|
Measures real-time multimedia streaming performance.
|
|
|
|
|
|
|
|
|
|
\begin{itemize}
|
|
|
|
|
\bitem{Method:} The sender generates a 4K ($3840\times2160$) test
|
|
|
|
|
pattern at 30 fps using ffmpeg with H.264 encoding (ultrafast preset,
|
|
|
|
|
zerolatency tuning) at 25 Mbps target bitrate. The stream is transmitted
|
|
|
|
|
over the RIST protocol to a receiver on the target machine for 30 seconds.
|
|
|
|
|
\bitem{Encoding metrics:} Actual bitrate, frame rate, dropped frames.
|
|
|
|
|
\bitem{Network metrics:} Packets dropped, packets recovered via
|
|
|
|
|
RIST retransmission, RTT, quality score (0--100), received bitrate.
|
|
|
|
|
\end{itemize}
|
|
|
|
|
|
|
|
|
|
RIST (Reliable Internet Stream Transport) is a protocol designed for
|
|
|
|
|
low-latency video contribution over unreliable networks, making it a
|
|
|
|
|
realistic test of VPN behavior under multimedia workloads.
|
|
|
|
|
Generates a 4K ($3840\times2160$) H.264 test pattern at 30\,fps
|
|
|
|
|
(ultrafast preset, zerolatency tuning, 25\,Mbps target bitrate) with
|
|
|
|
|
ffmpeg and transmits it over the RIST protocol for 30 seconds. RIST
|
|
|
|
|
(Reliable Internet Stream Transport) is designed for low-latency
|
|
|
|
|
video contribution over unreliable networks, making it a realistic
|
|
|
|
|
test of VPN behavior under multimedia workloads. In addition to
|
|
|
|
|
standard network metrics, the benchmark records encoding-side
|
|
|
|
|
statistics (actual bitrate, frame rate, dropped frames) and
|
|
|
|
|
RIST-specific counters (packets recovered via retransmission, quality
|
|
|
|
|
score).
|
|
|
|
|
|
|
|
|
|
\subsection{Nix Cache Download}
|
|
|
|
|
|
|
|
|
|
Measures sustained HTTP download performance of many small files
|
|
|
|
|
using a real-world workload.
|
|
|
|
|
|
|
|
|
|
\begin{itemize}
|
|
|
|
|
\bitem{Method:} A Harmonia Nix binary cache server on the target
|
|
|
|
|
machine serves the Firefox package. The client downloads it via
|
|
|
|
|
\texttt{nix copy} through the VPN. Benchmarked with hyperfine:
|
|
|
|
|
1 warmup run followed by 2 timed runs. The local cache and Nix's
|
|
|
|
|
SQLite metadata are cleared between runs.
|
|
|
|
|
\bitem{Metrics:} Mean duration (seconds), standard deviation,
|
|
|
|
|
min/max duration.
|
|
|
|
|
\end{itemize}
|
|
|
|
|
A Harmonia Nix binary cache server on the target machine serves the
|
|
|
|
|
Firefox package. The client downloads it via \texttt{nix copy}
|
|
|
|
|
through the VPN, exercising many small HTTP requests rather than a
|
|
|
|
|
single bulk transfer. Benchmarked with hyperfine (1 warmup run,
|
|
|
|
|
2 timed runs); the local Nix store and SQLite metadata are cleared
|
|
|
|
|
between runs.
|
|
|
|
|
|
|
|
|
|
\section{Network Impairment Profiles}
|
|
|
|
|
|
|
|
|
|
Four impairment profiles simulate a range of network conditions, from
|
|
|
|
|
ideal to severely degraded. Impairments are applied via Linux traffic
|
|
|
|
|
control (\texttt{tc netem}) on every machine's primary interface.
|
|
|
|
|
Table~\ref{tab:impairment_profiles} shows the per-machine values;
|
|
|
|
|
effective round-trip impairment is approximately doubled.
|
|
|
|
|
To evaluate VPN performance under different network conditions, four
|
|
|
|
|
impairment profiles are defined, ranging from an unmodified baseline
|
|
|
|
|
to a severely degraded link. All impairments are injected with Linux
|
|
|
|
|
traffic control (\texttt{tc netem}) on the egress side of every
|
|
|
|
|
machine's primary interface.
|
|
|
|
|
Table~\ref{tab:impairment_profiles} lists the per-machine values.
|
|
|
|
|
Because impairments are applied on both ends of a connection, the
|
|
|
|
|
effective round-trip impact is roughly double the listed values.
|
|
|
|
|
|
|
|
|
|
\begin{table}[H]
|
|
|
|
|
\centering
|
|
|
|
|
@@ -214,12 +196,30 @@ effective round-trip impairment is approximately doubled.
|
|
|
|
|
\end{tabular}
|
|
|
|
|
\end{table}
|
|
|
|
|
|
|
|
|
|
The correlation column controls how strongly each packet's impairment
|
|
|
|
|
depends on the preceding packet. At 0\% correlation, loss and
|
|
|
|
|
reordering events are independent; at higher values they occur in
|
|
|
|
|
bursts, because a packet that was lost or reordered increases the
|
|
|
|
|
probability that the next packet suffers the same fate. This produces
|
|
|
|
|
realistic bursty degradation rather than uniformly distributed drops.
|
|
|
|
|
Each column in Table~\ref{tab:impairment_profiles} controls one
|
|
|
|
|
aspect of the simulated degradation:
|
|
|
|
|
|
|
|
|
|
\begin{itemize}
|
|
|
|
|
\item \textbf{Latency} is a constant delay added to every outgoing
|
|
|
|
|
packet. For example, 2\,ms on each machine adds roughly 4\,ms to
|
|
|
|
|
the round trip.
|
|
|
|
|
\item \textbf{Jitter} introduces random variation on top of the
|
|
|
|
|
fixed latency. A packet on the Low profile may see anywhere
|
|
|
|
|
between 0 and 4\,ms of total added delay instead of exactly
|
|
|
|
|
2\,ms.
|
|
|
|
|
\item \textbf{Loss} is the fraction of packets that are silently
|
|
|
|
|
dropped. At 0.25\,\% (Low profile), roughly 1 in 400 packets is
|
|
|
|
|
discarded.
|
|
|
|
|
\item \textbf{Reorder} is the fraction of packets that arrive out
|
|
|
|
|
of sequence. \texttt{tc netem} achieves this by giving selected
|
|
|
|
|
packets a shorter delay than their predecessors, so they overtake
|
|
|
|
|
earlier packets.
|
|
|
|
|
\item \textbf{Correlation} determines whether impairment events are
|
|
|
|
|
independent or bursty. At 0\,\%, each packet's fate is decided
|
|
|
|
|
independently. At higher values, a packet that was lost or
|
|
|
|
|
reordered raises the probability that the next packet suffers the
|
|
|
|
|
same fate, producing the burst patterns typical of real networks.
|
|
|
|
|
\end{itemize}
|
|
|
|
|
|
|
|
|
|
A 30-second stabilization period follows TC application before
|
|
|
|
|
measurements begin, allowing queuing disciplines to settle.
|
|
|
|
|
@@ -348,8 +348,6 @@ typical observations, while min and max capture outlier behavior.
|
|
|
|
|
The nix-cache benchmark additionally reports standard deviation via
|
|
|
|
|
hyperfine's built-in statistical output.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
\section{Source Code Analysis}
|
|
|
|
|
|
|
|
|
|
To complement the performance benchmarks with architectural
|
|
|
|
|
@@ -517,8 +515,6 @@ wall-clock duration, number of attempts, VPN restart count and
|
|
|
|
|
duration, connectivity wait time, source and target machine names,
|
|
|
|
|
and on failure, the relevant service logs.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
\section{VPNs Under Test}
|
|
|
|
|
|
|
|
|
|
VPNs were selected based on:
|
|
|
|
|
@@ -532,7 +528,6 @@ VPNs were selected based on:
|
|
|
|
|
\bitem{Linux support:} All VPNs must run on Linux.
|
|
|
|
|
\end{itemize}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Ten VPN implementations were selected for evaluation, spanning a range
|
|
|
|
|
of architectures from centralized coordination to fully decentralized
|
|
|
|
|
mesh topologies. Table~\ref{tab:vpn_selection} summarizes the selection.
|
|
|
|
|
|