added motivation

2024-12-02 02:33:45 +01:00
parent 0bfd7e8291
commit dab2c4011e
8 changed files with 312 additions and 351 deletions
--- a/Chapters/Methodology.tex
+++ b/Chapters/Methodology.tex
@@ -0,0 +1,238 @@
+% Chapter Template
+
+\chapter{Methodology} % Main chapter title
+
+\label{Methodology} % Change X to a consecutive number; for
+% referencing this chapter elsewhere, use \ref{ChapterX}
+
+%----------------------------------------------------------------------------------------
+%  SECTION 1
+%----------------------------------------------------------------------------------------
+
+This chapter describes the methodology used to evaluate and analyze
+the Clan framework. A summary of the logical flow of this research is
+depicted in Figure \ref{fig:clan_thesis_argumentation_tree}.
+
+\begin{figure}[H]
+  \centering
+  \includesvg[width=1\textwidth,
+  keepaspectratio]{Figures/clan_thesis_argumentation_tree.drawio.svg}
+  \caption{Argumentation Tree for the Clan Thesis}
+  \label{fig:clan_thesis_argumentation_tree}
+\end{figure}
+
+The structure of this study adopts a multi-faceted approach,
+addressing several interrelated challenges in enhancing the
+reliability and manageability of \ac{P2P} networks.
+The primary objective is to assess how the Clan framework effectively
+addresses these challenges.
+
+The research methodology consists of two main components:
+\begin{enumerate}
+  \item \textbf{Development of a Theoretical Model} \\
+    A theoretical model of the Clan framework will be constructed.
+    This includes a formal specification of the system's foundational
+    axioms, outlining the principles and properties that guide its
+    design. From these axioms, key theorems will be derived, along
+    with their boundary conditions. The aim is to understand the
+    mechanisms underpinning the framework and establish a basis for
+    its evaluation.
+
+  \item \textbf{Empirical Validation of the Theoretical Model} \\
+    Practical experiments will be conducted to validate the
+    predictions of the theoretical model. These experiments will
+    evaluate how well the model aligns with observed performance in
+    real-world settings. This step is crucial to identifying the
+    model’s strengths and limitations.
+\end{enumerate}
+
+The methodology will particularly examine three core components of
+the Clan framework:
+\begin{itemize}
+  \item \textbf{Clan Deployment System} \\
+    The deployment system is the core of the Clan framework, enabling
+    the configuration and management of distributed software
+    components. It simplifies complex configurations through Python
+    code, which abstracts the intricacies of the Nix language.
+    Central to this system is the "inventory," a mergeable data
+    structure designed for ensuring consistent service configurations
+    across nodes without conflicts. This component will be analyzed
+    for its design, functionality, efficiency, scalability, and fault
+    resilience.
+
+  \item \textbf{Overlay Networks / Mesh VPNs} \\
+    Overlay networks, also known as "Mesh VPNs," are critical for
+    secure communication in Clan’s \ac{P2P} deployment. The study
+    will evaluate their performance in terms of security,
+    scalability, and resilience to network disruptions. Specifically,
+    the assessment will include how well these networks handle
+    traffic in environments where no device has a public IP address,
+    as well as the impact of node failures on overall
+    connectivity. The analysis will focus on:
+    \begin{itemize}
+      \item \textbf{ZeroTier}: A globally distributed "Ethernet Switch".
+      \item \textbf{Mycelium}: An end-to-end encrypted IPv6 overlay network.
+      \item \textbf{Hyprspace}: A lightweight VPN leveraging IPFS and libp2p.
+    \end{itemize}
+
+  \item \textbf{Data Mesher} \\
+    The Data Mesher is responsible for data synchronization across
+    nodes, ensuring eventual consistency in Clan’s decentralized network. This
+    component will be evaluated for synchronization speed, fault
+    tolerance, and conflict resolution mechanisms. Additionally, it
+    will be analyzed for its resilience in scenarios involving
+    malicious nodes, measuring how effectively it prevents and
+    mitigates manipulation or integrity violations during data
+    replication and distribution.
+\end{itemize}
+
+\section{Related Work}
+
+The Clan framework operates within the realm of software deployment
+and peer-to-peer networking,
+necessitating a deep understanding of existing methodologies in these
+areas to tackle contemporary challenges.
+This section will discuss related works encompassing system
+deployment, peer data management,
+and low maintenance structured peer-to-peer overlays, which inform
+the development and positioning of the Clan framework.
+
+\subsection{Nix: A Safe and Policy-Free System for Software Deployment}
+
+Nix addresses significant issues in software deployment by utilizing
+a technique that employs cryptographic
+hashes to ensure unique paths for component instances \cite{dolstra_nix_2004}.
+The system is distinguished by its features, such as concurrent
+installation of multiple versions and variants,
+atomic upgrades, and safe garbage collection.
+These capabilities lead to a flexible deployment system that
+harmonizes source and binary deployments.
+Nix conceptualizes deployment without imposing rigid policies,
+thereby offering adaptable strategies for component management.
+This contrasts with many prevailing systems that are constrained by
+policy-specific designs,
+making Nix an easily extensible, safe and versatile deployment solution
+for configuration files and software.
+
+As Clan makes extensive use of Nix for deployment, understanding the
+foundations and principles of Nix is crucial for evaluating inner workings.
+
+\subsection{NixOS: A Purely Functional Linux Distribution}
+
+NixOS is an extension of the principles established by Nix,
+presenting a Linux distribution that manages system configurations
+using purely functional methods \cite{dolstra_nixos_2008}. This model
+ensures that system
+configurations are reproducible and isolated
+from stateful interactions typical in imperative models of package management.
+Because NixOS configurations are built by pure functions, they can overcome the
+challenges of easily rolling back changes, deploying multiple package versions
+side-by-side, and achieving deterministic configuration reproduction .
+The solution is particularly compelling in environments necessitating rigorous
+reproducibility and minimal configuration drift—a valuable feature
+for distributed networks .
+
+Clan also leverages NixOS for system configuration and deployment,
+making it essential to understand how NixOS's functional model works.
+
+\subsection{Disnix: A Toolset for Distributed Deployment}
+
+Disnix extends the Nix philosophy to the challenge of distributed
+deployment, offering a toolset that enables system administrators and
+developers to perform automatic deployment of service-oriented
+systems across a network of machines \cite{van_der_burg_disnix_2014}.
+Disnix leverages the features of Nix to manage complex intra-dependencies.
+Meaning dependencies that exist on a network level instead on a binary levle.
+The overlap with the Clan framework is evident in the focus on deployment, how
+they differ will be explored in the evaluation of Clan's deployment system.
+
+\subsection{State of the Art in Software Defined Networking}
+
+The work by Bakhshi \cite{bakhshi_state_2017} surveys the
+foundational principles and recent developments in Software Defined
+Networking (SDN). It describes SDN as a paradigm that separates the
+control plane from the data plane, enabling centralized, programmable
+control over network behavior. The paper focuses on the architectural
+components of SDN, including the three-layer abstraction model—the
+application layer, control layer, and data layer—and highlights the
+role of SDN controllers such as OpenDaylight, Floodlight, and Ryu.
+
+A key contribution of the paper is its identification of challenges
+and open research questions in SDN. These include issues related to
+scalability, fault tolerance, and the security risks introduced by
+centralized control.
+
+This work is relevant to evaluating Clan’s role as a
+Software Defined Network deployment tool and as a
+comparison point against the state of the art.
+
+\subsection{Low Maintenance Peer-to-Peer Overlays}
+
+Structured Peer-to-Peer (P2P) overlay networks offer scalability and
+efficiency but often require significant maintenance to handle
+challenges such as peer churn and mismatched logical and physical
+topologies. Shukla et al. propose a novel approach to designing
+Distributed Hash Table (DHT)-based P2P overlays by integrating
+Software Defined Networks (SDNs) to dynamically adjust
+application-specific network policies and rules
+\cite{shukla_towards_2021}. This method reduces maintenance overhead
+by aligning overlay topology with the underlying physical network,
+thus improving performance and reducing communication costs.
+
+The relevance of this work to Clan lies in its addressing of
+operational complexity in managing P2P networks.
+
+\subsection{Full-Mesh VPN Performance Evaluation}
+
+The work by Kjorveziroski et al. \cite{kjorveziroski_full-mesh_2024}
+provides a comprehensive evaluation of full-mesh VPN solutions,
+specifically focusing on their use as underlay networks for
+distributed systems, such as Kubernetes clusters. Their benchmarks
+analyze the performance of VPNs with built-in NAT traversal
+capabilities, including ZeroTier, emphasizing throughput, reliability
+under packet loss, and behavior when relay mechanisms are used. For
+the Clan framework, these insights are particularly relevant in
+assessing the performance and scalability of its Overlay Networks
+component. By benchmarking ZeroTier alongside its peers, the paper
+offers an established reference point for evaluating how Mesh VPN
+solutions like ZeroTier perform under conditions similar to the
+intricacies of peer-to-peer systems managed by Clan.
+
+\subsection{AMC: Towards Trustworthy and Explorable CRDT Applications}
+
+Jeffery and Mortier \cite{jeffery_amc_2023} present the Automerge
+Model Checker (AMC), a tool aimed at verifying and dynamically
+exploring the correctness of applications built on Conflict-Free
+Replicated Data Types (CRDTs). Their work addresses critical
+challenges associated with implementing and optimizing
+operation-based (op-based) CRDTs, particularly emphasizing how these
+optimizations can inadvertently introduce subtle bugs in distributed
+systems despite rigorous testing methods like fuzz testing. As part
+of their contributions, they implemented the "Automerge" library in
+Rust, an op-based CRDT framework that exposes a JSON-like API and
+supports local-first and asynchronous collaborative operations.
+
+This paper is particularly relevant to the development and evaluation
+of the Data Mesher component of the Clan framework, which utilizes
+state-based (or value-based) CRDTs for synchronizing distributed data
+across peer-to-peer nodes. While Automerge addresses issues pertinent
+to op-based CRDTs, the discussion on verification techniques, edge
+case handling, and model-checking methodologies provides
+cross-cutting insights to the complexities of ops based CRDTs and is
+a good argument for using simpler state based CRDTs.
+
+\subsection{Keep CALM and CRDT On}
+
+The work by Laddad et al. \cite{laddad_keep_2022} complements and
+expands upon concepts presented in the AMC paper. By revisiting the
+foundations of CRDTs, the authors address limitations related to
+reliance on eventual consistency and propose techniques to
+distinguish between safe and unsafe queries using monotonicity
+results derived from the CALM Theorem. This inquiry is highly
+relevant for the Data Mesher component of Clan, as it delves into
+operational and observable consistency guarantees that can optimize
+both efficiency and safety in distributed query execution.
+Specifically, the insights on query models and coordination-free
+approaches advance the understanding of how CRDT-based systems, like
+the Data Mesher, manage distributed state effectively without
+compromising safety guarantees.