clan-master-thesis/AI_Data/Clan/data-mesher/index.md at 5b0ae1e598eb9e9db6c29367d13db2e0a9dc73ed

Files

Qubasa 5b0ae1e598 add treefmt and jupyter lab

2024-11-24 23:38:01 +01:00

11 KiB

Raw Blame History

+++ title= "Introducing Data-Mesher: A CRDT-Based Peer-to-Peer Database" subline= "DNS was designed for resilience, built to function even during catastrophic failures. But despite its distributed nature, it places control in the hands of..." date = 2024-11-18T09:08:10+02:00 draft = false authors = [ "Lassulus", "Qubasa" ] tags = ['Dev Report'] +++

one authority per domain. This centralization limits flexibility in networks where multiple, independent nodes need to contribute and manage data collaboratively.

Data-Mesher changes this. Running on every node in a peer-to-peer network, Data-Mesher functions as a fully decentralized database using CRDTs (Conflict-Free Replicated Data Types) to resolve conflicts. What makes it unique is that any node—trusted or untrusted—can add data without compromising others'. Multiple authorities can coexist, allowing for decentralized control of key information, such as DNS-like hostnames and application settings.

In Data-Mesher, the network, not a single authority, resolves conflicts, making it ideal for systems that need true multi-party collaboration.

How Does Data-Mesher Work?

At its core, Data-Mesher runs on every participating node (host) within a distributed system, allowing each node to independently store, update, and sync data across other nodes. It uses a CRDT-based approach to resolve conflicts, ensuring that all nodes eventually converge on the same dataset.

What makes Data-Mesher unique is its capability to allow untrusted nodes to contribute data without compromising the integrity of other nodes' contributions. This is particularly useful in peer-to-peer environments, where nodes might not have complete trust in each other but still require collaborative data sharing.

The Basic Structure of a dm-network

The data in Data-Mesher is grouped into what we call a dm-network, which is primarily a key-value structure. In a very basic form, the dm-network is simply a group of hosts (nodes) and settings bundled under a shared public key (an ed25519 key). Nodes in the dm-network collaborate by announcing DNS hostnames and other settings relevant to application configuration.

Here are key elements of a dm-network:

Settings
Protected by a digital signature only accessible to admins with the correct private key. Settings control policy, which could include anything from banning specific keys to establishing rules for hostname overrides.
Hosts
Acts as the storage for hostnames that nodes contribute. Each hostname entry is signed, and the node uses its ed25519 key pair to generate this signature. In case two hosts try to register the same hostname, Data-Mesher selects the earliest signed entry.

Key Mechanisms of Data Synchronization

Node Communication and Data Sync
When two nodes communicate, they exchange their entire dataset in the form of a data.json file. This file contains all known settings and hosts, where each entry is signed and timestamped. The merge function ensures that both nodes end up with identical data, regardless of the order in which changes are received. The merge strategy for configuration settings is simple—whichever entry has the most recent last_updated timestamp wins.
Timestamp Attestation
To avoid timestamp manipulation (such as backdating a record to give precedence to newer data), Data-Mesher relies on timestamp attestation servers. RFC3161-based services like FreeTSA or OpenTimestamps are options for obtaining cryptographic timestamps. These services accept payloads (e.g., the CRDT entries) and return cryptographically signed timestamps, ensuring tamper-proofing.
Handling Invalid Hostnames
If a node submits a hostname without a valid timestamp, this hostname will be stored but marked as inactive until it can be verified by a timestamp attestation service. Only once validation is complete and the hostname carries a valid signed timestamp will it be activated within the network.
Invalid Signatures
Hostnames or timestamp entries backed by invalid signatures are rejected upfront and do not propagate through the network, adding a layer of security to prevent malicious data from infiltrating the network.

Use Case: Distributed DNS and Hostname Management

One of the main applications implemented in Data-Mesher is in managing DNS-like functionality for decentralized networks. Nodes in the dm-network can announce DNS hostnames, and these are propagated across the peer-to-peer system without requiring centralized management.

Each node registers its own hostnames.
Host entries are shared across nodes in a gossip-style protocol to ensure they're up-to-date without overloading network throughput. Nodes communicate indirectly and share updates with the nodes they connect to, spreading the data gradually rather than all at once.

Example `dns.json` Output

As part of its functionality, Data-Mesher periodically generates a dns.json file, which contains hostname-to-IP mappings in a consumable format. Multiple services can utilize this information to route traffic within the network.

Example:

{"hostname": "mors.nether", "ip": "fdcc:c5da:5295:c853:d499:93e9:c5fc:c8b5"}
{"hostname": "green.nether", "ip": "fdcc:c5da:5295:c853:d499:937c:31a2:1e86"}

Security and Accuracy Mechanisms

One inherent challenge in any distributed network is ensuring accuracy and security without compromising on decentralization. Data-Mesher balances these demands with several meaningful mechanisms:

Signature Validation
Each data entry (whether it's a setting or hostname) must have a valid signature to be trusted and propagated across nodes. This allows nodes to validate the origin and authenticity of every piece of shared data.
Reachability Checks
When a node announces a new hostname, other nodes verify the reachability of the target IP and port before incorporating the hostname into their working configuration. The verification protocol ensures that the machine providing the hostname entry can correctly respond to a challenge tied to the relevant private key.

Joining Multiple DM-Networks

Data-Mesher supports joining multiple DM-Networks, and nodes can prioritize which network has control in cases of conflicts. Here’s an example of how you can configure Data-Mesher in NixOS to join two different DM-Networks.

In this setup, the .qubasa.clan domain gets higher priority than the .clan domain.

What does this mean?

If both networks have colliding hostname (e.g., home.qubasa.clan), the one from .qubasa.clan will take precedence over the one from .clan. The network with the lower priority number wins in the case of conflicts (closer to 0).

Here’s a Nix configuration to demonstrate this:

services.data-mesher = {
  enable = true;  # Enable Data-Mesher service
  interface = "<mesh_vpn>";  # The network interface Data-Mesher will use
  openFirewall = true;  # Ensure the firewall allows Data-Mesher traffic

  # Define the DM-Networks to join
  networks = {
    "qubasa.clan" = {
      pubkey = "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIKTi2h4X56CzjeY4L1INl1d5JvYwh7HpaSuUlD33RhnY";  # Public key for the qubasa.clan network
      priority = 1;  # Higher priority (lower number = higher priority)
      bootstrapPeers = [
        "http://[fd27:bb88:dbef:737b:3799:9318:aa77:ec12]:7331"  # A peer within this network to bootstrap from
      ];
    };
    "clan" = {
      pubkey = "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIJ1UM2Cza+GIRyuB9C3NqY0pSWnGC4DzmQOcWOa4SafV";  # Public key for the clan network
      priority = 2;  # Lower priority (higher number = lower priority)
      bootstrapPeers = [
        "http://[fd16:aa77:dbef:737b:3799:9316:aa77:dbef]:7331"  # A peer within this network to bootstrap from
      ];
    };
  };
};

Key Points:

Priority System: qubasa.clan has a priority of 1, while clan has priority 2. If there’s a conflicting hostname, Data-Mesher will resolve it in favor of the qubasa.clan network since it has a lower priority number.
Bootstrap Peers: Each network is associated with one or more bootstrap peers, which help your node join that network by sharing an initial dataset. In this example, two peers are provided, one from the qubasa.clan and another from the clan domain.
Public Keys: Each network you join requires a valid public key. The key is used to verify that the data received belongs to that specific network and has not been tampered with.
Floating Point Priorities: Data-Mesher uses floating-point numbers for priorities. This allows you to insert new networks at any level of priority without reworking the entire priority hierarchy. For example, you could add a network with priority 1.5 between qubasa.clan (priority 1) and clan (priority 2).

This setup allows nodes to coordinate and sync across multiple DM-Networks seamlessly, while ensuring that conflicts are handled predictably based on each network’s priority.

Bootstrapping and Network Resurrection

In case a node finds itself isolated from the rest of the dm-network (perhaps due to network partitions or decayed peers), it can bootstrap itself by pulling static data from a list of pre-configured URLs. These bootstrap URLs can point either to other nodes in the dm-network or a simple web server serving a static data.json file.

What’s Next? Future Enhancements

Customizable Data Schemas

While the current focus is on hostnames and settings, we foresee a broader application case by allowing more diverse use cases like network configurations, or even custom application settings. This would be achieved by extending the current schema definitions to allow broader categories beyond just "hosts" and "settings".

This could involve creating more flexible merge criteria (newer-wins, append-only, etc.) and defining writers and permissions for specific data keys.

Flexible Host Object Definitions

At the moment, the "host" object in the schema is somewhat inflexible, focusing primarily on DNS information. A future enhancement might include expanding this definition into a more abstract schema, allowing for various other types of node-related information beyond DNS settings.

From Theory to Practice

Data-Mesher isn't just a theoretical construct—it can be a practical solution for networks that require decentralized peer coordination, conflict-free updates, and scalable distribution of discrete configurations like DNS names.

For those interested in deploying decentralized networks with configurations that automatically self-consume and degrade over time (via TTL), Data-Mesher offers a foundational approach that prioritizes security, consistency, and flexibility in equal measure.

11 KiB Raw Blame History Unescape Escape