[{"content":"iroh-rings is a Rust library I am writing to add a very intuitive ring-based access control for sharing resources over iroh, an awesome QUIC/TLS peer-to-peer networking stack i love. A ring is a named group of peers. Resources are associated with rings, and every member of a ring gets access to those resources. The library was originally built as the access layer for the ringdrop file-sharing application, so the first version of the access model was simple: a peer either had access to a resource or not.\nThat binary model worked well for downloading. But as the scope of iroh-rings grew beyond file downloads, the question became obvious: what kind of access to resources?\nThe old model In v0.4, the registry answered a single question:\nfn is_allowed(\u0026amp;self, peer: \u0026amp;EndpointId, resource_id: \u0026amp;ResId) -\u0026gt; Result\u0026lt;bool, Error\u0026gt;; A peer was allowed or it wasn\u0026rsquo;t. This was enough for reading — but a peer with write intent or delete intent would get the exact same true answer as a read-only peer. Any finer distinction had to be pushed into application code, outside the library. That boundary was leaking — a sign that the two components weren\u0026rsquo;t respecting single responsibility.\nThinking about the right operations The first instinct was to lift the CRUD model from REST APIs - I\u0026rsquo;ve written so many classic web services with that pattern that it came naturally: create, read, update, delete. But a P2P context is not a REST API.\nIn iroh-rings, \u0026ldquo;create\u0026rdquo; and \u0026ldquo;update\u0026rdquo; means granting Write permission for a (potentially new) resource ID on a ring. The payload transfer is handled at the application layer via the Transfer implementation — the protocol itself only tracks IDs and permissions.\nHere is the permission model:\nPermission What it authorises Read Get the resource (download bytes) Write New push or overwrite the resource, but only into rings the peer already belongs to Delete Remove the ring–resource association (the underlying data is untouched) Notes on modifying permissions (Write and Delete) A peer in a ring with Write permission on a resource can push updates to that resource (peer\u0026rsquo;s ring membership is required).\nDelete is narrower than it sounds. Removing a ring–resource association does not destroy the data. True deletion is a local-only operation reserved for the registry owner. From a remote peer\u0026rsquo;s perspective, Delete means \u0026ldquo;you may ask me to stop serving this resource from this ring\u0026rdquo;.\nWhere permissions live Permissions are attached to each ring–resource association independently. Ring A can grant Read on resource X but Read + Write on resource Y — there is no global \u0026ldquo;ring A grants Read\u0026rdquo;; the permission is always scoped to a specific (ring, resource) pair.\nA peer\u0026rsquo;s effective permissions on a resource are the union across all rings they belong to that are associated with that resource.\nI explicitly chosed not to support per-peer permission overrides. The argument for them seemed appropriate at the first look: \u0026ldquo;what if I want one peer in a ring to have less access than the others?\u0026rdquo; But per-peer overrides introduce a second access-control dimension that interacts with ring membership in subtle ways. This pushes the system toward a full RBAC or ABAC model, with all the corresponding implementation complexity and audit surface. For this library, the cognitive overhead seemed not worth it. If different peers need different permissions on the same resources, they can be simply put in different rings containing them.\nThe open ring constraint iroh-rings has a built-in ring called \u0026quot;open\u0026quot;. Any resource associated with the open ring is readable by any peer, regardless of ring membership. Think of it as the public-read ACL of the library.\nThe open ring only makes sense for Read. Allowing Write or Delete via the open ring would mean any peer on the network could modify or deassociate resources — that is not a permission, it is an absence of access control. The library enforces this at the validation layer:\npub(crate) fn validate_ring_permissions( ring_name: \u0026amp;str, permissions: \u0026amp;[Permission], ) -\u0026gt; Result\u0026lt;(), Error\u0026gt; { if permissions.is_empty() { return Err(Error::EmptyPermissionSet); } if ring_name == OPEN_RING_NAME \u0026amp;\u0026amp; permissions.iter().any(|p| !matches!(p, Permission::Read)) { return Err(Error::OpenRingReadOnly); } Ok(()) } The open ring and private rings may coexist on the same resource — a resource can be publicly readable via the open ring while remaining writable or deletable only by members of a private ring.\nThe wire protocol Giving permissions meaning requires transmitting intent. The iroh-rings wire protocol is minimal: the initiating peer sends a length-prefixed resource id, and the gate replies with a single status byte. To carry the permission, we added one byte after the resource id — the operation byte — and bumped the ALPN from /iroh-rings/1 to /iroh-rings/2:\n[2B resource_id_len_LE][N bytes resource_id][1B operation] → [1B status] The gate reads the operation byte, maps it to a Permission, checks the registry, and either grants or denies the stream:\npub fn encode_request(resource_id: \u0026amp;[u8], permission: Permission) -\u0026gt; Result\u0026lt;Vec\u0026lt;u8\u0026gt;, Error\u0026gt;; How permissions are stored Each ring–resource association carries its own permission set, encoded as a compact bitfield: bit 0 = Read, bit 1 = Write, bit 2 = Delete. A single byte can represent any combination. Checking a permission is a bitwise AND — no allocation, no iteration over a Vec\u0026lt;Permission\u0026gt;:\nlet pbit = permission_bit(permission); // 0b001 / 0b010 / 0b100 if bits \u0026amp; pbit == 0 { continue; // this ring does not grant the requested permission } Both the in-memory and redb backends share the same ring-computation logic via compute_resource_rings, so any change to the model propagates to both automatically. In the redb backend specifically, the bitfield is stored in a dedicated RESOURCE_RING_PERMS table keyed by a composite of resource id and ring name:\n[2B resource_id_len_LE][resource_id bytes][ring_name bytes] → u8 bitfield The 2-byte length prefix makes the key unambiguous regardless of the content of the resource id.\nWhat the model does not do iroh-rings enforces what is allowed. It does not authenticate who is speaking — that is the job of the transport. QUIC/TLS 1.3 completes a handshake before any application byte is exchanged; each peer\u0026rsquo;s EndpointId is their Ed25519 public key, and completing the handshake proves possession of the corresponding private key. By the time the gate consults the registry, the caller\u0026rsquo;s identity is cryptographically guaranteed.\nThe registry also does not model content-level access (e.g., byte ranges, file paths, version numbers). That belongs in the Transfer sub-protocol, which runs after the gate has granted access. iroh-rings draws a clean line between who may do what on which resource and what is actually exchanged.\niroh-rings is deliberately a simple group-based access control library. There is no delegation, no \u0026ldquo;I temporarily grant you access.\u0026rdquo; A peer either belongs to a ring or it doesn\u0026rsquo;t. If you need signed capability delegation, rcan is the composable layer for that: it implements attenuated capability tokens with delegation chains and expiration. The two can coexist cleanly — a ring member could issue a time-bound Write token via rcan to an external peer, without ever adding that peer to the ring.\niroh-rings remains the authority on group membership and static permissions; rcan handles the ephemeral, delegated layer on top.\nConclusion The move from binary is_allowed to permission-typed has_permission is a small API surface change with meaningful consequences for what iroh-rings can express. The three-permission model covers the operations that genuinely matter, at the registry layer, in many use-cases.\nCheck the sources for this new iroh-rings v0.5, available on crates.io. Feedback welcome.\n","permalink":"https://rikettsie.github.io/posts/iroh-rings-access-permission-data-model/","summary":"iroh-rings started with a binary ALLOWED/DENIED access model. Reasoning here the transition to a READ/WRITE/DELETE permission model.","title":"Extending the resource access permission model on iroh-rings"},{"content":"The access control logic in iroh-rings did not start as a standalone library. It was born inside this project, tightly coupled to ringdrop\u0026rsquo;s own types and assumptions. Only later, once the model had stabilized, did I extract it into a separate crate through a generalization refactor — and that separation turned out to be one of the better decisions I made along the way.\nThis post is about the project that came first.\nThat is ringdrop.\nThe interesting part is probably how three independent layers compose together into something useful and coherent. Each layer does one thing and hands off to the next.\nThree layers, one node The stack looks like this:\n┌──────────────────────────────────────────────────┐ │ iroh-rings (who can access what) │ ├──────────────────────────────────────────────────┤ │ iroh-blobs (what is stored, how to transfer) │ ├──────────────────────────────────────────────────┤ │ iroh (how peers connect) │ └──────────────────────────────────────────────────┘ iroh handles connectivity: QUIC transport, NAT hole punching, peer discovery, connection migration. Two nodes behind separate NATs can establish a direct data path — the relay is used for initial signaling, but once the hole is punched traffic flows peer-to-peer, as I described in the hole punching post.\niroh-blobs handles storage: content-addressed blobs identified by their BLAKE3 hash, encoded with BAO for efficient partial verification. I wrote about BAO earlier — the short version is that you can verify any chunk of a file without downloading the whole thing first.\niroh-rings handles access control: ring membership, resource associations, and the authorization check before any blob is served.\nNode\u0026lt;R\u0026gt; is the struct that owns all three. It is generic over R: Registry, meaning you can swap the backend without touching anything else — the access control policy is entirely determined by whatever Registry you pass in.\nRingGate: enforcement at the right layer The RingGate is where access control is enforced. It sits between the incoming blob request and the blob store: before a blob is served to a remote peer, the gate checks the registry.\nThe check happens at request time, not at import time. This is intentional — ring membership can change after a file is imported. If I add a peer to a ring, they should immediately be able to access all resources associated with that ring, without re-importing anything. The gate reads the current state of the registry on every request.\nA failed check is silent from the requester\u0026rsquo;s perspective: the request is dropped, no error detail is returned. Leaking why access was denied would tell an unauthorized peer which resources exist — which is information you probably do not want to give out.\nShareTicket: out-of-band sharing To download a file, a peer needs a ShareTicket. It encodes everything required: the blob hash and the address of the node serving it.\nThe idea is similar to sendme, the file transfer tool by the iroh team: a self-contained ticket is all a receiver needs to locate and request the content. The difference is what happens when the request arrives — ringdrop\u0026rsquo;s RingGate checks the registry before serving anything, silently rejecting peers that do not belong to an authorized ring. The ticket gets you to the door; the ring decides whether it opens.\nThe ticket is handed out of band — over a chat message, an email, whatever. The design is deliberate: ringdrop does not have a discovery mechanism. If you have the ticket, you know the resource exists and where to get it. If you do not have the ticket, you cannot even ask. Combined with the ring check, this gives two independent layers of access control: possession of the ticket, and membership in the right ring.\nDaemon and IPC A file drop node needs to keep running between operations. I did not want to re-initialize the iroh endpoint and reload the blob store on every CLI invocation — that would be slow and would lose in-memory connection state.\nSo ringdrop runs as a daemon: a background process that owns the Node and listens on a local TCP socket. The CLI tool (rdrop shell command) connects to it, sends a single newline-terminated JSON request, and reads back a stream of JSON events until the operation completes or fails.\nThe protocol is intentionally thin. Each Op variant maps to one operation — import a file, create a ring, add a peer, download a blob. The daemon dispatches the request to the node and streams progress events back. The CLI renders them.\nI chose TCP over a Unix socket for platform portability (ringdrop runs on Linux, macOS, and Windows), and JSON over a binary protocol because the message volume is low and debuggability matters more than throughput in this phase. Running nc localhost \u0026lt;port\u0026gt; and typing a request by hand is my escape hatch during development :)\nCLI A typical session with rdrop looks like this:\n# Start the daemon (once, keeps running) rdrop daemon start # Create a ring named \u0026#34;team\u0026#34; and add a peer to it rdrop ring create team rdrop ring add team \u0026lt;peer-endpoint-id\u0026gt; # Import a file and associate it with the ring rdrop import ./report.pdf --ring team # The share ticket is printed — send it out of band # rdrop://abcdef20... # On the other side, a peer with the ticket downloads the file rdrop receive \u0026lt;ticket\u0026gt; The peer on the other side must be a member of the team ring on the serving node, otherwise the download is silently rejected.\nWhat\u0026rsquo;s next There is plenty still to build. If you want to see what\u0026rsquo;s coming or have a use case to propose, the issue tracker is the right place.\nContributions are very welcome — whether that\u0026rsquo;s code, documentation, or just opening an issue with something you\u0026rsquo;d find useful.\nRefs GitHub repo: https://github.com/rikettsie/ringdrop\nCrate: https://crates.io/crates/ringdrop\nDocs: https://docs.rs/iroh-rings/latest/ringdrop/\niroh-rings GitHub repo: https://github.com/rikettsie/iroh-rings\n","permalink":"https://rikettsie.github.io/posts/ringdrop-p2p-file-drop/","summary":"How to compose iroh\u0026rsquo;s transport, content-addressed blob storage, and ring-based access control into a permission-aware P2P file drop.","title":"ringdrop: composing iroh, iroh-blobs, and iroh-rings into a P2P file drop"},{"content":"After spending time with iroh\u0026rsquo;s transport layer — hole punching, QUIC connections, NAT traversal — I wanted to build something on top of it. The connectivity problem was solved. The next question was: who gets to access what?\nIn a centralised system this is trivial: you have a server, the server checks credentials, done. In a P2P system there is no server to ask. Every node needs to decide for itself whether to serve a resource to a remote peer. I needed an access control model that was simple to reason about, did not require a central authority, and could plug into iroh\u0026rsquo;s existing identity model cleanly.\nThat library is iroh-rings.\nThe ring model The central concept is a ring: a named group of peers. A resource is associated with one or more rings. A peer can access a resource if and only if it is a member of at least one of those rings.\nThat is the entire model. No roles, no capabilities, no ACL lists. Just named groups and membership.\nThere is one built-in ring: the open ring (OPEN_RING_NAME). Resources associated with it are accessible to any peer, with no membership check. It is the public ring — useful for resources you want to share with the world without managing a list of peers.\nThe mental model maps naturally onto real access patterns:\nA file shared with your team -\u0026gt; associate it with a team ring containing your colleagues\u0026rsquo; endpoint IDs. A file shared publicly -\u0026gt; associate it with the open ring. A file shared with a specific peer -\u0026gt; create a ring with just that peer in it. The Registry trait The Registry trait is the core abstraction. It manages three things: rings, their peer membership, and the association between resources and rings. Any struct that implements it, can back the system — the access control logic doesn\u0026rsquo;t changes.\npub trait Registry { fn create_ring(\u0026amp;self, ring_name: \u0026amp;str) -\u0026gt; Result\u0026lt;()\u0026gt;; fn add_peer_to_ring( \u0026amp;self, ring_name: \u0026amp;str, peer: EndpointId, nickname: Option\u0026lt;\u0026amp;str\u0026gt;, ) -\u0026gt; Result\u0026lt;()\u0026gt;; fn add_ring_to_resource\u0026lt;ResId: ResourceId\u0026gt;( \u0026amp;self, resource_id: ResId, ring_name: \u0026amp;str, ) -\u0026gt; Result\u0026lt;()\u0026gt;; fn is_allowed\u0026lt;ResId: ResourceId\u0026gt;( \u0026amp;self, peer: \u0026amp;EndpointId, resource_id: \u0026amp;ResId, ) -\u0026gt; Result\u0026lt;bool\u0026gt;; // ... list_rings, list_ring_peers, remove_peer_from_ring, list_resource_rings } A few things worth noting in this signature. All methods take \u0026amp;self, even the ones that write — create_ring, add_peer_to_ring, add_ring_to_resource all mutate state, yet none takes \u0026amp;mut self.\nThis is a deliberate choice: requiring \u0026amp;mut self would mean callers need exclusive access to the registry for every write, which becomes awkward when the registry is shared across threads or owned behind an Arc. By using \u0026amp;self, the trait leaves the synchronization strategy to the implementor. RedbRegistry, for example, wraps a redb Database which opens its own write transactions internally — no external \u0026amp;mut needed. An implementor that wraps a HashMap would use a Mutex\u0026lt;HashMap\u0026gt; or RwLock instead. The trait does not prescribe which; it only says \u0026ldquo;you need shared access, figure out the rest\u0026rdquo;.\nMethods that touch resources are generic over ResId: ResourceId rather than using an associated type, so you can use different resource ID types with the same registry instance. The authorization rule is: a peer is allowed if it belongs to at least one ring associated with the resource, or if the open ring is associated with it.\nOne design decision I am happy with: the Registry operates at the policy layer only. It does not authenticate peers — it trusts the identity it receives. That is intentional. Authentication is already handled by the transport: iroh uses QUIC with TLS 1.3, which means every connection already carries a verified peer identity. By the time the registry is consulted, we know who is asking. The registry only needs to decide whether they are allowed.\nThis separation kept the trait clean and made it easy to implement and test.\nPluggable backends The crate ships two ready-made implementations.\nInMemoryRegistry stores everything in memory. It is fast, has no dependencies, and disappears when the process exits. Useful for tests, short-lived sessions, or cases where you reconstruct state from another source on startup.\nRedbRegistry persists to disk using redb, a pure-Rust embedded key-value store. Ring membership and resource associations survive restarts. This is what I use in ringdrop for a node that needs to remember what it is sharing between sessions.\nSwapping backends is a one-line change — the rest of the application code is generic over R: Registry.\nContract tests How do you ensure a custom Registry implementation behaves correctly? The crate ships a contract test suite alongside the trait in crate::registry. Any implementor can run the full suite against their backend by calling registry_contract(\u0026amp;my_impl) from their own test module. The tests cover the authorization logic, edge cases around the open ring, membership changes, and resource association.\nI have seen this pattern used in a few well-known Rust projects, and I like it a lot. For example tower ships tower-test alongside its Service trait so that custom middleware can be tested against a consistent harness; embedded-hal ships mock implementations alongside its hardware abstraction traits for the same reason. When you define a trait that has meaningful semantic invariants — not just method signatures — shipping the tests for those invariants as part of the crate is much better than leaving implementors to rediscover them on their own.\nWire protocol iroh\u0026rsquo;s documentation is genuinely good and the protocol extension model is straightforward: you define your own ALPN identifier, implement a protocol handler, and register it on the endpoint. iroh takes care of the rest.\nMy approach was to define /iroh-rings/X as the ALPN for this protocol (where X is the incremental version number). The FsTransfer implementation uses it to serve filesystem resources to authorized peers: a remote peer requests a resource by ID, the local node checks the registry, and if the peer is authorized the resource is streamed back. Because QUIC multiplexes multiple streams over a single connection, /iroh-rings/1 can coexist with other protocols — iroh-blobs, custom application protocols — on the same connection without any extra work.\nRefs GitHub repo: https://github.com/rikettsie/iroh-rings\nCrate: https://crates.io/crates/iroh-rings\nDocs: https://docs.rs/iroh-rings/latest/iroh_rings/\n","permalink":"https://rikettsie.github.io/posts/iroh-rings-p2p-access-control/","summary":"How I designed a ring-based access control library on top of iroh — and why the Registry trait is the most important design decision in it.","title":"iroh-rings: ring-based access control for P2P resources"},{"content":"Inventing new ad-hoc protocols can be very fun: making two remote machines communicate over a channel, exchanging meaningful messages with each other.\nProfessional tides recently brought me to build a raw custom protocol from scratch, also ensuring that the traffic flowing in the wire was encrypted end-to-end.\nWhen I tackled the problem at first, I naively thought that what counts overall is the protocol part itself — clearly defining the exchanged message structures, the error codes in structured enums, and ultimately having the protocol handshakes well designed with all related callbacks in the application layer.\nThe encryption part would \u0026ldquo;just\u0026rdquo; make the flowing data opaque to any reader in the middle. And for that part, I told myself, I won\u0026rsquo;t invent anything — I\u0026rsquo;ll integrate the best in the art for asymmetric key exchange. We don\u0026rsquo;t rewrite crypto; we use it from established standards.\nI was not so wrong… but neither was I totally right. I learned it\u0026rsquo;s not as simple as it reads.\nLet\u0026rsquo;s step back to a clear-text data transfer first.\nA clear-text framing scenario The naive delimiter approach When two peers share knowledge of the protocol, they can simply agree on one special character (or a sequence of characters) to send on the wire to separate messages from each other, and make the receiving peer split the stream accordingly.\nFor example, many line-oriented protocols use \\n (newline) or \\r\\n as a frame delimiter — HTTP/1.1 headers, SMTP, Redis RESP. Others use a dedicated control byte, like 0x7E in HDLC or PPP.\nThe receiver accumulates bytes into a buffer, and every time the delimiter appears it slices off the buffer up to that point as a complete frame. Another component — usually called the protocol handler — then deserializes each byte chunk into a message structure recognized by the protocol and passes it to the application callback for action. If the chunk is unrecognized or malformed, it purges it and takes other side actions (error notification, silent ignore, and so on).\nReadable and easy to debug with a packet sniffer\u0026hellip; but..\nThe problem: delimiter collision in the payload There is a catch. What if the payload itself contains the delimiter byte?\nSay both peers agree that 0x00 (null byte) terminates every frame. The sender builds a frame and writes it to the wire followed by 0x00. The receiver reads bytes until it sees 0x00 and calls that a frame. All good — until one frame\u0026rsquo;s payload legitimately contains a 0x00 byte somewhere in the middle.\nThe receiver sees that byte, thinks the frame is done, and slices the buffer prematurely. Downstream, the parsing crashes\u0026hellip; data is malformed. The two peers are now out of sync and the protocol results broken.\nThis is not a theoretical edge case! It artives all the time for binary payloads: with files, numerical data or serialized structs, zero bytes are common. You cannot just \u0026ldquo;pick a byte value supposing it won\u0026rsquo;t appear in the data\u0026rdquo; without restricting what data you want transmit.\nWith encrypted data this heuristic disappears entirely. Good encryption produces output statistically indistinguishable from a uniform distribution — every byte value from 0x00 to 0xFF is equally likely. A multi-byte delimiter only lends your parser a bit more time before crashing: a 4-byte sequence has a 1-in-4-billion chance of appearing at any given offset, but across millions of frames that probability accumulates toward certainty.\nByte stuffing solves it The standard fix is byte stuffing (also called byte escaping). The idea is:\nPick two special byte values: a frame delimiter (e.g. 0xC0) and an escape byte (e.g. 0xDB). Before sending to the wire, scan the payload. Every time the delimiter byte appears, replace it with the two-byte sequence [ESC, FLAG]. Every time the escape byte itself appears, replace it with [ESC, ESC]. Then append the real delimiter. On the receiving end, reverse the substitution as bytes arrive: after an ESC, the next byte is all the time FLAG or ESC. Now the delimiter byte can only ever mean \u0026ldquo;end of frame\u0026rdquo;, any occurrence of the delimiter value inside the payload has been transformed into something else before hitting the wire (i.e. [ESC, FLAG]). The two peers stay in sync regardless of payload content.\nThis is how PPP (Point-to-Point Protocol) works, defined in RFC 1662.\nWhy encrypted data makes this harder Ciphertext looks like random bytes Good symmetric encryption — AES in CTR mode, ChaCha20, anything AEAD — produces output that is computationally indistinguishable from uniformly random bytes. This is not an accident; it is a hard security requirement. If ciphertext had detectable structure, an attacker could exploit that structure to leak information about the plaintext.\nThe practical consequence is that in the encrypted stream, every byte value from 0x00 to 0xFF appears with roughly equal probability. Over a session with many frames, any specific byte value will appear in ciphertext constantly.\nThe framing layer must operate on ciphertext Framing and encryption are two independent concerns, but they interact:\nEncryption makes the message content opaque. Framing tells the receiver where a message starts and ends. The receiver must be able to find frame boundaries before it can decrypt anything. So the framing mechanism has to work at the ciphertext level, on bytes that look random.\nTwo approaches to encrypted framing Length-prefix framing The simpler and more widely used approach: prepend each frame with a fixed-size field stating the exact byte length of the payload that follows.\n[ length : 4 bytes | ciphertext : N bytes ] The receiver reads exactly 4 bytes to know the length N, then reads exactly N bytes for the payload ciphertext. No delimiter, no scanning, no stuffing needed. The structure is self-delimiting.\nThis is what TLS 1.3 does: the TLS record layer wraps each encrypted fragment with a 3-byte header that includes a 2-byte length field. The receiver reads the header, learns how many bytes to expect, and reads them at once without any scanning.\nSignal follows the same principle, but delegates framing to the WebSocket layer rather than implementing it at the application level. Encrypted payloads are sent as binary WebSocket frames; WebSocket\u0026rsquo;s own frame header (RFC 6455) carries the length, so the receiver always knows exactly how many bytes to read before handing the blob to the decryption layer. The application code in libsignal never writes a length prefix explicitly, it encrypts with the Noise protocol or the Double Ratchet, then passes the ciphertext to the WebSocket client and this one handles the rest.\nWireGuard does the same: each UDP packet carries an encrypted payload whose length is implicit from the UDP datagram length — the IP/UDP stack provides the framing for free at the network layer, and WireGuard just decrypts the contents.\nThe main design decision is the size of the length field: 2 bytes limits frames at 65 535 bytes, 4 bytes limits at ~4 GB (size must be chosen accordingly to what fits best your protocol, and explicitly reject larger frames.\nDelimiter + byte stuffing On character-oriented or byte-stream channels where a length prefix is impractical — serial lines, legacy embedded links, some radio protocols (which was my case) — byte stuffing remains the right approach: I picked a frame delimiter, stuff any occurrence of that delimiter (and the escape byte) in the ciphertext before transmitting, then appended the delimiter. Reversing the process on the other end.\nThe tradeoff compared to length-prefix framing:\nLength-prefix Byte stuffing Overhead Fixed (e.g. 4 bytes/frame) Variable (0–100% of payload) Scanning None — read exact N bytes Scan every byte Resync on error Scan for next valid header Scan for next delimiter byte Typical use TCP-based encrypted protocols Serial / character-oriented links For a TCP-based custom protocol, length-prefix is almost always the right choice. TCP is a byte stream with no message boundaries — a single read() can return half a message or three merged together. The length field lets the receiver issue two bounded reads (header, then exactly N bytes) with no scanning. Byte stuffing shines when you are working on a byte-at-a-time channel where you cannot buffer ahead.\nByte stuffing in detail The algorithm uses exactly two reserved byte values:\nFLAG (frame delimiter): e.g. 0xC0 ESC (escape byte): e.g. 0xDB Before sending a frame:\nWrite FLAG to the wire (marks the frame start). Walk every byte b of the payload: If b == FLAG: emit ESC, then FLAG. If b == ESC: emit ESC, then ESC. Otherwise: emit b unchanged. Write a final FLAG (marks the frame end). The rule is simple: every FLAG or ESC that belongs to the payload is preceded by an ESC. A bare FLAG — one not preceded by ESC — is always and only a frame boundary.\nExample — stuffing the 4-byte payload [0xC0, 0x41, 0xDB, 0x42]:\nInput: C0 41 DB 42 Stuffed: DB C0 41 DB DB 42 Framed: C0 DB C0 41 DB DB 42 C0 ^ ^^^^ ^^^^ ^ | ESC+FLAG ESC+ESC end FLAG start FLAG Worst-case overhead is 2× (every byte in the payload is either FLAG or ESC). In practice, for random ciphertext, overhead is around 2/256 ≈ 0.8% per frame on average — each of the two reserved values appears roughly once every 128 bytes and costs one extra byte.\nByte unstuffing The receiver runs a simple two-state machine as bytes arrive:\nNORMAL state: accumulate bytes into the frame buffer. FLAG received -\u0026gt; the buffer holds a complete frame; deliver it to the upper-layer and reset. ESC received -\u0026gt; transition to the ESCAPE state. Anything else -\u0026gt; append to buffer. ESCAPE state: the next byte is an escaped value. FLAG received -\u0026gt; append FLAG to buffer; transition back to the NORMAL state. ESC received -\u0026gt; append ESC to buffer; transition back to the NORMAL state. Anything else -\u0026gt; protocol error (invalid escape sequence); discard the current frame, back to the NORMAL state. ┌─────────────────────────────────────┐ │ │ ▼ │ other byte -\u0026gt; buffer ┌─────────┐ ESC ┌──────────┐ │ │ NORMAL │─────────▶│ ESCAPE │─────────┘ └─────────┘ └──────────┘ │ │ FLAG │ FLAG │ -\u0026gt; buffer FLAG │ ESC │ -\u0026gt; buffer ESC ▼ other │ -\u0026gt; protocol error deliver frame On any error the machine returns to NORMAL and discards the in-progress frame. The next FLAG it encounters, one not preceded by ESC, resets the buffer and starts a fresh frame. That is the resync: no out-of-band signal needed, the stream self-corrects at the next delimiter.\nError conditions worth handling explicitly:\nUnknown escape sequence: discard an in-progress frame, log a protocol error, remain synchronized — the next FLAG will start a fresh frame. Frame too large: if the buffer grows beyond your declared maximum frame size before seeing a FLAG, something is wrong (stuffing error or desync). Better to discard and reset the communication. Truncated frame at connection close: if the connection drops in the miffle of a frame, discard the partial buffer. Frame structure design Whether you use length-prefix or byte stuffing for framing, the content of each frame deserves a careful design. A reasonable structure for an encrypted custom protocol:\n┌────────────┬────────────┬─────────────────────┬────────────┐ │ magic │ seq_no │ ciphertext │ auth_tag │ │ 2 bytes │ 4 bytes │ N bytes │ 16 bytes │ └────────────┴────────────┴─────────────────────┴────────────┘ Magic word — a fixed 2-byte value (e.g. 0xBEEF) at the frame header, for resynchronization in the length-prefix framing scenario. In this case, indeed, after a desync event, the receiver can scan the incoming stream for the magic value and attempt to resume parsing from there. With byte stuffing, the magic word is largely redundant: FLAG already provides resync and AEAD authentication will reject any garbage frame. The magic word is most useful with length-prefix framing instead, where there is no delimiter to scan for after losing sync.\nSequence number — a monotonically incrementing counter per sender. The receiver uses it to detect replayed frames and reordered delivery. This counter must be incremented on every sent frame; frames must be rejected, whose sequence number is not strictly greater than the last accepted one.\nCiphertext — the encrypted message payload. I used an AEAD cipher (ChaCha20-Poly1305). The authentication tag it produces covers both the ciphertext and any associated data you want to authenticate, typically the sequence number, so replayed frames are caught.\nAuth tag — 16 bytes produced by the AEAD cipher. The receiver verifies this before doing anything with the payload. If verification fails, the frame is silently discared (without sending back any detailed error, as that can leak information to an active attacker).\nEncrypt-then-frame vs frame-then-encrypt The order of operations matters and was surprisingly easy to get wrong the first time.\nFrame-then-encrypt: that\u0026rsquo;s building the full plaintext frame (header + payload), then encrypt the entire thing. The problem here is the receiver cannot know the length of the ciphertext before decrypting, so it would need an outer framing mechanism on the ciphertext.\n\u0026hellip;instead\u0026hellip;\nEncrypt-then-frame: this means encrypting the payload first, then wrap the ciphertext in a frame (length prefix, or stuffing + delimiter). The framing fields (length, magic, sequence number) travel in the clear and are effectively used to convey communication and resync in case of error.\nThis is what TLS 1.3, WireGuard, and Signal all do. It works very well because:\nThe receiver reads the clear-text framing fields to know how many ciphertext bytes to expect. It reads exactly that many bytes. It decrypts and verifies the auth tag. It delivers the plaintext to the application. The clear-text framing fields leaks one piece of information: the length field reveals the approximate plaintext size, even if the content is opaque. This is a known aside. TLS 1.3 addresses it with optional record padding — the sender can pad the plaintext to a round size before encrypting, hiding the true payload length. Signal pads certain message types for the same reason. If payload length is sensitive in your protocol, consider doing the same.\nOne frame, one message — or more? So far I\u0026rsquo;ve implicitly assumed a 1:1 relationship: one frame carries exactly one serialized message. The sender never starts a new message mid-frame; the receiver dispatches exactly one message per successfully decrypted frame. This is the simplest design and a perfectly valid choice for a custom protocol.\nBut it leaks information. Of course, an observer watching the wire cannot read the content, but they can still see frame sizes and timing. If each frame maps to one message, frame size correlates directly with message size, and message timing is exposed in the clear. For many protocols that is acceptable. For a privacy-sensitive one, it is not.\nCoalescing messages inside the ciphertext By the way, the encrypt-then-frame ordering opens a useful opportunity: the plaintext that goes into the AEAD cipher does not have to be a single message. It can be a batch of several concatenated messages, or a real message padded with dummy bytes, or both.\n[ message A | message B | message C | padding ] \u0026lt;- plaintext │ ▼ [ encrypt with AEAD ] │ ▼ [ length | ciphertext + auth tag ] \u0026lt;- wire frame An observer sees one frame of a certain size. Whether that encodes one large message, three small ones, or a single real message with noise appended is invisible from the outside. This technique — sometimes called message coalescing or batching — hides both message size and message count within a time window.\nSignal does exactly this for attachments: PaddingInputStream wraps the plaintext stream and appends zero bytes up to the next bucket in a geometric progression (1.05^n, minimum 541 bytes) before the stream is passed to encryption in SignalServiceMessageSender. An observer can tell which bucket an attachment falls in, but not its exact size.\nInner framing inside the plaintext The consequence is that the decrypted plaintext is now a stream, not a single blob. The receiver needs to know where each message starts and ends within it. That requires a second, inner framing layer — and this one operates on trusted, already-authenticated data. There is no adversary at this layer, so it can be simple: for example a small length prefix per message.\n[ len_A : 2 bytes | message A | len_B : 2 bytes | message B | ... ] Thus, the receiver decrypts the outer frame, verifies the auth tag, then walks the inner stream parsing messages until the buffer is exhausted. Padding bytes at the end can be marked with a reserved message type (e.g. type 0x00 for padding) so the receiver knows to ignore them.\nThe two framing layers serve different purposes and can be designed independently:\nLayer Operates on Purpose Outer (wire) Ciphertext Frame boundary detection, resync Inner (app) Plaintext Message boundary, batching, padding Error detection and recovery The primary desync signal is an AEAD authentication failure, i.e. the decrypted bytes do not authenticate. A secondary indicator is a malformed header: an unrecognized magic word, a sequence number wildly out of range, or a declared length exceeding the maximum frame size. Either should trigger a desync response, not a transient error handler.\nFor recovery, it\u0026rsquo;s useful maintaining a counter of consecutive desync events. Below the threshold, attempting local resync (scanning for the next FLAG or magic word) is valuable. Above it, better reset and rehandshake, as something sounds systematically wrong (scanning forward is just burning CPU power, while peers probably talk past each other).\nThe whole picture Putting it all together, a sender\u0026rsquo;s pipeline looks composed by three layers, each with a distinct responsibility:\n1- Outer framing: find the ciphertext blob on the wire, surviving desynchronization.\n2- Encryption: authenticate and decrypt, rejecting anything tampered with.\n3- Inner framing: recover individual messages from the plaintext batch, hiding their sizes and counts from the outside.\n── LAYER 3: inner framing ─────────────────────────────────── message A message B message C │ │ │ └─────┬─────┘ │ │ (+ padding) │ ▼ │ [ prefix each message with its length ] │ ── LAYER 2: encryption ────────────────────────────────────── ▼ [ encrypt plaintext batch with AEAD -\u0026gt; ciphertext + auth tag ] │ ── LAYER 1: outer framing ─────────────────────────────────── ▼ [ build outer frame: magic | seq_no | ciphertext | auth_tag ] │ ▼ [ apply length prefix OR byte stuff ] │ ▼ wire And the receiver\u0026rsquo;s pipeline, symmetrically:\nwire │ ── LAYER 1: outer framing ─────────────────────────────────── ▼ [ read length field + N bytes OR scan for FLAG + unstuff ] │ ▼ [ check magic, extract seq_no, reject replays ] │ ── LAYER 2: encryption ────────────────────────────────────── ▼ [ decrypt + verify auth tag -\u0026gt; on failure: discard, count ] │ ── LAYER 3: inner framing ─────────────────────────────────── ▼ [ walk inner stream: extract messages by length prefix, discard padding ] │ ┌─────┴─────┐ ▼ ▼ message A message B ... │ │ ▼ ▼ [ dispatch each to protocol handler ] Getting all three in that order, is what finally made a robust encrypted protocol on the wire for me.\nSummarizing the solution in Rust The protocol had to run over a serial port.\nI instantiated two dedicated threads, one for sending, one for receiving. Each thread owning its half of the port.\nMessages flow through std::sync::mpsc channels between the application and the two I/O loops.\nTokio would have been overkill here: the target was a resource-constrained embedded Linux board, and in thie context two blocking threads with standard channels keep the dependency minimal.\nimpl Read + impl Write abstracts the wire. In production the port was backed by a UART-connected transceiver, exposed by the OS as a serial device. The same setup would apply to narrow-bandwidth RF links where the radio module exposes a transparent serial interface (this is common with LoRa and FSK transceivers for exemple).\nThe frame layout is [ seq: 4 bytes | ciphertext + auth_tag ]. The sequence number doubles as the AEAD nonce (padded to 12 bytes) and as authenticated associated data, so a replayed or reordered frame fails both the sequence check and the auth tag verification.\nuse chacha20poly1305::{aead::{Aead, KeyInit, Payload}, ChaCha20Poly1305, Nonce}; use std::{io::{Read, Write}, sync::mpsc}; const FLAG: u8 = 0xC0; // the chosen inter-frame separator const ESC: u8 = 0xDB; // the chosen escape caracter fn byte_stuff(src: \u0026amp;[u8]) -\u0026gt; Vec\u0026lt;u8\u0026gt; { let mut out = vec![FLAG]; for \u0026amp;b in src { match b { FLAG =\u0026gt; out.extend_from_slice(\u0026amp;[ESC, FLAG]), ESC =\u0026gt; out.extend_from_slice(\u0026amp;[ESC, ESC]), _ =\u0026gt; out.push(b), } } out.push(FLAG); out } fn byte_unstuff(src: \u0026amp;[u8]) -\u0026gt; Option\u0026lt;Vec\u0026lt;u8\u0026gt;\u0026gt; { let mut out = Vec::new(); let mut i = 0; while i \u0026lt; src.len() { match src[i] { ESC =\u0026gt; match src.get(i + 1) { Some(\u0026amp;b @ (FLAG | ESC)) =\u0026gt; { out.push(b); i += 2; } _ =\u0026gt; return None, }, b =\u0026gt; { out.push(b); i += 1; } } } Some(out) } fn nonce(seq: u32) -\u0026gt; Nonce { let mut n = [0u8; 12]; // ChaCha20-Poly1305 nonce is 96 bits n[8..].copy_from_slice(\u0026amp;seq.to_be_bytes()); // seq in the last 4 bytes Nonce::from(n) } fn decrypt_frame( frame: \u0026amp;[u8], cipher: \u0026amp;ChaCha20Poly1305, last_seq: \u0026amp;mut Option\u0026lt;u32\u0026gt; ) -\u0026gt; Option\u0026lt;Vec\u0026lt;u8\u0026gt;\u0026gt; { if frame.len() \u0026lt; 4 + 16 { return None; } // 4 seq + 16 AEAD auth tag (minimum) let seq = u32::from_be_bytes( frame[..4].try_into().ok()? ); // first 4 bytes: seq if last_seq.is_some_and(|p| seq \u0026lt;= p) { return None; } let pt = cipher.decrypt( \u0026amp;nonce(seq), Payload { msg: \u0026amp;frame[4..], aad: \u0026amp;frame[..4] } ).ok()?; // frame[4..] = ciphertext + auth tag | frame[..4] = seq as AAD *last_seq = Some(seq); Some(pt) } fn send_loop(mut w: impl Write, rx: mpsc::Receiver\u0026lt;Vec\u0026lt;u8\u0026gt;\u0026gt;, cipher: ChaCha20Poly1305) { let mut seq: u32 = 0; for msg in rx { let aad = seq.to_be_bytes(); let ct = cipher.encrypt(\u0026amp;nonce(seq), Payload { msg: \u0026amp;msg, aad: \u0026amp;aad }).unwrap(); let mut frame = aad.to_vec(); frame.extend_from_slice(\u0026amp;ct); w.write_all(\u0026amp;byte_stuff(\u0026amp;frame)).unwrap(); seq = seq.wrapping_add(1); } } fn recv_loop(mut r: impl Read, tx: mpsc::Sender\u0026lt;Vec\u0026lt;u8\u0026gt;\u0026gt;, cipher: ChaCha20Poly1305) { let (mut buf, mut b) = (Vec::new(), [0u8; 1]); let mut last_seq: Option\u0026lt;u32\u0026gt; = None; let mut desync_count: u32 = 0; // resync: count consecutive bad frames const DESYNC_LIMIT: u32 = 5; loop { if r.read_exact(\u0026amp;mut b).is_err() { break; } match b[0] { FLAG if buf.is_empty() =\u0026gt; {} FLAG =\u0026gt; { let ok = byte_unstuff(\u0026amp;buf) .and_then(|frame| decrypt_frame(\u0026amp;frame, \u0026amp;cipher, \u0026amp;mut last_seq)) .map(|pt| { let _ = tx.send(pt); }) .is_some(); // resync: escalate to rehandshake after repeated failures if ok { desync_count = 0; } else { desync_count += 1; if desync_count \u0026gt;= DESYNC_LIMIT { break; } } buf.clear(); } _ =\u0026gt; buf.push(b[0]), } } } In main(), the two loops are started with thread::spawn(move || send_loop(...)) and thread::spawn(move || recv_loop(...)), each taking ownership of its port half and cipher instance via the move closure.\nCode refs rustls — encrypt-then-frame in TLS 1.3 The clearest entry point is the send pipeline in rustls/src/conn/send.rs (lines 126–128):\nself.sendable_tls.append( self.encrypt_state.encrypt_outgoing(m).encode() ); encrypt_outgoing(m) seals the plaintext fragment with AEAD and returns an EncodedMessage\u0026lt;OutboundOpaque\u0026gt; — ciphertext only, no header yet. .encode() then writes the 5-byte TLS record header, including the 2-byte big-endian length of the now-encrypted payload.\nThat length-writing step lives in rustls/src/crypto/cipher/messages.rs (lines 191–197):\npub fn encode(self) -\u0026gt; Vec\u0026lt;u8\u0026gt; { let length = self.payload.len() as u16; let mut encoded_payload = self.payload.0; encoded_payload[0] = self.typ.into(); encoded_payload[1..3].copy_from_slice(\u0026amp;self.version.to_array()); encoded_payload[3..5].copy_from_slice(\u0026amp;(length).to_be_bytes()); encoded_payload } The 5-byte header slot is pre-allocated by OutboundOpaque::with_capacity() (lines 271–275) and stays zeroed until encode() fills it in — after encryption, never before.\nSignal — WebSocket as the framing layer Signal does not write a length prefix at the application level. Instead it encrypts with the Noise protocol (for attestation connections) or the Double Ratchet (for messages), then hands the ciphertext to the WebSocket client as a binary frame:\n// rust/net/infra/src/ws/attested.rs pub async fn send_bytes(\u0026amp;mut self, plaintext: \u0026amp;[u8]) -\u0026gt; Result\u0026lt;(), AttestedConnectionError\u0026gt; { let message = client_connection.send(plaintext)?; Ok(ws_client.write(message).await?) } (rust/net/infra/src/ws/attested.rs)\nThe framing is handled by the WebSocket layer itself. Each WebSocket frame carries a length field in its header per RFC 6455 §5.2, so the receiver always knows how many bytes to read before passing the blob to decryption. The encrypt-then-frame principle holds — it is just the WebSocket protocol that provides the \u0026ldquo;frame\u0026rdquo; part.\n","permalink":"https://rikettsie.github.io/posts/encrypted-protocols-three-layers/","summary":"How two peers exchanging encrypted frames can synchronize with each other — and why byte stuffing exists.","title":"Why encrypted protocols need three layers to stay in sync"},{"content":"Content addressability means an object\u0026rsquo;s address is its hash. For example you ask for af1349b9f5f9a1a6a..., you get exactly the bytes this hash points to and any deviation is immediately detectable.\nThe interesting engineering problem is verifying a stream of bytes as they arrive, chunk by chunk, without buffering the whole thing first.\nBAO does this. It\u0026rsquo;s an implementation of BLAKE3 hash verified streaming.\nFrom BLAKE3 to BAO The BLAKE3 tree structure BLAKE3 hashes a file by splitting it into 1 KiB chunks, hashing each chunk independently, then combining chunk hashes in a binary tree (a Merkle tree), reducing pairs at each level until a single 32-byte root hash remains.\nThis tree is implicit: it is not stored anywhere, it is derived from the file length and a deterministic chunking scheme. The root hash is what BLAKE3 reports.\nWhat BAO adds BAO (invented by Jack O\u0026rsquo;Connor) makes the Merkle tree explicit and transferable. A .bao file contains:\nThe file length (8 bytes) All interior node hashes, in a defined pre-order layout The raw file data, in 1 KiB chunks This layout lets a parser verify any chunk as it arrives: walk the stored tree from the root down to the chunk\u0026rsquo;s position, check each parent hash against its children, check the leaf against the chunk data. No chunk is accepted until its ancestry back to the trusted root is valid.\nThis means that before a chunk\u0026rsquo;s bytes are handed to the application, the parser walks up the Merkle tree from that chunk\u0026rsquo;s leaf node all the way to the root, recomputing each parent hash from its two children and checking it matches the stored value. Only if every hash on that path matches, right up to the root hash you already trust, then the chunk is considered genuine.\nIf any hash on the path is wrong, the chunk is rejected, even if the chunk data itself looks fine.\nSlice extraction The more powerful feature is slices: BAO can produce a proof for an arbitrary byte range [start, end). The proof is the subset of interior nodes needed to verify that range, plus the chunks themselves. A client can request bytes [512 KiB, 768 KiB) and verify them with ~2 KiB of proof data, without downloading the rest of the file.\nThis is what makes BAO useful for streaming large files over untrusted channels: video players, package managers and p2p networks can start processing data immediately and reject corruption at the chunk level.\nWhy it\u0026rsquo;s fast The proof for any range touches only O(log n) interior nodes, where n is the total number of chunks. A 1 GiB file has roughly 1 million 1 KiB chunks, so the proof path is at most ~20 hashes deep. Each hash is 32 bytes, the verification is few BLAKE3 compression rounds, and the tree pre-order layout means those nodes (the hashes) are close together on disk (no random seeks required!).\nBAO in the wild iroh-blobs, the blob transfer crate in the iroh ecosystem, builds its transfer strategy on BAO. Blobs are identified by their BLAKE3 hash and transferred as BAO streams. A receiver can verify each chunk in-flight, request only the ranges it needs, and resume an interrupted transfer without re-downloading validated chunks. The result is a verified streaming layer that works over QUIC, with no central index required.\nsendme, iroh\u0026rsquo;s file-sending tool, uses the same stack: a sender hashes a file with BLAKE3, advertises the hash, and the receiver fetches and verifies it as a BAO stream.\nringdrop is a streamed p2p file transfer tool with ring-based access control. Peers form authorized groups (rings) share files via rdrop:// tickets that only ring members can redeem. It is built on iroh for QUIC transport and NAT traversal, and on bao-tree for verified streaming: a BLAKE3 bitfield tracks which chunks have been verified, so an interrupted download resumes exactly where it left off without re-transferring anything.\nCode refs bao — original spec and reference implementation by Jack O\u0026rsquo;Connor; src/encode.rs and src/decode.rs bao-tree — production library used by iroh; src/io/ and src/tree.rs BLAKE3 — the underlying hash; src/lib.rs iroh-blobs — blob transfer built on BAO; src/store/ and src/get.rs Bibliography O\u0026rsquo;Connor, J. (2019). BAO: An incremental hash format built on BLAKE3. Specification. https://github.com/oconnor663/bao/blob/master/docs/spec.md O\u0026rsquo;Connor, J., Aumasson, J.-P., Neves, S., \u0026amp; Wilcox-O\u0026rsquo;Hearn, Z. (2020). BLAKE3: One function, fast everywhere. https://github.com/BLAKE3-team/BLAKE3-specs/blob/master/blake3.pdf Merkle, R. C. (1988). A digital signature based on a conventional encryption function. Advances in Cryptology — CRYPTO \u0026lsquo;87, LNCS vol. 293. Springer. ","permalink":"https://rikettsie.github.io/posts/bao-verified-streaming/","summary":"How BAO makes BLAKE3 hashes useful for streaming: verifying data without buffering the whole file.","title":"BAO: verified streaming over content-addressed blobs"},{"content":"Most devices on the internet sit behind a NAT: the router assigns them a private address and rewrites packet headers on the way out. Two peers behind separate NATs cannot reach each other directly, neither knows the other\u0026rsquo;s public address, and neither NAT will forward unsolicited packets.\nHole punching is the technique that makes direct connections possible despite this.\nHow NAT works (briefly) When a device behind NAT sends a packet out, the NAT creates a mapping: (private_ip:private_port) -\u0026gt; (public_ip:public_port). Packets returning to that public endpoint within the mapping lifetime are forwarded inward. The NAT does not forward packets arriving at a public port that has no existing mapping, i.e. that have not previously been triggered.\nThe hole-punching sequence Both peers connect to a relay server (also called signalling server) with a known public address. The relay can see each peer\u0026rsquo;s public IP and port (that\u0026rsquo;s usually called the \u0026ldquo;reflexive address\u0026rdquo;) and forwards this information to both peers at each side. Now:\nPeer A sends a packet to peer B\u0026rsquo;s reflexive address. B\u0026rsquo;s NAT drops it (no mapping exists), but A\u0026rsquo;s NAT creates a mapping, i.e. an \u0026ldquo;OK, GO\u0026rdquo; at A\u0026rsquo;s side of the pipe!, for future incoming traffic from B\u0026rsquo;s public address.\nPeer B simultaneously (or almost) sends a packet to A\u0026rsquo;s reflexive address. A\u0026rsquo;s NAT at that point leverage the previously established mapping for B\u0026rsquo;s address and forwards the packet. B\u0026rsquo;s NAT creates its own mapping for A.\nBoth sides of the pipe, open at roughly the same time. The hole is made! Subsequent packets flow through without the relay.\nLimitations of TCP and plain UDP Hole punching over TCP is theoretically possible via simultaneous open — both peers send SYN packets to each other at exactly the same time, so each NAT creates a mapping before the other\u0026rsquo;s SYN arrives. In practice this is brittle on two counts. First, some NATs track TCP state and actively block an incoming SYN that arrives at a port with only an outgoing SYN and no completed handshake — treating it as a potential SYN flood rather than a legitimate simultaneous open. Second, symmetric NATs assign a different external port per destination, so the reflexive address learned via the relay is useless for the direct SYN. The result is that TCP hole punching success rates vary too widely across NAT implementations to rely on.\nPlain UDP hole punching works more reliably, but leaves everything else to the application. NAT mappings for UDP typically expire after 30–300 seconds of inactivity, so the application must send keepalives or risk the hole closing. More critically, if the local IP or port changes — a mobile device switching from Wi-Fi to cellular, for instance — the mapping is gone and the whole punching sequence must restart from scratch. There is also no built-in encryption, ordering, or retransmission: the application owns all of that.\nWhy QUIC helps UDP hole punching existed long before QUIC, but QUIC makes the post-connection cleaner. QUIC connections are identified by a connection ID embedded in the packet, not by the 4-tuple (src_ip, src_port, dst_ip, dst_port). This means:\nIf a NAT remaps the port after connection (common on mobile), the QUIC connection can survive via a connection migration leveraging this ID. You can attempt multiple paths simultaneously and promote whichever succeeds first without restarting the handshake. QUIC collapses the transport and TLS 1.3 crypto handshakes into 1 RTT (versus 2 RTTs for TCP + TLS). Since hole punching often involves multiple retries, a faster handshake directly reduces the latency cost of each failed attempt. Resumed sessions can are even faster, converging in 0-RTT. iroh\u0026rsquo;s approach iroh combines three layers:\nRelay servers: they are always-on fallback, primarily used to exchange reflexive addresses, but in extreme cases to route data through as well. Direct path upgrade: iroh continuously attempts UDP hole punching; once a direct path is confirmed it is used for all data. Path monitoring: if the direct path goes silent or stale (NAT mapping expired), iroh falls back to the relay and reattempts punching. The result is a connection that starts instantly via relay and silently upgrades to direct, with transparent fallback, without the application layer needing to know.\nSymmetric NAT: the hard case Symmetric NAT assigns a different public port for each destination. The reflexive address seen by the relay is not the address a direct packet from the other side would arrive on. iroh tries to handle this with port prediction with euristics, but there is no general guarantee of direct connectivity through symmetric NAT.\nCode refs iroh-net — src/magicsock/ — hole punching and path management iroh relay server — the relay used to exchange reflexive addresses pion/stun — STUN implementation in Go Bibliography Ford, B., Srisuresh, P., \u0026amp; Kegel, D. (2005). Peer-to-peer communication across network address translators. Proceedings of the USENIX Annual Technical Conference (USENIX ATC \u0026lsquo;05). Iyengar, J., \u0026amp; Thomson, M. (2021). QUIC: A UDP-based multiplexed and secure transport. RFC 9000. IETF. Rosenberg, J., et al. (2008). Session Traversal Utilities for NAT (STUN). RFC 5389. IETF. (updated by RFC 8489, 2020) Rosenberg, J., et al. (2010). Traversal Using Relays around NAT (TURN). RFC 5766. IETF. Seemann, M., \u0026amp; Huitema, C. (2024). Implementing NAT hole punching with QUIC. arXiv:2408.01791. https://arxiv.org/pdf/2408.01791 ","permalink":"https://rikettsie.github.io/posts/nat-traversal-hole-punching-iroh/","summary":"How hole punching works, why TCP and plain UDP fall short, and how QUIC and iroh make direct P2P connections reliable.","title":"NAT traversal and hole punching: from UDP to QUIC with iroh"},{"content":"I recently had to parse a huge structured log file containing more than 100 million lines. Each line had a timestamp, a URL and other fields. The goal was to compute URL statistics fast, using only command line tools (a Python script or a small Rust program), without an aggregation database like Cassandra or ClickHouse.\nThe first question I needed to answer was: have I seen this URL before? A hash set would be the natural answer — no counts needed, just membership. But with 100 million distinct URLs, even a hash set is too large: each URL string is 50–100 bytes, putting the set somewhere between 5 and 10 GB — well beyond the available RAM in my situation.\nA Bloom filter is the right tool for this: a compact probabilistic structure that answers membership queries — \u0026ldquo;have I seen this key?\u0026rdquo; — using a fraction of the memory a hash table would require.\nHow it works A Bloom filter is an m-bit array, initialised to all zeros, paired with k independent hash functions, each mapping any key to a position in [0, m).\nInsertion: hash the key k times, set all k bits.\nQuery: hash the key k times, check all k bits. If any is 0, the key is definitely absent. If all are 1, the key is probably present.\nThere are no false negatives. False positives are possible: a queried key might hash to positions all set by other keys. That rate is tunable.\nBut how this false positive likelihood is tuned?\nChoosing the best m and k For n expected insertions and a desired false-positive rate p:\nm = -n ln(p) / (ln 2)² k = (m / n) ln 2 At 1% false positives, each element costs ~9.6 bits — roughly 1.2 bytes regardless of key size. A hash set for 100 million SHA-256 URLs would need ~3 GB; the equivalent Bloom filter needs ~120 MB.\nWhat\u0026rsquo;s not covered No deletion in the basic structure. A deleted element might clear bits shared with other elements, causing false negatives. To support deletion you must implement counting Bloom filters in which presence bits are replaced with counters (it supports deletion at the cost of 3×–to-4× more space). No enumeration: you cannot iterate over inserted elements. The bit array only records which positions were set by the hash functions, not the elements themselves. The mapping is one-way, i.e. from element to bit positions, but never the reverse. Moreover, multiple elements can set overlapping bits, so you cannot even tell which bits belong to which element. False positives are permanent: once a bit is set it stays set. For the use case I had to tackle, none of these were a problem.\nImplemented solutions In Python For this problem I used pybloomfiltermmap3, a Python library that backs the bit array with a memory-mapped file — which means the filter can be persisted across restarts and isn\u0026rsquo;t constrained by available RAM alone. The log was read as a stream through a Python generator, keeping file memory flat regardless of file size. For each URL, the filter answered my initial question in constant time; unseen URLs were written to an output file and added to the filter.\nThe filter is the only data structure that grows, roughly reching ~120 MB for 100 million URLs, well within budget and orders of magnitude cheaper than a hash set.\nfrom collections.abc import Generator from pathlib import Path from pybloomfilter import BloomFilter def stream_urls(log_path: Path) -\u0026gt; Generator[str, None, None]: with open(log_path) as f: for line in f: fields = line.split() yield fields[1] # position of URL token in the line def deduplicate_urls(log_path: Path, out_path: Path) -\u0026gt; None: # Initialize the Bloom filter with: # - how many expected elements, # - the desired false-positive rate (1%) # - path to the memory-mapped file backing the bit array bf = BloomFilter(100_000_000, 0.01, \u0026#34;urls.bloom\u0026#34;) with open(out_path, \u0026#34;w\u0026#34;) as out: for url in stream_urls(log_path): if url not in bf: bf.add(url) out.write(url + \u0026#34;\\n\u0026#34;) In Rust Edit [May 24, 2026]: adding an \u0026ldquo;a posteriori\u0026rdquo; implementation for this use-case in Rust, with fastbloom, a crate I\u0026rsquo;ve recently discovered:\nuse std::fs::File; use std::io::{BufRead, BufReader, BufWriter, Write}; use std::path::Path; use fastbloom::BloomFilter; fn stream_urls(log_path: \u0026amp;Path) -\u0026gt; impl Iterator\u0026lt;Item = String\u0026gt; { BufReader::new(File::open(log_path).expect(\u0026#34;failed to open log\u0026#34;)) .lines() .filter_map(|line| { let line = line.ok()?; Some(line.split_whitespace().nth(1)?.to_owned()) }) } fn deduplicate_urls(log_path: \u0026amp;Path, out_path: \u0026amp;Path) { let mut bf = BloomFilter::with_false_pos(0.01).expected_items(100_000_000); let mut out = BufWriter::new(File::create(out_path).expect(\u0026#34;failed to create output\u0026#34;)); for url in stream_urls(log_path) { if !bf.contains(\u0026amp;url) { bf.insert(\u0026amp;url); writeln!(out, \u0026#34;{url}\u0026#34;).expect(\u0026#34;failed to write\u0026#34;); } } } One notable difference between Python pybloomfiltermmap3 and Rust fastbloom is that the latter lives in memory, it has no mmap persistence. If the process die, you must start over.\nCode refs prashnts/pybloomfiltermmap3 — Python Bloom filter backed by a memory-mapped file tomtomwombat/fastbloom RocksDB — bloom_impl.h — production Bloom filter used per SSTable Bibliography Bloom, B. H. (1970). Space/time trade-offs in hash coding with allowable errors. Communications of the ACM, 13(7), 422–426. Broder, A., \u0026amp; Mitzenmacher, M. (2004). Network applications of Bloom filters: A survey. Internet Mathematics, 1(4), 485–509. Fan, B., Andersen, D. G., Kaminsky, M., \u0026amp; Mitzenmacher, M. (2014). Cuckoo filter: Practically better than Bloom. Proceedings of CoNEXT \u0026lsquo;14. ACM. Graf, T., \u0026amp; Lemire, D. (2019). Xor filters: Faster and smaller than Bloom and Cuckoo filters. ACM Journal of Experimental Algorithmics, 25. (arXiv:1912.08258) ","permalink":"https://rikettsie.github.io/posts/bloom-filters/","summary":"How to use a Bloom filter to check membership across 100 million URLs without blowing up RAM — and why a hash set wouldn\u0026rsquo;t cut it.","title":"Bloom filters: probabilistic membership at scale"},{"content":"Imagine your dataset is 200 GB, your RAM is 16 GB and you need it sorted.\nI learned the standard approach, which is unchanged in its essentials since the 1960s!\nPhase 1 — Generate sorted runs Read as much data as fits in memory, sort it, write it back to disk. Repeat until the whole input is covered. Each chunk written out is called a run. In our example, with 16 GB of RAM you get roughly ceil(200 / 16) = 13 initial runs — so 13 sorted files on disk, each 16 GB, that together cover the full dataset but are not yet merged into a single sorted sequence. That is what Phase 2 does.\nA variant called \u0026ldquo;Replacement selection\u0026rdquo; This is a slight improvement for the phase 1, using the trick of filling a min-heap by reading the first 16 GB sequentially from disk. Then, instead of writing the whole heap out and starting over (which would be equivalent to the previous naive solution), proceed element by element in the following way:\npop the smallest value from the heap, write it to the current run, and immediately read the next element from the disk input;\nif that new element is greater or equal than the last value written, it is inserted into the heap and will eventually be written out as part of the same run;\nif it is smaller, it is deferred to the next run. The heap stays full, acting as a sliding window over the input rather than a fixed chunk.\nOn random data this variant produces runs roughly twice the memory size, halving the number of initial runs. But let\u0026rsquo;s continue with the phase 2.\nPhase 2 — k-way merge Now you have N sorted runs on disk. Open k of them simultaneously, one read buffer per run, and merge them into a single output stream using a min-heap of size k. Each heap pop gives you the globally smallest remaining element.\nThe choice of k is a hardware trade-off:\nMore runs merged at once -\u0026gt; fewer passes over the data But each run needs a read buffer, and disk I/O is efficient only when buffers are large enough to amortize seek overhead In practice k is chosen so that k × buffer_size fits in available memory while leaving room for the output buffer.\nCounting the passes With N initial runs and merge fan-in k, sorting completes in:\npasses = ceil(log_k(N)) So, with 13 runs and k = 4, that is 2 passes. With k = 13, a single merge pass is enough.\nWhat databases do PostgreSQL\u0026rsquo;s external sort uses replacement selection for run generation and a k-way heap merge. SQLite does the same. MapReduce\u0026rsquo;s shuffle phase is a distributed external sort too: mappers produce sorted partitions (runs), reducers merge them.\nReferences PostgreSQL — tuplesort.c — replacement selection and k-way heap merge\nSQLite — vdbesort.c — external sort used by the virtual database engine\nApache Hadoop — MapReduce — distributed external sort in the shuffle phase\nKnuth, D. E. (1973). The Art of Computer Programming, Vol. 3: Sorting and Searching (1st ed.), Ch. 5.4 External Sorting. Addison-Wesley. (2nd ed. 1998) — the canonical reference that formalized external merge sort; the algorithms themselves trace back to IBM and Bell Labs tape-sorting work circa 1959–1963.\nVitter, J. S. (2008). Algorithms and data structures for external memory. Foundations and Trends in Theoretical Computer Science, 2(4), 305–474.\nSalzberg, B. (1989). Merging sorted runs using large main memory. Acta Informatica, 27(3), 195–215.\n","permalink":"https://rikettsie.github.io/posts/external-merge-sort/","summary":"How external merge sort handles datasets larger than RAM, using sorted runs and k-way merging.","title":"External merge sort: sorting datasets that don't fit in RAM"},{"content":"First of all: is Signal p2p? Signal is not peer-to-peer at the transport layer. Messages are routed through Signal\u0026rsquo;s central servers, which hold prekeys, queue messages for offline recipients. They are a required intermediary, i.e. you cannot use Signal without Signal\u0026rsquo;s infrastructure.\nAt the cryptographic layer, however, the server is a blind relay. It never sees plaintext, and with sealed sender it does not learn who is talking to whom. The trust model is end-to-end: the server is in the path but not in the circle of trust.\nThus, in contrast with truly p2p messaging systems like Matrix, Signal trades decentralisation for simplicity with a tighter, heavier, security model.\nThe rest of this post is about how this security model is built; it has been very interesting to me to learn all these bits.\nSignal\u0026rsquo;s security model Signal\u0026rsquo;s security goal is strict, and it targets three distinct properties.\nForward secrecy: even if a server is compromised today, past messages stay private, i.e. old message keys are deleted and cannot be reconstructed.\nBreak-in recovery: even if a client is compromised and current keys are stolen, the attacker loses access again after the next ratchet step.\nDeniability: Signal produces no unforgeable proof of authorship. Even if someone records your messages, they cannot prove to a third party that you wrote them — the other party could have forged any message, unlike with PGP signatures which bind a message irrevocably to a key.\nReaching this goal requires layering three cryptographic ideas.\n1. Asynchronous key agreement 1a. X3DH — Extended Triple Diffie-Hellman The X3DH protocol lets one peer (Alice) start encrypting messages to the other (Bob) even when this one is offline.\nBob registers with the server three kinds of keys:\nIK_B — long-term identity key SPK_B — medium-term signed prekey (rotated weekly), signed by IK_B OPK_B — a batch of one-time prekeys (used once, ever) Alice generates an ephemeral key EK_A and computes four DH operations:\nDH1 = DH(IK_A, SPK_B) DH2 = DH(EK_A, IK_B) DH3 = DH(EK_A, SPK_B) DH4 = DH(EK_A, OPK_B) # omitted if no OPK available The shared secret is KDF(DH1 || DH2 || DH3 || DH4). Each DH covers a different security property: IK-to-IK for authentication, EK-to-IK for forward secrecy, EK-to-SPK for binding to Bob\u0026rsquo;s current key, EK-to-OPK for one-time deniability.\n1b. PQXDH — X3DH\u0026rsquo;s post-quantum successor X3DH was deprecated by Signal this September 2023 and replaced with PQXDH (Post-Quantum Extended Diffie-Hellman) which was released in the client application version 6.35. The reason of deprecation is the harvest now, decrypt later threat: an adversary can record encrypted traffic today and store it until a large-scale quantum computer becomes available. Shor\u0026rsquo;s algorithm would then break the elliptic curve Diffie-Hellman at the heart of X3DH, retroactively exposing every past session.\nPQXDH addresses this by adding a post-quantum KEM (standing for Key Encapsulation Mechanism), the CRYSTALS-Kyber, alongside the existing X25519 exchange. Bob publishes an additional Kyber prekey; Alice encapsulates a secret under it and mixes the result into the shared secret alongside the four classical Diffie-Hellman (DH) outputs. An attacker must break both the classical and the post-quantum component to compromise the session, it\u0026rsquo;s hard.\nAll of X3DH\u0026rsquo;s goals are preserved: asynchronous key agreement, forward secrecy, deniability, and the same prekey registration flow. The Double Ratchet that follows is unchanged.\n2. The Double Ratchet — per-message key evolution Once X3DH establishes a root key, the Double Ratchet takes over for the conversation. It combines two ratchets:\nSymmetric-key ratchet (KDF chain): each message derives the next message key by hashing the current chain key. Deleting used keys gives forward secrecy — a stolen device cannot decrypt past messages.\nDiffie-Hellman ratchet: every time the other party replies, both sides perform a new DH exchange and inject the result into the root chain. This gives break-in recovery — if an attacker steals your current keys, they lose access as soon as the next DH ratchet step occurs.\n3. Sealed sender By default, the Signal server learns who is messaging whom. Sealed sender feature (introduced in 2018) wraps the sender\u0026rsquo;s identity inside the encrypted payload so the server sees only the recipient. Combined with the server\u0026rsquo;s lack of metadata logging, this feature challenges traffic analyzers.\nDesign rationale The three layers (key agreement, double ratchet, sealed sender) close a specific gap, targeting distinct threats:\nPQXDH: bootstrap without a live session Double Ratchet: key compromise containment over time Sealed sender: minimizes peer metadata knoledge by the infrastructure The formal security model was verified by Cohn-Gordon et al. (2016), who proved the Double Ratchet achieves \u0026ldquo;message secrecy\u0026rdquo; and \u0026ldquo;break-in recovery\u0026rdquo; under standard assumptions.\nReferences libsignal — rust/protocol/src/ — official implementation; ratchet/ and keys/ are the core\nSignal-Android — app/src/main/java/org/thoughtcrime/securesms/crypto/\npython-doubleratchet — pedagogical standalone Double Ratchet\nMarlinspike, M., \u0026amp; Perrin, T. (2016). The X3DH key agreement protocol. Signal Foundation. https://signal.org/docs/specifications/x3dh/\nPerrin, T. (2023). The PQXDH key agreement protocol. Signal Foundation. https://signal.org/docs/specifications/pqxdh/\nMarlinspike, M., \u0026amp; Perrin, T. (2016). The Double Ratchet algorithm. Signal Foundation. https://signal.org/docs/specifications/doubleratchet/\nCohn-Gordon, K., Cremers, C., Dowling, B., Garratt, L., \u0026amp; Stebila, D. (2016). A formal security analysis of the Signal messaging protocol. IEEE European Symposium on Security and Privacy (EuroS\u0026amp;P 2017). (ePrint 2016/1013)\nUnger, N., et al. (2015). SoK: Secure messaging. IEEE Symposium on Security and Privacy (S\u0026amp;P 2015).\n","permalink":"https://rikettsie.github.io/posts/signal-x3dh-pqxdh-double-ratchet/","summary":"How Signal achieves forward secrecy, break-in recovery, and deniability — from X3DH to PQXDH and the Double Ratchet.","title":"The Signal protocol: X3DH vs PQXDH, and the Double Ratchet"},{"content":"Distributed systems need a way to find things without a central index. Kademlia, introduced in 2002 by Petar Maymounkov and David Mazières, solves this with a deceptively simple idea: treat node IDs as points in a binary space, and measure distance with XOR.\nWhy XOR? XOR has a property that makes it perfect for routing: for any two points A and B, the distance d(A, B) = A XOR B is:\nSymmetric: d(A, B) = d(B, A) Zero iff equal: d(A, A) = 0 Triangle inequality holds (as a metric over GF(2)^n) Unidirectional: for a given point A and distance d, there is exactly one point B such that d(A, B) = d. This means lookups converge monotonically — you never overshoot. One space for everything A foundational decision in Kademlia is that node IDs and content keys live in the same space. This is a very important point that I didn\u0026rsquo;t precisely catch at the beginning of my study of Kademlia, and precisely what makes Kademlia work :).\nA node is assigned a random 160-bit ID; a file or value is stored under a key that is also a 160-bit identifier (typically the SHA-1 or SHA-256 hash of the content). Both are just points in the same binary space, and XOR distance applies equally to both.\nThis means routing to a value works exactly like routing to a node: you treat the content key as if it were a node ID and walk toward it. No separate lookup mechanism is needed.\nThe simplicity this iplies is considerable: one algorithm, one metric, one routing table, for both node discovery and content retrieval.\nThe routing table: k-buckets Each node maintains a routing table structured as a binary tree. For each possible distance prefix [2^i, 2^(i+1)), it keeps at most k contacts (typically k = 20). These are the k-buckets.\nWhen a bucket is full and a new contact arrives, Kademlia checks whether the least-recently seen node is still alive. If it is, the new node is dropped; if it has gone silent, it is evicted. This bias toward long-lived nodes makes the routing table robust — nodes that have been up for an hour are statistically more likely to stay up than nodes that just joined.\nNode lookup To find the node closest to a target key t, you start with what we call \u0026ldquo;the α (alpha)\u0026rdquo;, typically the 3 closest known nodes and send them parallel FIND_NODE(t) requests. Each responds with its own closest known nodes.\nYou recurse, always advancing toward the target, until the closest node you\u0026rsquo;ve found stops changing.\nThe algorithm terminates in O(log n) rounds.\nWhy?\nBecause each step gets you at least 1 bit closer to the target in the XOR ordering, i.e. you halve the remaining distance in the XOR sense. There are at most 160 bits of prefix to exhaust, but with n nodes the effective depth is log(n) since the tree is sparse beyond that.\nStoring and retrieving values STORE(key, value) publishes to the k nodes closest to the key.\nFIND_VALUE(key) runs like a node lookup but stops as soon as any node returns the stored value. Publishers re-announce every hour so content stays alive even as nodes churn.\nThus the k closest nodes all store a copy of the value — k (typically 20) is chosen to be large enough that even with significant churn (many nodes coming and going in a short time), at least one node holding the value is likely to remain online. This is deliberate replication for availability.\nThe re-announcement every hour (publishers re-publish the value periodically) is a complementary mechanism for resilience: as the set of closest nodes shifts due to churn, the value gets replicated to newly-closest nodes before old ones disappear.\nSome trailing words about the XOR metric The XOR metric is an example of how the right mathematical abstraction collapses complexity: one operator unifies node lookup and content retrieval, routing and storage.\nMoreover, the fact that for any two points there is exactly one point at a given XOR distance, this makes convergence inevitable!\nComparison with \u0026ldquo;Chord\u0026rdquo; alternative: Chord (by Stoica et al., 2001) is another Distributed Hash Table (DHT), this one organizing nodes on a one-dimensional ring, ordered by ID. Routing hops along the ring using finger tables that jump by powers of two. It achieves O(log n) lookups like Kademlia, but the ring metric is asymmetric, i.e. the distance from A to B is not the same as from B to A, so routing tables require careful directional logic and lookups have edge cases at ring boundaries. XOR is symmetric by definition, which iplies a node\u0026rsquo;s routing table covers both directions simultaneously and the lookup algorithm needs no special cases.\nThe deeper point is that XOR is not just a convenience — it is the tightest possible metric for this problem. It maps naturally onto binary tries, it guarantees monotonic convergence, and it makes the keyspace for nodes and content truly interchangeable. Chord achieves similar complexity but with more \u0026ldquo;machinery\u0026rdquo;. Kademlia finds the elegant path: one binary operator, and the rest follows.\nReferences bmuller/kademlia — clean Python reference; routing.py for the bucket logic\ngo-libp2p-kad-dht — IPFS\u0026rsquo;s implementation; routing_table.go\nMaymounkov, P., \u0026amp; Mazières, D. (2002). Kademlia: A peer-to-peer information system based on the XOR metric. Proceedings of the 1st International Workshop on Peer-to-Peer Systems (IPTPS \u0026lsquo;02). Lecture Notes in Computer Science, vol 2429. Springer.\nBaumgart, I., \u0026amp; Mies, S. (2007). S/Kademlia: A practicable approach towards secure key-based routing. 13th International Conference on Parallel and Distributed Systems (ICPADS \u0026lsquo;07). IEEE.\nStoica, I., Morris, R., Karger, D., Kaashoek, M. F., \u0026amp; Balakrishnan, H. (2001). Chord: A scalable peer-to-peer lookup service for internet applications. Proceedings of ACM SIGCOMM \u0026lsquo;01.\n","permalink":"https://rikettsie.github.io/posts/kademlia-xor-distance/","summary":"The working principle of Kademlia, in particular relating to the XOR distance metric.","title":"The beauty of Kademlia: the XOR distance"}]