Introduction

Built with AI. Murmer is built and maintained with the help of AI coding tools, primarily Claude. I drive the design and review every change, but a lot of the implementation, tests, and docs are written in collaboration with AI. I’d rather you know that going in.

Murmer is a distributed actor framework for Rust, built on tokio and QUIC.

It provides typed, location-transparent actors that communicate through message passing. Whether an actor lives in the same process or on a remote node across the network, you interact with it through the same Endpoint<A> API.

Why I built this

I’ve spent years working with Elixir and the BEAM VM, and the actor model there is something I’ve grown deeply fond of — the simplicity of processes, message passing, and supervision just works. When I looked at bringing that experience to Rust, I studied existing implementations like Actix, Telepathy, and Akka (on the JVM side). They’re impressive systems, but I kept running into the same friction: getting a basic actor up and running was complex, and adding remote communication across nodes was even more so.

Murmer is an experiment in answering a simple question: can you build a robust distributed actor system in Rust that’s actually simple to use?

The answer, it seems, is yes.

The design draws heavy inspiration from BEAM OTP’s supervision and process model, Akka’s clustering approach, and Apple’s Swift Distributed Actors for the typed, location-transparent endpoint API. The goal is to combine these ideas with Rust’s performance and safety guarantees — zero-cost local dispatch, compile-time message type checking, and automatic serialization over encrypted QUIC connections when actors span nodes.

Murmer in 1 minute

Install:

[dependencies]
murmer = "0.3"
serde = { version = "1", features = ["derive"] }
tokio = { version = "1", features = ["full"] }

Write an actor, send it messages:

use murmer::prelude::*;

// ① Define your actor — state lives separately
#[derive(Debug)]
struct Counter;
struct CounterState { count: i64 }

impl Actor for Counter {
    type State = CounterState;
}

// ② Handlers become the actor's API
#[handlers]
impl Counter {
    #[handler]
    fn increment(
        &mut self,
        _ctx: &ActorContext<Self>,
        state: &mut CounterState,
        amount: i64,
    ) -> i64 {
        state.count += amount;
        state.count
    }

    #[handler]
    fn get_count(
        &mut self,
        _ctx: &ActorContext<Self>,
        state: &mut CounterState,
    ) -> i64 {
        state.count
    }
}

#[tokio::main]
async fn main() {
    // ③ Create a local actor system
    let system = System::local();

    // ④ Start an actor — returns a typed Endpoint<Counter>
    let counter = system.start("counter/main", Counter, CounterState { count: 0 });

    // ⑤ Send messages via auto-generated extension methods
    let result = counter.increment(5).await.unwrap();
    println!("Count: {result}"); // → Count: 5

    // ⑥ Look up actors by label — works for local and remote
    let found = system.lookup::<Counter>("counter/main").unwrap();
    let count = found.get_count().await.unwrap();
    println!("Looked up: {count}"); // → Looked up: 5
}

cargo run

That’s it — a complete, working actor system. The rest of this page explains what’s happening under the hood. The Getting Started chapter goes deeper into each component.

What it gives you

Send messages without caring where the actor lives. counter.increment(5) (line ⑤) works identically whether the actor is local or on a remote node — the Endpoint<A> API abstracts the difference away.
Test distributed systems from a single process. System::local() (line ③) runs everything in-memory. Swap to System::clustered() when you’re ready for real networking — your actor code stays identical.
Define actors with minimal boilerplate. The #[handlers] macro (line ②) auto-generates message structs (Increment, GetCount), dispatch tables, serialization, and the extension methods you call on line ⑤.
Get networking and encryption handled for you. QUIC transport with automatic TLS, SWIM-based cluster membership, and mDNS discovery — all configured, not hand-rolled.
Supervise actors like OTP. Restart policies (Temporary, Transient, Permanent) with configurable limits and exponential backoff keep your system running through failures.
Orchestrate applications across a cluster. The app module adds placement strategies, leader election, and crash recovery — so you can declare what should run and where, and the framework handles the rest.

What’s happening: line by line

① Actor + State — Counter is a zero-sized struct. All mutable state lives in CounterState, passed as &mut to every handler. This keeps the actor lightweight and the state threading explicit.

② #[handlers] — The macro reads your method signatures and generates:

Message structs — Increment { pub amount: i64 } and GetCount (unit struct)
Handler<Increment> and Handler<GetCount> trait implementations
RemoteDispatch — a wire-format dispatch table so remote nodes know how to route messages to the right handler
CounterExt — an extension trait on Endpoint<Counter> that gives you .increment(amount) and .get_count() methods

③ System::local() — Creates the actor runtime and boots the Receptionist — the internal actor registry that tracks all actors by label and type.

④ system.start(...) — Wraps Counter in a Supervisor that manages its lifecycle, mailbox, and restart behavior. Registers it with the Receptionist under the label "counter/main". Returns an Endpoint<Counter> — your typed send handle.

⑤ counter.increment(5) — The extension method constructs an Increment { amount: 5 } message and sends it through the Endpoint. Since this is a local actor, the message is dispatched as a zero-copy envelope through the Supervisor’s mailbox to the handler. The result comes back through a oneshot channel.

⑥ system.lookup(...) — Queries the Receptionist for an actor of type Counter at label "counter/main". Returns the same Endpoint<Counter>. In a clustered system, this could return a proxy endpoint that transparently serializes messages over QUIC to a remote node.

Core concepts

Concept	In the example	Purpose
Actor	`Counter` + `CounterState`	Stateful message processor. Actor has no fields — state lives separately.
Message	Generated `Increment`, `GetCount`	Defines a request and its response type.
RemoteMessage	Generated by `#[handlers]`	A message that can cross the wire (serializable + `TYPE_ID`).
Endpoint	`counter` from `system.start(...)`	Opaque send handle. Abstracts local vs remote — callers never know which.
Receptionist	Powers `system.lookup(...)`	Type-erased actor registry. Start, lookup, and subscribe to actors.
Router	Not shown — see Discovery	Distributes messages across a pool of endpoints (round-robin, broadcast).
Listing	Not shown — see Discovery	Async stream of endpoints matching a `ReceptionKey`.

Architecture

Architecture: System → Receptionist → Supervisor → Endpoint → Cluster

Every layer in this diagram is touched by the example code:

System — created at line ③, runs the entire runtime
Receptionist — populated at line ④ (start), queried at line ⑥ (lookup)
Supervisor Layer — wraps Counter at line ④, manages its mailbox, would handle restarts if configured
Endpoint‹A› — returned at line ④, used at line ⑤ to send messages; local dispatch here, but swap to System::clustered() and the same endpoint transparently routes over QUIC
Cluster Layer — not active in System::local(), but requires zero code changes to enable (see Clustering)

Key design decisions

Endpoint<A> is the only API — callers never know if an actor is local or remote.
Receptionist is non-generic — stores type-erased entries internally, uses TypeId guards for safe downcasts at lookup time.
Supervisors are flat — each actor has its own supervisor, no parent-child hierarchy.
Labels are paths — "cache/user", "worker/0", "thumbnail/processor/3". Hierarchical naming for organizational clarity.
Fail-fast networking — if a QUIC stream fails, all pending responses error immediately instead of hanging.

From primitives to applications

Murmer works at two levels:

The core (Actors, Discovery, Supervision, Clustering) gives you the building blocks — everything you saw in the example above. You can build complete services with just these primitives.

The application layer (Application Orchestration) builds on top of the core to manage real, running applications across a cluster. You declare what actors should run, where they should be placed (with constraints like “must have GPU” or “must be a Worker node”), and what happens when a node fails — and the Coordinator handles placement, spawning, and crash recovery automatically.

murmer core primitives flowing into app module orchestration

Other libraries to consider

Murmer is still v0 and an experiment. For anything real, you should probably consider using one of these instead. They inspired Murmer and most of them are far more mature:

Erlang/Elixir OTP — the actor model and supervision tree that started it all for me. If you can run on the BEAM, OTP is decades of battle-tested production experience.
Apple Swift Distributed Actors — the typed, location-transparent endpoint design that shaped Murmer’s API. Worth reading even if you’re not writing Swift.
Actix — the most established actor framework in Rust. Mature, fast, and great for in-process actors.
ractor — a newer Rust actor library inspired by Erlang, with a clustering story of its own. Well worth a look if you’re comparing options.

Learn more

Getting Started — deeper walkthrough of each component
API Reference on docs.rs
Source on GitHub

Getting Started

This chapter goes deeper into the components you saw in the introduction. If you haven’t seen the 1-minute example yet, start there — this chapter assumes you’ve seen the basics and want to understand more.

Dependencies

[dependencies]
murmer = "0.3"
serde = { version = "1", features = ["derive"] }
tokio = { version = "1", features = ["full"] }

murmer is the core framework. The macros feature (on by default) re-exports #[handlers], #[handler], and #[derive(Message)] from murmer-macros — no separate dependency needed. Both serde and tokio are required — serde for message serialization (even in local mode, the types need to be Serialize + Deserialize for remote readiness), and tokio as the async runtime.

The Actor trait

Every actor implements the Actor trait, which has one required associated type: State.

use murmer::prelude::*;

#[derive(Debug)]
struct ChatRoom;

struct ChatRoomState {
    room_name: String,
    messages: Vec<ChatEntry>,
    max_messages: usize,
}

struct ChatEntry {
    from: String,
    text: String,
    timestamp: u64,
}

impl Actor for ChatRoom {
    type State = ChatRoomState;
}

Why state lives separately

This is a deliberate design choice. In many actor frameworks, state lives directly on the actor struct. In murmer, the actor struct is typically empty (zero-sized) and all mutable state lives in the associated State type.

This gives you:

Explicit state threading — every handler receives &mut State, making it clear what data is being read and modified.
Clean restarts — when a supervisor restarts an actor, the factory creates a fresh (Actor, State) pair. No hidden state carried over from a crashed instance.
Separation of identity and data — the actor struct can carry configuration or immutable context (like a database pool handle), while State holds the mutable per-instance data.

You can put fields on the actor struct — they just won’t be part of the restart cycle:

struct ChatRoom {
    db: DatabasePool,  // immutable, shared across restarts
}

struct ChatRoomState {
    messages: Vec<ChatEntry>,  // mutable, reset on restart
}

Defining handlers

Handlers are methods on the actor that process incoming messages. The #[handlers] macro on the impl block and #[handler] on individual methods does the heavy lifting:

#[handlers]
impl ChatRoom {
    #[handler]
    fn post_message(
        &mut self,
        _ctx: &ActorContext<Self>,
        state: &mut ChatRoomState,
        from: String,
        text: String,
    ) -> usize {
        state.messages.push(ChatEntry {
            from,
            text,
            timestamp: now(),
        });
        // Trim if over limit
        if state.messages.len() > state.max_messages {
            state.messages.remove(0);
        }
        state.messages.len()
    }

    #[handler]
    fn get_history(
        &mut self,
        _ctx: &ActorContext<Self>,
        state: &mut ChatRoomState,
    ) -> Vec<String> {
        state.messages.iter()
            .map(|e| format!("[{}] {}: {}", e.timestamp, e.from, e.text))
            .collect()
    }

    #[handler]
    fn room_name(
        &mut self,
        _ctx: &ActorContext<Self>,
        state: &mut ChatRoomState,
    ) -> String {
        state.room_name.clone()
    }
}

Handler signature rules

Every handler method must follow this pattern:

fn method_name(
    &mut self,                          // always &mut self
    ctx: &ActorContext<Self>,           // always second — the actor's context
    state: &mut YourStateType,         // always third — mutable state access
    // ... additional parameters become message fields
) -> ReturnType {
    // ...
}

&mut self — the actor instance.
ctx: &ActorContext<Self> — provides access to the system, receptionist, the actor’s own label, and methods like ctx.watch() for actor monitoring. Prefix with _ if unused.
state: &mut State — the actor’s mutable state.
Additional parameters — each one becomes a field on the generated message struct. fn increment(... amount: i64) generates Increment { pub amount: i64 }.
Return type — becomes the message’s Result type. Handlers must return a value (not ()).

What gets generated

For the ChatRoom example above, the macro generates:

// Message structs
struct PostMessage { pub from: String, pub text: String }
struct GetHistory;  // unit struct — no extra params
struct RoomName;

// Trait implementations
impl Handler<PostMessage> for ChatRoom { /* ... */ }
impl Handler<GetHistory> for ChatRoom { /* ... */ }
impl Handler<RoomName> for ChatRoom { /* ... */ }

// Remote dispatch table
impl RemoteDispatch for ChatRoom { /* ... */ }

// Extension trait for ergonomic sends
trait ChatRoomExt {
    fn post_message(&self, from: String, text: String) -> impl Future<Output = Result<usize>>;
    fn get_history(&self) -> impl Future<Output = Result<Vec<String>>>;
    fn room_name(&self) -> impl Future<Output = Result<String>>;
}

impl ChatRoomExt for Endpoint<ChatRoom> { /* ... */ }

This means you can call endpoint.post_message("alice".into(), "hello".into()) directly on any Endpoint<ChatRoom>, without ever constructing a message struct yourself.

Async handlers

For handlers that need to perform async work (I/O, database queries, HTTP calls), use async fn:

#[handlers]
impl ChatRoom {
    #[handler]
    async fn fetch_and_store(
        &mut self,
        ctx: &ActorContext<Self>,
        state: &mut ChatRoomState,
        url: String,
    ) -> Result<usize, String> {
        // You can .await here — the supervisor handles scheduling
        let response = reqwest::get(&url).await
            .map_err(|e| e.to_string())?;
        let body = response.text().await
            .map_err(|e| e.to_string())?;

        state.messages.push(ChatEntry {
            from: "system".into(),
            text: body,
            timestamp: now(),
        });
        Ok(state.messages.len())
    }
}

Async handlers generate AsyncHandler<FetchAndStore> instead of Handler<FetchAndStore>. The supervisor processes async handlers cooperatively — while one handler is awaiting, no other messages are processed for that actor (preserving the single-writer invariant).

Explicit message types

The auto-generated messages from #[handlers] cover most cases, but sometimes you want to define a message type explicitly — for example, when multiple actors handle the same message:

use murmer::Message;
use serde::{Serialize, Deserialize};

#[derive(Debug, Clone, Serialize, Deserialize, Message)]
#[message(result = String)]
struct Ping {
    payload: String,
}

To use an explicit message in a handler, name the parameter msg:

#[handlers]
impl ChatRoom {
    #[handler]
    fn ping(
        &mut self,
        _ctx: &ActorContext<Self>,
        _state: &mut ChatRoomState,
        msg: Ping,          // "msg" signals: use this type directly
    ) -> String {
        format!("pong: {}", msg.payload)
    }
}

The msg parameter name tells the macro to use Ping as-is instead of generating a new message struct.

For messages that need to cross the network, add remote:

#[derive(Debug, Clone, Serialize, Deserialize, Message)]
#[message(result = String, remote = "my_app::Ping")]
struct Ping {
    payload: String,
}

The remote value is a unique string identifier used for wire-format routing. See the Proc Macro Reference for the full details.

The System

The System is the entry point for everything — it runs the actor runtime, manages the receptionist, and optionally handles clustering.

// Local mode — in-memory, no networking
let system = System::local();

// Clustered mode — QUIC networking, SWIM membership
let system = System::clustered_auto(config).await?;

Both modes expose the same API. Your actor code doesn’t change.

Starting actors

// Returns Endpoint<ChatRoom>
let room = system.start(
    "room/general",                   // label — unique within the cluster
    ChatRoom,                         // actor instance
    ChatRoomState {                   // initial state
        room_name: "general".into(),
        messages: vec![],
        max_messages: 1000,
    },
);

The label "room/general" is a path-like string that uniquely identifies this actor in the system. Labels are how actors find each other through the receptionist.

Looking up actors

// Type-safe lookup — returns Option<Endpoint<ChatRoom>>
let room = system.lookup::<ChatRoom>("room/general");

if let Some(ep) = room {
    let history = ep.get_history().await?;
}

Lookups are type-checked at compile time. If the label exists but the type doesn’t match, None is returned.

Endpoints in depth

Endpoint<A> is the central abstraction. It’s:

Typed — Endpoint<ChatRoom> can only send messages that ChatRoom handles
Cloneable — lightweight handle, share freely across tasks
Location-transparent — local endpoints dispatch through in-memory channels; remote endpoints serialize over QUIC

// All of these work identically
let room: Endpoint<ChatRoom> = system.start("room/1", ChatRoom, state);
let room: Endpoint<ChatRoom> = system.lookup::<ChatRoom>("room/1").unwrap();

// Clone and pass to another task
let room2 = room.clone();
tokio::spawn(async move {
    room2.post_message("bot".into(), "background task".into()).await.unwrap();
});

// Send via extension methods (ergonomic)
room.post_message("alice".into(), "hello".into()).await?;

// Or send a message struct directly
room.send(PostMessage { from: "alice".into(), text: "hello".into() }).await?;

Both .post_message(...) (extension method) and .send(PostMessage { ... }) (direct) do the same thing. The extension methods are more ergonomic for the common case.

Actor watches

Monitor other actors and get notified when they terminate — inspired by Erlang’s monitor/2:

struct Watchdog;
struct WatchdogState { terminated: Vec<String> }

impl Actor for Watchdog {
    type State = WatchdogState;

    fn on_actor_terminated(
        &mut self,
        state: &mut WatchdogState,
        terminated: &ActorTerminated,
    ) {
        tracing::warn!(
            "Actor {} terminated: {:?}",
            terminated.label,
            terminated.reason
        );
        state.terminated.push(terminated.label.clone());
    }
}

#[handlers]
impl Watchdog {
    #[handler]
    fn watch(
        &mut self,
        ctx: &ActorContext<Self>,
        _state: &mut WatchdogState,
        label: String,
    ) -> bool {
        ctx.watch(&label);
        true
    }
}

The on_actor_terminated callback on the Actor trait fires when any watched actor stops, crashes, or is killed. The ActorTerminated struct tells you which actor and why.

Going from local to clustered

The entire point of murmer’s design is that this transition requires zero changes to your actor code. Only the system construction changes:

// Before: local
let system = System::local();

// After: clustered
let config = ClusterConfig::builder()
    .name("my-node")
    .listen("0.0.0.0:7100".parse()?)
    .cookie("my-cluster-secret")
    .seed_nodes(["192.168.1.1:7100".parse()?])
    .build()?;

let system = System::clustered_auto(config).await?;

Everything else — system.start(...), system.lookup(...), endpoint.send(...) — stays identical. Actors on remote nodes appear in your local receptionist automatically via registry replication.

See the Clustering chapter for the full walkthrough, including Docker deployment and the interactive cluster_chat example.

Build and test

cargo build
cargo nextest run
cargo clippy -- -D warnings

Next steps

Now that you understand the components, dive into the specific areas:

Actors and Messages — the complete actor model: state, handlers, endpoints, location transparency
Discovery — labels, reception keys, listings, and routing
Supervision — restart policies, backoff, actor factories
Clustering — QUIC networking, SWIM membership, multi-node deployment
Proc Macro Reference — everything #[handlers] and #[derive(Message)] generate
Application Orchestration — placement, leader election, crash recovery

Actors and Messages

This chapter covers the core building blocks: actors, state, messages, handlers, and endpoints.

Actors

An actor in murmer is a small server defined as a Rust struct with a set of handler methods, persisting state, and message types. You define an actor by implementing the Actor trait, then implementing Handler methods for each message type it can handle.

use murmer::prelude::*;

#[derive(Debug)]
struct ChatRoom;

struct ChatRoomState {
    room_name: String,
    messages: Vec<ChatEntry>,
}

impl Actor for ChatRoom {
    type State = ChatRoomState;
}

Key points:

The actor struct itself is typically a zero-sized type (no fields). All mutable state lives in the associated State type.
State is passed as &mut State to every handler, keeping the actor struct itself lightweight and the state explicitly threaded.
Each actor runs inside a supervisor that manages its lifecycle, mailbox, and restart behavior.

Messages

A message is a type that can be sent to an actor. Every message defines a result type and can optionally be serializable for remote delivery.

There are two ways to define messages:

Auto-generated messages (recommended)

When using #[handlers] with #[handler], the macro generates message structs automatically from your method signatures:

#[handlers]
impl ChatRoom {
    #[handler]
    fn post_message(
        &mut self,
        _ctx: &ActorContext<Self>,
        state: &mut ChatRoomState,
        from: String,
        text: String,
    ) -> usize {
        state.messages.push(ChatEntry { from, text });
        state.messages.len()
    }

    #[handler]
    fn get_history(
        &mut self,
        _ctx: &ActorContext<Self>,
        state: &mut ChatRoomState,
    ) -> Vec<String> {
        state.messages.iter()
            .map(|e| format!("{}: {}", e.from, e.text))
            .collect()
    }
}

This generates PostMessage { pub from: String, pub text: String } and GetHistory unit struct, plus all the trait implementations and an extension trait ChatRoomExt on Endpoint<ChatRoom>.

Explicit messages

For messages shared across multiple actors, define them manually with #[derive(Message)]:

use murmer::Message;
use serde::{Serialize, Deserialize};

#[derive(Debug, Clone, Serialize, Deserialize, Message)]
#[message(result = Vec<String>, remote = "orchestrator::ListDir")]
struct ListDir {
    path: String,
}

Then reference it in a handler with the msg parameter name:

#[handlers]
impl StorageAgent {
    #[handler]
    fn list_dir(
        &mut self,
        _ctx: &ActorContext<Self>,
        state: &mut StorageState,
        msg: ListDir,
    ) -> Vec<String> {
        state.dirs.get(&msg.path).cloned().unwrap_or_default()
    }
}

The remote = "..." attribute is optional — omit it for local-only messages that don’t need wire serialization.

Async handlers

Handlers that need to await use async fn:

#[handlers]
impl MyActor {
    #[handler]
    async fn fetch_data(
        &mut self,
        ctx: &ActorContext<Self>,
        state: &mut MyState,
        url: String,
    ) -> Vec<u8> {
        some_async_call(&url).await
    }
}

This generates an AsyncHandler<FetchData> implementation instead of Handler<FetchData>.

Endpoints

Endpoints are opaque handles to actors used to send messages and receive responses. The Endpoint<A> type abstracts where the actor lives — whether local or remote — and handles serialization in the background.

// Start returns a typed endpoint
let counter = system.start("counter/main", Counter, CounterState { count: 0 });

// Send via auto-generated extension trait
let result = counter.increment(5).await.unwrap();

// Or send a message struct directly
let result = counter.send(Increment { amount: 5 }).await.unwrap();

Location transparency

The key design principle: Endpoint<A> hides whether the actor is local or remote.

Endpoint‹A› forking into local (zero-copy) and remote (QUIC) paths

Local actors use the envelope pattern — zero serialization cost, direct in-memory dispatch through a type-erased trait object.
Remote actors serialize messages with bincode, send them over QUIC streams, and deserialize responses on return.

The caller’s code is identical in both cases:

let result = endpoint.send(Increment { amount: 5 }).await?;

How endpoints work under the hood

Under the hood, an endpoint wraps a tokio channel that sends messages to the actor’s supervisor:

Local endpoint: The channel connects directly to the actor’s supervisor (mpsc). Messages are dispatched as type-erased envelopes with zero serialization cost.
Remote endpoint: The channel connects to a proxy supervisor that serializes the message in bincode format, sends it through a QUIC stream, awaits the response, deserializes it, and returns it to the caller.

Endpoints are lightweight and Clone — share them freely across tasks.

Actor watches

Erlang-style actor monitoring — get notified when a watched actor terminates:

impl Actor for Monitor {
    type State = MonitorState;

    fn on_actor_terminated(
        &mut self,
        state: &mut MonitorState,
        terminated: &ActorTerminated,
    ) {
        match &terminated.reason {
            TerminationReason::Panicked(msg) => {
                tracing::error!("{} panicked: {}", terminated.label, msg);
            }
            _ => {}
        }
    }
}

#[handlers]
impl Monitor {
    #[handler]
    fn watch_actor(
        &mut self,
        ctx: &ActorContext<Self>,
        _state: &mut MonitorState,
        label: String,
    ) {
        ctx.watch(&label);
    }
}

The ActorContext provides the watch() method, and termination notifications arrive via the on_actor_terminated callback on the Actor trait.

Discovery

Murmer provides a unified actor discovery system through the Receptionist — a type-erased registry that handles registration, lookup, and subscription for both local and remote actors.

Labels

Actors are identified by path-like labels: "cache/user", "worker/0", "thumbnail/processor/3".

let ep = system.start("service/auth", AuthActor, AuthState::new());
let ep = system.lookup::<AuthActor>("service/auth").unwrap();

Labels serve as the primary routing key in the system:

No two actors can have the same label in the clustered system.
The path structure ("group/subgroup/instance") enables hierarchical organization of actors.
Wildcards can be used in subscriptions on the receptionist (e.g., subscribe to all actors under "worker/*").
Labels are only used for actor discovery and routing — they are a feature of the receptionist, not intrinsic to actors themselves.

The Receptionist

The Receptionist is a special internal actor started with every actor system. It manages the lookup and registration of both local and remote actors.

When an actor is started, it registers itself with the receptionist. The receptionist maintains a mapping of labels and actor types to their corresponding endpoints. When you want to communicate with an actor, you query the receptionist with the actor type and label to receive an endpoint.

Typed lookup

Lookups are type-safe — you specify the actor type and get back a typed endpoint:

// Returns Option<Endpoint<AuthActor>>
let endpoint = system.lookup::<AuthActor>("service/auth");

If the label doesn’t exist or the type doesn’t match, None is returned. For remote actors, the receptionist returns an endpoint that transparently handles connection management when accessed.

State observation

When an actor is registered, the receptionist becomes an observer of its state. If the actor stops or crashes, the receptionist is notified and removes the actor from its registry. This ensures the receptionist always has an up-to-date view of the actors in the system.

If an actor restarts (via supervision), the receptionist is notified but the actor remains available — the state is non-negative (alive or restarting, not dead).

Reception keys and listings

Group actors by type and subscribe to changes:

let worker_key = ReceptionKey::<Worker>::new("workers");

// Check actors into the group
receptionist.check_in("worker/0", worker_key.clone());
receptionist.check_in("worker/1", worker_key.clone());

// Subscribe — get existing actors immediately + live updates
let mut listing = receptionist.listing(worker_key);
while let Some(endpoint) = listing.next().await {
    endpoint.send(Work { task: "process".into() }).await?;
}

A Listing<A> is an async stream that yields endpoints as actors register and deregister against a ReceptionKey. It provides both backfill (existing actors) and live updates (new registrations), making it ideal for dynamic pool management.

Lifecycle events

Subscribe to all actor registrations and deregistrations across the system:

let mut events = receptionist.subscribe_events();
while let Some(event) = events.recv().await {
    match event {
        ActorEvent::Registered { label, actor_type } => { /* ... */ }
        ActorEvent::Deregistered { label, actor_type } => { /* ... */ }
    }
}

Other actors can subscribe to the receptionist to receive notifications about actors being added, removed, or updated. Subscriptions can be broad (all actor updates, useful for clustering) or specific (a particular actor type or label).

Routing

Distribute messages across actor pools:

let router = Router::new(
    vec![ep1, ep2, ep3],
    RoutingStrategy::RoundRobin,
);

// Each send goes to the next endpoint in sequence
router.send(Increment { amount: 1 }).await?;

// Or broadcast to all
let results = router.broadcast(GetCount).await;

The Router<A> takes a set of endpoints and a RoutingStrategy to distribute messages. Current strategies include round-robin and broadcast.

How discovery works across nodes

When running in clustered mode, the receptionist automatically synchronizes actor registrations across nodes:

Cross-node discovery: register → replicate → lookup → proxy endpoint → QUIC

A local actor registers with its node’s receptionist.
The node broadcasts an ActorAdd notification to all connected peers.
Remote nodes register the actor in their local receptionists with a lazy endpoint factory.
When a client looks up the remote actor, the endpoint factory creates a proxy that handles network transport.
If the actor’s node fails, all remote registrations are cleaned up automatically.

This means system.lookup::<MyActor>("some/label") works identically whether the actor is local or on a remote node — the receptionist handles the difference transparently.

Supervision

Every actor in murmer runs inside a supervisor. The supervisor manages the actor’s lifecycle — starting it, processing its mailbox, and restarting it when things go wrong. This model is directly inspired by Erlang/OTP’s supervision trees, adapted for Rust’s ownership and type system.

How supervisors work

Each actor gets its own supervisor. The supervisor is responsible for:

Starting the actor and registering it with the receptionist.
Mailbox processing — ingesting messages and passing them to the actor’s handlers in order of arrival.
Crash detection — catching panics and deciding what to do next based on the restart policy.
Restarting the actor using a factory if the policy allows it.
State notifications — informing the receptionist of state changes (started, stopped, dead, etc.).
Context — providing the actor with access to the system, receptionist, and other actors via ActorContext.

Supervisors are flat — there is no parent-child hierarchy between actors. Each actor is independent and can be stopped or restarted without affecting others.

Actor lifecycle

The supervisor manages an actor through a well-defined set of states:

Actor lifecycle: Starting → Running → Crashed/Stopped → Restarting or Dead

Restart policies

Actors can be started with restart policies that control behavior on failure:

Policy	Restart on panic?	Restart on clean stop?
`Temporary`	No	No
`Transient`	Yes	No
`Permanent`	Yes	Yes

Temporary — the actor runs once. If it panics or stops, it’s gone. This is the default.
Transient — the actor restarts if it panics, but a clean shutdown is respected. Use this for actors that should survive crashes but can be intentionally stopped.
Permanent — the actor always restarts, whether it panicked or stopped cleanly. Use this for critical services that must always be running.

Configuration

To use restart policies, you provide an ActorFactory (which knows how to create fresh instances) and a RestartConfig:

use murmer::{RestartPolicy, RestartConfig, BackoffConfig, ActorFactory};
use std::time::Duration;

struct MyFactory;
impl ActorFactory for MyFactory {
    type Actor = Counter;
    fn create(&mut self) -> (Counter, CounterState) {
        (Counter, CounterState { count: 0 })
    }
}

let endpoint = receptionist.start_with_config(
    "counter/resilient",
    MyFactory,
    RestartConfig {
        policy: RestartPolicy::Permanent,  // Always restart
        max_restarts: 5,                   // Max 5 restarts...
        window: Duration::from_secs(60),   // ...within 60 seconds
        backoff: BackoffConfig {
            initial: Duration::from_millis(100),
            max: Duration::from_secs(30),
            multiplier: 2.0,
        },
    },
);

Restart limits

The max_restarts and window fields prevent infinite restart loops. If the actor exceeds the restart limit within the time window, the supervisor gives up and the actor is permanently stopped. This prevents a persistent bug from consuming all your resources.

Exponential backoff

The BackoffConfig controls the delay between restarts:

initial — delay before the first restart attempt.
max — maximum delay (the backoff caps here).
multiplier — each subsequent restart delay is multiplied by this factor.

For example, with initial: 100ms, max: 30s, multiplier: 2.0, restarts happen at 100ms, 200ms, 400ms, 800ms, … up to 30s.

Actor factories

The ActorFactory trait gives the supervisor a way to create fresh actor instances for restarts:

trait ActorFactory {
    type Actor: Actor;
    fn create(&mut self) -> (Self::Actor, <Self::Actor as Actor>::State);
}

The factory is called each time the supervisor needs a new instance. It can carry its own state if needed — for example, incrementing a generation counter or loading configuration from disk.

Interaction with the receptionist

When a supervised actor restarts:

The old actor instance is dropped.
The supervisor creates a new instance via the factory.
The new instance is registered with the receptionist under the same label.
Any actors watching the old instance receive a termination notification, but the label remains routable.

This means endpoints held by other actors remain valid through restarts — messages sent during the brief restart window are queued in the supervisor’s mailbox and delivered to the new instance.

Watching actors

Any actor can monitor another actor for termination using ctx.watch(). When the watched actor terminates (for any reason), the watcher receives an ActorTerminated notification via on_actor_terminated.

impl Actor for Supervisor {
    type State = SupervisorState;

    fn on_actor_terminated(&mut self, state: &mut SupervisorState, event: &ActorTerminated) {
        tracing::warn!("Actor {} terminated: {:?}", event.label, event.reason);
    }
}

#[handlers]
impl Supervisor {
    #[handler]
    fn start_worker(&mut self, ctx: &ActorContext<Self>, state: &mut SupervisorState) {
        ctx.receptionist().start("worker/0", MyWorker, WorkerState::default());
        ctx.watch("worker/0");
    }
}

Watches are one-shot: they fire once when the watched actor terminates and are not re-armed.

Erlang semantics: If ctx.watch() is called for an actor that doesn’t exist, on_actor_terminated fires immediately. This avoids the race condition where the actor dies between the lookup and the watch.

Tagged watches

When a supervisor manages multiple children with different roles, parsing label strings in on_actor_terminated is fragile. Use ctx.watch_with_tag() to attach a role tag that is delivered alongside the termination notification:

fn start_children(&mut self, ctx: &ActorContext<Self>, state: &mut SupervisorState) {
    let r = ctx.receptionist();

    state.writer = Some(r.start("writer/0", WriterActor, WriterState::default()));
    ctx.watch_with_tag("writer/0", "writer");

    state.reader = Some(r.start("reader/0", ReaderActor, ReaderState::default()));
    ctx.watch_with_tag("reader/0", "reader");
}

fn on_actor_terminated(&mut self, state: &mut SupervisorState, event: &ActorTerminated) {
    match event.tag.as_deref() {
        Some("writer") => {
            tracing::error!("writer died — restarting");
            state.writer = None;
        }
        Some("reader") => {
            tracing::warn!("reader died — clearing slot");
            state.reader = None;
        }
        _ => {}
    }
}

Plain ctx.watch() delivers ActorTerminated with tag: None — fully backward compatible.

Scheduling

Actors can send delayed or periodic messages to themselves using the scheduling API. The returned ScheduleHandle auto-cancels when dropped.

schedule_once

Send a message to the actor after a delay:

fn handle(&self, ctx: &ActorContext<Self>, state: &mut MyState, _msg: StartTimeout) -> () {
    state.timeout = Some(ctx.schedule_once(Duration::from_secs(30), TimeoutExpired));
    // Drop state.timeout to cancel before it fires.
}

schedule_repeat

Send a message to the actor on a repeating interval:

fn on_start(&self, ctx: &ActorContext<Self>, state: &mut MyState) {
    state.heartbeat = Some(ctx.schedule_repeat(Duration::from_secs(60), Heartbeat));
}

The first tick fires after one full interval elapses (not immediately). Missed ticks are skipped — if the actor is slow, it receives one tick per interval boundary rather than a burst of catch-up ticks.

ScheduleHandle

ScheduleHandle is returned by both scheduling methods. Hold it in actor state to keep the schedule alive. Drop it (or call .cancel()) to stop the timer:

struct MyState {
    maintenance: Option<ScheduleHandle>, // Some = running, None = cancelled
}

// Cancel from a handler:
fn handle(&self, _ctx: &ActorContext<Self>, state: &mut MyState, _msg: StopMaintenance) -> () {
    state.maintenance = None; // drop cancels the timer
}

When the actor stops, all ScheduleHandles in its state are dropped and their timers are cancelled automatically.

Clustering

One of murmer’s core design goals is that your actor code doesn’t change when you go from a single process to a multi-node cluster. The same Endpoint<A> API works in both cases.

Step 1: Run everything locally

Create a System::local() — no networking, no config. Your actors communicate through in-memory channels with zero serialization cost:

use murmer::prelude::*;

let system = System::local();

let room = system.start("room/general", ChatRoom, ChatRoomState {
    room_name: "general".into(),
    messages: vec![],
});

// Send messages via extension trait — works instantly
room.post_message("alice".into(), "Hello!".into()).await?;

// Look up actors by label
let ep = system.lookup::<ChatRoom>("room/general").unwrap();
let history = ep.get_history().await?;

Step 2: Go distributed

When you’re ready for real networking, swap System::local() for System::clustered(). Your actor code stays identical — only the system construction changes:

use murmer::prelude::*;
use murmer::cluster::config::ClusterConfig;

let config = ClusterConfig::builder()
    .name("alpha")
    .listen("0.0.0.0:7100".parse()?)
    .advertise("192.168.1.5:7100".parse()?)
    .cookie("my-cluster-secret")
    .seed_nodes(["192.168.1.1:7100".parse()?])
    .build()?;

// clustered_auto() discovers all #[handlers]-annotated actor types automatically
let system = System::clustered_auto(config).await?;

// Same API as local — start, lookup, send
let room = system.start("room/alpha", ChatRoom, state);
room.post_message("alice".into(), "Hello!".into()).await?;

// Actors on other nodes appear automatically via registry replication
let remote_room = system.lookup::<ChatRoom>("room/beta").unwrap();
remote_room.get_history().await?;  // transparently serialized over QUIC

Each node gets a single QUIC connection to every peer, multiplexed over per-actor streams. The OpLog replication protocol uses version vectors for efficient, idempotent sync.

Step 3: Test it interactively

The cluster_chat example lets you try both modes with an interactive CLI:

# Local mode — all actors in one process
cargo run -p murmer-examples --bin cluster_chat -- --local

=== murmer cluster_chat (local mode) ===
  Started room: #general
  Started room: #random

> post general alice Hello everyone!
  [1 messages in #general]
> post general bob Hey alice!
  [2 messages in #general]
> history general
  --- #general ---
  alice: Hello everyone!
  bob: Hey alice!
> rooms
  Known rooms:
    #general — 2 messages
    #random — 0 messages

Same binary, same commands — just add cluster config:

# Terminal 1 — seed node
cargo run -p murmer-examples --bin cluster_chat -- --node alpha --port 7100

# Terminal 2 — joins via seed
cargo run -p murmer-examples --bin cluster_chat -- --node beta --port 7200 --seed 127.0.0.1:7100

Step 4: Deploy with Docker

The docker-compose.yml in the repo runs a 3-node cluster:

docker compose up --build

This starts three containers — alpha, beta, and gamma — each running the cluster_chat example:

services:
  alpha:
    build: .
    command: ["--node", "alpha", "--port", "7100"]

  beta:
    build: .
    command: ["--node", "beta", "--port", "7100", "--seed", "alpha:7100"]

  gamma:
    build: .
    command: ["--node", "gamma", "--port", "7100", "--seed", "alpha:7100"]

Beta and gamma seed from alpha and automatically mesh together.

How clustering works

Cluster mesh formation: seed → handshake → gossip → full mesh

Auto-discovery

When an actor system starts in clustered mode, it runs a server that listens for incoming connections. New nodes connect to existing ones via seed nodes and begin exchanging information about their actors. Nodes can be configured to gossip this information, allowing the network to mesh together organically.

Networking layer

The networking layer is built on QUIC (via the quinn crate) and SWIM (via the foca crate):

QUIC provides a reliable, low-latency transport with built-in TLS encryption. Each node pair shares a single QUIC connection, multiplexed over per-actor streams.
SWIM handles cluster membership — failure detection, protocol-level heartbeats, and member state dissemination.
mDNS provides optional zero-configuration discovery for LAN environments.

Stream architecture

QUIC stream multiplexing: Node Alpha → per-actor streams → Node Beta

When a remote actor’s endpoint is accessed:

A dedicated QUIC stream is opened to the remote node for that actor.
The stream stays open as long as it’s active (not idle).
On the receiving end, a stream handler deserializes incoming messages, looks up the target actor via the receptionist, and forwards them.
Each stream binds to a single actor — messages for other actors result in an error and stream closure.
The stream subscribes to the actor’s lifecycle via the receptionist. If the actor enters a negative state (stopped, dead), the stream closes with an error.

An actor on a node might have multiple inbound streams, but the mailbox system ensures messages are processed in order of arrival.

Registry replication

Actor registrations are replicated across the cluster using an OpLog with version vectors:

When a local actor registers, its node broadcasts an ActorAdd operation to all peers.
Remote nodes create lazy endpoint factories in their local receptionists.
Version vectors ensure operations are idempotent and ordering is preserved.
When a node leaves, its registrations are pruned from all other nodes.

This gives every node an eventually consistent view of the entire cluster’s actor topology.

Edge Clients

Not everything that talks to a murmer cluster needs to be a cluster member. A REST API gateway, a CLI tool, a monitoring dashboard, or an integration test runner just needs to call actors — it doesn’t need to run any, participate in SWIM gossip, or store a registry.

Edge clients fill that role. A MurmerClient connects to any cluster node via QUIC, pulls the set of public actors, and exposes the same Endpoint<A> API you already know — without any of the cluster machinery behind it.

use murmer::MurmerClient;
use std::time::Duration;

let client = MurmerClient::connect("10.0.0.5:9000".parse()?, "cluster-cookie").await?;
let ep = client.lookup::<UserService>("api/users").unwrap();
let user = ep.send(GetUser { id: 42 }).await?;
client.disconnect().await;

Visibility: controlling what Edge clients see

Every actor has a visibility that controls who can discover it. You set this at startup time on the server side:

// Public — visible to Edge clients and all cluster members
let api = system.start_public("api/users", UserService, state);

// Internal — visible to cluster members only (default)
let router = system.start("routing/shard-0", ShardRouter, state);

// Private — node-local only, never replicated
let metrics = system.start_private("node/metrics", MetricsCollector, state);

Visibility	Edge clients	Cluster members	Replicated via OpLog
`Public`	✓	✓	✓
`Internal` (default)	✗	✓	✓
`Private`	✗	✗	✗

Private is a zero-overhead choice: the actor is never written to the OpLog, never serialized, and never sent over the wire. Use it for utility actors that are purely node-local implementation details — connection managers, per-node caches, local metrics collectors.

Connecting

Edge clients connect to any node in the cluster. The node you connect to acts as your sync source — it responds to pull requests with the current set of public actors.

// Short-lived: connect, call, disconnect
let client = MurmerClient::connect(
    "10.0.0.5:9000".parse()?,
    "cluster-cookie",
).await?;

The cluster cookie must match the server’s cookie or the handshake will be rejected.

Requirements

The server must be started with System::clustered() — Edge clients connect via QUIC and need a listening endpoint. System::local() has no network layer and cannot accept Edge client connections.

Looking up actors

`lookup` — instant, returns None if not synced yet

if let Some(ep) = client.lookup::<UserService>("api/users") {
    let user = ep.send(GetUser { id: 42 }).await?;
}

Returns None if the actor hasn’t been synced to the client yet. Use this after an initial sync has had time to complete.

`lookup_wait` — blocks until the actor appears

let ep = client
    .lookup_wait::<UserService>("api/users", Duration::from_secs(5))
    .await?;

Triggers an immediate pull, then waits for the actor to appear — either from that pull’s response or a subsequent one. Re-polls the server every 500 ms. Returns ClusterError::Timeout if the actor doesn’t appear within the deadline.

Fast path: if the actor is already synced, lookup_wait returns after one pull round-trip (typically sub-millisecond on LAN).

Usage patterns

Pattern 1: Short-lived client (pull once)

Ideal for CLI tools, integration tests, and one-off queries. Pulls on connect, uses the snapshot, disconnects.

let client = MurmerClient::connect(addr, cookie).await?;

// Give the initial pull a moment to arrive
tokio::time::sleep(Duration::from_millis(50)).await;

let ep = client.lookup::<UserService>("api/users").unwrap();
let result = ep.send(GetUser { id: 1 }).await?;

client.disconnect().await;

Pattern 2: Long-lived gateway (periodic pull)

Ideal for REST/gRPC API servers, dashboards, and proxies. Use sync_interval to re-pull periodically and pick up new actor registrations as the cluster evolves.

use murmer::ClientOptions;

let client = MurmerClient::connect_with_options(
    addr,
    cookie.into(),
    ClientOptions {
        sync_interval: Some(Duration::from_secs(30)),
        ..Default::default()
    },
).await?;

// client.lookup() stays fresh — re-pulled every 30 seconds

Pattern 3: Wait for a specific actor

Useful when you connect before the target actor is registered — for example, a gateway that starts before the cluster has finished placing its actors.

let ep = client
    .lookup_wait::<PaymentService>("payments/processor", Duration::from_secs(10))
    .await?;

How sync works

Edge clients use pull-based sync — the server never pushes unsolicited updates. The client sends a RegistrySyncRequest with its current version vector; the server responds with only the delta (new public actor registrations since that version).

Edge client                     Cluster node
    │                                │
    │── RegistrySyncRequest(vv) ────▶│
    │◀── RegistrySync(delta ops) ────│
    │                                │
    │  ... time passes ...           │
    │                                │
    │── RegistrySyncRequest(vv') ───▶│  (periodic or lookup_wait re-poll)
    │◀── RegistrySync(new ops) ──────│

After the first sync, subsequent pulls return only the delta — O(new ops), not O(all ops). 1000 idle Edge clients add near-zero server overhead: no SWIM membership, no server-initiated fan-out, no per-client state.

Scalability characteristics

Property	Behavior
SWIM membership	Edge clients are not added to SWIM — no failure detection overhead
Server-initiated sync	Skipped for Edge clients — they pull on their own schedule
Disconnect	Silent — no cluster alarm, no actor pruning, no SWIM event
Server state per client	None — the server is stateless with respect to each Edge client
Wire overhead (idle)	Zero — the server never initiates contact

Full example

The edge_client example demonstrates all three patterns:

cargo test -p murmer-examples --bin edge_client

It covers:

Public actors visible to Edge clients, internal actors hidden
lookup_wait resolving when an actor registers after connect
Long-lived client with periodic sync_interval

Monitoring

Murmer provides production-grade metrics for your actor systems via the monitor feature. It follows a facade pattern — your actor code records metrics through thin instrumentation calls, and you choose the backend (Prometheus, StatsD, etc.) at startup.

Enabling monitoring

Add the monitor feature to your Cargo.toml:

[dependencies]
murmer = { version = "0.3", features = ["monitor"] }

When the feature is off, all instrumentation compiles to nothing — zero overhead, zero dependencies.

Quick start with Prometheus

One call installs the metrics recorder and starts an HTTP endpoint for scraping:

use murmer::monitor::start_prometheus;

#[tokio::main]
async fn main() {
    // Start serving metrics on :9000/metrics
    start_prometheus(9000).expect("failed to start prometheus exporter");

    // Now start your actor system as usual
    let system = System::local();
    let counter = system.start("counter/main", Counter, CounterState { count: 0 });

    // Every message send, handler invocation, and lifecycle event
    // is automatically recorded. Scrape with:
    //   curl http://localhost:9000/metrics
}

What gets measured

Murmer instruments five categories of metrics automatically. You don’t need to add any code — everything is recorded as actors run.

Actor lifecycle

Metric	Type	Labels	Description
`murmer_actors_active`	gauge	`actor_type`	Currently running actors
`murmer_actors_started_total`	counter	`actor_type`	Total actors started
`murmer_actors_stopped_total`	counter	`actor_type`, `reason`	Total actors stopped (reason: `stopped`, `panicked`, `restart_limit_exceeded`)
`murmer_actors_restarts_total`	counter	`actor_type`	Total actor restarts
`murmer_actors_restart_limit_exceeded_total`	counter	`actor_type`	Times restart limits were hit

Message processing

Metric	Type	Labels	Description
`murmer_messages_processed_total`	counter	`actor_type`	Messages successfully handled
`murmer_messages_failed_total`	counter	`actor_type`	Messages that caused a panic
`murmer_message_processing_duration_seconds`	histogram	`actor_type`	Handler execution time

Endpoint sends

Metric	Type	Labels	Description
`murmer_sends_total`	counter	`actor_type`, `locality`	Total sends (`local` or `remote`)
`murmer_send_errors_total`	counter	`actor_type`, `error_kind`	Send failures by error type
`murmer_network_roundtrip_duration_seconds`	histogram	`actor_type`	End-to-end remote call latency

Networking

Metric	Type	Description
`murmer_network_connections_active`	gauge	Active QUIC connections to peer nodes
`murmer_network_streams_active`	gauge	Active QUIC streams for actor messaging
`murmer_network_bytes_sent_total`	counter	Total bytes sent over actor streams
`murmer_network_bytes_received_total`	counter	Total bytes received over actor streams
`murmer_network_inflight_calls`	gauge	In-flight remote calls awaiting responses
`murmer_network_dead_letters_total`	counter	Failed in-flight calls (connection lost)

Cluster membership

Metric	Type	Labels	Description
`murmer_cluster_nodes`	gauge	`status`	Number of nodes in the cluster
`murmer_cluster_membership_changes_total`	counter	`event_type`	Membership events (`joined`, `failed`, `left`)

Spawn drain loop

Metric	Type	Labels	Description
`murmer_spawn_drain_dispatch_seconds`	histogram		Time from enqueue to drain-loop dequeue (dispatch latency)
`murmer_spawn_drain_factory_seconds`	histogram	`locality`	Wall-clock time for a spawn factory (`local`) or `send_control` (`remote`) to complete
`murmer_spawn_drain_queue_depth`	gauge		Number of spawn requests currently pending in the drain-loop channel

Receptionist

Metric	Type	Description
`murmer_receptionist_lookups_total`	counter	Actor lookups
`murmer_receptionist_registrations_total`	counter	Actor registrations
`murmer_receptionist_deregistrations_total`	counter	Actor deregistrations

Label cardinality

All actor metrics use actor_type (the Rust type name, e.g. my_app::ChatRoom) rather than actor_label (e.g. "room/general"). This keeps cardinality bounded — you typically have fewer than 20 actor types, but could have thousands of labels.

Architecture: how it works

The instrumentation uses a facade pattern inspired by how tracing works:

instrument.rs (always compiled) — Contains thin pub(crate) functions like instrument::message_processed(actor_type). When monitor is on, these call metrics::counter!(...). When off, they’re empty #[inline(always)] functions that the compiler eliminates entirely.
Call sites (supervisor, receptionist, endpoint, etc.) — Call instrument functions unconditionally. No #[cfg] attributes scattered across the codebase.
Your application — Installs a metrics recorder at startup (e.g., start_prometheus(9000)). All recorded metrics flow to the backend you chose.

This means adding a new metric requires touching exactly two places: the instrument function and the call site. The facade keeps the #[cfg] logic in one file.

ClusterMonitor actor

In addition to Prometheus metrics, murmer provides a ClusterMonitor actor that maintains a queryable in-memory view of cluster health:

use murmer::monitor::{ClusterMonitor, ClusterMonitorState, run_monitor_bridge, GetClusterHealth};

// Start the monitor actor
let monitor = system.start("murmer/monitor", ClusterMonitor, ClusterMonitorState::new());

// Bridge cluster events into the monitor
tokio::spawn(run_monitor_bridge(&cluster_system, monitor.clone()));

// Query health at any time
let health = monitor.send(GetClusterHealth).await?;
println!("Alive: {}, Joins: {}, Failures: {}",
    health.alive_nodes, health.total_joins, health.total_failures);

The ClusterMonitor tracks:

Which nodes are alive and when they joined
Cumulative counters for joins, failures, and departures
Per-node uptime

Grafana dashboard

A typical Prometheus + Grafana setup might query:

# Message throughput by actor type
rate(murmer_messages_processed_total[5m])

# 99th percentile handler latency
histogram_quantile(0.99, rate(murmer_message_processing_duration_seconds_bucket[5m]))

# Actor crash rate
rate(murmer_messages_failed_total[5m])

# Remote call latency
histogram_quantile(0.95, rate(murmer_network_roundtrip_duration_seconds_bucket[5m]))

# Cluster size over time
murmer_cluster_nodes{status="active"}

# Spawn dispatch latency (time requests wait in the drain queue)
histogram_quantile(0.95, rate(murmer_spawn_drain_dispatch_seconds_bucket[5m]))

# Spawn factory execution time by locality
histogram_quantile(0.99, rate(murmer_spawn_drain_factory_seconds_bucket{locality="local"}[5m]))

# Current spawn queue backpressure
murmer_spawn_drain_queue_depth

Proc Macro Reference

Murmer provides two proc macros to reduce boilerplate when defining actors: #[handlers] + #[handler] for handler generation, and #[derive(Message)] for explicit message types.

`#[handlers]` + `#[handler]`

Place #[handlers] on an impl block containing actor message handlers. Mark each handler method with #[handler].

Auto-generated messages (recommended)

#[handlers]
impl MyActor {
    #[handler]
    fn do_thing(
        &mut self,
        ctx: &ActorContext<Self>,
        state: &mut MyState,
        name: String,
        count: u32,
    ) -> String {
        format!("{name}: {count}")
    }

    #[handler]
    fn get_status(
        &mut self,
        _ctx: &ActorContext<Self>,
        state: &mut MyState,
    ) -> bool {
        state.is_active
    }

    #[handler]
    async fn fetch_data(
        &mut self,
        ctx: &ActorContext<Self>,
        state: &mut MyState,
        url: String,
    ) -> Vec<u8> {
        some_async_call(&url).await
    }
}

What gets generated

From the above, the macro produces:

Message structs — method name converted to PascalCase, parameters after ctx and state become fields:

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct DoThing { pub name: String, pub count: u32 }

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct GetStatus;  // no extra params → unit struct

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct FetchData { pub url: String }

Message + RemoteMessage impls — each struct implements Message (with the handler’s return type as Result) and RemoteMessage (with a TYPE_ID of "ActorName::method_name"):

impl Message for DoThing { type Result = String; }
impl RemoteMessage for DoThing { const TYPE_ID: &'static str = "MyActor::do_thing"; }

Handler / AsyncHandler impls — dispatches message fields as method arguments:

impl Handler<DoThing> for MyActor {
    fn handle(&mut self, ctx: &ActorContext<Self>, state: &mut MyState, message: DoThing) -> String {
        self.do_thing(ctx, state, message.name, message.count)
    }
}

// async fn → AsyncHandler
impl AsyncHandler<FetchData> for MyActor { /* ... */ }

RemoteDispatch — a wire-format dispatch table that routes serialized messages by their TYPE_ID. This enables cross-node delivery without the sender knowing the concrete handler:

impl RemoteDispatch for MyActor {
    async fn dispatch_remote(&mut self, ctx, state, message_type: &str, payload: &[u8])
        -> Result<Vec<u8>, DispatchError>
    {
        match message_type {
            "MyActor::do_thing" => { /* deserialize DoThing, call handler, serialize result */ }
            "MyActor::get_status" => { /* ... */ }
            "MyActor::fetch_data" => { /* ... */ }
            _ => Err(DispatchError::UnknownMessageType(..))
        }
    }
}

Extension trait — ergonomic methods directly on Endpoint<MyActor>:

pub trait MyActorExt {
    fn do_thing(&self, name: String, count: u32) -> impl Future<Output = Result<String, SendError>>;
    fn get_status(&self) -> impl Future<Output = Result<bool, SendError>>;
    fn fetch_data(&self, url: String) -> impl Future<Output = Result<Vec<u8>, SendError>>;
}

impl MyActorExt for Endpoint<MyActor> { /* ... */ }

This lets you call handlers directly:

let result = endpoint.do_thing("hello".into(), 42).await?;
let status = endpoint.get_status().await?;
let data = endpoint.fetch_data("https://...".into()).await?;

Auto-registration — a linkme distributed slice entry for the TypeRegistry. At cluster startup, TypeRegistry::from_auto() collects all #[handlers]-annotated actor types automatically, enabling the cluster to route messages to the correct deserializer without manual registration.

Handler method signature

Each #[handler] method must follow this pattern:

fn method_name(&mut self, ctx: &ActorContext<Self>, state: &mut State, ...params) -> ReturnType

&mut self — the actor instance.
ctx: &ActorContext<Self> — provides access to the system, receptionist, and lifecycle operations like watch().
state: &mut State — the actor’s mutable state.
Remaining parameters become message struct fields.
Use async fn for handlers that need to .await.

Explicit messages (backward compatible)

For messages shared across multiple actors, name the last parameter msg (or _msg) and the macro will use the type directly instead of generating a struct:

#[handlers]
impl MyActor {
    #[handler]
    fn increment(
        &mut self,
        ctx: &ActorContext<Self>,
        state: &mut MyState,
        msg: Increment,
    ) -> i64 {
        state.count += msg.amount;
        state.count
    }
}

Here Increment must already exist and implement Message. The extension trait method will take the message as a parameter: endpoint.increment(msg).

`#[derive(Message)]`

Derives Message (and optionally RemoteMessage) for a struct or enum.

Basic usage (local only)

#[derive(Debug, Clone, Serialize, Deserialize, Message)]
#[message(result = i64)]
struct Increment { amount: i64 }

This implements Message with type Result = i64.

With remote support

Add remote = "..." to also implement RemoteMessage with a wire-stable type ID:

#[derive(Debug, Clone, Serialize, Deserialize, Message)]
#[message(result = i64, remote = "counter::Increment")]
struct Increment { amount: i64 }

The TYPE_ID string is used for wire dispatch — it must be unique across all message types in the cluster and stable across code changes (don’t use std::any::type_name which can change between compiler versions).

Attributes

Attribute	Required	Description
`result = Type`	Yes	The response type for this message
`remote = "id"`	No	Wire-stable type ID for `RemoteMessage`

Application Orchestration

Murmer’s core gives you actors, messages, and clustering primitives. The app module (enabled via the app feature flag) builds on top of these to provide the application-level abstractions you need for real, working distributed applications: placement strategies, leader election, crash recovery, and a Coordinator actor that ties them all together.

Think of it this way: murmer gives you the building blocks, and the app module helps you assemble them into a running system that manages actor lifecycles across a cluster automatically.

Enable the feature in your Cargo.toml:

[dependencies]
murmer = { version = "0.3", features = ["app"] }

Overview

Orchestration flow: submit spec → coordinate → place → spawn → crash recovery

The orchestration layer answers three questions:

Where should this actor run? — Placement strategies score nodes based on load, capabilities, metadata, and constraints.
Who decides? — Leader election picks one node to run the Coordinator, which makes all placement decisions.
What happens when a node fails? — Crash strategies define recovery behavior: redistribute immediately, wait for the node to return, or let the actor die.

Actor specifications

An ActorSpec describes an actor that the orchestrator should place and manage. It captures what to run, how to recover from crashes, and where to place it.

use murmer::app::spec::{ActorSpec, CrashStrategy, PlacementConstraints};
use murmer::cluster::config::NodeClass;
use std::time::Duration;

let spec = ActorSpec::new("storage/photos", "orchestrator::StorageAgent")
    .with_state(serialized_state)
    .with_crash_strategy(CrashStrategy::WaitForReturn(Duration::from_secs(30)))
    .with_constraints(PlacementConstraints {
        required_classes: vec![NodeClass::Worker],
        required_metadata: [("volume".into(), "photos".into())].into(),
        ..Default::default()
    });

Fields

Field	Purpose
`label`	Actor label (e.g., `"storage/photos"`) — unique across the cluster
`actor_type_name`	Key into the `SpawnRegistry` — identifies what type of actor to create
`initial_state`	Serialized initial state (bincode bytes) sent to the target node
`crash_strategy`	What to do when the hosting node fails
`placement`	Constraints that filter which nodes are eligible

Crash strategies

Strategy	Behavior
`Redistribute`	Move to another eligible node immediately (default)
`WaitForReturn(Duration)`	Wait for the failed node to rejoin; fall back to `Redistribute` on timeout
`Abandon`	Let the actor die with the node — no recovery

Placement constraints

Constraints filter eligible nodes before the placement strategy scores them:

PlacementConstraints {
    required_classes: vec![NodeClass::Worker],        // must be a Worker node
    required_metadata: [("gpu".into(), "true".into())].into(), // must have gpu=true
    anti_affinity_labels: vec!["db/primary".into()],  // repel: don't co-locate with this actor
    colocate_with: Some("writer/c1".into()),          // attract: only where "writer/c1" already runs
    required_node_id: None,                            // or hard-pin to exactly one node id
    ..Default::default()
}

required_classes — empty means any class is acceptable.
required_metadata — the node must have all specified key-value pairs.
anti_affinity_labels — avoid placing on nodes already running these actors (repel).
colocate_with — place only on a node already running the named anchor actor (attract — the inverse of anti-affinity). Use it to pin a member next to a partner (e.g. a reader next to its writer). Because it is a hard filter, if no node runs the anchor the spec gets NoEligibleNodes rather than landing elsewhere.
required_node_id — hard-pin to exactly one node id; if that node is not alive the spec gets NoEligibleNodes (it is never silently relocated — unlike the soft Pinned strategy).

colocate_with and required_node_id are hard filters, not preferences — they are how you build co-located actor groups and node-pinned placement on top of the Coordinator.

Placement strategies

The PlacementStrategy trait defines a fitness function that scores nodes for hosting a given actor spec. The Coordinator evaluates all eligible nodes (after constraint filtering) and picks the highest scorer.

trait PlacementStrategy {
    fn fitness(&self, node: &NodeInfo, spec: &ActorSpec, view: &ClusterView) -> f64;
}

Return 0.0 or negative to indicate “do not place here”.
Higher values mean stronger preference.
The full ClusterView is available for global-aware decisions (e.g., load balancing).

Built-in strategies

Strategy	Behavior
`LeastLoaded`	Place on the node running the fewest actors
`RandomPlacement`	Uniform random selection across eligible nodes
`Pinned(node_id)`	Always prefer a specific node; fall back if unavailable

Leader election

The LeaderElection trait is pluggable. The Coordinator only runs on the elected leader node.

trait LeaderElection {
    fn elect(&self, view: &ClusterView) -> Option<String>;
}

The default implementation, OldestNode, picks the node with the lowest incarnation counter. This is deterministic — all nodes independently compute the same answer without a consensus round.

use murmer::app::election::OldestNode;
use murmer::cluster::config::NodeClass;

// Any alive node can be leader
let election = OldestNode::any();

// Only Edge nodes can be leader
let election = OldestNode::with_class(NodeClass::Edge);

The Coordinator

The Coordinator is itself a murmer actor (dogfooding the framework). It maintains a ClusterView, accepts SubmitSpec messages, and handles crash recovery when nodes fail.

Lifecycle

The Coordinator starts on the elected leader node.
It subscribes to cluster events to track node joins and failures.
Users send SubmitSpec messages to declare what actors should run.
The Coordinator evaluates placement constraints and strategies, then sends SpawnActor control messages to target nodes.
When a node fails, the Coordinator re-places affected actors according to each spec’s CrashStrategy.

The cluster event bridge

The bridge (murmer::app::bridge) connects the raw cluster machinery to the Coordinator. It subscribes to ClusterEvents and translates them into Coordinator messages (NotifyNodeJoined, NotifyNodeFailed, NotifyNodeLeft, NotifySpawnAck). This keeps the Coordinator decoupled from the transport layer.

The recommended setup uses bridge::start_coordinator():

use murmer::app::bridge;
use murmer::app::coordinator::CoordinatorState;
use murmer::app::placement::LeastLoaded;
use murmer::app::election::OldestNode;

let cluster = system.cluster_system().unwrap();
let state = CoordinatorState::new(
    cluster.identity().node_id_string(),
    Box::new(LeastLoaded),
    Box::new(OldestNode::with_class(NodeClass::Edge)),
);

let coordinator = bridge::start_coordinator(cluster, state);

This wires up everything: the Coordinator actor, the event bridge loop, and the spawn drain loop.

The spawn drain loop

The spawn drain loop reads placement decisions from the Coordinator and dispatches them — either invoking a local spawn factory or sending a SpawnActor control message to a remote node. Each request is dispatched as an independent tokio::spawn task so factories run concurrently; acks arrive at the Coordinator in any order (keyed by request_id).

An AckGuard ensures that every spawn request receives an acknowledgement, even if the factory panics or the task is cancelled. On the happy path the factory calls ack(true, None). If the guard is dropped without an explicit ack (panic, cancellation), it fires a detached task to deliver a failure ack so the Coordinator’s pending_spawns map never leaks stale entries.

The cluster view

The ClusterView is the Coordinator’s world model — a snapshot of all nodes with their capabilities and running actors:

// Query the Coordinator's view
let view = coordinator.send(GetClusterView).await?;
println!("Alive nodes: {}", view.alive_count);
println!("Total nodes: {}", view.total_count);

// Query managed specs
let specs = coordinator.send(GetSpecs).await?;
for spec in &specs {
    println!("{}: {:?} on {}", spec.label, spec.state, spec.node_id);
}

Each node in the view carries:

Identity — name, host, port, incarnation counter
Class — Worker, Edge, Coordinator, etc.
Metadata — arbitrary key-value pairs (e.g., "volume" = "photos")
Running actors — labels of actors currently hosted
Liveness — whether the node is reachable

Cluster singletons

Some actors must have exactly one instance across the whole cluster — a catalog owner, a sequence minter, a lock manager. A cluster singleton is a Coordinator-managed actor pinned to an anchor, with a fenced handoff so two instances never run at once.

use murmer::app::singleton::{SingletonSpec, SingletonAnchor};
use murmer::app::coordinator::{StartSingleton, GetSingleton};

// Declare the singleton via the Coordinator.
let ownership = coordinator.send_async(StartSingleton {
    spec: SingletonSpec::new("catalog", "app::Catalog", SingletonAnchor::Leader)
        .with_state(boot_bytes),
}).await??;
assert_eq!(ownership.generation.term, 1);

Or, as a convenience when the Coordinator is registered under the well-known "coordinator" label:

let ownership = system.start_singleton(
    SingletonSpec::new("catalog", "app::Catalog", SingletonAnchor::Leader),
).await?;

Anchors

SingletonAnchor decides which node owns the instance:

Anchor	Owner
`Leader`	Whatever the `LeaderElection` currently elects
`Class(NodeClass)`	The oldest alive node of that class
`Node(node_id)`	Exactly that node (no owner if it is down)

The generation fence

Every ownership grant carries a monotone SingletonGeneration { term, seq }:

term bumps once per ownership change (a failover or move) — an FDB-style recovery epoch.
seq bumps on a re-grant to the same owner (liveness renew / idempotent re-assert).

SingletonGeneration orders lexicographically (term-major) and packs into a single u64 via packed() — so a downstream fence that compares one integer (e.g. a write generation stamped on disk) rejects a stale ex-owner with no change to that comparison. A node that loses ownership and later returns necessarily holds a strictly-lower generation than its successor, so its first fenced write is rejected.

Failover

When the owner node leaves or fails, the Coordinator re-places the singleton on a surviving node that still satisfies the anchor, minting a strictly-higher term. The old owner is gone, so there is no drain — the generation fence is what guarantees a zombie ex-owner cannot double-write:

// Owner departs → the Coordinator re-drives to a survivor with term N+1.
let after = coordinator.send(GetSingleton { label: "catalog".into() }).await?.unwrap();
assert!(after.generation > before.generation); // strictly higher — the fence

Pluggable generation source

By default the Coordinator mints generations in memory (CoordinatorGenerationSource), which is fine for a single node or tests but is not durable across a Coordinator restart. Inject a durable GenerationSource so the generation survives failover of the Coordinator itself:

let state = CoordinatorState::new(local_id, Box::new(LeastLoaded), Box::new(OldestNode::any()))
    .with_generation_source(Arc::new(MyDurableGenerationSource::new()));

GenerationSource has three methods — claim_term (begin a new ownership epoch), claim_seq (re-grant within the current term), and current (read the authoritative ownership for cold-start rebuild after a Coordinator restart). Backing it with an external linearizable store (a database row, a consensus log) is what makes the singleton’s identity the single source of truth across the whole cluster.

Full example: Filesystem RPC

The orchestrator example demonstrates the full orchestration loop:

Three nodes form a cluster: a gateway (Edge class) and two workers (store-a, store-b).
Each worker advertises capabilities via metadata ("volume" = "photos" or "volume" = "docs").
The gateway runs a Coordinator that places StorageAgent actors on workers matching their placement constraints.
Clients query storage agents for directory listings and file reads — transparently routed across the cluster.
Node failure triggers crash strategy handling.

// Submit a spec with placement constraints
let result = coordinator.send(SubmitSpec {
    spec: ActorSpec::new("storage/photos", "orchestrator::StorageAgent")
        .with_state(photos_state_bytes)
        .with_crash_strategy(CrashStrategy::WaitForReturn(Duration::from_secs(30)))
        .with_constraints(PlacementConstraints {
            required_classes: vec![NodeClass::Worker],
            required_metadata: [("volume".into(), "photos".into())].into(),
            ..Default::default()
        }),
}).await?;

// The Coordinator placed the actor — now query it from any node
let photos = system.lookup::<StorageAgent>("storage/photos").unwrap();
let entries = photos.send(ListDir { path: "/".into() }).await?;

Run the example:

cargo test -p murmer-examples --test orchestrator

Keyboard shortcuts

Murmer — A Distributed Actor Framework for Rust