Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

murmer — distributed actors for rust


Introduction

Murmer is a distributed actor framework for Rust, built on tokio and QUIC.

It provides typed, location-transparent actors that communicate through message passing. Whether an actor lives in the same process or on a remote node across the network, you interact with it through the same Endpoint<A> API.

Why I built this

I’ve spent years working with Elixir and the BEAM VM, and the actor model there is something I’ve grown deeply fond of — the simplicity of processes, message passing, and supervision just works. When I looked at bringing that experience to Rust, I studied existing implementations like Actix, Telepathy, and Akka (on the JVM side). They’re impressive systems, but I kept running into the same friction: getting a basic actor up and running was complex, and adding remote communication across nodes was even more so.

Murmer is an experiment in answering a simple question: can you build a robust distributed actor system in Rust that’s actually simple to use?

The answer, it seems, is yes.

The design draws heavy inspiration from BEAM OTP’s supervision and process model, Akka’s clustering approach, and Apple’s Swift Distributed Actors for the typed, location-transparent endpoint API. The goal is to combine these ideas with Rust’s performance and safety guarantees — zero-cost local dispatch, compile-time message type checking, and automatic serialization over encrypted QUIC connections when actors span nodes.

Murmer in 1 minute

Install:

[dependencies]
murmer = "0.1"
serde = { version = "1", features = ["derive"] }
tokio = { version = "1", features = ["full"] }

Write an actor, send it messages:

use murmer::prelude::*;

// ① Define your actor — state lives separately
#[derive(Debug)]
struct Counter;
struct CounterState { count: i64 }

impl Actor for Counter {
    type State = CounterState;
}

// ② Handlers become the actor's API
#[handlers]
impl Counter {
    #[handler]
    fn increment(
        &mut self,
        _ctx: &ActorContext<Self>,
        state: &mut CounterState,
        amount: i64,
    ) -> i64 {
        state.count += amount;
        state.count
    }

    #[handler]
    fn get_count(
        &mut self,
        _ctx: &ActorContext<Self>,
        state: &mut CounterState,
    ) -> i64 {
        state.count
    }
}

#[tokio::main]
async fn main() {
    // ③ Create a local actor system
    let system = System::local();

    // ④ Start an actor — returns a typed Endpoint<Counter>
    let counter = system.start("counter/main", Counter, CounterState { count: 0 });

    // ⑤ Send messages via auto-generated extension methods
    let result = counter.increment(5).await.unwrap();
    println!("Count: {result}"); // → Count: 5

    // ⑥ Look up actors by label — works for local and remote
    let found = system.lookup::<Counter>("counter/main").unwrap();
    let count = found.get_count().await.unwrap();
    println!("Looked up: {count}"); // → Looked up: 5
}
cargo run

That’s it — a complete, working actor system. The rest of this page explains what’s happening under the hood. The Getting Started chapter goes deeper into each component.

What it gives you

  • Send messages without caring where the actor lives. counter.increment(5) (line ⑤) works identically whether the actor is local or on a remote node — the Endpoint<A> API abstracts the difference away.
  • Test distributed systems from a single process. System::local() (line ③) runs everything in-memory. Swap to System::clustered() when you’re ready for real networking — your actor code stays identical.
  • Define actors with minimal boilerplate. The #[handlers] macro (line ②) auto-generates message structs (Increment, GetCount), dispatch tables, serialization, and the extension methods you call on line ⑤.
  • Get networking and encryption handled for you. QUIC transport with automatic TLS, SWIM-based cluster membership, and mDNS discovery — all configured, not hand-rolled.
  • Supervise actors like OTP. Restart policies (Temporary, Transient, Permanent) with configurable limits and exponential backoff keep your system running through failures.
  • Orchestrate applications across a cluster. The app module adds placement strategies, leader election, and crash recovery — so you can declare what should run and where, and the framework handles the rest.

What’s happening: line by line

① Actor + StateCounter is a zero-sized struct. All mutable state lives in CounterState, passed as &mut to every handler. This keeps the actor lightweight and the state threading explicit.

#[handlers] — The macro reads your method signatures and generates:

  • Message structs — Increment { pub amount: i64 } and GetCount (unit struct)
  • Handler<Increment> and Handler<GetCount> trait implementations
  • RemoteDispatch — a wire-format dispatch table so remote nodes know how to route messages to the right handler
  • CounterExt — an extension trait on Endpoint<Counter> that gives you .increment(amount) and .get_count() methods

System::local() — Creates the actor runtime and boots the Receptionist — the internal actor registry that tracks all actors by label and type.

system.start(...) — Wraps Counter in a Supervisor that manages its lifecycle, mailbox, and restart behavior. Registers it with the Receptionist under the label "counter/main". Returns an Endpoint<Counter> — your typed send handle.

counter.increment(5) — The extension method constructs an Increment { amount: 5 } message and sends it through the Endpoint. Since this is a local actor, the message is dispatched as a zero-copy envelope through the Supervisor’s mailbox to the handler. The result comes back through a oneshot channel.

system.lookup(...) — Queries the Receptionist for an actor of type Counter at label "counter/main". Returns the same Endpoint<Counter>. In a clustered system, this could return a proxy endpoint that transparently serializes messages over QUIC to a remote node.

Core concepts

ConceptIn the examplePurpose
ActorCounter + CounterStateStateful message processor. Actor has no fields — state lives separately.
MessageGenerated Increment, GetCountDefines a request and its response type.
RemoteMessageGenerated by #[handlers]A message that can cross the wire (serializable + TYPE_ID).
Endpointcounter from system.start(...)Opaque send handle. Abstracts local vs remote — callers never know which.
ReceptionistPowers system.lookup(...)Type-erased actor registry. Start, lookup, and subscribe to actors.
RouterNot shown — see DiscoveryDistributes messages across a pool of endpoints (round-robin, broadcast).
ListingNot shown — see DiscoveryAsync stream of endpoints matching a ReceptionKey.

Architecture

Architecture: System → Receptionist → Supervisor → Endpoint → Cluster

Every layer in this diagram is touched by the example code:

  • System — created at line ③, runs the entire runtime
  • Receptionist — populated at line ④ (start), queried at line ⑥ (lookup)
  • Supervisor Layer — wraps Counter at line ④, manages its mailbox, would handle restarts if configured
  • Endpoint‹A› — returned at line ④, used at line ⑤ to send messages; local dispatch here, but swap to System::clustered() and the same endpoint transparently routes over QUIC
  • Cluster Layer — not active in System::local(), but requires zero code changes to enable (see Clustering)

Key design decisions

  • Endpoint<A> is the only API — callers never know if an actor is local or remote.
  • Receptionist is non-generic — stores type-erased entries internally, uses TypeId guards for safe downcasts at lookup time.
  • Supervisors are flat — each actor has its own supervisor, no parent-child hierarchy.
  • Labels are paths"cache/user", "worker/0", "thumbnail/processor/3". Hierarchical naming for organizational clarity.
  • Fail-fast networking — if a QUIC stream fails, all pending responses error immediately instead of hanging.

From primitives to applications

Murmer works at two levels:

The core (Actors, Discovery, Supervision, Clustering) gives you the building blocks — everything you saw in the example above. You can build complete services with just these primitives.

The application layer (Application Orchestration) builds on top of the core to manage real, running applications across a cluster. You declare what actors should run, where they should be placed (with constraints like “must have GPU” or “must be a Worker node”), and what happens when a node fails — and the Coordinator handles placement, spawning, and crash recovery automatically.

murmer core primitives flowing into app module orchestration

Learn more

Getting Started

This chapter goes deeper into the components you saw in the introduction. If you haven’t seen the 1-minute example yet, start there — this chapter assumes you’ve seen the basics and want to understand more.

Dependencies

[dependencies]
murmer = "0.1"
serde = { version = "1", features = ["derive"] }
tokio = { version = "1", features = ["full"] }

murmer is the core framework. The macros feature (on by default) re-exports #[handlers], #[handler], and #[derive(Message)] from murmer-macros — no separate dependency needed. Both serde and tokio are required — serde for message serialization (even in local mode, the types need to be Serialize + Deserialize for remote readiness), and tokio as the async runtime.

The Actor trait

Every actor implements the Actor trait, which has one required associated type: State.

use murmer::prelude::*;

#[derive(Debug)]
struct ChatRoom;

struct ChatRoomState {
    room_name: String,
    messages: Vec<ChatEntry>,
    max_messages: usize,
}

struct ChatEntry {
    from: String,
    text: String,
    timestamp: u64,
}

impl Actor for ChatRoom {
    type State = ChatRoomState;
}

Why state lives separately

This is a deliberate design choice. In many actor frameworks, state lives directly on the actor struct. In murmer, the actor struct is typically empty (zero-sized) and all mutable state lives in the associated State type.

This gives you:

  • Explicit state threading — every handler receives &mut State, making it clear what data is being read and modified.
  • Clean restarts — when a supervisor restarts an actor, the factory creates a fresh (Actor, State) pair. No hidden state carried over from a crashed instance.
  • Separation of identity and data — the actor struct can carry configuration or immutable context (like a database pool handle), while State holds the mutable per-instance data.

You can put fields on the actor struct — they just won’t be part of the restart cycle:

struct ChatRoom {
    db: DatabasePool,  // immutable, shared across restarts
}

struct ChatRoomState {
    messages: Vec<ChatEntry>,  // mutable, reset on restart
}

Defining handlers

Handlers are methods on the actor that process incoming messages. The #[handlers] macro on the impl block and #[handler] on individual methods does the heavy lifting:

#[handlers]
impl ChatRoom {
    #[handler]
    fn post_message(
        &mut self,
        _ctx: &ActorContext<Self>,
        state: &mut ChatRoomState,
        from: String,
        text: String,
    ) -> usize {
        state.messages.push(ChatEntry {
            from,
            text,
            timestamp: now(),
        });
        // Trim if over limit
        if state.messages.len() > state.max_messages {
            state.messages.remove(0);
        }
        state.messages.len()
    }

    #[handler]
    fn get_history(
        &mut self,
        _ctx: &ActorContext<Self>,
        state: &mut ChatRoomState,
    ) -> Vec<String> {
        state.messages.iter()
            .map(|e| format!("[{}] {}: {}", e.timestamp, e.from, e.text))
            .collect()
    }

    #[handler]
    fn room_name(
        &mut self,
        _ctx: &ActorContext<Self>,
        state: &mut ChatRoomState,
    ) -> String {
        state.room_name.clone()
    }
}

Handler signature rules

Every handler method must follow this pattern:

fn method_name(
    &mut self,                          // always &mut self
    ctx: &ActorContext<Self>,           // always second — the actor's context
    state: &mut YourStateType,         // always third — mutable state access
    // ... additional parameters become message fields
) -> ReturnType {
    // ...
}
  • &mut self — the actor instance.
  • ctx: &ActorContext<Self> — provides access to the system, receptionist, the actor’s own label, and methods like ctx.watch() for actor monitoring. Prefix with _ if unused.
  • state: &mut State — the actor’s mutable state.
  • Additional parameters — each one becomes a field on the generated message struct. fn increment(... amount: i64) generates Increment { pub amount: i64 }.
  • Return type — becomes the message’s Result type. Handlers must return a value (not ()).

What gets generated

For the ChatRoom example above, the macro generates:

// Message structs
struct PostMessage { pub from: String, pub text: String }
struct GetHistory;  // unit struct — no extra params
struct RoomName;

// Trait implementations
impl Handler<PostMessage> for ChatRoom { /* ... */ }
impl Handler<GetHistory> for ChatRoom { /* ... */ }
impl Handler<RoomName> for ChatRoom { /* ... */ }

// Remote dispatch table
impl RemoteDispatch for ChatRoom { /* ... */ }

// Extension trait for ergonomic sends
trait ChatRoomExt {
    fn post_message(&self, from: String, text: String) -> impl Future<Output = Result<usize>>;
    fn get_history(&self) -> impl Future<Output = Result<Vec<String>>>;
    fn room_name(&self) -> impl Future<Output = Result<String>>;
}

impl ChatRoomExt for Endpoint<ChatRoom> { /* ... */ }

This means you can call endpoint.post_message("alice".into(), "hello".into()) directly on any Endpoint<ChatRoom>, without ever constructing a message struct yourself.

Async handlers

For handlers that need to perform async work (I/O, database queries, HTTP calls), use async fn:

#[handlers]
impl ChatRoom {
    #[handler]
    async fn fetch_and_store(
        &mut self,
        ctx: &ActorContext<Self>,
        state: &mut ChatRoomState,
        url: String,
    ) -> Result<usize, String> {
        // You can .await here — the supervisor handles scheduling
        let response = reqwest::get(&url).await
            .map_err(|e| e.to_string())?;
        let body = response.text().await
            .map_err(|e| e.to_string())?;

        state.messages.push(ChatEntry {
            from: "system".into(),
            text: body,
            timestamp: now(),
        });
        Ok(state.messages.len())
    }
}

Async handlers generate AsyncHandler<FetchAndStore> instead of Handler<FetchAndStore>. The supervisor processes async handlers cooperatively — while one handler is awaiting, no other messages are processed for that actor (preserving the single-writer invariant).

Explicit message types

The auto-generated messages from #[handlers] cover most cases, but sometimes you want to define a message type explicitly — for example, when multiple actors handle the same message:

use murmer::Message;
use serde::{Serialize, Deserialize};

#[derive(Debug, Clone, Serialize, Deserialize, Message)]
#[message(result = String)]
struct Ping {
    payload: String,
}

To use an explicit message in a handler, name the parameter msg:

#[handlers]
impl ChatRoom {
    #[handler]
    fn ping(
        &mut self,
        _ctx: &ActorContext<Self>,
        _state: &mut ChatRoomState,
        msg: Ping,          // "msg" signals: use this type directly
    ) -> String {
        format!("pong: {}", msg.payload)
    }
}

The msg parameter name tells the macro to use Ping as-is instead of generating a new message struct.

For messages that need to cross the network, add remote:

#[derive(Debug, Clone, Serialize, Deserialize, Message)]
#[message(result = String, remote = "my_app::Ping")]
struct Ping {
    payload: String,
}

The remote value is a unique string identifier used for wire-format routing. See the Proc Macro Reference for the full details.

The System

The System is the entry point for everything — it runs the actor runtime, manages the receptionist, and optionally handles clustering.

// Local mode — in-memory, no networking
let system = System::local();

// Clustered mode — QUIC networking, SWIM membership
let system = System::clustered_auto(config).await?;

Both modes expose the same API. Your actor code doesn’t change.

Starting actors

// Returns Endpoint<ChatRoom>
let room = system.start(
    "room/general",                   // label — unique within the cluster
    ChatRoom,                         // actor instance
    ChatRoomState {                   // initial state
        room_name: "general".into(),
        messages: vec![],
        max_messages: 1000,
    },
);

The label "room/general" is a path-like string that uniquely identifies this actor in the system. Labels are how actors find each other through the receptionist.

Looking up actors

// Type-safe lookup — returns Option<Endpoint<ChatRoom>>
let room = system.lookup::<ChatRoom>("room/general");

if let Some(ep) = room {
    let history = ep.get_history().await?;
}

Lookups are type-checked at compile time. If the label exists but the type doesn’t match, None is returned.

Endpoints in depth

Endpoint<A> is the central abstraction. It’s:

  • TypedEndpoint<ChatRoom> can only send messages that ChatRoom handles
  • Cloneable — lightweight handle, share freely across tasks
  • Location-transparent — local endpoints dispatch through in-memory channels; remote endpoints serialize over QUIC
// All of these work identically
let room: Endpoint<ChatRoom> = system.start("room/1", ChatRoom, state);
let room: Endpoint<ChatRoom> = system.lookup::<ChatRoom>("room/1").unwrap();

// Clone and pass to another task
let room2 = room.clone();
tokio::spawn(async move {
    room2.post_message("bot".into(), "background task".into()).await.unwrap();
});

// Send via extension methods (ergonomic)
room.post_message("alice".into(), "hello".into()).await?;

// Or send a message struct directly
room.send(PostMessage { from: "alice".into(), text: "hello".into() }).await?;

Both .post_message(...) (extension method) and .send(PostMessage { ... }) (direct) do the same thing. The extension methods are more ergonomic for the common case.

Actor watches

Monitor other actors and get notified when they terminate — inspired by Erlang’s monitor/2:

struct Watchdog;
struct WatchdogState { terminated: Vec<String> }

impl Actor for Watchdog {
    type State = WatchdogState;

    fn on_actor_terminated(
        &mut self,
        state: &mut WatchdogState,
        terminated: &ActorTerminated,
    ) {
        tracing::warn!(
            "Actor {} terminated: {:?}",
            terminated.label,
            terminated.reason
        );
        state.terminated.push(terminated.label.clone());
    }
}

#[handlers]
impl Watchdog {
    #[handler]
    fn watch(
        &mut self,
        ctx: &ActorContext<Self>,
        _state: &mut WatchdogState,
        label: String,
    ) -> bool {
        ctx.watch(&label);
        true
    }
}

The on_actor_terminated callback on the Actor trait fires when any watched actor stops, crashes, or is killed. The ActorTerminated struct tells you which actor and why.

Going from local to clustered

The entire point of murmer’s design is that this transition requires zero changes to your actor code. Only the system construction changes:

// Before: local
let system = System::local();

// After: clustered
let config = ClusterConfig::builder()
    .name("my-node")
    .listen("0.0.0.0:7100".parse()?)
    .cookie("my-cluster-secret")
    .seed_nodes(["192.168.1.1:7100".parse()?])
    .build()?;

let system = System::clustered_auto(config).await?;

Everything else — system.start(...), system.lookup(...), endpoint.send(...) — stays identical. Actors on remote nodes appear in your local receptionist automatically via registry replication.

See the Clustering chapter for the full walkthrough, including Docker deployment and the interactive cluster_chat example.

Build and test

cargo build
cargo nextest run
cargo clippy -- -D warnings

Next steps

Now that you understand the components, dive into the specific areas:

Actors and Messages

This chapter covers the core building blocks: actors, state, messages, handlers, and endpoints.

Actors

An actor in murmer is a small server defined as a Rust struct with a set of handler methods, persisting state, and message types. You define an actor by implementing the Actor trait, then implementing Handler methods for each message type it can handle.

use murmer::prelude::*;

#[derive(Debug)]
struct ChatRoom;

struct ChatRoomState {
    room_name: String,
    messages: Vec<ChatEntry>,
}

impl Actor for ChatRoom {
    type State = ChatRoomState;
}

Key points:

  • The actor struct itself is typically a zero-sized type (no fields). All mutable state lives in the associated State type.
  • State is passed as &mut State to every handler, keeping the actor struct itself lightweight and the state explicitly threaded.
  • Each actor runs inside a supervisor that manages its lifecycle, mailbox, and restart behavior.

Messages

A message is a type that can be sent to an actor. Every message defines a result type and can optionally be serializable for remote delivery.

There are two ways to define messages:

When using #[handlers] with #[handler], the macro generates message structs automatically from your method signatures:

#[handlers]
impl ChatRoom {
    #[handler]
    fn post_message(
        &mut self,
        _ctx: &ActorContext<Self>,
        state: &mut ChatRoomState,
        from: String,
        text: String,
    ) -> usize {
        state.messages.push(ChatEntry { from, text });
        state.messages.len()
    }

    #[handler]
    fn get_history(
        &mut self,
        _ctx: &ActorContext<Self>,
        state: &mut ChatRoomState,
    ) -> Vec<String> {
        state.messages.iter()
            .map(|e| format!("{}: {}", e.from, e.text))
            .collect()
    }
}

This generates PostMessage { pub from: String, pub text: String } and GetHistory unit struct, plus all the trait implementations and an extension trait ChatRoomExt on Endpoint<ChatRoom>.

Explicit messages

For messages shared across multiple actors, define them manually with #[derive(Message)]:

use murmer::Message;
use serde::{Serialize, Deserialize};

#[derive(Debug, Clone, Serialize, Deserialize, Message)]
#[message(result = Vec<String>, remote = "orchestrator::ListDir")]
struct ListDir {
    path: String,
}

Then reference it in a handler with the msg parameter name:

#[handlers]
impl StorageAgent {
    #[handler]
    fn list_dir(
        &mut self,
        _ctx: &ActorContext<Self>,
        state: &mut StorageState,
        msg: ListDir,
    ) -> Vec<String> {
        state.dirs.get(&msg.path).cloned().unwrap_or_default()
    }
}

The remote = "..." attribute is optional — omit it for local-only messages that don’t need wire serialization.

Async handlers

Handlers that need to await use async fn:

#[handlers]
impl MyActor {
    #[handler]
    async fn fetch_data(
        &mut self,
        ctx: &ActorContext<Self>,
        state: &mut MyState,
        url: String,
    ) -> Vec<u8> {
        some_async_call(&url).await
    }
}

This generates an AsyncHandler<FetchData> implementation instead of Handler<FetchData>.

Endpoints

Endpoints are opaque handles to actors used to send messages and receive responses. The Endpoint<A> type abstracts where the actor lives — whether local or remote — and handles serialization in the background.

// Start returns a typed endpoint
let counter = system.start("counter/main", Counter, CounterState { count: 0 });

// Send via auto-generated extension trait
let result = counter.increment(5).await.unwrap();

// Or send a message struct directly
let result = counter.send(Increment { amount: 5 }).await.unwrap();

Location transparency

The key design principle: Endpoint<A> hides whether the actor is local or remote.

Endpoint‹A› forking into local (zero-copy) and remote (QUIC) paths

  • Local actors use the envelope pattern — zero serialization cost, direct in-memory dispatch through a type-erased trait object.
  • Remote actors serialize messages with bincode, send them over QUIC streams, and deserialize responses on return.

The caller’s code is identical in both cases:

let result = endpoint.send(Increment { amount: 5 }).await?;

How endpoints work under the hood

Under the hood, an endpoint wraps a tokio channel that sends messages to the actor’s supervisor:

  • Local endpoint: The channel connects directly to the actor’s supervisor (mpsc). Messages are dispatched as type-erased envelopes with zero serialization cost.
  • Remote endpoint: The channel connects to a proxy supervisor that serializes the message in bincode format, sends it through a QUIC stream, awaits the response, deserializes it, and returns it to the caller.

Endpoints are lightweight and Clone — share them freely across tasks.

Actor watches

Erlang-style actor monitoring — get notified when a watched actor terminates:

impl Actor for Monitor {
    type State = MonitorState;

    fn on_actor_terminated(
        &mut self,
        state: &mut MonitorState,
        terminated: &ActorTerminated,
    ) {
        match &terminated.reason {
            TerminationReason::Panicked(msg) => {
                tracing::error!("{} panicked: {}", terminated.label, msg);
            }
            _ => {}
        }
    }
}

#[handlers]
impl Monitor {
    #[handler]
    fn watch_actor(
        &mut self,
        ctx: &ActorContext<Self>,
        _state: &mut MonitorState,
        label: String,
    ) {
        ctx.watch(&label);
    }
}

The ActorContext provides the watch() method, and termination notifications arrive via the on_actor_terminated callback on the Actor trait.

Discovery

Murmer provides a unified actor discovery system through the Receptionist — a type-erased registry that handles registration, lookup, and subscription for both local and remote actors.

Labels

Actors are identified by path-like labels: "cache/user", "worker/0", "thumbnail/processor/3".

let ep = system.start("service/auth", AuthActor, AuthState::new());
let ep = system.lookup::<AuthActor>("service/auth").unwrap();

Labels serve as the primary routing key in the system:

  • No two actors can have the same label in the clustered system.
  • The path structure ("group/subgroup/instance") enables hierarchical organization of actors.
  • Wildcards can be used in subscriptions on the receptionist (e.g., subscribe to all actors under "worker/*").
  • Labels are only used for actor discovery and routing — they are a feature of the receptionist, not intrinsic to actors themselves.

The Receptionist

The Receptionist is a special internal actor started with every actor system. It manages the lookup and registration of both local and remote actors.

When an actor is started, it registers itself with the receptionist. The receptionist maintains a mapping of labels and actor types to their corresponding endpoints. When you want to communicate with an actor, you query the receptionist with the actor type and label to receive an endpoint.

Typed lookup

Lookups are type-safe — you specify the actor type and get back a typed endpoint:

// Returns Option<Endpoint<AuthActor>>
let endpoint = system.lookup::<AuthActor>("service/auth");

If the label doesn’t exist or the type doesn’t match, None is returned. For remote actors, the receptionist returns an endpoint that transparently handles connection management when accessed.

State observation

When an actor is registered, the receptionist becomes an observer of its state. If the actor stops or crashes, the receptionist is notified and removes the actor from its registry. This ensures the receptionist always has an up-to-date view of the actors in the system.

If an actor restarts (via supervision), the receptionist is notified but the actor remains available — the state is non-negative (alive or restarting, not dead).

Reception keys and listings

Group actors by type and subscribe to changes:

let worker_key = ReceptionKey::<Worker>::new("workers");

// Check actors into the group
receptionist.check_in("worker/0", worker_key.clone());
receptionist.check_in("worker/1", worker_key.clone());

// Subscribe — get existing actors immediately + live updates
let mut listing = receptionist.listing(worker_key);
while let Some(endpoint) = listing.next().await {
    endpoint.send(Work { task: "process".into() }).await?;
}

A Listing<A> is an async stream that yields endpoints as actors register and deregister against a ReceptionKey. It provides both backfill (existing actors) and live updates (new registrations), making it ideal for dynamic pool management.

Lifecycle events

Subscribe to all actor registrations and deregistrations across the system:

let mut events = receptionist.subscribe_events();
while let Some(event) = events.recv().await {
    match event {
        ActorEvent::Registered { label, actor_type } => { /* ... */ }
        ActorEvent::Deregistered { label, actor_type } => { /* ... */ }
    }
}

Other actors can subscribe to the receptionist to receive notifications about actors being added, removed, or updated. Subscriptions can be broad (all actor updates, useful for clustering) or specific (a particular actor type or label).

Routing

Distribute messages across actor pools:

let router = Router::new(
    vec![ep1, ep2, ep3],
    RoutingStrategy::RoundRobin,
);

// Each send goes to the next endpoint in sequence
router.send(Increment { amount: 1 }).await?;

// Or broadcast to all
let results = router.broadcast(GetCount).await;

The Router<A> takes a set of endpoints and a RoutingStrategy to distribute messages. Current strategies include round-robin and broadcast.

How discovery works across nodes

When running in clustered mode, the receptionist automatically synchronizes actor registrations across nodes:

Cross-node discovery: register → replicate → lookup → proxy endpoint → QUIC

  1. A local actor registers with its node’s receptionist.
  2. The node broadcasts an ActorAdd notification to all connected peers.
  3. Remote nodes register the actor in their local receptionists with a lazy endpoint factory.
  4. When a client looks up the remote actor, the endpoint factory creates a proxy that handles network transport.
  5. If the actor’s node fails, all remote registrations are cleaned up automatically.

This means system.lookup::<MyActor>("some/label") works identically whether the actor is local or on a remote node — the receptionist handles the difference transparently.

Supervision

Every actor in murmer runs inside a supervisor. The supervisor manages the actor’s lifecycle — starting it, processing its mailbox, and restarting it when things go wrong. This model is directly inspired by Erlang/OTP’s supervision trees, adapted for Rust’s ownership and type system.

How supervisors work

Each actor gets its own supervisor. The supervisor is responsible for:

  • Starting the actor and registering it with the receptionist.
  • Mailbox processing — ingesting messages and passing them to the actor’s handlers in order of arrival.
  • Crash detection — catching panics and deciding what to do next based on the restart policy.
  • Restarting the actor using a factory if the policy allows it.
  • State notifications — informing the receptionist of state changes (started, stopped, dead, etc.).
  • Context — providing the actor with access to the system, receptionist, and other actors via ActorContext.

Supervisors are flat — there is no parent-child hierarchy between actors. Each actor is independent and can be stopped or restarted without affecting others.

Actor lifecycle

The supervisor manages an actor through a well-defined set of states:

Actor lifecycle: Starting → Running → Crashed/Stopped → Restarting or Dead

Restart policies

Actors can be started with restart policies that control behavior on failure:

PolicyRestart on panic?Restart on clean stop?
TemporaryNoNo
TransientYesNo
PermanentYesYes
  • Temporary — the actor runs once. If it panics or stops, it’s gone. This is the default.
  • Transient — the actor restarts if it panics, but a clean shutdown is respected. Use this for actors that should survive crashes but can be intentionally stopped.
  • Permanent — the actor always restarts, whether it panicked or stopped cleanly. Use this for critical services that must always be running.

Configuration

To use restart policies, you provide an ActorFactory (which knows how to create fresh instances) and a RestartConfig:

use murmer::{RestartPolicy, RestartConfig, BackoffConfig, ActorFactory};
use std::time::Duration;

struct MyFactory;
impl ActorFactory for MyFactory {
    type Actor = Counter;
    fn create(&mut self) -> (Counter, CounterState) {
        (Counter, CounterState { count: 0 })
    }
}

let endpoint = receptionist.start_with_config(
    "counter/resilient",
    MyFactory,
    RestartConfig {
        policy: RestartPolicy::Permanent,  // Always restart
        max_restarts: 5,                   // Max 5 restarts...
        window: Duration::from_secs(60),   // ...within 60 seconds
        backoff: BackoffConfig {
            initial: Duration::from_millis(100),
            max: Duration::from_secs(30),
            multiplier: 2.0,
        },
    },
);

Restart limits

The max_restarts and window fields prevent infinite restart loops. If the actor exceeds the restart limit within the time window, the supervisor gives up and the actor is permanently stopped. This prevents a persistent bug from consuming all your resources.

Exponential backoff

The BackoffConfig controls the delay between restarts:

  • initial — delay before the first restart attempt.
  • max — maximum delay (the backoff caps here).
  • multiplier — each subsequent restart delay is multiplied by this factor.

For example, with initial: 100ms, max: 30s, multiplier: 2.0, restarts happen at 100ms, 200ms, 400ms, 800ms, … up to 30s.

Actor factories

The ActorFactory trait gives the supervisor a way to create fresh actor instances for restarts:

trait ActorFactory {
    type Actor: Actor;
    fn create(&mut self) -> (Self::Actor, <Self::Actor as Actor>::State);
}

The factory is called each time the supervisor needs a new instance. It can carry its own state if needed — for example, incrementing a generation counter or loading configuration from disk.

Interaction with the receptionist

When a supervised actor restarts:

  1. The old actor instance is dropped.
  2. The supervisor creates a new instance via the factory.
  3. The new instance is registered with the receptionist under the same label.
  4. Any actors watching the old instance receive a termination notification, but the label remains routable.

This means endpoints held by other actors remain valid through restarts — messages sent during the brief restart window are queued in the supervisor’s mailbox and delivered to the new instance.

Clustering

One of murmer’s core design goals is that your actor code doesn’t change when you go from a single process to a multi-node cluster. The same Endpoint<A> API works in both cases.

Step 1: Run everything locally

Create a System::local() — no networking, no config. Your actors communicate through in-memory channels with zero serialization cost:

use murmer::prelude::*;

let system = System::local();

let room = system.start("room/general", ChatRoom, ChatRoomState {
    room_name: "general".into(),
    messages: vec![],
});

// Send messages via extension trait — works instantly
room.post_message("alice".into(), "Hello!".into()).await?;

// Look up actors by label
let ep = system.lookup::<ChatRoom>("room/general").unwrap();
let history = ep.get_history().await?;

Step 2: Go distributed

When you’re ready for real networking, swap System::local() for System::clustered(). Your actor code stays identical — only the system construction changes:

use murmer::prelude::*;
use murmer::cluster::config::ClusterConfig;

let config = ClusterConfig::builder()
    .name("alpha")
    .listen("0.0.0.0:7100".parse()?)
    .advertise("192.168.1.5:7100".parse()?)
    .cookie("my-cluster-secret")
    .seed_nodes(["192.168.1.1:7100".parse()?])
    .build()?;

// clustered_auto() discovers all #[handlers]-annotated actor types automatically
let system = System::clustered_auto(config).await?;

// Same API as local — start, lookup, send
let room = system.start("room/alpha", ChatRoom, state);
room.post_message("alice".into(), "Hello!".into()).await?;

// Actors on other nodes appear automatically via registry replication
let remote_room = system.lookup::<ChatRoom>("room/beta").unwrap();
remote_room.get_history().await?;  // transparently serialized over QUIC

Each node gets a single QUIC connection to every peer, multiplexed over per-actor streams. The OpLog replication protocol uses version vectors for efficient, idempotent sync.

Step 3: Test it interactively

The cluster_chat example lets you try both modes with an interactive CLI:

# Local mode — all actors in one process
cargo run -p murmer-examples --bin cluster_chat -- --local
=== murmer cluster_chat (local mode) ===
  Started room: #general
  Started room: #random

> post general alice Hello everyone!
  [1 messages in #general]
> post general bob Hey alice!
  [2 messages in #general]
> history general
  --- #general ---
  alice: Hello everyone!
  bob: Hey alice!
> rooms
  Known rooms:
    #general — 2 messages
    #random — 0 messages

Same binary, same commands — just add cluster config:

# Terminal 1 — seed node
cargo run -p murmer-examples --bin cluster_chat -- --node alpha --port 7100

# Terminal 2 — joins via seed
cargo run -p murmer-examples --bin cluster_chat -- --node beta --port 7200 --seed 127.0.0.1:7100

Step 4: Deploy with Docker

The docker-compose.yml in the repo runs a 3-node cluster:

docker compose up --build

This starts three containers — alpha, beta, and gamma — each running the cluster_chat example:

services:
  alpha:
    build: .
    command: ["--node", "alpha", "--port", "7100"]

  beta:
    build: .
    command: ["--node", "beta", "--port", "7100", "--seed", "alpha:7100"]

  gamma:
    build: .
    command: ["--node", "gamma", "--port", "7100", "--seed", "alpha:7100"]

Beta and gamma seed from alpha and automatically mesh together.

How clustering works

Cluster mesh formation: seed → handshake → gossip → full mesh

Auto-discovery

When an actor system starts in clustered mode, it runs a server that listens for incoming connections. New nodes connect to existing ones via seed nodes and begin exchanging information about their actors. Nodes can be configured to gossip this information, allowing the network to mesh together organically.

Networking layer

The networking layer is built on QUIC (via the quinn crate) and SWIM (via the foca crate):

  • QUIC provides a reliable, low-latency transport with built-in TLS encryption. Each node pair shares a single QUIC connection, multiplexed over per-actor streams.
  • SWIM handles cluster membership — failure detection, protocol-level heartbeats, and member state dissemination.
  • mDNS provides optional zero-configuration discovery for LAN environments.

Stream architecture

QUIC stream multiplexing: Node Alpha → per-actor streams → Node Beta

When a remote actor’s endpoint is accessed:

  1. A dedicated QUIC stream is opened to the remote node for that actor.
  2. The stream stays open as long as it’s active (not idle).
  3. On the receiving end, a stream handler deserializes incoming messages, looks up the target actor via the receptionist, and forwards them.
  4. Each stream binds to a single actor — messages for other actors result in an error and stream closure.
  5. The stream subscribes to the actor’s lifecycle via the receptionist. If the actor enters a negative state (stopped, dead), the stream closes with an error.

An actor on a node might have multiple inbound streams, but the mailbox system ensures messages are processed in order of arrival.

Registry replication

Actor registrations are replicated across the cluster using an OpLog with version vectors:

  • When a local actor registers, its node broadcasts an ActorAdd operation to all peers.
  • Remote nodes create lazy endpoint factories in their local receptionists.
  • Version vectors ensure operations are idempotent and ordering is preserved.
  • When a node leaves, its registrations are pruned from all other nodes.

This gives every node an eventually consistent view of the entire cluster’s actor topology.

Edge Clients

Not everything that talks to a murmer cluster needs to be a cluster member. A REST API gateway, a CLI tool, a monitoring dashboard, or an integration test runner just needs to call actors — it doesn’t need to run any, participate in SWIM gossip, or store a registry.

Edge clients fill that role. A MurmerClient connects to any cluster node via QUIC, pulls the set of public actors, and exposes the same Endpoint<A> API you already know — without any of the cluster machinery behind it.

use murmer::MurmerClient;
use std::time::Duration;

let client = MurmerClient::connect("10.0.0.5:9000".parse()?, "cluster-cookie").await?;
let ep = client.lookup::<UserService>("api/users").unwrap();
let user = ep.send(GetUser { id: 42 }).await?;
client.disconnect().await;

Visibility: controlling what Edge clients see

Every actor has a visibility that controls who can discover it. You set this at startup time on the server side:

// Public — visible to Edge clients and all cluster members
let api = system.start_public("api/users", UserService, state);

// Internal — visible to cluster members only (default)
let router = system.start("routing/shard-0", ShardRouter, state);

// Private — node-local only, never replicated
let metrics = system.start_private("node/metrics", MetricsCollector, state);
VisibilityEdge clientsCluster membersReplicated via OpLog
Public
Internal (default)
Private

Private is a zero-overhead choice: the actor is never written to the OpLog, never serialized, and never sent over the wire. Use it for utility actors that are purely node-local implementation details — connection managers, per-node caches, local metrics collectors.

Connecting

Edge clients connect to any node in the cluster. The node you connect to acts as your sync source — it responds to pull requests with the current set of public actors.

// Short-lived: connect, call, disconnect
let client = MurmerClient::connect(
    "10.0.0.5:9000".parse()?,
    "cluster-cookie",
).await?;

The cluster cookie must match the server’s cookie or the handshake will be rejected.

Requirements

The server must be started with System::clustered() — Edge clients connect via QUIC and need a listening endpoint. System::local() has no network layer and cannot accept Edge client connections.

Looking up actors

lookup — instant, returns None if not synced yet

if let Some(ep) = client.lookup::<UserService>("api/users") {
    let user = ep.send(GetUser { id: 42 }).await?;
}

Returns None if the actor hasn’t been synced to the client yet. Use this after an initial sync has had time to complete.

lookup_wait — blocks until the actor appears

let ep = client
    .lookup_wait::<UserService>("api/users", Duration::from_secs(5))
    .await?;

Triggers an immediate pull, then waits for the actor to appear — either from that pull’s response or a subsequent one. Re-polls the server every 500 ms. Returns ClusterError::Timeout if the actor doesn’t appear within the deadline.

Fast path: if the actor is already synced, lookup_wait returns after one pull round-trip (typically sub-millisecond on LAN).

Usage patterns

Pattern 1: Short-lived client (pull once)

Ideal for CLI tools, integration tests, and one-off queries. Pulls on connect, uses the snapshot, disconnects.

let client = MurmerClient::connect(addr, cookie).await?;

// Give the initial pull a moment to arrive
tokio::time::sleep(Duration::from_millis(50)).await;

let ep = client.lookup::<UserService>("api/users").unwrap();
let result = ep.send(GetUser { id: 1 }).await?;

client.disconnect().await;

Pattern 2: Long-lived gateway (periodic pull)

Ideal for REST/gRPC API servers, dashboards, and proxies. Use sync_interval to re-pull periodically and pick up new actor registrations as the cluster evolves.

use murmer::ClientOptions;

let client = MurmerClient::connect_with_options(
    addr,
    cookie.into(),
    ClientOptions {
        sync_interval: Some(Duration::from_secs(30)),
        ..Default::default()
    },
).await?;

// client.lookup() stays fresh — re-pulled every 30 seconds

Pattern 3: Wait for a specific actor

Useful when you connect before the target actor is registered — for example, a gateway that starts before the cluster has finished placing its actors.

let ep = client
    .lookup_wait::<PaymentService>("payments/processor", Duration::from_secs(10))
    .await?;

How sync works

Edge clients use pull-based sync — the server never pushes unsolicited updates. The client sends a RegistrySyncRequest with its current version vector; the server responds with only the delta (new public actor registrations since that version).

Edge client                     Cluster node
    │                                │
    │── RegistrySyncRequest(vv) ────▶│
    │◀── RegistrySync(delta ops) ────│
    │                                │
    │  ... time passes ...           │
    │                                │
    │── RegistrySyncRequest(vv') ───▶│  (periodic or lookup_wait re-poll)
    │◀── RegistrySync(new ops) ──────│

After the first sync, subsequent pulls return only the delta — O(new ops), not O(all ops). 1000 idle Edge clients add near-zero server overhead: no SWIM membership, no server-initiated fan-out, no per-client state.

Scalability characteristics

PropertyBehavior
SWIM membershipEdge clients are not added to SWIM — no failure detection overhead
Server-initiated syncSkipped for Edge clients — they pull on their own schedule
DisconnectSilent — no cluster alarm, no actor pruning, no SWIM event
Server state per clientNone — the server is stateless with respect to each Edge client
Wire overhead (idle)Zero — the server never initiates contact

Full example

The edge_client example demonstrates all three patterns:

cargo test -p murmer-examples --bin edge_client

It covers:

  1. Public actors visible to Edge clients, internal actors hidden
  2. lookup_wait resolving when an actor registers after connect
  3. Long-lived client with periodic sync_interval

Monitoring

Murmer provides production-grade metrics for your actor systems via the monitor feature. It follows a facade pattern — your actor code records metrics through thin instrumentation calls, and you choose the backend (Prometheus, StatsD, etc.) at startup.

Enabling monitoring

Add the monitor feature to your Cargo.toml:

[dependencies]
murmer = { version = "0.1", features = ["monitor"] }

When the feature is off, all instrumentation compiles to nothing — zero overhead, zero dependencies.

Quick start with Prometheus

One call installs the metrics recorder and starts an HTTP endpoint for scraping:

use murmer::monitor::start_prometheus;

#[tokio::main]
async fn main() {
    // Start serving metrics on :9000/metrics
    start_prometheus(9000).expect("failed to start prometheus exporter");

    // Now start your actor system as usual
    let system = System::local();
    let counter = system.start("counter/main", Counter, CounterState { count: 0 });

    // Every message send, handler invocation, and lifecycle event
    // is automatically recorded. Scrape with:
    //   curl http://localhost:9000/metrics
}

What gets measured

Murmer instruments five categories of metrics automatically. You don’t need to add any code — everything is recorded as actors run.

Actor lifecycle

MetricTypeLabelsDescription
murmer_actors_activegaugeactor_typeCurrently running actors
murmer_actors_started_totalcounteractor_typeTotal actors started
murmer_actors_stopped_totalcounteractor_type, reasonTotal actors stopped (reason: stopped, panicked, restart_limit_exceeded)
murmer_actors_restarts_totalcounteractor_typeTotal actor restarts
murmer_actors_restart_limit_exceeded_totalcounteractor_typeTimes restart limits were hit

Message processing

MetricTypeLabelsDescription
murmer_messages_processed_totalcounteractor_typeMessages successfully handled
murmer_messages_failed_totalcounteractor_typeMessages that caused a panic
murmer_message_processing_duration_secondshistogramactor_typeHandler execution time

Endpoint sends

MetricTypeLabelsDescription
murmer_sends_totalcounteractor_type, localityTotal sends (local or remote)
murmer_send_errors_totalcounteractor_type, error_kindSend failures by error type
murmer_network_roundtrip_duration_secondshistogramactor_typeEnd-to-end remote call latency

Networking

MetricTypeDescription
murmer_network_connections_activegaugeActive QUIC connections to peer nodes
murmer_network_streams_activegaugeActive QUIC streams for actor messaging
murmer_network_bytes_sent_totalcounterTotal bytes sent over actor streams
murmer_network_bytes_received_totalcounterTotal bytes received over actor streams
murmer_network_inflight_callsgaugeIn-flight remote calls awaiting responses
murmer_network_dead_letters_totalcounterFailed in-flight calls (connection lost)

Cluster membership

MetricTypeLabelsDescription
murmer_cluster_nodesgaugestatusNumber of nodes in the cluster
murmer_cluster_membership_changes_totalcounterevent_typeMembership events (joined, failed, left)

Receptionist

MetricTypeDescription
murmer_receptionist_lookups_totalcounterActor lookups
murmer_receptionist_registrations_totalcounterActor registrations
murmer_receptionist_deregistrations_totalcounterActor deregistrations

Label cardinality

All actor metrics use actor_type (the Rust type name, e.g. my_app::ChatRoom) rather than actor_label (e.g. "room/general"). This keeps cardinality bounded — you typically have fewer than 20 actor types, but could have thousands of labels.

Architecture: how it works

The instrumentation uses a facade pattern inspired by how tracing works:

  1. instrument.rs (always compiled) — Contains thin pub(crate) functions like instrument::message_processed(actor_type). When monitor is on, these call metrics::counter!(...). When off, they’re empty #[inline(always)] functions that the compiler eliminates entirely.

  2. Call sites (supervisor, receptionist, endpoint, etc.) — Call instrument functions unconditionally. No #[cfg] attributes scattered across the codebase.

  3. Your application — Installs a metrics recorder at startup (e.g., start_prometheus(9000)). All recorded metrics flow to the backend you chose.

This means adding a new metric requires touching exactly two places: the instrument function and the call site. The facade keeps the #[cfg] logic in one file.

ClusterMonitor actor

In addition to Prometheus metrics, murmer provides a ClusterMonitor actor that maintains a queryable in-memory view of cluster health:

use murmer::monitor::{ClusterMonitor, ClusterMonitorState, run_monitor_bridge, GetClusterHealth};

// Start the monitor actor
let monitor = system.start("murmer/monitor", ClusterMonitor, ClusterMonitorState::new());

// Bridge cluster events into the monitor
tokio::spawn(run_monitor_bridge(&cluster_system, monitor.clone()));

// Query health at any time
let health = monitor.send(GetClusterHealth).await?;
println!("Alive: {}, Joins: {}, Failures: {}",
    health.alive_nodes, health.total_joins, health.total_failures);

The ClusterMonitor tracks:

  • Which nodes are alive and when they joined
  • Cumulative counters for joins, failures, and departures
  • Per-node uptime

Grafana dashboard

A typical Prometheus + Grafana setup might query:

# Message throughput by actor type
rate(murmer_messages_processed_total[5m])

# 99th percentile handler latency
histogram_quantile(0.99, rate(murmer_message_processing_duration_seconds_bucket[5m]))

# Actor crash rate
rate(murmer_messages_failed_total[5m])

# Remote call latency
histogram_quantile(0.95, rate(murmer_network_roundtrip_duration_seconds_bucket[5m]))

# Cluster size over time
murmer_cluster_nodes{status="active"}

Proc Macro Reference

Murmer provides two proc macros to reduce boilerplate when defining actors: #[handlers] + #[handler] for handler generation, and #[derive(Message)] for explicit message types.

#[handlers] + #[handler]

Place #[handlers] on an impl block containing actor message handlers. Mark each handler method with #[handler].

#[handlers]
impl MyActor {
    #[handler]
    fn do_thing(
        &mut self,
        ctx: &ActorContext<Self>,
        state: &mut MyState,
        name: String,
        count: u32,
    ) -> String {
        format!("{name}: {count}")
    }

    #[handler]
    fn get_status(
        &mut self,
        _ctx: &ActorContext<Self>,
        state: &mut MyState,
    ) -> bool {
        state.is_active
    }

    #[handler]
    async fn fetch_data(
        &mut self,
        ctx: &ActorContext<Self>,
        state: &mut MyState,
        url: String,
    ) -> Vec<u8> {
        some_async_call(&url).await
    }
}

What gets generated

From the above, the macro produces:

Message structs — method name converted to PascalCase, parameters after ctx and state become fields:

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct DoThing { pub name: String, pub count: u32 }

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct GetStatus;  // no extra params → unit struct

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct FetchData { pub url: String }

Message + RemoteMessage impls — each struct implements Message (with the handler’s return type as Result) and RemoteMessage (with a TYPE_ID of "ActorName::method_name"):

impl Message for DoThing { type Result = String; }
impl RemoteMessage for DoThing { const TYPE_ID: &'static str = "MyActor::do_thing"; }

Handler / AsyncHandler impls — dispatches message fields as method arguments:

impl Handler<DoThing> for MyActor {
    fn handle(&mut self, ctx: &ActorContext<Self>, state: &mut MyState, message: DoThing) -> String {
        self.do_thing(ctx, state, message.name, message.count)
    }
}

// async fn → AsyncHandler
impl AsyncHandler<FetchData> for MyActor { /* ... */ }

RemoteDispatch — a wire-format dispatch table that routes serialized messages by their TYPE_ID. This enables cross-node delivery without the sender knowing the concrete handler:

impl RemoteDispatch for MyActor {
    async fn dispatch_remote(&mut self, ctx, state, message_type: &str, payload: &[u8])
        -> Result<Vec<u8>, DispatchError>
    {
        match message_type {
            "MyActor::do_thing" => { /* deserialize DoThing, call handler, serialize result */ }
            "MyActor::get_status" => { /* ... */ }
            "MyActor::fetch_data" => { /* ... */ }
            _ => Err(DispatchError::UnknownMessageType(..))
        }
    }
}

Extension trait — ergonomic methods directly on Endpoint<MyActor>:

pub trait MyActorExt {
    fn do_thing(&self, name: String, count: u32) -> impl Future<Output = Result<String, SendError>>;
    fn get_status(&self) -> impl Future<Output = Result<bool, SendError>>;
    fn fetch_data(&self, url: String) -> impl Future<Output = Result<Vec<u8>, SendError>>;
}

impl MyActorExt for Endpoint<MyActor> { /* ... */ }

This lets you call handlers directly:

let result = endpoint.do_thing("hello".into(), 42).await?;
let status = endpoint.get_status().await?;
let data = endpoint.fetch_data("https://...".into()).await?;

Auto-registration — a linkme distributed slice entry for the TypeRegistry. At cluster startup, TypeRegistry::from_auto() collects all #[handlers]-annotated actor types automatically, enabling the cluster to route messages to the correct deserializer without manual registration.

Handler method signature

Each #[handler] method must follow this pattern:

fn method_name(&mut self, ctx: &ActorContext<Self>, state: &mut State, ...params) -> ReturnType
  • &mut self — the actor instance.
  • ctx: &ActorContext<Self> — provides access to the system, receptionist, and lifecycle operations like watch().
  • state: &mut State — the actor’s mutable state.
  • Remaining parameters become message struct fields.
  • Use async fn for handlers that need to .await.

Explicit messages (backward compatible)

For messages shared across multiple actors, name the last parameter msg (or _msg) and the macro will use the type directly instead of generating a struct:

#[handlers]
impl MyActor {
    #[handler]
    fn increment(
        &mut self,
        ctx: &ActorContext<Self>,
        state: &mut MyState,
        msg: Increment,
    ) -> i64 {
        state.count += msg.amount;
        state.count
    }
}

Here Increment must already exist and implement Message. The extension trait method will take the message as a parameter: endpoint.increment(msg).

#[derive(Message)]

Derives Message (and optionally RemoteMessage) for a struct or enum.

Basic usage (local only)

#[derive(Debug, Clone, Serialize, Deserialize, Message)]
#[message(result = i64)]
struct Increment { amount: i64 }

This implements Message with type Result = i64.

With remote support

Add remote = "..." to also implement RemoteMessage with a wire-stable type ID:

#[derive(Debug, Clone, Serialize, Deserialize, Message)]
#[message(result = i64, remote = "counter::Increment")]
struct Increment { amount: i64 }

The TYPE_ID string is used for wire dispatch — it must be unique across all message types in the cluster and stable across code changes (don’t use std::any::type_name which can change between compiler versions).

Attributes

AttributeRequiredDescription
result = TypeYesThe response type for this message
remote = "id"NoWire-stable type ID for RemoteMessage

Application Orchestration

Murmer’s core gives you actors, messages, and clustering primitives. The app module (enabled via the app feature flag) builds on top of these to provide the application-level abstractions you need for real, working distributed applications: placement strategies, leader election, crash recovery, and a Coordinator actor that ties them all together.

Think of it this way: murmer gives you the building blocks, and the app module helps you assemble them into a running system that manages actor lifecycles across a cluster automatically.

Enable the feature in your Cargo.toml:

[dependencies]
murmer = { version = "0.1", features = ["app"] }

Overview

Orchestration flow: submit spec → coordinate → place → spawn → crash recovery

The orchestration layer answers three questions:

  1. Where should this actor run? — Placement strategies score nodes based on load, capabilities, metadata, and constraints.
  2. Who decides? — Leader election picks one node to run the Coordinator, which makes all placement decisions.
  3. What happens when a node fails? — Crash strategies define recovery behavior: redistribute immediately, wait for the node to return, or let the actor die.

Actor specifications

An ActorSpec describes an actor that the orchestrator should place and manage. It captures what to run, how to recover from crashes, and where to place it.

use murmer::app::spec::{ActorSpec, CrashStrategy, PlacementConstraints};
use murmer::cluster::config::NodeClass;
use std::time::Duration;

let spec = ActorSpec::new("storage/photos", "orchestrator::StorageAgent")
    .with_state(serialized_state)
    .with_crash_strategy(CrashStrategy::WaitForReturn(Duration::from_secs(30)))
    .with_constraints(PlacementConstraints {
        required_classes: vec![NodeClass::Worker],
        required_metadata: [("volume".into(), "photos".into())].into(),
        ..Default::default()
    });

Fields

FieldPurpose
labelActor label (e.g., "storage/photos") — unique across the cluster
actor_type_nameKey into the SpawnRegistry — identifies what type of actor to create
initial_stateSerialized initial state (bincode bytes) sent to the target node
crash_strategyWhat to do when the hosting node fails
placementConstraints that filter which nodes are eligible

Crash strategies

StrategyBehavior
RedistributeMove to another eligible node immediately (default)
WaitForReturn(Duration)Wait for the failed node to rejoin; fall back to Redistribute on timeout
AbandonLet the actor die with the node — no recovery

Placement constraints

Constraints filter eligible nodes before the placement strategy scores them:

PlacementConstraints {
    required_classes: vec![NodeClass::Worker],        // must be a Worker node
    required_metadata: [("gpu".into(), "true".into())].into(), // must have gpu=true
    anti_affinity_labels: vec!["db/primary".into()],  // don't co-locate with this actor
    ..Default::default()
}
  • required_classes — empty means any class is acceptable.
  • required_metadata — the node must have all specified key-value pairs.
  • anti_affinity_labels — avoid placing on nodes already running these actors.

Placement strategies

The PlacementStrategy trait defines a fitness function that scores nodes for hosting a given actor spec. The Coordinator evaluates all eligible nodes (after constraint filtering) and picks the highest scorer.

trait PlacementStrategy {
    fn fitness(&self, node: &NodeInfo, spec: &ActorSpec, view: &ClusterView) -> f64;
}
  • Return 0.0 or negative to indicate “do not place here”.
  • Higher values mean stronger preference.
  • The full ClusterView is available for global-aware decisions (e.g., load balancing).

Built-in strategies

StrategyBehavior
LeastLoadedPlace on the node running the fewest actors
RandomPlacementUniform random selection across eligible nodes
Pinned(node_id)Always prefer a specific node; fall back if unavailable

Leader election

The LeaderElection trait is pluggable. The Coordinator only runs on the elected leader node.

trait LeaderElection {
    fn elect(&self, view: &ClusterView) -> Option<String>;
}

The default implementation, OldestNode, picks the node with the lowest incarnation counter. This is deterministic — all nodes independently compute the same answer without a consensus round.

use murmer::app::election::OldestNode;
use murmer::cluster::config::NodeClass;

// Any alive node can be leader
let election = OldestNode::any();

// Only Edge nodes can be leader
let election = OldestNode::with_class(NodeClass::Edge);

The Coordinator

The Coordinator is itself a murmer actor (dogfooding the framework). It maintains a ClusterView, accepts SubmitSpec messages, and handles crash recovery when nodes fail.

Lifecycle

  1. The Coordinator starts on the elected leader node.
  2. It subscribes to cluster events to track node joins and failures.
  3. Users send SubmitSpec messages to declare what actors should run.
  4. The Coordinator evaluates placement constraints and strategies, then sends SpawnActor control messages to target nodes.
  5. When a node fails, the Coordinator re-places affected actors according to each spec’s CrashStrategy.

The cluster event bridge

The bridge (murmer::app::bridge) connects the raw cluster machinery to the Coordinator. It subscribes to ClusterEvents and translates them into Coordinator messages (NotifyNodeJoined, NotifyNodeFailed, NotifyNodeLeft, NotifySpawnAck). This keeps the Coordinator decoupled from the transport layer.

The recommended setup uses bridge::start_coordinator():

use murmer::app::bridge;
use murmer::app::coordinator::CoordinatorState;
use murmer::app::placement::LeastLoaded;
use murmer::app::election::OldestNode;

let cluster = system.cluster_system().unwrap();
let state = CoordinatorState::new(
    cluster.identity().node_id_string(),
    Box::new(LeastLoaded),
    Box::new(OldestNode::with_class(NodeClass::Edge)),
);

let coordinator = bridge::start_coordinator(cluster, state);

This wires up everything: the Coordinator actor, the event bridge loop, and the spawn drain loop (which sends placement decisions to the transport layer).

The cluster view

The ClusterView is the Coordinator’s world model — a snapshot of all nodes with their capabilities and running actors:

// Query the Coordinator's view
let view = coordinator.send(GetClusterView).await?;
println!("Alive nodes: {}", view.alive_count);
println!("Total nodes: {}", view.total_count);

// Query managed specs
let specs = coordinator.send(GetSpecs).await?;
for spec in &specs {
    println!("{}: {:?} on {}", spec.label, spec.state, spec.node_id);
}

Each node in the view carries:

  • Identity — name, host, port, incarnation counter
  • ClassWorker, Edge, Coordinator, etc.
  • Metadata — arbitrary key-value pairs (e.g., "volume" = "photos")
  • Running actors — labels of actors currently hosted
  • Liveness — whether the node is reachable

Full example: Filesystem RPC

The orchestrator example demonstrates the full orchestration loop:

  1. Three nodes form a cluster: a gateway (Edge class) and two workers (store-a, store-b).
  2. Each worker advertises capabilities via metadata ("volume" = "photos" or "volume" = "docs").
  3. The gateway runs a Coordinator that places StorageAgent actors on workers matching their placement constraints.
  4. Clients query storage agents for directory listings and file reads — transparently routed across the cluster.
  5. Node failure triggers crash strategy handling.
// Submit a spec with placement constraints
let result = coordinator.send(SubmitSpec {
    spec: ActorSpec::new("storage/photos", "orchestrator::StorageAgent")
        .with_state(photos_state_bytes)
        .with_crash_strategy(CrashStrategy::WaitForReturn(Duration::from_secs(30)))
        .with_constraints(PlacementConstraints {
            required_classes: vec![NodeClass::Worker],
            required_metadata: [("volume".into(), "photos".into())].into(),
            ..Default::default()
        }),
}).await?;

// The Coordinator placed the actor — now query it from any node
let photos = system.lookup::<StorageAgent>("storage/photos").unwrap();
let entries = photos.send(ListDir { path: "/".into() }).await?;

Run the example:

cargo test -p murmer-examples --test orchestrator