Introduction
Murmer is a distributed actor framework for Rust, built on tokio and QUIC.
It provides typed, location-transparent actors that communicate through message passing. Whether an actor lives in the same process or on a remote node across the network, you interact with it through the same Endpoint<A> API.
Why I built this
I’ve spent years working with Elixir and the BEAM VM, and the actor model there is something I’ve grown deeply fond of — the simplicity of processes, message passing, and supervision just works. When I looked at bringing that experience to Rust, I studied existing implementations like Actix, Telepathy, and Akka (on the JVM side). They’re impressive systems, but I kept running into the same friction: getting a basic actor up and running was complex, and adding remote communication across nodes was even more so.
Murmer is an experiment in answering a simple question: can you build a robust distributed actor system in Rust that’s actually simple to use?
The answer, it seems, is yes.
The design draws heavy inspiration from BEAM OTP’s supervision and process model, Akka’s clustering approach, and Apple’s Swift Distributed Actors for the typed, location-transparent endpoint API. The goal is to combine these ideas with Rust’s performance and safety guarantees — zero-cost local dispatch, compile-time message type checking, and automatic serialization over encrypted QUIC connections when actors span nodes.
Murmer in 1 minute
Install:
[dependencies]
murmer = "0.1"
serde = { version = "1", features = ["derive"] }
tokio = { version = "1", features = ["full"] }
Write an actor, send it messages:
use murmer::prelude::*;
// ① Define your actor — state lives separately
#[derive(Debug)]
struct Counter;
struct CounterState { count: i64 }
impl Actor for Counter {
type State = CounterState;
}
// ② Handlers become the actor's API
#[handlers]
impl Counter {
#[handler]
fn increment(
&mut self,
_ctx: &ActorContext<Self>,
state: &mut CounterState,
amount: i64,
) -> i64 {
state.count += amount;
state.count
}
#[handler]
fn get_count(
&mut self,
_ctx: &ActorContext<Self>,
state: &mut CounterState,
) -> i64 {
state.count
}
}
#[tokio::main]
async fn main() {
// ③ Create a local actor system
let system = System::local();
// ④ Start an actor — returns a typed Endpoint<Counter>
let counter = system.start("counter/main", Counter, CounterState { count: 0 });
// ⑤ Send messages via auto-generated extension methods
let result = counter.increment(5).await.unwrap();
println!("Count: {result}"); // → Count: 5
// ⑥ Look up actors by label — works for local and remote
let found = system.lookup::<Counter>("counter/main").unwrap();
let count = found.get_count().await.unwrap();
println!("Looked up: {count}"); // → Looked up: 5
}
cargo run
That’s it — a complete, working actor system. The rest of this page explains what’s happening under the hood. The Getting Started chapter goes deeper into each component.
What it gives you
- Send messages without caring where the actor lives.
counter.increment(5)(line ⑤) works identically whether the actor is local or on a remote node — theEndpoint<A>API abstracts the difference away. - Test distributed systems from a single process.
System::local()(line ③) runs everything in-memory. Swap toSystem::clustered()when you’re ready for real networking — your actor code stays identical. - Define actors with minimal boilerplate. The
#[handlers]macro (line ②) auto-generates message structs (Increment,GetCount), dispatch tables, serialization, and the extension methods you call on line ⑤. - Get networking and encryption handled for you. QUIC transport with automatic TLS, SWIM-based cluster membership, and mDNS discovery — all configured, not hand-rolled.
- Supervise actors like OTP. Restart policies (Temporary, Transient, Permanent) with configurable limits and exponential backoff keep your system running through failures.
- Orchestrate applications across a cluster. The
appmodule adds placement strategies, leader election, and crash recovery — so you can declare what should run and where, and the framework handles the rest.
What’s happening: line by line
① Actor + State — Counter is a zero-sized struct. All mutable state lives in CounterState, passed as &mut to every handler. This keeps the actor lightweight and the state threading explicit.
② #[handlers] — The macro reads your method signatures and generates:
- Message structs —
Increment { pub amount: i64 }andGetCount(unit struct) Handler<Increment>andHandler<GetCount>trait implementationsRemoteDispatch— a wire-format dispatch table so remote nodes know how to route messages to the right handlerCounterExt— an extension trait onEndpoint<Counter>that gives you.increment(amount)and.get_count()methods
③ System::local() — Creates the actor runtime and boots the Receptionist — the internal actor registry that tracks all actors by label and type.
④ system.start(...) — Wraps Counter in a Supervisor that manages its lifecycle, mailbox, and restart behavior. Registers it with the Receptionist under the label "counter/main". Returns an Endpoint<Counter> — your typed send handle.
⑤ counter.increment(5) — The extension method constructs an Increment { amount: 5 } message and sends it through the Endpoint. Since this is a local actor, the message is dispatched as a zero-copy envelope through the Supervisor’s mailbox to the handler. The result comes back through a oneshot channel.
⑥ system.lookup(...) — Queries the Receptionist for an actor of type Counter at label "counter/main". Returns the same Endpoint<Counter>. In a clustered system, this could return a proxy endpoint that transparently serializes messages over QUIC to a remote node.
Core concepts
| Concept | In the example | Purpose |
|---|---|---|
| Actor | Counter + CounterState | Stateful message processor. Actor has no fields — state lives separately. |
| Message | Generated Increment, GetCount | Defines a request and its response type. |
| RemoteMessage | Generated by #[handlers] | A message that can cross the wire (serializable + TYPE_ID). |
| Endpoint | counter from system.start(...) | Opaque send handle. Abstracts local vs remote — callers never know which. |
| Receptionist | Powers system.lookup(...) | Type-erased actor registry. Start, lookup, and subscribe to actors. |
| Router | Not shown — see Discovery | Distributes messages across a pool of endpoints (round-robin, broadcast). |
| Listing | Not shown — see Discovery | Async stream of endpoints matching a ReceptionKey. |
Architecture
Every layer in this diagram is touched by the example code:
- System — created at line ③, runs the entire runtime
- Receptionist — populated at line ④ (
start), queried at line ⑥ (lookup) - Supervisor Layer — wraps
Counterat line ④, manages its mailbox, would handle restarts if configured - Endpoint‹A› — returned at line ④, used at line ⑤ to send messages; local dispatch here, but swap to
System::clustered()and the same endpoint transparently routes over QUIC - Cluster Layer — not active in
System::local(), but requires zero code changes to enable (see Clustering)
Key design decisions
- Endpoint<A> is the only API — callers never know if an actor is local or remote.
- Receptionist is non-generic — stores type-erased entries internally, uses
TypeIdguards for safe downcasts at lookup time. - Supervisors are flat — each actor has its own supervisor, no parent-child hierarchy.
- Labels are paths —
"cache/user","worker/0","thumbnail/processor/3". Hierarchical naming for organizational clarity. - Fail-fast networking — if a QUIC stream fails, all pending responses error immediately instead of hanging.
From primitives to applications
Murmer works at two levels:
The core (Actors, Discovery, Supervision, Clustering) gives you the building blocks — everything you saw in the example above. You can build complete services with just these primitives.
The application layer (Application Orchestration) builds on top of the core to manage real, running applications across a cluster. You declare what actors should run, where they should be placed (with constraints like “must have GPU” or “must be a Worker node”), and what happens when a node fails — and the Coordinator handles placement, spawning, and crash recovery automatically.
Learn more
- Getting Started — deeper walkthrough of each component
- API Reference on docs.rs
- Source on GitHub
Getting Started
This chapter goes deeper into the components you saw in the introduction. If you haven’t seen the 1-minute example yet, start there — this chapter assumes you’ve seen the basics and want to understand more.
Dependencies
[dependencies]
murmer = "0.1"
serde = { version = "1", features = ["derive"] }
tokio = { version = "1", features = ["full"] }
murmer is the core framework. The macros feature (on by default) re-exports #[handlers], #[handler], and #[derive(Message)] from murmer-macros — no separate dependency needed. Both serde and tokio are required — serde for message serialization (even in local mode, the types need to be Serialize + Deserialize for remote readiness), and tokio as the async runtime.
The Actor trait
Every actor implements the Actor trait, which has one required associated type: State.
use murmer::prelude::*;
#[derive(Debug)]
struct ChatRoom;
struct ChatRoomState {
room_name: String,
messages: Vec<ChatEntry>,
max_messages: usize,
}
struct ChatEntry {
from: String,
text: String,
timestamp: u64,
}
impl Actor for ChatRoom {
type State = ChatRoomState;
}
Why state lives separately
This is a deliberate design choice. In many actor frameworks, state lives directly on the actor struct. In murmer, the actor struct is typically empty (zero-sized) and all mutable state lives in the associated State type.
This gives you:
- Explicit state threading — every handler receives
&mut State, making it clear what data is being read and modified. - Clean restarts — when a supervisor restarts an actor, the factory creates a fresh
(Actor, State)pair. No hidden state carried over from a crashed instance. - Separation of identity and data — the actor struct can carry configuration or immutable context (like a database pool handle), while
Stateholds the mutable per-instance data.
You can put fields on the actor struct — they just won’t be part of the restart cycle:
struct ChatRoom {
db: DatabasePool, // immutable, shared across restarts
}
struct ChatRoomState {
messages: Vec<ChatEntry>, // mutable, reset on restart
}
Defining handlers
Handlers are methods on the actor that process incoming messages. The #[handlers] macro on the impl block and #[handler] on individual methods does the heavy lifting:
#[handlers]
impl ChatRoom {
#[handler]
fn post_message(
&mut self,
_ctx: &ActorContext<Self>,
state: &mut ChatRoomState,
from: String,
text: String,
) -> usize {
state.messages.push(ChatEntry {
from,
text,
timestamp: now(),
});
// Trim if over limit
if state.messages.len() > state.max_messages {
state.messages.remove(0);
}
state.messages.len()
}
#[handler]
fn get_history(
&mut self,
_ctx: &ActorContext<Self>,
state: &mut ChatRoomState,
) -> Vec<String> {
state.messages.iter()
.map(|e| format!("[{}] {}: {}", e.timestamp, e.from, e.text))
.collect()
}
#[handler]
fn room_name(
&mut self,
_ctx: &ActorContext<Self>,
state: &mut ChatRoomState,
) -> String {
state.room_name.clone()
}
}
Handler signature rules
Every handler method must follow this pattern:
fn method_name(
&mut self, // always &mut self
ctx: &ActorContext<Self>, // always second — the actor's context
state: &mut YourStateType, // always third — mutable state access
// ... additional parameters become message fields
) -> ReturnType {
// ...
}
&mut self— the actor instance.ctx: &ActorContext<Self>— provides access to the system, receptionist, the actor’s own label, and methods likectx.watch()for actor monitoring. Prefix with_if unused.state: &mut State— the actor’s mutable state.- Additional parameters — each one becomes a field on the generated message struct.
fn increment(... amount: i64)generatesIncrement { pub amount: i64 }. - Return type — becomes the message’s
Resulttype. Handlers must return a value (not()).
What gets generated
For the ChatRoom example above, the macro generates:
// Message structs
struct PostMessage { pub from: String, pub text: String }
struct GetHistory; // unit struct — no extra params
struct RoomName;
// Trait implementations
impl Handler<PostMessage> for ChatRoom { /* ... */ }
impl Handler<GetHistory> for ChatRoom { /* ... */ }
impl Handler<RoomName> for ChatRoom { /* ... */ }
// Remote dispatch table
impl RemoteDispatch for ChatRoom { /* ... */ }
// Extension trait for ergonomic sends
trait ChatRoomExt {
fn post_message(&self, from: String, text: String) -> impl Future<Output = Result<usize>>;
fn get_history(&self) -> impl Future<Output = Result<Vec<String>>>;
fn room_name(&self) -> impl Future<Output = Result<String>>;
}
impl ChatRoomExt for Endpoint<ChatRoom> { /* ... */ }
This means you can call endpoint.post_message("alice".into(), "hello".into()) directly on any Endpoint<ChatRoom>, without ever constructing a message struct yourself.
Async handlers
For handlers that need to perform async work (I/O, database queries, HTTP calls), use async fn:
#[handlers]
impl ChatRoom {
#[handler]
async fn fetch_and_store(
&mut self,
ctx: &ActorContext<Self>,
state: &mut ChatRoomState,
url: String,
) -> Result<usize, String> {
// You can .await here — the supervisor handles scheduling
let response = reqwest::get(&url).await
.map_err(|e| e.to_string())?;
let body = response.text().await
.map_err(|e| e.to_string())?;
state.messages.push(ChatEntry {
from: "system".into(),
text: body,
timestamp: now(),
});
Ok(state.messages.len())
}
}
Async handlers generate AsyncHandler<FetchAndStore> instead of Handler<FetchAndStore>. The supervisor processes async handlers cooperatively — while one handler is awaiting, no other messages are processed for that actor (preserving the single-writer invariant).
Explicit message types
The auto-generated messages from #[handlers] cover most cases, but sometimes you want to define a message type explicitly — for example, when multiple actors handle the same message:
use murmer::Message;
use serde::{Serialize, Deserialize};
#[derive(Debug, Clone, Serialize, Deserialize, Message)]
#[message(result = String)]
struct Ping {
payload: String,
}
To use an explicit message in a handler, name the parameter msg:
#[handlers]
impl ChatRoom {
#[handler]
fn ping(
&mut self,
_ctx: &ActorContext<Self>,
_state: &mut ChatRoomState,
msg: Ping, // "msg" signals: use this type directly
) -> String {
format!("pong: {}", msg.payload)
}
}
The msg parameter name tells the macro to use Ping as-is instead of generating a new message struct.
For messages that need to cross the network, add remote:
#[derive(Debug, Clone, Serialize, Deserialize, Message)]
#[message(result = String, remote = "my_app::Ping")]
struct Ping {
payload: String,
}
The remote value is a unique string identifier used for wire-format routing. See the Proc Macro Reference for the full details.
The System
The System is the entry point for everything — it runs the actor runtime, manages the receptionist, and optionally handles clustering.
// Local mode — in-memory, no networking
let system = System::local();
// Clustered mode — QUIC networking, SWIM membership
let system = System::clustered_auto(config).await?;
Both modes expose the same API. Your actor code doesn’t change.
Starting actors
// Returns Endpoint<ChatRoom>
let room = system.start(
"room/general", // label — unique within the cluster
ChatRoom, // actor instance
ChatRoomState { // initial state
room_name: "general".into(),
messages: vec![],
max_messages: 1000,
},
);
The label "room/general" is a path-like string that uniquely identifies this actor in the system. Labels are how actors find each other through the receptionist.
Looking up actors
// Type-safe lookup — returns Option<Endpoint<ChatRoom>>
let room = system.lookup::<ChatRoom>("room/general");
if let Some(ep) = room {
let history = ep.get_history().await?;
}
Lookups are type-checked at compile time. If the label exists but the type doesn’t match, None is returned.
Endpoints in depth
Endpoint<A> is the central abstraction. It’s:
- Typed —
Endpoint<ChatRoom>can only send messages thatChatRoomhandles - Cloneable — lightweight handle, share freely across tasks
- Location-transparent — local endpoints dispatch through in-memory channels; remote endpoints serialize over QUIC
// All of these work identically
let room: Endpoint<ChatRoom> = system.start("room/1", ChatRoom, state);
let room: Endpoint<ChatRoom> = system.lookup::<ChatRoom>("room/1").unwrap();
// Clone and pass to another task
let room2 = room.clone();
tokio::spawn(async move {
room2.post_message("bot".into(), "background task".into()).await.unwrap();
});
// Send via extension methods (ergonomic)
room.post_message("alice".into(), "hello".into()).await?;
// Or send a message struct directly
room.send(PostMessage { from: "alice".into(), text: "hello".into() }).await?;
Both .post_message(...) (extension method) and .send(PostMessage { ... }) (direct) do the same thing. The extension methods are more ergonomic for the common case.
Actor watches
Monitor other actors and get notified when they terminate — inspired by Erlang’s monitor/2:
struct Watchdog;
struct WatchdogState { terminated: Vec<String> }
impl Actor for Watchdog {
type State = WatchdogState;
fn on_actor_terminated(
&mut self,
state: &mut WatchdogState,
terminated: &ActorTerminated,
) {
tracing::warn!(
"Actor {} terminated: {:?}",
terminated.label,
terminated.reason
);
state.terminated.push(terminated.label.clone());
}
}
#[handlers]
impl Watchdog {
#[handler]
fn watch(
&mut self,
ctx: &ActorContext<Self>,
_state: &mut WatchdogState,
label: String,
) -> bool {
ctx.watch(&label);
true
}
}
The on_actor_terminated callback on the Actor trait fires when any watched actor stops, crashes, or is killed. The ActorTerminated struct tells you which actor and why.
Going from local to clustered
The entire point of murmer’s design is that this transition requires zero changes to your actor code. Only the system construction changes:
// Before: local
let system = System::local();
// After: clustered
let config = ClusterConfig::builder()
.name("my-node")
.listen("0.0.0.0:7100".parse()?)
.cookie("my-cluster-secret")
.seed_nodes(["192.168.1.1:7100".parse()?])
.build()?;
let system = System::clustered_auto(config).await?;
Everything else — system.start(...), system.lookup(...), endpoint.send(...) — stays identical. Actors on remote nodes appear in your local receptionist automatically via registry replication.
See the Clustering chapter for the full walkthrough, including Docker deployment and the interactive cluster_chat example.
Build and test
cargo build
cargo nextest run
cargo clippy -- -D warnings
Next steps
Now that you understand the components, dive into the specific areas:
- Actors and Messages — the complete actor model: state, handlers, endpoints, location transparency
- Discovery — labels, reception keys, listings, and routing
- Supervision — restart policies, backoff, actor factories
- Clustering — QUIC networking, SWIM membership, multi-node deployment
- Proc Macro Reference — everything
#[handlers]and#[derive(Message)]generate - Application Orchestration — placement, leader election, crash recovery
Actors and Messages
This chapter covers the core building blocks: actors, state, messages, handlers, and endpoints.
Actors
An actor in murmer is a small server defined as a Rust struct with a set of handler methods, persisting state, and message types. You define an actor by implementing the Actor trait, then implementing Handler methods for each message type it can handle.
use murmer::prelude::*;
#[derive(Debug)]
struct ChatRoom;
struct ChatRoomState {
room_name: String,
messages: Vec<ChatEntry>,
}
impl Actor for ChatRoom {
type State = ChatRoomState;
}
Key points:
- The actor struct itself is typically a zero-sized type (no fields). All mutable state lives in the associated
Statetype. - State is passed as
&mut Stateto every handler, keeping the actor struct itself lightweight and the state explicitly threaded. - Each actor runs inside a supervisor that manages its lifecycle, mailbox, and restart behavior.
Messages
A message is a type that can be sent to an actor. Every message defines a result type and can optionally be serializable for remote delivery.
There are two ways to define messages:
Auto-generated messages (recommended)
When using #[handlers] with #[handler], the macro generates message structs automatically from your method signatures:
#[handlers]
impl ChatRoom {
#[handler]
fn post_message(
&mut self,
_ctx: &ActorContext<Self>,
state: &mut ChatRoomState,
from: String,
text: String,
) -> usize {
state.messages.push(ChatEntry { from, text });
state.messages.len()
}
#[handler]
fn get_history(
&mut self,
_ctx: &ActorContext<Self>,
state: &mut ChatRoomState,
) -> Vec<String> {
state.messages.iter()
.map(|e| format!("{}: {}", e.from, e.text))
.collect()
}
}
This generates PostMessage { pub from: String, pub text: String } and GetHistory unit struct, plus all the trait implementations and an extension trait ChatRoomExt on Endpoint<ChatRoom>.
Explicit messages
For messages shared across multiple actors, define them manually with #[derive(Message)]:
use murmer::Message;
use serde::{Serialize, Deserialize};
#[derive(Debug, Clone, Serialize, Deserialize, Message)]
#[message(result = Vec<String>, remote = "orchestrator::ListDir")]
struct ListDir {
path: String,
}
Then reference it in a handler with the msg parameter name:
#[handlers]
impl StorageAgent {
#[handler]
fn list_dir(
&mut self,
_ctx: &ActorContext<Self>,
state: &mut StorageState,
msg: ListDir,
) -> Vec<String> {
state.dirs.get(&msg.path).cloned().unwrap_or_default()
}
}
The remote = "..." attribute is optional — omit it for local-only messages that don’t need wire serialization.
Async handlers
Handlers that need to await use async fn:
#[handlers]
impl MyActor {
#[handler]
async fn fetch_data(
&mut self,
ctx: &ActorContext<Self>,
state: &mut MyState,
url: String,
) -> Vec<u8> {
some_async_call(&url).await
}
}
This generates an AsyncHandler<FetchData> implementation instead of Handler<FetchData>.
Endpoints
Endpoints are opaque handles to actors used to send messages and receive responses. The Endpoint<A> type abstracts where the actor lives — whether local or remote — and handles serialization in the background.
// Start returns a typed endpoint
let counter = system.start("counter/main", Counter, CounterState { count: 0 });
// Send via auto-generated extension trait
let result = counter.increment(5).await.unwrap();
// Or send a message struct directly
let result = counter.send(Increment { amount: 5 }).await.unwrap();
Location transparency
The key design principle: Endpoint<A> hides whether the actor is local or remote.
- Local actors use the envelope pattern — zero serialization cost, direct in-memory dispatch through a type-erased trait object.
- Remote actors serialize messages with bincode, send them over QUIC streams, and deserialize responses on return.
The caller’s code is identical in both cases:
let result = endpoint.send(Increment { amount: 5 }).await?;
How endpoints work under the hood
Under the hood, an endpoint wraps a tokio channel that sends messages to the actor’s supervisor:
- Local endpoint: The channel connects directly to the actor’s supervisor (mpsc). Messages are dispatched as type-erased envelopes with zero serialization cost.
- Remote endpoint: The channel connects to a proxy supervisor that serializes the message in bincode format, sends it through a QUIC stream, awaits the response, deserializes it, and returns it to the caller.
Endpoints are lightweight and Clone — share them freely across tasks.
Actor watches
Erlang-style actor monitoring — get notified when a watched actor terminates:
impl Actor for Monitor {
type State = MonitorState;
fn on_actor_terminated(
&mut self,
state: &mut MonitorState,
terminated: &ActorTerminated,
) {
match &terminated.reason {
TerminationReason::Panicked(msg) => {
tracing::error!("{} panicked: {}", terminated.label, msg);
}
_ => {}
}
}
}
#[handlers]
impl Monitor {
#[handler]
fn watch_actor(
&mut self,
ctx: &ActorContext<Self>,
_state: &mut MonitorState,
label: String,
) {
ctx.watch(&label);
}
}
The ActorContext provides the watch() method, and termination notifications arrive via the on_actor_terminated callback on the Actor trait.
Discovery
Murmer provides a unified actor discovery system through the Receptionist — a type-erased registry that handles registration, lookup, and subscription for both local and remote actors.
Labels
Actors are identified by path-like labels: "cache/user", "worker/0", "thumbnail/processor/3".
let ep = system.start("service/auth", AuthActor, AuthState::new());
let ep = system.lookup::<AuthActor>("service/auth").unwrap();
Labels serve as the primary routing key in the system:
- No two actors can have the same label in the clustered system.
- The path structure (
"group/subgroup/instance") enables hierarchical organization of actors. - Wildcards can be used in subscriptions on the receptionist (e.g., subscribe to all actors under
"worker/*"). - Labels are only used for actor discovery and routing — they are a feature of the receptionist, not intrinsic to actors themselves.
The Receptionist
The Receptionist is a special internal actor started with every actor system. It manages the lookup and registration of both local and remote actors.
When an actor is started, it registers itself with the receptionist. The receptionist maintains a mapping of labels and actor types to their corresponding endpoints. When you want to communicate with an actor, you query the receptionist with the actor type and label to receive an endpoint.
Typed lookup
Lookups are type-safe — you specify the actor type and get back a typed endpoint:
// Returns Option<Endpoint<AuthActor>>
let endpoint = system.lookup::<AuthActor>("service/auth");
If the label doesn’t exist or the type doesn’t match, None is returned. For remote actors, the receptionist returns an endpoint that transparently handles connection management when accessed.
State observation
When an actor is registered, the receptionist becomes an observer of its state. If the actor stops or crashes, the receptionist is notified and removes the actor from its registry. This ensures the receptionist always has an up-to-date view of the actors in the system.
If an actor restarts (via supervision), the receptionist is notified but the actor remains available — the state is non-negative (alive or restarting, not dead).
Reception keys and listings
Group actors by type and subscribe to changes:
let worker_key = ReceptionKey::<Worker>::new("workers");
// Check actors into the group
receptionist.check_in("worker/0", worker_key.clone());
receptionist.check_in("worker/1", worker_key.clone());
// Subscribe — get existing actors immediately + live updates
let mut listing = receptionist.listing(worker_key);
while let Some(endpoint) = listing.next().await {
endpoint.send(Work { task: "process".into() }).await?;
}
A Listing<A> is an async stream that yields endpoints as actors register and deregister against a ReceptionKey. It provides both backfill (existing actors) and live updates (new registrations), making it ideal for dynamic pool management.
Lifecycle events
Subscribe to all actor registrations and deregistrations across the system:
let mut events = receptionist.subscribe_events();
while let Some(event) = events.recv().await {
match event {
ActorEvent::Registered { label, actor_type } => { /* ... */ }
ActorEvent::Deregistered { label, actor_type } => { /* ... */ }
}
}
Other actors can subscribe to the receptionist to receive notifications about actors being added, removed, or updated. Subscriptions can be broad (all actor updates, useful for clustering) or specific (a particular actor type or label).
Routing
Distribute messages across actor pools:
let router = Router::new(
vec![ep1, ep2, ep3],
RoutingStrategy::RoundRobin,
);
// Each send goes to the next endpoint in sequence
router.send(Increment { amount: 1 }).await?;
// Or broadcast to all
let results = router.broadcast(GetCount).await;
The Router<A> takes a set of endpoints and a RoutingStrategy to distribute messages. Current strategies include round-robin and broadcast.
How discovery works across nodes
When running in clustered mode, the receptionist automatically synchronizes actor registrations across nodes:
- A local actor registers with its node’s receptionist.
- The node broadcasts an
ActorAddnotification to all connected peers. - Remote nodes register the actor in their local receptionists with a lazy endpoint factory.
- When a client looks up the remote actor, the endpoint factory creates a proxy that handles network transport.
- If the actor’s node fails, all remote registrations are cleaned up automatically.
This means system.lookup::<MyActor>("some/label") works identically whether the actor is local or on a remote node — the receptionist handles the difference transparently.
Supervision
Every actor in murmer runs inside a supervisor. The supervisor manages the actor’s lifecycle — starting it, processing its mailbox, and restarting it when things go wrong. This model is directly inspired by Erlang/OTP’s supervision trees, adapted for Rust’s ownership and type system.
How supervisors work
Each actor gets its own supervisor. The supervisor is responsible for:
- Starting the actor and registering it with the receptionist.
- Mailbox processing — ingesting messages and passing them to the actor’s handlers in order of arrival.
- Crash detection — catching panics and deciding what to do next based on the restart policy.
- Restarting the actor using a factory if the policy allows it.
- State notifications — informing the receptionist of state changes (started, stopped, dead, etc.).
- Context — providing the actor with access to the system, receptionist, and other actors via
ActorContext.
Supervisors are flat — there is no parent-child hierarchy between actors. Each actor is independent and can be stopped or restarted without affecting others.
Actor lifecycle
The supervisor manages an actor through a well-defined set of states:
Restart policies
Actors can be started with restart policies that control behavior on failure:
| Policy | Restart on panic? | Restart on clean stop? |
|---|---|---|
Temporary | No | No |
Transient | Yes | No |
Permanent | Yes | Yes |
- Temporary — the actor runs once. If it panics or stops, it’s gone. This is the default.
- Transient — the actor restarts if it panics, but a clean shutdown is respected. Use this for actors that should survive crashes but can be intentionally stopped.
- Permanent — the actor always restarts, whether it panicked or stopped cleanly. Use this for critical services that must always be running.
Configuration
To use restart policies, you provide an ActorFactory (which knows how to create fresh instances) and a RestartConfig:
use murmer::{RestartPolicy, RestartConfig, BackoffConfig, ActorFactory};
use std::time::Duration;
struct MyFactory;
impl ActorFactory for MyFactory {
type Actor = Counter;
fn create(&mut self) -> (Counter, CounterState) {
(Counter, CounterState { count: 0 })
}
}
let endpoint = receptionist.start_with_config(
"counter/resilient",
MyFactory,
RestartConfig {
policy: RestartPolicy::Permanent, // Always restart
max_restarts: 5, // Max 5 restarts...
window: Duration::from_secs(60), // ...within 60 seconds
backoff: BackoffConfig {
initial: Duration::from_millis(100),
max: Duration::from_secs(30),
multiplier: 2.0,
},
},
);
Restart limits
The max_restarts and window fields prevent infinite restart loops. If the actor exceeds the restart limit within the time window, the supervisor gives up and the actor is permanently stopped. This prevents a persistent bug from consuming all your resources.
Exponential backoff
The BackoffConfig controls the delay between restarts:
initial— delay before the first restart attempt.max— maximum delay (the backoff caps here).multiplier— each subsequent restart delay is multiplied by this factor.
For example, with initial: 100ms, max: 30s, multiplier: 2.0, restarts happen at 100ms, 200ms, 400ms, 800ms, … up to 30s.
Actor factories
The ActorFactory trait gives the supervisor a way to create fresh actor instances for restarts:
trait ActorFactory {
type Actor: Actor;
fn create(&mut self) -> (Self::Actor, <Self::Actor as Actor>::State);
}
The factory is called each time the supervisor needs a new instance. It can carry its own state if needed — for example, incrementing a generation counter or loading configuration from disk.
Interaction with the receptionist
When a supervised actor restarts:
- The old actor instance is dropped.
- The supervisor creates a new instance via the factory.
- The new instance is registered with the receptionist under the same label.
- Any actors watching the old instance receive a termination notification, but the label remains routable.
This means endpoints held by other actors remain valid through restarts — messages sent during the brief restart window are queued in the supervisor’s mailbox and delivered to the new instance.
Clustering
One of murmer’s core design goals is that your actor code doesn’t change when you go from a single process to a multi-node cluster. The same Endpoint<A> API works in both cases.
Step 1: Run everything locally
Create a System::local() — no networking, no config. Your actors communicate through in-memory channels with zero serialization cost:
use murmer::prelude::*;
let system = System::local();
let room = system.start("room/general", ChatRoom, ChatRoomState {
room_name: "general".into(),
messages: vec![],
});
// Send messages via extension trait — works instantly
room.post_message("alice".into(), "Hello!".into()).await?;
// Look up actors by label
let ep = system.lookup::<ChatRoom>("room/general").unwrap();
let history = ep.get_history().await?;
Step 2: Go distributed
When you’re ready for real networking, swap System::local() for System::clustered(). Your actor code stays identical — only the system construction changes:
use murmer::prelude::*;
use murmer::cluster::config::ClusterConfig;
let config = ClusterConfig::builder()
.name("alpha")
.listen("0.0.0.0:7100".parse()?)
.advertise("192.168.1.5:7100".parse()?)
.cookie("my-cluster-secret")
.seed_nodes(["192.168.1.1:7100".parse()?])
.build()?;
// clustered_auto() discovers all #[handlers]-annotated actor types automatically
let system = System::clustered_auto(config).await?;
// Same API as local — start, lookup, send
let room = system.start("room/alpha", ChatRoom, state);
room.post_message("alice".into(), "Hello!".into()).await?;
// Actors on other nodes appear automatically via registry replication
let remote_room = system.lookup::<ChatRoom>("room/beta").unwrap();
remote_room.get_history().await?; // transparently serialized over QUIC
Each node gets a single QUIC connection to every peer, multiplexed over per-actor streams. The OpLog replication protocol uses version vectors for efficient, idempotent sync.
Step 3: Test it interactively
The cluster_chat example lets you try both modes with an interactive CLI:
# Local mode — all actors in one process
cargo run -p murmer-examples --bin cluster_chat -- --local
=== murmer cluster_chat (local mode) ===
Started room: #general
Started room: #random
> post general alice Hello everyone!
[1 messages in #general]
> post general bob Hey alice!
[2 messages in #general]
> history general
--- #general ---
alice: Hello everyone!
bob: Hey alice!
> rooms
Known rooms:
#general — 2 messages
#random — 0 messages
Same binary, same commands — just add cluster config:
# Terminal 1 — seed node
cargo run -p murmer-examples --bin cluster_chat -- --node alpha --port 7100
# Terminal 2 — joins via seed
cargo run -p murmer-examples --bin cluster_chat -- --node beta --port 7200 --seed 127.0.0.1:7100
Step 4: Deploy with Docker
The docker-compose.yml in the repo runs a 3-node cluster:
docker compose up --build
This starts three containers — alpha, beta, and gamma — each running the cluster_chat example:
services:
alpha:
build: .
command: ["--node", "alpha", "--port", "7100"]
beta:
build: .
command: ["--node", "beta", "--port", "7100", "--seed", "alpha:7100"]
gamma:
build: .
command: ["--node", "gamma", "--port", "7100", "--seed", "alpha:7100"]
Beta and gamma seed from alpha and automatically mesh together.
How clustering works
Auto-discovery
When an actor system starts in clustered mode, it runs a server that listens for incoming connections. New nodes connect to existing ones via seed nodes and begin exchanging information about their actors. Nodes can be configured to gossip this information, allowing the network to mesh together organically.
Networking layer
The networking layer is built on QUIC (via the quinn crate) and SWIM (via the foca crate):
- QUIC provides a reliable, low-latency transport with built-in TLS encryption. Each node pair shares a single QUIC connection, multiplexed over per-actor streams.
- SWIM handles cluster membership — failure detection, protocol-level heartbeats, and member state dissemination.
- mDNS provides optional zero-configuration discovery for LAN environments.
Stream architecture
When a remote actor’s endpoint is accessed:
- A dedicated QUIC stream is opened to the remote node for that actor.
- The stream stays open as long as it’s active (not idle).
- On the receiving end, a stream handler deserializes incoming messages, looks up the target actor via the receptionist, and forwards them.
- Each stream binds to a single actor — messages for other actors result in an error and stream closure.
- The stream subscribes to the actor’s lifecycle via the receptionist. If the actor enters a negative state (stopped, dead), the stream closes with an error.
An actor on a node might have multiple inbound streams, but the mailbox system ensures messages are processed in order of arrival.
Registry replication
Actor registrations are replicated across the cluster using an OpLog with version vectors:
- When a local actor registers, its node broadcasts an
ActorAddoperation to all peers. - Remote nodes create lazy endpoint factories in their local receptionists.
- Version vectors ensure operations are idempotent and ordering is preserved.
- When a node leaves, its registrations are pruned from all other nodes.
This gives every node an eventually consistent view of the entire cluster’s actor topology.
Edge Clients
Not everything that talks to a murmer cluster needs to be a cluster member. A REST API gateway, a CLI tool, a monitoring dashboard, or an integration test runner just needs to call actors — it doesn’t need to run any, participate in SWIM gossip, or store a registry.
Edge clients fill that role. A MurmerClient connects to any cluster node via QUIC, pulls the set of public actors, and exposes the same Endpoint<A> API you already know — without any of the cluster machinery behind it.
use murmer::MurmerClient;
use std::time::Duration;
let client = MurmerClient::connect("10.0.0.5:9000".parse()?, "cluster-cookie").await?;
let ep = client.lookup::<UserService>("api/users").unwrap();
let user = ep.send(GetUser { id: 42 }).await?;
client.disconnect().await;
Visibility: controlling what Edge clients see
Every actor has a visibility that controls who can discover it. You set this at startup time on the server side:
// Public — visible to Edge clients and all cluster members
let api = system.start_public("api/users", UserService, state);
// Internal — visible to cluster members only (default)
let router = system.start("routing/shard-0", ShardRouter, state);
// Private — node-local only, never replicated
let metrics = system.start_private("node/metrics", MetricsCollector, state);
| Visibility | Edge clients | Cluster members | Replicated via OpLog |
|---|---|---|---|
Public | ✓ | ✓ | ✓ |
Internal (default) | ✗ | ✓ | ✓ |
Private | ✗ | ✗ | ✗ |
Private is a zero-overhead choice: the actor is never written to the OpLog, never serialized, and never sent over the wire. Use it for utility actors that are purely node-local implementation details — connection managers, per-node caches, local metrics collectors.
Connecting
Edge clients connect to any node in the cluster. The node you connect to acts as your sync source — it responds to pull requests with the current set of public actors.
// Short-lived: connect, call, disconnect
let client = MurmerClient::connect(
"10.0.0.5:9000".parse()?,
"cluster-cookie",
).await?;
The cluster cookie must match the server’s cookie or the handshake will be rejected.
Requirements
The server must be started with System::clustered() — Edge clients connect via QUIC and need a listening endpoint. System::local() has no network layer and cannot accept Edge client connections.
Looking up actors
lookup — instant, returns None if not synced yet
if let Some(ep) = client.lookup::<UserService>("api/users") {
let user = ep.send(GetUser { id: 42 }).await?;
}
Returns None if the actor hasn’t been synced to the client yet. Use this after an initial sync has had time to complete.
lookup_wait — blocks until the actor appears
let ep = client
.lookup_wait::<UserService>("api/users", Duration::from_secs(5))
.await?;
Triggers an immediate pull, then waits for the actor to appear — either from that pull’s response or a subsequent one. Re-polls the server every 500 ms. Returns ClusterError::Timeout if the actor doesn’t appear within the deadline.
Fast path: if the actor is already synced, lookup_wait returns after one pull round-trip (typically sub-millisecond on LAN).
Usage patterns
Pattern 1: Short-lived client (pull once)
Ideal for CLI tools, integration tests, and one-off queries. Pulls on connect, uses the snapshot, disconnects.
let client = MurmerClient::connect(addr, cookie).await?;
// Give the initial pull a moment to arrive
tokio::time::sleep(Duration::from_millis(50)).await;
let ep = client.lookup::<UserService>("api/users").unwrap();
let result = ep.send(GetUser { id: 1 }).await?;
client.disconnect().await;
Pattern 2: Long-lived gateway (periodic pull)
Ideal for REST/gRPC API servers, dashboards, and proxies. Use sync_interval to re-pull periodically and pick up new actor registrations as the cluster evolves.
use murmer::ClientOptions;
let client = MurmerClient::connect_with_options(
addr,
cookie.into(),
ClientOptions {
sync_interval: Some(Duration::from_secs(30)),
..Default::default()
},
).await?;
// client.lookup() stays fresh — re-pulled every 30 seconds
Pattern 3: Wait for a specific actor
Useful when you connect before the target actor is registered — for example, a gateway that starts before the cluster has finished placing its actors.
let ep = client
.lookup_wait::<PaymentService>("payments/processor", Duration::from_secs(10))
.await?;
How sync works
Edge clients use pull-based sync — the server never pushes unsolicited updates. The client sends a RegistrySyncRequest with its current version vector; the server responds with only the delta (new public actor registrations since that version).
Edge client Cluster node
│ │
│── RegistrySyncRequest(vv) ────▶│
│◀── RegistrySync(delta ops) ────│
│ │
│ ... time passes ... │
│ │
│── RegistrySyncRequest(vv') ───▶│ (periodic or lookup_wait re-poll)
│◀── RegistrySync(new ops) ──────│
After the first sync, subsequent pulls return only the delta — O(new ops), not O(all ops). 1000 idle Edge clients add near-zero server overhead: no SWIM membership, no server-initiated fan-out, no per-client state.
Scalability characteristics
| Property | Behavior |
|---|---|
| SWIM membership | Edge clients are not added to SWIM — no failure detection overhead |
| Server-initiated sync | Skipped for Edge clients — they pull on their own schedule |
| Disconnect | Silent — no cluster alarm, no actor pruning, no SWIM event |
| Server state per client | None — the server is stateless with respect to each Edge client |
| Wire overhead (idle) | Zero — the server never initiates contact |
Full example
The edge_client example demonstrates all three patterns:
cargo test -p murmer-examples --bin edge_client
It covers:
- Public actors visible to Edge clients, internal actors hidden
lookup_waitresolving when an actor registers after connect- Long-lived client with periodic
sync_interval
Monitoring
Murmer provides production-grade metrics for your actor systems via the monitor feature. It follows a facade pattern — your actor code records metrics through thin instrumentation calls, and you choose the backend (Prometheus, StatsD, etc.) at startup.
Enabling monitoring
Add the monitor feature to your Cargo.toml:
[dependencies]
murmer = { version = "0.1", features = ["monitor"] }
When the feature is off, all instrumentation compiles to nothing — zero overhead, zero dependencies.
Quick start with Prometheus
One call installs the metrics recorder and starts an HTTP endpoint for scraping:
use murmer::monitor::start_prometheus;
#[tokio::main]
async fn main() {
// Start serving metrics on :9000/metrics
start_prometheus(9000).expect("failed to start prometheus exporter");
// Now start your actor system as usual
let system = System::local();
let counter = system.start("counter/main", Counter, CounterState { count: 0 });
// Every message send, handler invocation, and lifecycle event
// is automatically recorded. Scrape with:
// curl http://localhost:9000/metrics
}
What gets measured
Murmer instruments five categories of metrics automatically. You don’t need to add any code — everything is recorded as actors run.
Actor lifecycle
| Metric | Type | Labels | Description |
|---|---|---|---|
murmer_actors_active | gauge | actor_type | Currently running actors |
murmer_actors_started_total | counter | actor_type | Total actors started |
murmer_actors_stopped_total | counter | actor_type, reason | Total actors stopped (reason: stopped, panicked, restart_limit_exceeded) |
murmer_actors_restarts_total | counter | actor_type | Total actor restarts |
murmer_actors_restart_limit_exceeded_total | counter | actor_type | Times restart limits were hit |
Message processing
| Metric | Type | Labels | Description |
|---|---|---|---|
murmer_messages_processed_total | counter | actor_type | Messages successfully handled |
murmer_messages_failed_total | counter | actor_type | Messages that caused a panic |
murmer_message_processing_duration_seconds | histogram | actor_type | Handler execution time |
Endpoint sends
| Metric | Type | Labels | Description |
|---|---|---|---|
murmer_sends_total | counter | actor_type, locality | Total sends (local or remote) |
murmer_send_errors_total | counter | actor_type, error_kind | Send failures by error type |
murmer_network_roundtrip_duration_seconds | histogram | actor_type | End-to-end remote call latency |
Networking
| Metric | Type | Description |
|---|---|---|
murmer_network_connections_active | gauge | Active QUIC connections to peer nodes |
murmer_network_streams_active | gauge | Active QUIC streams for actor messaging |
murmer_network_bytes_sent_total | counter | Total bytes sent over actor streams |
murmer_network_bytes_received_total | counter | Total bytes received over actor streams |
murmer_network_inflight_calls | gauge | In-flight remote calls awaiting responses |
murmer_network_dead_letters_total | counter | Failed in-flight calls (connection lost) |
Cluster membership
| Metric | Type | Labels | Description |
|---|---|---|---|
murmer_cluster_nodes | gauge | status | Number of nodes in the cluster |
murmer_cluster_membership_changes_total | counter | event_type | Membership events (joined, failed, left) |
Receptionist
| Metric | Type | Description |
|---|---|---|
murmer_receptionist_lookups_total | counter | Actor lookups |
murmer_receptionist_registrations_total | counter | Actor registrations |
murmer_receptionist_deregistrations_total | counter | Actor deregistrations |
Label cardinality
All actor metrics use actor_type (the Rust type name, e.g. my_app::ChatRoom) rather than actor_label (e.g. "room/general"). This keeps cardinality bounded — you typically have fewer than 20 actor types, but could have thousands of labels.
Architecture: how it works
The instrumentation uses a facade pattern inspired by how tracing works:
-
instrument.rs(always compiled) — Contains thinpub(crate)functions likeinstrument::message_processed(actor_type). Whenmonitoris on, these callmetrics::counter!(...). When off, they’re empty#[inline(always)]functions that the compiler eliminates entirely. -
Call sites (supervisor, receptionist, endpoint, etc.) — Call instrument functions unconditionally. No
#[cfg]attributes scattered across the codebase. -
Your application — Installs a metrics recorder at startup (e.g.,
start_prometheus(9000)). All recorded metrics flow to the backend you chose.
This means adding a new metric requires touching exactly two places: the instrument function and the call site. The facade keeps the #[cfg] logic in one file.
ClusterMonitor actor
In addition to Prometheus metrics, murmer provides a ClusterMonitor actor that maintains a queryable in-memory view of cluster health:
use murmer::monitor::{ClusterMonitor, ClusterMonitorState, run_monitor_bridge, GetClusterHealth};
// Start the monitor actor
let monitor = system.start("murmer/monitor", ClusterMonitor, ClusterMonitorState::new());
// Bridge cluster events into the monitor
tokio::spawn(run_monitor_bridge(&cluster_system, monitor.clone()));
// Query health at any time
let health = monitor.send(GetClusterHealth).await?;
println!("Alive: {}, Joins: {}, Failures: {}",
health.alive_nodes, health.total_joins, health.total_failures);
The ClusterMonitor tracks:
- Which nodes are alive and when they joined
- Cumulative counters for joins, failures, and departures
- Per-node uptime
Grafana dashboard
A typical Prometheus + Grafana setup might query:
# Message throughput by actor type
rate(murmer_messages_processed_total[5m])
# 99th percentile handler latency
histogram_quantile(0.99, rate(murmer_message_processing_duration_seconds_bucket[5m]))
# Actor crash rate
rate(murmer_messages_failed_total[5m])
# Remote call latency
histogram_quantile(0.95, rate(murmer_network_roundtrip_duration_seconds_bucket[5m]))
# Cluster size over time
murmer_cluster_nodes{status="active"}
Proc Macro Reference
Murmer provides two proc macros to reduce boilerplate when defining actors: #[handlers] + #[handler] for handler generation, and #[derive(Message)] for explicit message types.
#[handlers] + #[handler]
Place #[handlers] on an impl block containing actor message handlers. Mark each handler method with #[handler].
Auto-generated messages (recommended)
#[handlers]
impl MyActor {
#[handler]
fn do_thing(
&mut self,
ctx: &ActorContext<Self>,
state: &mut MyState,
name: String,
count: u32,
) -> String {
format!("{name}: {count}")
}
#[handler]
fn get_status(
&mut self,
_ctx: &ActorContext<Self>,
state: &mut MyState,
) -> bool {
state.is_active
}
#[handler]
async fn fetch_data(
&mut self,
ctx: &ActorContext<Self>,
state: &mut MyState,
url: String,
) -> Vec<u8> {
some_async_call(&url).await
}
}
What gets generated
From the above, the macro produces:
Message structs — method name converted to PascalCase, parameters after ctx and state become fields:
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct DoThing { pub name: String, pub count: u32 }
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct GetStatus; // no extra params → unit struct
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct FetchData { pub url: String }
Message + RemoteMessage impls — each struct implements Message (with the handler’s return type as Result) and RemoteMessage (with a TYPE_ID of "ActorName::method_name"):
impl Message for DoThing { type Result = String; }
impl RemoteMessage for DoThing { const TYPE_ID: &'static str = "MyActor::do_thing"; }
Handler / AsyncHandler impls — dispatches message fields as method arguments:
impl Handler<DoThing> for MyActor {
fn handle(&mut self, ctx: &ActorContext<Self>, state: &mut MyState, message: DoThing) -> String {
self.do_thing(ctx, state, message.name, message.count)
}
}
// async fn → AsyncHandler
impl AsyncHandler<FetchData> for MyActor { /* ... */ }
RemoteDispatch — a wire-format dispatch table that routes serialized messages by their TYPE_ID. This enables cross-node delivery without the sender knowing the concrete handler:
impl RemoteDispatch for MyActor {
async fn dispatch_remote(&mut self, ctx, state, message_type: &str, payload: &[u8])
-> Result<Vec<u8>, DispatchError>
{
match message_type {
"MyActor::do_thing" => { /* deserialize DoThing, call handler, serialize result */ }
"MyActor::get_status" => { /* ... */ }
"MyActor::fetch_data" => { /* ... */ }
_ => Err(DispatchError::UnknownMessageType(..))
}
}
}
Extension trait — ergonomic methods directly on Endpoint<MyActor>:
pub trait MyActorExt {
fn do_thing(&self, name: String, count: u32) -> impl Future<Output = Result<String, SendError>>;
fn get_status(&self) -> impl Future<Output = Result<bool, SendError>>;
fn fetch_data(&self, url: String) -> impl Future<Output = Result<Vec<u8>, SendError>>;
}
impl MyActorExt for Endpoint<MyActor> { /* ... */ }
This lets you call handlers directly:
let result = endpoint.do_thing("hello".into(), 42).await?;
let status = endpoint.get_status().await?;
let data = endpoint.fetch_data("https://...".into()).await?;
Auto-registration — a linkme distributed slice entry for the TypeRegistry. At cluster startup, TypeRegistry::from_auto() collects all #[handlers]-annotated actor types automatically, enabling the cluster to route messages to the correct deserializer without manual registration.
Handler method signature
Each #[handler] method must follow this pattern:
fn method_name(&mut self, ctx: &ActorContext<Self>, state: &mut State, ...params) -> ReturnType
&mut self— the actor instance.ctx: &ActorContext<Self>— provides access to the system, receptionist, and lifecycle operations likewatch().state: &mut State— the actor’s mutable state.- Remaining parameters become message struct fields.
- Use
async fnfor handlers that need to.await.
Explicit messages (backward compatible)
For messages shared across multiple actors, name the last parameter msg (or _msg) and the macro will use the type directly instead of generating a struct:
#[handlers]
impl MyActor {
#[handler]
fn increment(
&mut self,
ctx: &ActorContext<Self>,
state: &mut MyState,
msg: Increment,
) -> i64 {
state.count += msg.amount;
state.count
}
}
Here Increment must already exist and implement Message. The extension trait method will take the message as a parameter: endpoint.increment(msg).
#[derive(Message)]
Derives Message (and optionally RemoteMessage) for a struct or enum.
Basic usage (local only)
#[derive(Debug, Clone, Serialize, Deserialize, Message)]
#[message(result = i64)]
struct Increment { amount: i64 }
This implements Message with type Result = i64.
With remote support
Add remote = "..." to also implement RemoteMessage with a wire-stable type ID:
#[derive(Debug, Clone, Serialize, Deserialize, Message)]
#[message(result = i64, remote = "counter::Increment")]
struct Increment { amount: i64 }
The TYPE_ID string is used for wire dispatch — it must be unique across all message types in the cluster and stable across code changes (don’t use std::any::type_name which can change between compiler versions).
Attributes
| Attribute | Required | Description |
|---|---|---|
result = Type | Yes | The response type for this message |
remote = "id" | No | Wire-stable type ID for RemoteMessage |
Application Orchestration
Murmer’s core gives you actors, messages, and clustering primitives. The app module (enabled via the app feature flag) builds on top of these to provide the application-level abstractions you need for real, working distributed applications: placement strategies, leader election, crash recovery, and a Coordinator actor that ties them all together.
Think of it this way: murmer gives you the building blocks, and the app module helps you assemble them into a running system that manages actor lifecycles across a cluster automatically.
Enable the feature in your Cargo.toml:
[dependencies]
murmer = { version = "0.1", features = ["app"] }
Overview
The orchestration layer answers three questions:
- Where should this actor run? — Placement strategies score nodes based on load, capabilities, metadata, and constraints.
- Who decides? — Leader election picks one node to run the Coordinator, which makes all placement decisions.
- What happens when a node fails? — Crash strategies define recovery behavior: redistribute immediately, wait for the node to return, or let the actor die.
Actor specifications
An ActorSpec describes an actor that the orchestrator should place and manage. It captures what to run, how to recover from crashes, and where to place it.
use murmer::app::spec::{ActorSpec, CrashStrategy, PlacementConstraints};
use murmer::cluster::config::NodeClass;
use std::time::Duration;
let spec = ActorSpec::new("storage/photos", "orchestrator::StorageAgent")
.with_state(serialized_state)
.with_crash_strategy(CrashStrategy::WaitForReturn(Duration::from_secs(30)))
.with_constraints(PlacementConstraints {
required_classes: vec![NodeClass::Worker],
required_metadata: [("volume".into(), "photos".into())].into(),
..Default::default()
});
Fields
| Field | Purpose |
|---|---|
label | Actor label (e.g., "storage/photos") — unique across the cluster |
actor_type_name | Key into the SpawnRegistry — identifies what type of actor to create |
initial_state | Serialized initial state (bincode bytes) sent to the target node |
crash_strategy | What to do when the hosting node fails |
placement | Constraints that filter which nodes are eligible |
Crash strategies
| Strategy | Behavior |
|---|---|
Redistribute | Move to another eligible node immediately (default) |
WaitForReturn(Duration) | Wait for the failed node to rejoin; fall back to Redistribute on timeout |
Abandon | Let the actor die with the node — no recovery |
Placement constraints
Constraints filter eligible nodes before the placement strategy scores them:
PlacementConstraints {
required_classes: vec![NodeClass::Worker], // must be a Worker node
required_metadata: [("gpu".into(), "true".into())].into(), // must have gpu=true
anti_affinity_labels: vec!["db/primary".into()], // don't co-locate with this actor
..Default::default()
}
required_classes— empty means any class is acceptable.required_metadata— the node must have all specified key-value pairs.anti_affinity_labels— avoid placing on nodes already running these actors.
Placement strategies
The PlacementStrategy trait defines a fitness function that scores nodes for hosting a given actor spec. The Coordinator evaluates all eligible nodes (after constraint filtering) and picks the highest scorer.
trait PlacementStrategy {
fn fitness(&self, node: &NodeInfo, spec: &ActorSpec, view: &ClusterView) -> f64;
}
- Return
0.0or negative to indicate “do not place here”. - Higher values mean stronger preference.
- The full
ClusterViewis available for global-aware decisions (e.g., load balancing).
Built-in strategies
| Strategy | Behavior |
|---|---|
LeastLoaded | Place on the node running the fewest actors |
RandomPlacement | Uniform random selection across eligible nodes |
Pinned(node_id) | Always prefer a specific node; fall back if unavailable |
Leader election
The LeaderElection trait is pluggable. The Coordinator only runs on the elected leader node.
trait LeaderElection {
fn elect(&self, view: &ClusterView) -> Option<String>;
}
The default implementation, OldestNode, picks the node with the lowest incarnation counter. This is deterministic — all nodes independently compute the same answer without a consensus round.
use murmer::app::election::OldestNode;
use murmer::cluster::config::NodeClass;
// Any alive node can be leader
let election = OldestNode::any();
// Only Edge nodes can be leader
let election = OldestNode::with_class(NodeClass::Edge);
The Coordinator
The Coordinator is itself a murmer actor (dogfooding the framework). It maintains a ClusterView, accepts SubmitSpec messages, and handles crash recovery when nodes fail.
Lifecycle
- The Coordinator starts on the elected leader node.
- It subscribes to cluster events to track node joins and failures.
- Users send
SubmitSpecmessages to declare what actors should run. - The Coordinator evaluates placement constraints and strategies, then sends
SpawnActorcontrol messages to target nodes. - When a node fails, the Coordinator re-places affected actors according to each spec’s
CrashStrategy.
The cluster event bridge
The bridge (murmer::app::bridge) connects the raw cluster machinery to the Coordinator. It subscribes to ClusterEvents and translates them into Coordinator messages (NotifyNodeJoined, NotifyNodeFailed, NotifyNodeLeft, NotifySpawnAck). This keeps the Coordinator decoupled from the transport layer.
The recommended setup uses bridge::start_coordinator():
use murmer::app::bridge;
use murmer::app::coordinator::CoordinatorState;
use murmer::app::placement::LeastLoaded;
use murmer::app::election::OldestNode;
let cluster = system.cluster_system().unwrap();
let state = CoordinatorState::new(
cluster.identity().node_id_string(),
Box::new(LeastLoaded),
Box::new(OldestNode::with_class(NodeClass::Edge)),
);
let coordinator = bridge::start_coordinator(cluster, state);
This wires up everything: the Coordinator actor, the event bridge loop, and the spawn drain loop (which sends placement decisions to the transport layer).
The cluster view
The ClusterView is the Coordinator’s world model — a snapshot of all nodes with their capabilities and running actors:
// Query the Coordinator's view
let view = coordinator.send(GetClusterView).await?;
println!("Alive nodes: {}", view.alive_count);
println!("Total nodes: {}", view.total_count);
// Query managed specs
let specs = coordinator.send(GetSpecs).await?;
for spec in &specs {
println!("{}: {:?} on {}", spec.label, spec.state, spec.node_id);
}
Each node in the view carries:
- Identity — name, host, port, incarnation counter
- Class —
Worker,Edge,Coordinator, etc. - Metadata — arbitrary key-value pairs (e.g.,
"volume" = "photos") - Running actors — labels of actors currently hosted
- Liveness — whether the node is reachable
Full example: Filesystem RPC
The orchestrator example demonstrates the full orchestration loop:
- Three nodes form a cluster: a gateway (Edge class) and two workers (store-a, store-b).
- Each worker advertises capabilities via metadata (
"volume" = "photos"or"volume" = "docs"). - The gateway runs a Coordinator that places
StorageAgentactors on workers matching their placement constraints. - Clients query storage agents for directory listings and file reads — transparently routed across the cluster.
- Node failure triggers crash strategy handling.
// Submit a spec with placement constraints
let result = coordinator.send(SubmitSpec {
spec: ActorSpec::new("storage/photos", "orchestrator::StorageAgent")
.with_state(photos_state_bytes)
.with_crash_strategy(CrashStrategy::WaitForReturn(Duration::from_secs(30)))
.with_constraints(PlacementConstraints {
required_classes: vec![NodeClass::Worker],
required_metadata: [("volume".into(), "photos".into())].into(),
..Default::default()
}),
}).await?;
// The Coordinator placed the actor — now query it from any node
let photos = system.lookup::<StorageAgent>("storage/photos").unwrap();
let entries = photos.send(ListDir { path: "/".into() }).await?;
Run the example:
cargo test -p murmer-examples --test orchestrator