Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Monitoring

Murmer provides production-grade metrics for your actor systems via the monitor feature. It follows a facade pattern — your actor code records metrics through thin instrumentation calls, and you choose the backend (Prometheus, StatsD, etc.) at startup.

Enabling monitoring

Add the monitor feature to your Cargo.toml:

[dependencies]
murmer = { version = "0.1", features = ["monitor"] }

When the feature is off, all instrumentation compiles to nothing — zero overhead, zero dependencies.

Quick start with Prometheus

One call installs the metrics recorder and starts an HTTP endpoint for scraping:

use murmer::monitor::start_prometheus;

#[tokio::main]
async fn main() {
    // Start serving metrics on :9000/metrics
    start_prometheus(9000).expect("failed to start prometheus exporter");

    // Now start your actor system as usual
    let system = System::local();
    let counter = system.start("counter/main", Counter, CounterState { count: 0 });

    // Every message send, handler invocation, and lifecycle event
    // is automatically recorded. Scrape with:
    //   curl http://localhost:9000/metrics
}

What gets measured

Murmer instruments five categories of metrics automatically. You don’t need to add any code — everything is recorded as actors run.

Actor lifecycle

MetricTypeLabelsDescription
murmer_actors_activegaugeactor_typeCurrently running actors
murmer_actors_started_totalcounteractor_typeTotal actors started
murmer_actors_stopped_totalcounteractor_type, reasonTotal actors stopped (reason: stopped, panicked, restart_limit_exceeded)
murmer_actors_restarts_totalcounteractor_typeTotal actor restarts
murmer_actors_restart_limit_exceeded_totalcounteractor_typeTimes restart limits were hit

Message processing

MetricTypeLabelsDescription
murmer_messages_processed_totalcounteractor_typeMessages successfully handled
murmer_messages_failed_totalcounteractor_typeMessages that caused a panic
murmer_message_processing_duration_secondshistogramactor_typeHandler execution time

Endpoint sends

MetricTypeLabelsDescription
murmer_sends_totalcounteractor_type, localityTotal sends (local or remote)
murmer_send_errors_totalcounteractor_type, error_kindSend failures by error type
murmer_network_roundtrip_duration_secondshistogramactor_typeEnd-to-end remote call latency

Networking

MetricTypeDescription
murmer_network_connections_activegaugeActive QUIC connections to peer nodes
murmer_network_streams_activegaugeActive QUIC streams for actor messaging
murmer_network_bytes_sent_totalcounterTotal bytes sent over actor streams
murmer_network_bytes_received_totalcounterTotal bytes received over actor streams
murmer_network_inflight_callsgaugeIn-flight remote calls awaiting responses
murmer_network_dead_letters_totalcounterFailed in-flight calls (connection lost)

Cluster membership

MetricTypeLabelsDescription
murmer_cluster_nodesgaugestatusNumber of nodes in the cluster
murmer_cluster_membership_changes_totalcounterevent_typeMembership events (joined, failed, left)

Receptionist

MetricTypeDescription
murmer_receptionist_lookups_totalcounterActor lookups
murmer_receptionist_registrations_totalcounterActor registrations
murmer_receptionist_deregistrations_totalcounterActor deregistrations

Label cardinality

All actor metrics use actor_type (the Rust type name, e.g. my_app::ChatRoom) rather than actor_label (e.g. "room/general"). This keeps cardinality bounded — you typically have fewer than 20 actor types, but could have thousands of labels.

Architecture: how it works

The instrumentation uses a facade pattern inspired by how tracing works:

  1. instrument.rs (always compiled) — Contains thin pub(crate) functions like instrument::message_processed(actor_type). When monitor is on, these call metrics::counter!(...). When off, they’re empty #[inline(always)] functions that the compiler eliminates entirely.

  2. Call sites (supervisor, receptionist, endpoint, etc.) — Call instrument functions unconditionally. No #[cfg] attributes scattered across the codebase.

  3. Your application — Installs a metrics recorder at startup (e.g., start_prometheus(9000)). All recorded metrics flow to the backend you chose.

This means adding a new metric requires touching exactly two places: the instrument function and the call site. The facade keeps the #[cfg] logic in one file.

ClusterMonitor actor

In addition to Prometheus metrics, murmer provides a ClusterMonitor actor that maintains a queryable in-memory view of cluster health:

use murmer::monitor::{ClusterMonitor, ClusterMonitorState, run_monitor_bridge, GetClusterHealth};

// Start the monitor actor
let monitor = system.start("murmer/monitor", ClusterMonitor, ClusterMonitorState::new());

// Bridge cluster events into the monitor
tokio::spawn(run_monitor_bridge(&cluster_system, monitor.clone()));

// Query health at any time
let health = monitor.send(GetClusterHealth).await?;
println!("Alive: {}, Joins: {}, Failures: {}",
    health.alive_nodes, health.total_joins, health.total_failures);

The ClusterMonitor tracks:

  • Which nodes are alive and when they joined
  • Cumulative counters for joins, failures, and departures
  • Per-node uptime

Grafana dashboard

A typical Prometheus + Grafana setup might query:

# Message throughput by actor type
rate(murmer_messages_processed_total[5m])

# 99th percentile handler latency
histogram_quantile(0.99, rate(murmer_message_processing_duration_seconds_bucket[5m]))

# Actor crash rate
rate(murmer_messages_failed_total[5m])

# Remote call latency
histogram_quantile(0.95, rate(murmer_network_roundtrip_duration_seconds_bucket[5m]))

# Cluster size over time
murmer_cluster_nodes{status="active"}