For the complete documentation index, see llms.txt. This page is also available as Markdown.

Overview

This section provides comprehensive performance information for Rumi, including benchmarking results, test methodologies, and performance analysis.

Performance Documentation

Rumi performance documentation is organized into two main sections:

1. Canonical Benchmark

The Canonical Benchmark section presents results from the official end-to-end performance benchmark used to measure Rumi's core capabilities.

What it measures: Complete Receive-Process-Send flow of a clustered microservice, including messaging, state management, persistence, cluster replication, and consensus protocol.

Key metrics:

  • Wire-to-Wire Latency: Time from inbound message arrival to outbound message transmission (50th, 99th, 99.9th percentiles)

  • Maximum Throughput: Messages processed per second under saturated load

Test program: ESProcessor from the Rumi Performance Benchmark Suite

Results available: Performance metrics for Rumi 4.0 releases

➡️ View Canonical Benchmark Results

2. Performance Benchmark Suite

The Performance Benchmark Suite section documents the collection of benchmarking tools that measure individual Rumi runtime components.

What it includes: 7 modules, each focusing on a specific component:

  • Time Module: Time API overhead

  • Serialization Module: Message encoding/decoding performance

  • Link Module: Cluster replication transport

  • Messaging Module: Pub/sub messaging layer

  • Persistence Module: Message and data persistence

  • Storage Module: Object store operations

  • AEP Module: End-to-end canonical benchmark (described above)

Purpose: Isolate and measure individual component performance, understand performance characteristics, validate configurations

Source code: github.com/neeveresearch/nvx-rumi-perf

➡️ Explore Benchmark Suite

Quick Start

View Latest Results

See the latest canonical benchmark results:

Run Your Own Tests

Download and run the benchmark suite:

  1. Download distribution from the Neeve artifact repository:

  2. Extract and run:

  3. See documentation for detailed configuration and parameters:

Understanding Performance

Latency Characteristics

Rumi is optimized for ultra-low latency:

  • Typical latency: 27-30µs median (wire-to-wire, including network)

  • Tail latency: 99.9th percentile within 1.5x of median

  • Configuration: Performance varies by CPU configuration and optimization mode

Throughput Characteristics

Rumi supports high-volume scenarios:

  • Typical throughput: 280K+ messages/second per microservice instance

  • Scaling: Linear scaling with message complexity and CPU resources

  • Configuration: Best throughput with minimal CPU configuration for lightweight handlers

Performance Factors

Key factors affecting Rumi performance:

  1. Message Access Method: Direct access (serializer/deserializer) vs Indirect (POJO) - ~10% latency difference, 2.4x throughput difference

  2. CPU Configuration: MinCPU, Default, or MaxCPU - affects parallelization vs coordination overhead

  3. Optimization Mode: Latency or Throughput - different JVM and runtime tuning

  4. Hardware: CPU, memory, storage (NVME vs SSD), network (InfiniBand vs Ethernet)

  5. Network Tuning: VMA, RDMA enablement (not enabled in baseline tests)

Next Steps

Last updated