> For the complete documentation index, see [llms.txt](https://docs.rumi.systems/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.rumi.systems/performance/canonical-benchmark/test-description.md).

# Test Description

This page describes the complete methodology for the canonical performance benchmark used to measure Rumi performance across releases.

## Test Program

The benchmark uses the **ESProcessor** (Event Sourcing Processor) from the [Rumi Performance Benchmark Suite](https://github.com/rumidata/nvx-rumi-perf). The ESProcessor exercises the complete Receive-Process-Send flow of a clustered microservice using Event Sourcing HA policy.

**Documentation**: See the [AEP Module](/performance/benchmark-suite/modules/aep-module.md) documentation for complete details on the test program, parameters, and configuration options.

## Test Flow

The benchmark exercises a complete message flow through a clustered microservice consisting of a primary and backup instance:

![Test Flow](/files/UF6lIylTIcjQaSq0mZAn)

### Primary Microservice

The primary microservice executes the following steps:

1. **Decode Inbound Message** - Deserialize incoming message from wire format
2. **Dispatch to Handler** - Route message to appropriate business logic handler
3. **Read all fields from message** - Business logic accesses message data
4. **Create and send message** - Business logic creates response message
5. **Replicate** - Replicate state change to backup
   * **5.2. Persist** - (Concurrent) Persist state change to disk on primary
6. **Consensus ACK** - Receive acknowledgment from backup
7. **Encode Outbound Message** - Serialize response message to wire format

### Backup Microservice

The backup microservice maintains consistency through (steps concurrent with primary's 5.2):

* **5.1. Replicate** - Receive replicated state from primary
* **5.3. Persist** - Persist replicated state to disk
* **5.4. Dispatch to Handler** - Process replicated message in business logic
* **5.5. Replay business logic** - Execute business logic for consistency
* **5.6. Consensus ACK** - Send acknowledgment back to primary

## Test Message

### Message Characteristics

* **Type**: Full-featured message exercising the complete Rumi data model
* **Serialized Size**: \~200 bytes
* **Encoding**: Xbuf2 (Rumi's high-performance binary encoding)
* **Structure**: Contains all standard data types (primitives, strings, nested entities, arrays)

### Code Paths Exercised

The benchmark tests the following Rumi capabilities:

✅ **Exercised Paths:**

* Message serialization/deserialization
* Handler dispatch
* Persistence
* Cluster replication
* Threading
* Consensus protocol

❌ **Not Exercised:**

* Message logging
* ICR (Inter-Cluster Replication)

## Primary Metric: Wire-to-Wire (w2w) Latency

The w2w metric measures the time from when an inbound message is received ("post-wire") to when the corresponding outbound message is sent ("pre-wire").

### What is Included

The w2w latency encompasses:

* Inbound message deserialization (wire format to POJO)
* Message handoff to business logic thread
* Handler dispatch
* Message data access by business logic
* State persistence
* Cluster replication to backup
* Replication acknowledgment from backup
* Outbound message creation
* Outbound message serialization (POJO to wire format)

### Latency Percentiles

Results are reported as:

* **50th percentile (median)** - Typical latency
* **99th percentile** - Tail latency under normal conditions
* **99.9th percentile** - Worst-case latency for high-percentile SLAs

## Test Variables

The benchmark measures performance across multiple configuration dimensions:

### Runtime Optimization Mode

| Value          | Description                                |
| -------------- | ------------------------------------------ |
| **Latency**    | container optimized for lowest latency     |
| **Throughput** | container optimized for highest throughput |

### Message Population/Extraction Method

| Value        | Description                                               | Performance Characteristic           |
| ------------ | --------------------------------------------------------- | ------------------------------------ |
| **Indirect** | Message data accessed via POJO setter/getter methods      | Standard object-oriented access      |
| **Direct**   | Message data accessed via serializer/deserializer objects | Zero-copy access, higher performance |

### CPU Configuration

The **# CPUs** value represents the number of system CPUs actually utilized by the test configuration.

| Value       | # CPUs | Threads                                                                                       | Description                                                                                                                                                                                                                      |
| ----------- | ------ | --------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **MinCPU**  | 1      | Business logic thread (affinitized, hot) + Cluster replication reader (affinitized, not hot)  | Minimal CPU footprint. Only the business logic thread runs "hot" (spinning). The replication reader is affinitized but consumes minimal CPU time, so total utilization is closer to 1 CPU than 2.                                |
| **Default** | 2-3    | Rumi decides thread allocation                                                                | Balanced configuration. Rumi automatically determines optimal thread count and affinitization. Typically uses 2 CPUs for latency-optimized configurations and 3 CPUs for throughput-optimized configurations with Direct access. |
| **MaxCPU**  | 6      | Default threads + detached sender (affinitized, hot) + detached dispatcher (affinitized, hot) | Maximum parallelization with additional hot threads for sending and dispatching.                                                                                                                                                 |

**Note on "Hot" threads**: A "hot" thread runs in a tight spin loop, continuously consuming a full CPU core for maximum responsiveness. Non-hot threads are affinitized (pinned to specific cores) but block when idle, consuming minimal CPU.

## Test Hardware

### Servers

* **Model**: Supermicro SYS-110P-WTR
* **CPU**: 1 x Intel Xeon Gold 6334 (8-Core, 3.6 GHz)
* **Memory**: 128GB (4 x 32GB)
* **Network**: NVIDIA/Mellanox ConnectX-6 InfiniBand dual-port
* **Storage**: NVME M.2 2TB

### Network

* **Switch**: NVIDIA Quantum InfiniBand Switch
* **Configuration**: Standard TCP/IP (VMA not enabled, unoptimized)
* **Round-trip wire latency**: \~23µs (unoptimized network)

### Hardware Tuning

The servers are configured for low-latency operation:

✅ **Enabled:**

* Dynamic power management = OFF
* Hyperthreading = OFF
* Linux performance profile = latency-performance

❌ **Not Enabled:**

* VMA (Mellanox kernel bypass) = OFF
* RDMA (Remote Direct Memory Access) = OFF

**Notes:**

* VMA is Mellanox's equivalent of Solarflare onloading
* RDMA is supported in Rumi but not enabled for this benchmark
* The \~23µs baseline is for an unoptimized network configuration
* Enabling RDMA can reduce replication network latency from \~23µs (unoptimized) to low single-digit microseconds, and in some cases to sub-microsecond latency

### Software Configuration

* **CPU affinitization**: ON (threads pinned to specific cores)
* **Test driver**: Custom in-process messaging driver (zero network overhead)

## Test Execution

### Latency Tests

* **Message Rate**: 10,000 messages/second (sustained)
* **Duration**: Sufficient for statistical significance
* **Measurement**: Percentile latencies (50th, 99th, 99.9th)

### Throughput Tests

* **Message Rate**: As fast as possible (saturated load)
* **Duration**: Sufficient to reach steady state
* **Measurement**: Messages processed per second

## Interpreting Results

### Latency Results

* All latency numbers are in **microseconds (µs)**
* **Round-trip wire latency (\~23µs on unoptimized network) is included** in all results
* Lower numbers indicate better performance
* Tail latencies (99th, 99.9th percentile) indicate consistency

### Throughput Results

* Measured in **messages per second**
* Higher numbers indicate better performance
* Represents maximum sustained throughput under saturation

### Configuration Trade-offs

* **MinCPU**: Lowest resource usage, may limit throughput
* **Default**: Balanced latency and throughput
* **MaxCPU**: Highest parallelization, may increase coordination overhead
* **Direct access**: Best performance, requires more careful coding
* **Indirect access**: Easier to use, slightly lower performance

## Next Steps

* [Test Results](/performance/canonical-benchmark/test-results.md) - View performance results by release
* [Canonical Benchmark Overview](/performance/canonical-benchmark.md) - Return to canonical benchmark overview
* [Performance Benchmark Suite](/performance/benchmark-suite.md) - Full benchmark suite documentation


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://docs.rumi.systems/performance/canonical-benchmark/test-description.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
