Link Module

The Link module benchmarks cluster replication link latency, measuring the performance of the low-level transport used for replicating state between cluster members.

Overview

The Link layer provides the foundation for cluster replication in Rumi, offering:

  • Direct point-to-point communication between cluster members

  • Minimal overhead transport for state replication

  • Support for various network protocols (TCP, UDP, InfiniBand, etc.)

This benchmark measures the raw performance capabilities of the replication link layer.

Test Programs

The Link module provides multiple test program variants:

Streaming Tests

Measure unidirectional throughput:

Blocking Variant:

  • Sender: com.neeve.perf.link.BlockingStreamingSender

  • Receiver: com.neeve.perf.link.BlockingStreamingReceiver

Non-Blocking Variant:

  • Receiver: com.neeve.perf.link.NonBlockingStreamingReceiver (no sender variant)

RDMA Variant:

  • Sender: com.neeve.perf.link.RdmaStreamingSender

  • Receiver: com.neeve.perf.link.RdmaStreamingReceiver

The sender continuously sends messages to the receiver as fast as possible, measuring maximum sustained throughput.

Ping-Pong Tests

Measure round-trip latency:

Blocking Variant:

  • Sender: com.neeve.perf.link.BlockingPingPongSender

  • Receiver: com.neeve.perf.link.BlockingPingPongReceiver

Non-Blocking Variant:

  • Sender: com.neeve.perf.link.NonBlockingPingPongSender

  • Receiver: com.neeve.perf.link.NonBlockingPingPongReceiver

The sender sends a message and waits for a response from the receiver, measuring round-trip time.

Command-Line Parameters

Blocking Streaming Sender

Short
Long
Default
Description

-d

--descriptor

-

Connection descriptor (e.g., tcp://192.168.1.7:12000&tcpnodelay=true)

-m

--messageSize

256

Message size in bytes

-b

--bufferSize

256

Write buffer size

-t

--testCount

100000000

Number of messages to send

-r

--testRate

10000000

Send rate (messages/sec)

-w

--warmupTime

2

Warmup time in seconds

-c

--cpuAffinityMask

-

CPU affinity mask

-i

--printIntervalStats

false

Output periodic interval stats

-f

--dontWriteLatenciesToFile

false

Suppress latency file output

Blocking Streaming Receiver

Short
Long
Default
Description

-d

--descriptor

-

Connection descriptor

-m

--messageSize

256

Message size in bytes

-c

--cpuAffinityMask

-

CPU affinity mask

-s

--stats

false

Output incremental throughput stats

Blocking Ping-Pong Sender

Short
Long
Default
Description

-d

--descriptor

-

Connection descriptor

-m

--messageSize

256

Message size in bytes

-c

--testCount

300000

Number of messages to send

-r

--testRate

10000

Send rate (messages/sec)

-a

--cpuAffinityMask

-

CPU affinity mask

-p

--spinRead

false

Spin or block on read

-o

--oneWayLatency

false

Calculate one-way latency

-i

--printIntervalStats

false

Output periodic interval stats

-f

--dontWriteLatenciesToFile

false

Suppress latency file output

Blocking Ping-Pong Receiver

Short
Long
Default
Description

-d

--descriptor

-

Connection descriptor

-m

--messageSize

256

Message size in bytes

-c

--cpuAffinityMask

-

CPU affinity mask

-p

--spinRead

false

Spin or block on read

-s

--stats

false

Output incremental throughput stats

Use Cases

Streaming Test

Purpose: Measure maximum throughput Setup: Two machines connected via high-speed network Use Case: Validate network configuration and capacity

Ping-Pong Test

Purpose: Measure minimum latency Setup: Two machines connected via low-latency network Use Case: Validate network tuning and baseline latency

Running Benchmarks

Streaming Throughput Test

On Receiver Machine:

On Sender Machine:

Ping-Pong Latency Test

On Receiver Machine:

On Sender Machine:

Interpreting Results

Throughput Results

Latency Results

Network Configurations

TCP over 10GbE

Typical Results:

  • Throughput: 1-2M messages/second

  • Latency: 15-25µs round-trip

TCP over InfiniBand

Typical Results:

  • Throughput: 2-4M messages/second

  • Latency: 8-15µs round-trip

RDMA over InfiniBand

Typical Results:

  • Throughput: 5-10M messages/second

  • Latency: 2-5µs round-trip

Comparison with Higher Layers

Link layer provides the foundation for cluster replication:

  • Link Layer: ~10µs (raw replication transport)

  • Messaging Layer: ~15µs (adds SMA abstractions)

  • AEP Engine: ~27µs (adds transactions, persistence, full clustering with consensus)

Next Steps

Last updated