Running Benchmarks¶

This guide explains how to run Spikard benchmarks locally, interpret results, and add new framework implementations.

Prerequisites¶

Required Tools¶

Rust toolchain (1.70+)

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

Load generator (oha or bombardier)

# Preferred: oha (Rust-based)
cargo install oha

# Alternative: bombardier (Go-based)
go install github.com/codesenberg/bombardier@latest

Language runtimes (for frameworks you want to test)
Python 3.10+ with uv: curl -LsSf https://astral.sh/uv/install.sh | sh
Node 20+ with pnpm: curl -fsSL https://get.pnpm.io/install.sh | sh
Ruby 3.2-4.x with rbenv: brew install rbenv ruby-build
PHP 8.2+ with composer: brew install php composer
Bun 1.x: curl -fsSL https://bun.sh/install | bash

Building the Harness¶

cd tools/benchmark-harness
cargo build --release

The compiled binary will be at target/release/benchmark-harness.

Quick Start¶

Running a Single Framework¶

Test one framework with default settings (30s duration, 100 concurrency):

cd tools/benchmark-harness

# Python framework
./target/release/benchmark-harness run \
  --framework spikard-python \
  --app-dir apps/spikard-python

# Node framework
./target/release/benchmark-harness run \
  --framework fastify \
  --app-dir apps/fastify

# Ruby framework
./target/release/benchmark-harness run \
  --framework roda \
  --app-dir apps/roda

Profile Mode with Workload Suites¶

Run multiple workloads systematically:

# All workloads (18 endpoints)
./target/release/benchmark-harness profile \
  --framework spikard-python \
  --app-dir apps/spikard-python \
  --suite all \
  --output results/spikard-python-profile.json

# Just JSON workloads
./target/release/benchmark-harness profile \
  --framework fastapi \
  --app-dir apps/fastapi \
  --suite json-bodies \
  --output results/fastapi-json.json

# Path parameter workloads
./target/release/benchmark-harness profile \
  --framework spikard-node \
  --app-dir apps/spikard-node \
  --suite path-params \
  --output results/spikard-node-paths.json

Compare Mode¶

Compare multiple frameworks side-by-side:

# Compare Python frameworks
./target/release/benchmark-harness compare \
  --frameworks spikard-python,fastapi,litestar,robyn \
  --suite all \
  --output results/python-comparison

# Compare with statistical significance
./target/release/benchmark-harness compare \
  --frameworks spikard-python,fastapi \
  --suite json-bodies \
  --significance 0.05 \
  --output results/statistical-comparison

# Generate markdown report
./target/release/benchmark-harness compare \
  --frameworks express,fastify,hono \
  --suite all \
  --report results/nodejs-report.md

Command Reference¶

Common Options¶

--duration <SECS>        Benchmark duration per workload [default: 30]
--concurrency <N>        Concurrent connections [default: 100]
--warmup <SECS>          Warmup duration [default: 3]
--output <FILE>          JSON output file
--format <FORMAT>        Output format: json, json-pretty, table [default: json-pretty]

Run Mode¶

Single framework benchmark:

benchmark-harness run [OPTIONS]
  --framework <NAME>     Framework name (required)
  --app-dir <PATH>       App directory path (required)
  --workload <NAME>      Specific workload (default: json-small)
  --port <PORT>          Server port [default: 8000]
  --host <HOST>          Server host [default: 127.0.0.1]

Example:

./target/release/benchmark-harness run \
  --framework spikard-python \
  --app-dir apps/spikard-python \
  --workload json-medium \
  --duration 60 \
  --concurrency 200

Profile Mode¶

Deep analysis with workload suites:

benchmark-harness profile [OPTIONS]
  --framework <NAME>     Framework to profile (required)
  --app-dir <PATH>       App directory (required)
  --suite <SUITE>        Workload suite: all, json-bodies, path-params, etc.
  --profiler <TYPE>      Language profiler: python, node, ruby
  --baseline <PATH>      Baseline results for comparison
  --flamegraph           Generate flamegraph (requires profiler)

Example with profiling:

# Python with py-spy profiling
./target/release/benchmark-harness profile \
  --framework spikard-python \
  --app-dir apps/spikard-python \
  --suite all \
  --profiler python \
  --flamegraph \
  --output results/profiled.json

# Compare against Rust baseline
./target/release/benchmark-harness profile \
  --framework spikard-python \
  --app-dir apps/spikard-python \
  --suite json-bodies \
  --baseline results/spikard-rust-baseline.json

Compare Mode¶

Statistical framework comparison:

benchmark-harness compare [OPTIONS]
  --frameworks <LIST>    Comma-separated framework names (required)
  --suite <SUITE>        Workload suite
  --significance <VAL>   p-value threshold [default: 0.05]
  --report <FILE>        Generate markdown report
  --output <PREFIX>      Output file prefix

Example:

./target/release/benchmark-harness compare \
  --frameworks spikard-python,fastapi,litestar,robyn \
  --suite all \
  --duration 60 \
  --concurrency 200 \
  --significance 0.05 \
  --report results/python-frameworks.md \
  --output results/python-comparison

Consolidate Mode¶

Aggregate multiple profile results into a single consolidated report:

./target/release/benchmark-harness consolidate \
  --input results \
  --pattern "**/profile.json" \
  --output results/consolidated-profile.json

This generates aggregated stats across runs (mean/median/stddev/CI) per framework and per workload.

## Workload Suites

Available suites (use with `--suite` flag):

| Suite | Workloads | Description |
|-------|-----------|-------------|
| `all` | 18 | All workloads across categories |
| `json-bodies` | 4 | JSON serialization (small, medium, large, very-large) |
| `path-params` | 6 | Path parameter extraction variants |
| `query-params` | 3 | Query string parsing (few, medium, many) |
| `forms` | 2 | URL-encoded form data |
| `multipart` | 3 | File upload handling |

## Interpreting Results

### JSON Output Structure

```json
{
  "mode": "profile",
  "metadata": {
    "framework": "spikard-python",
    "version": "0.1.0",
    "timestamp": "2024-11-29T10:00:00Z",
    "git_commit": "abc123",
    "host": {
      "cpu_model": "Apple M2 Pro",
      "cpu_cores": 12,
      "memory_gb": 32
    }
  },
  "suites": [
    {
      "name": "json-bodies",
      "workloads": [
        {
          "name": "json-small",
          "results": {
            "throughput": {
              "requests_per_sec": 59358.24,
              "success_rate": 1.0
            },
            "latency": {
              "mean_ms": 0.84,
              "p99_ms": 2.87
            },
            "resources": {
              "cpu": { "avg_percent": 78.5 },
              "memory": { "avg_mb": 45.3 }
            }
          }
        }
      ]
    }
  ]
}

Key Metrics¶

Requests per second (RPS):

Primary performance indicator
Higher is better
Compare within language ecosystems

Success rate:

Must be 1.0 (100%) for valid results
Lower values indicate errors or crashes

Latency percentiles:

p50 (median): Typical request latency
p99: 99% of requests complete within this time
p99.9: Extreme outliers

Resource usage:

cpu.avg_percent: Average CPU utilization
memory.avg_mb: Average memory consumption

Compare Mode Output¶

{
  "mode": "compare",
  "frameworks": [
    {"name": "spikard-python", "version": "0.1.0"},
    {"name": "fastapi", "version": "0.115.0"}
  ],
  "suites": [
    {
      "name": "json-bodies",
      "workloads": [
        {
          "name": "json-small",
          "results": [
            {"framework": "spikard-python", "throughput": {"requests_per_sec": 59358}},
            {"framework": "fastapi", "throughput": {"requests_per_sec": 21456}}
          ],
          "comparison": {
            "winner": "spikard-python",
            "performance_ratios": {"spikard-python_vs_fastapi": 2.77},
            "statistical_significance": {"p_value": 0.001, "significant": true}
          }
        }
      ]
    }
  ]
}

Best Practices¶

Preparation¶

Close unnecessary applications: Browsers, IDEs, and background services can skew results
Disable CPU throttling: Set CPU governor to performance mode

# Linux
sudo cpupower frequency-set -g performance

# macOS
sudo pmset -a cpufreq high

Monitor temperature: Ensure CPU doesn't thermal throttle during benchmarks
Use dedicated hardware: Avoid running benchmarks on development machines