Capsa is experimental software. APIs may change without notice.
Skip to content

Testing, Benchmarking, and Fuzzing

This guide covers running tests, benchmarks, and fuzz targets for Capsa development.

All commands require the Nix devenv shell. Enter it with nix develop (or devenv shell), or use direnv for automatic activation.

Building Test VMs

Most integration tests and benchmarks require test VMs. Build them first:

bash
# For aarch64 (Apple Silicon)
nix-build nix -A vms.aarch64 -o result-vms

# For x86_64 (Intel/AMD Linux, Intel Mac)
nix-build nix -A vms.x86_64 -o result-vms

This creates a result-vms symlink containing the kernel, initrd, and disk image used by tests and benchmarks.

Legacy path

nix-build nix/test-vms -A x86_64 -o result-vms still works but the nix -A vms.* path is preferred.

Available Test VM Configurations

VMDescription
defaultUniversal VM with networking, disk, and vsock support
with-diskDefault + pre-created disk image
uefiUEFI-bootable VM for testing UEFI boot

Testing

Running All Tests

bash
cargo test

This runs unit tests, doc tests, and integration tests. Always run the full suite before committing.

Running Specific Tests

bash
# Tests for a specific crate
cargo test -p capsa
cargo test -p capsa-net
cargo test -p capsa-core

# A specific integration test file
cargo test --test boot_test
cargo test --test sandbox_test

# Tests matching a name
cargo test test_console

Unit and Doc Tests Only

bash
cargo test --lib          # Unit tests only
cargo test --doc          # Doc tests only

Linting

bash
cargo fmt --all -- --check    # Check formatting
cargo clippy -- -D warnings   # Run clippy

Test Organization

Tests follow a two-layer model:

Layer 0 — Primitive Tests (*_primitives_test.rs): Verify foundational VM capabilities (boot, vsock, virtio-fs, disk). These use raw VMs via test_vm() + wait_for_agent(). If these fail, Layer 1 tests will fail too.

Test fileWhat it verifies
direct_boot_primitives_testVM boots, agent responds, command execution
vsock_primitives_testVsock connectivity via agent
virtio_fs_primitives_testVirtio-fs mount/read/write via agent
disk_primitives_testDisk attachment
console_testConsole I/O without agent

Layer 1 — Feature Tests (sandbox_test.rs, sandbox_pool_test.rs, etc.): Verify higher-level features using capsa::sandbox() with the agent automatically available.

Using Test Utilities in Code

rust
use capsa::test_utils::{test_vm, wait_for_agent};

let vm = test_vm("default").build().await?;
let agent = wait_for_agent(&vm).await?;

let result = agent.exec("echo").arg("hello").run().await?;
assert_eq!(result.stdout.trim(), "hello");

Benchmarking

Benchmarks use Criterion.rs and produce statistical analysis with HTML reports in target/criterion/.

Sandbox Benchmarks

Measures boot time, agent exec latency, and virtio-fs throughput through the full VM stack.

bash
# Run all sandbox benchmarks
cargo bench -p capsa --bench sandbox

# Quick correctness check (runs each benchmark once, no timing)
cargo bench -p capsa --bench sandbox -- --test

# Run a specific benchmark by name
cargo bench -p capsa --bench sandbox -- "boot_cold"
cargo bench -p capsa --bench sandbox -- "boot_from_pool"
cargo bench -p capsa --bench sandbox -- "agent_exec"
cargo bench -p capsa --bench sandbox -- "virtio_fs"
cargo bench -p capsa --bench sandbox -- "virtio_fs_raw"

What's measured:

BenchmarkDescription
boot_coldCold boot a sandbox from scratch
boot_from_poolReserve a sandbox from a pre-warmed pool
agent_exec/echoExecute echo via the agent
agent_exec/lsExecute ls -la / via the agent
virtio_fs/write, virtio_fs/readRead/write files through RPC + agent (1KB, 4KB, 64KB)
virtio_fs_raw/write, virtio_fs_raw/readRaw dd throughput through virtio-fs (10MB, 100MB at various block sizes)

Network Benchmarks

End-to-end iperf3 throughput through the full virtualization networking stack (guest TCP → virtio-net → NAT → host TCP). Requires iperf3 on the host and in the guest VM.

bash
# Run all network benchmarks
cargo bench -p capsa --bench network

# Quick correctness check
cargo bench -p capsa --bench network -- --test

# Just throughput benchmarks
cargo bench -p capsa --bench network -- "iperf3_throughput"

Linux only

The network benchmark currently uses the ip command to discover the host IP and only works on Linux.

Net Crate Throughput Benchmarks

Lower-level TCP download benchmarks through the userspace NAT stack. These don't require a VM — they use a simulated guest TCP stack to measure the networking layers in isolation.

bash
# Run all net throughput benchmarks
cargo bench -p capsa-net --bench throughput --features bench

What's measured (at 100MB and 1GB transfer sizes):

BenchmarkDescription
tcp_downloadChannel-based frame I/O (baseline)
tcp_download_socketpairUnix socketpair transport
tcp_download_fullFull stack: socketpair → bridge → switch → gateway
tcp_download_delayed_ackChannel baseline with delayed ACK (simulates Linux guest)

Viewing Benchmark Reports

Criterion generates HTML reports with plots and comparison against previous runs:

bash
open target/criterion/report/index.html

Fuzzing

Capsa uses two fuzzing approaches: cargo-fuzz (libFuzzer) for protocol-level fuzzing and integration test-based fuzz tests for end-to-end coverage.

cargo-fuzz Targets (capsa-net)

The capsa-net crate has libFuzzer-based fuzz targets for the networking stack:

TargetWhat it fuzzes
fuzz_nat_process_frameNAT table frame processing with arbitrary Ethernet frames
fuzz_policy_checkerNetwork policy checker with arbitrary packets
fuzz_dns_responseDNS response parsing and caching

Running fuzz targets:

bash
# Install cargo-fuzz (one-time)
cargo install cargo-fuzz

# Run a fuzz target (runs indefinitely until stopped with Ctrl+C)
cd crates/net
cargo fuzz run fuzz_nat_process_frame
cargo fuzz run fuzz_policy_checker
cargo fuzz run fuzz_dns_response

# Run with a time limit
cargo fuzz run fuzz_nat_process_frame -- -max_total_time=300  # 5 minutes

# Run with the existing corpus only (quick regression check)
cargo fuzz run fuzz_nat_process_frame -- -runs=0

Seed corpora are checked in under crates/net/fuzz/corpus/ to give the fuzzer a head start with valid packet structures. Crashes are saved to crates/net/fuzz/artifacts/.

Integration Test-Based Fuzz Tests

These run as regular cargo test and exercise edge cases through actual VMs:

Test fileWhat it fuzzes
virtiofs_fuzz_testVirtio-fs with unusual filenames, symlinks, and edge cases
virtiofs_protocol_fuzz_testFUSE protocol handling with uncommon flag combinations
vsock_fuzz_testVsock port scanning, rapid connect/disconnect, data integrity
bash
# Run all fuzz-style integration tests
cargo test -p capsa --test virtiofs_fuzz_test
cargo test -p capsa --test virtiofs_protocol_fuzz_test
cargo test -p capsa --test vsock_fuzz_test

Platform Requirements

Linux

  • /dev/kvm access is required for integration tests and VM benchmarks
  • If KVM is unavailable: sudo modprobe kvm_intel (or kvm_amd), then sudo chmod 666 /dev/kvm or add your user to the kvm group

macOS

  • Tests require codesign-run for Virtualization.framework entitlements (automatically available in the dev shell)
  • If you see capsa-vmm not found, either set CAPSA_VMM_PATH or build it with cargo build -p capsa-vmm
  • Integration tests require actual hardware — GitHub runners lack nested virtualization

Troubleshooting

Test VMs Not Found

If tests fail with missing kernel/initrd errors, rebuild the test VMs:

bash
nix-build nix -A vms.aarch64 -o result-vms   # Apple Silicon
nix-build nix -A vms.x86_64 -o result-vms    # x86_64

Entitlement Errors (macOS)

Error: The operation couldn't be completed. (Virtualization error -1)

Make sure you're running inside the dev shell, which provides codesign-run for automatic code signing.

Benchmark Noise

For more stable benchmark results:

  • Close other applications
  • Avoid running on battery power
  • Use --sample-size and --measurement-time to increase statistical confidence
  • Compare runs using Criterion's built-in comparison (it automatically compares against the last run)

Released under the MIT License.