Testing, Benchmarking, and Fuzzing

This guide covers running tests, benchmarks, and fuzz targets for Capsa development.

All commands require the Nix devenv shell. Enter it with nix develop (or devenv shell), or use direnv for automatic activation.

Building Test VMs

Most integration tests and benchmarks require test VMs. Build them first:

bash

# For aarch64 (Apple Silicon)
nix-build nix -A vms.aarch64 -o result-vms

# For x86_64 (Intel/AMD Linux, Intel Mac)
nix-build nix -A vms.x86_64 -o result-vms

This creates a result-vms symlink containing the kernel, initrd, and disk image used by tests and benchmarks.

Legacy path

nix-build nix/test-vms -A x86_64 -o result-vms still works but the nix -A vms.* path is preferred.

Available Test VM Configurations

VM	Description
`default`	Universal VM with networking, disk, and vsock support
`with-disk`	Default + pre-created disk image
`uefi`	UEFI-bootable VM for testing UEFI boot

Testing

Running All Tests

bash

cargo test

This runs unit tests, doc tests, and integration tests. Always run the full suite before committing.

Running Specific Tests

bash

# Tests for a specific crate
cargo test -p capsa
cargo test -p capsa-net
cargo test -p capsa-core

# A specific integration test file
cargo test --test boot_test
cargo test --test sandbox_test

# Tests matching a name
cargo test test_console

Unit and Doc Tests Only

bash

cargo test --lib          # Unit tests only
cargo test --doc          # Doc tests only

Linting

bash

cargo fmt --all -- --check    # Check formatting
cargo clippy -- -D warnings   # Run clippy

Test Organization

Tests follow a two-layer model:

Layer 0 — Primitive Tests (*_primitives_test.rs): Verify foundational VM capabilities (boot, vsock, virtio-fs, disk). These use raw VMs via test_vm() + wait_for_agent(). If these fail, Layer 1 tests will fail too.

Test file	What it verifies
`direct_boot_primitives_test`	VM boots, agent responds, command execution
`vsock_primitives_test`	Vsock connectivity via agent
`virtio_fs_primitives_test`	Virtio-fs mount/read/write via agent
`disk_primitives_test`	Disk attachment
`console_test`	Console I/O without agent

Layer 1 — Feature Tests (sandbox_test.rs, sandbox_pool_test.rs, etc.): Verify higher-level features using capsa::sandbox() with the agent automatically available.

Using Test Utilities in Code

rust

use capsa::test_utils::{test_vm, wait_for_agent};

let vm = test_vm("default").build().await?;
let agent = wait_for_agent(&vm).await?;

let result = agent.exec("echo").arg("hello").run().await?;
assert_eq!(result.stdout.trim(), "hello");

Benchmarking

Benchmarks use Criterion.rs and produce statistical analysis with HTML reports in target/criterion/.

Sandbox Benchmarks

Measures boot time, agent exec latency, and virtio-fs throughput through the full VM stack.

bash

# Run all sandbox benchmarks
cargo bench -p capsa --bench sandbox

# Quick correctness check (runs each benchmark once, no timing)
cargo bench -p capsa --bench sandbox -- --test

# Run a specific benchmark by name
cargo bench -p capsa --bench sandbox -- "boot_cold"
cargo bench -p capsa --bench sandbox -- "boot_from_pool"
cargo bench -p capsa --bench sandbox -- "agent_exec"
cargo bench -p capsa --bench sandbox -- "virtio_fs"
cargo bench -p capsa --bench sandbox -- "virtio_fs_raw"

What's measured:

Benchmark	Description
`boot_cold`	Cold boot a sandbox from scratch
`boot_from_pool`	Reserve a sandbox from a pre-warmed pool
`agent_exec/echo`	Execute `echo` via the agent
`agent_exec/ls`	Execute `ls -la /` via the agent
`virtio_fs/write`, `virtio_fs/read`	Read/write files through RPC + agent (1KB, 4KB, 64KB)
`virtio_fs_raw/write`, `virtio_fs_raw/read`	Raw `dd` throughput through virtio-fs (10MB, 100MB at various block sizes)

Network Benchmarks

End-to-end iperf3 throughput through the full virtualization networking stack (guest TCP → virtio-net → NAT → host TCP). Requires iperf3 on the host and in the guest VM.

bash

# Run all network benchmarks
cargo bench -p capsa --bench network

# Quick correctness check
cargo bench -p capsa --bench network -- --test

# Just throughput benchmarks
cargo bench -p capsa --bench network -- "iperf3_throughput"

Linux only

The network benchmark currently uses the ip command to discover the host IP and only works on Linux.

Net Crate Throughput Benchmarks

Lower-level TCP download benchmarks through the userspace NAT stack. These don't require a VM — they use a simulated guest TCP stack to measure the networking layers in isolation.

bash

# Run all net throughput benchmarks
cargo bench -p capsa-net --bench throughput --features bench

What's measured (at 100MB and 1GB transfer sizes):

Benchmark	Description
`tcp_download`	Channel-based frame I/O (baseline)
`tcp_download_socketpair`	Unix socketpair transport
`tcp_download_full`	Full stack: socketpair → bridge → switch → gateway
`tcp_download_delayed_ack`	Channel baseline with delayed ACK (simulates Linux guest)

Viewing Benchmark Reports

Criterion generates HTML reports with plots and comparison against previous runs:

bash

open target/criterion/report/index.html

Fuzzing

Capsa uses two fuzzing approaches: cargo-fuzz (libFuzzer) for protocol-level fuzzing and integration test-based fuzz tests for end-to-end coverage.

cargo-fuzz Targets (capsa-net)

The capsa-net crate has libFuzzer-based fuzz targets for the networking stack:

Target	What it fuzzes
`fuzz_nat_process_frame`	NAT table frame processing with arbitrary Ethernet frames
`fuzz_policy_checker`	Network policy checker with arbitrary packets
`fuzz_dns_response`	DNS response parsing and caching

Running fuzz targets:

bash

# Install cargo-fuzz (one-time)
cargo install cargo-fuzz

# Run a fuzz target (runs indefinitely until stopped with Ctrl+C)
cd crates/net
cargo fuzz run fuzz_nat_process_frame
cargo fuzz run fuzz_policy_checker
cargo fuzz run fuzz_dns_response

# Run with a time limit
cargo fuzz run fuzz_nat_process_frame -- -max_total_time=300  # 5 minutes

# Run with the existing corpus only (quick regression check)
cargo fuzz run fuzz_nat_process_frame -- -runs=0

Seed corpora are checked in under crates/net/fuzz/corpus/ to give the fuzzer a head start with valid packet structures. Crashes are saved to crates/net/fuzz/artifacts/.

Integration Test-Based Fuzz Tests

These run as regular cargo test and exercise edge cases through actual VMs:

Test file	What it fuzzes
`virtiofs_fuzz_test`	Virtio-fs with unusual filenames, symlinks, and edge cases
`virtiofs_protocol_fuzz_test`	FUSE protocol handling with uncommon flag combinations
`vsock_fuzz_test`	Vsock port scanning, rapid connect/disconnect, data integrity

bash

# Run all fuzz-style integration tests
cargo test -p capsa --test virtiofs_fuzz_test
cargo test -p capsa --test virtiofs_protocol_fuzz_test
cargo test -p capsa --test vsock_fuzz_test

Platform Requirements

Linux

/dev/kvm access is required for integration tests and VM benchmarks
If KVM is unavailable: sudo modprobe kvm_intel (or kvm_amd), then sudo chmod 666 /dev/kvm or add your user to the kvm group

macOS

Tests require codesign-run for Virtualization.framework entitlements (automatically available in the dev shell)
If you see capsa-vmm not found, either set CAPSA_VMM_PATH or build it with cargo build -p capsa-vmm
Integration tests require actual hardware — GitHub runners lack nested virtualization

Troubleshooting

Test VMs Not Found

If tests fail with missing kernel/initrd errors, rebuild the test VMs:

bash

nix-build nix -A vms.aarch64 -o result-vms   # Apple Silicon
nix-build nix -A vms.x86_64 -o result-vms    # x86_64

Entitlement Errors (macOS)

Error: The operation couldn't be completed. (Virtualization error -1)

Make sure you're running inside the dev shell, which provides codesign-run for automatic code signing.

Benchmark Noise

For more stable benchmark results:

Close other applications
Avoid running on battery power
Use --sample-size and --measurement-time to increase statistical confidence
Compare runs using Criterion's built-in comparison (it automatically compares against the last run)

Testing, Benchmarking, and Fuzzing ​

Building Test VMs ​

Available Test VM Configurations ​

Testing ​

Running All Tests ​

Running Specific Tests ​

Unit and Doc Tests Only ​

Linting ​

Test Organization ​

Using Test Utilities in Code ​

Benchmarking ​

Sandbox Benchmarks ​

Network Benchmarks ​

Net Crate Throughput Benchmarks ​

Viewing Benchmark Reports ​

Fuzzing ​

cargo-fuzz Targets (capsa-net) ​

Integration Test-Based Fuzz Tests ​

Platform Requirements ​

Linux ​

macOS ​

Troubleshooting ​

Test VMs Not Found ​

Entitlement Errors (macOS) ​

Benchmark Noise ​

Testing, Benchmarking, and Fuzzing

Building Test VMs

Available Test VM Configurations

Testing

Running All Tests

Running Specific Tests

Unit and Doc Tests Only

Linting

Test Organization

Using Test Utilities in Code

Benchmarking

Sandbox Benchmarks

Network Benchmarks

Net Crate Throughput Benchmarks

Viewing Benchmark Reports

Fuzzing

cargo-fuzz Targets (capsa-net)

Integration Test-Based Fuzz Tests

Platform Requirements

Linux

macOS

Troubleshooting

Test VMs Not Found

Entitlement Errors (macOS)

Benchmark Noise