Testing, Benchmarking, and Fuzzing
This guide covers running tests, benchmarks, and fuzz targets for Capsa development.
All commands require the Nix devenv shell. Enter it with nix develop (or devenv shell), or use direnv for automatic activation.
Building Test VMs
Most integration tests and benchmarks require test VMs. Build them first:
# For aarch64 (Apple Silicon)
nix-build nix -A vms.aarch64 -o result-vms
# For x86_64 (Intel/AMD Linux, Intel Mac)
nix-build nix -A vms.x86_64 -o result-vmsThis creates a result-vms symlink containing the kernel, initrd, and disk image used by tests and benchmarks.
Legacy path
nix-build nix/test-vms -A x86_64 -o result-vms still works but the nix -A vms.* path is preferred.
Available Test VM Configurations
| VM | Description |
|---|---|
default | Universal VM with networking, disk, and vsock support |
with-disk | Default + pre-created disk image |
uefi | UEFI-bootable VM for testing UEFI boot |
Testing
Running All Tests
cargo testThis runs unit tests, doc tests, and integration tests. Always run the full suite before committing.
Running Specific Tests
# Tests for a specific crate
cargo test -p capsa
cargo test -p capsa-net
cargo test -p capsa-core
# A specific integration test file
cargo test --test boot_test
cargo test --test sandbox_test
# Tests matching a name
cargo test test_consoleUnit and Doc Tests Only
cargo test --lib # Unit tests only
cargo test --doc # Doc tests onlyLinting
cargo fmt --all -- --check # Check formatting
cargo clippy -- -D warnings # Run clippyTest Organization
Tests follow a two-layer model:
Layer 0 — Primitive Tests (*_primitives_test.rs): Verify foundational VM capabilities (boot, vsock, virtio-fs, disk). These use raw VMs via test_vm() + wait_for_agent(). If these fail, Layer 1 tests will fail too.
| Test file | What it verifies |
|---|---|
direct_boot_primitives_test | VM boots, agent responds, command execution |
vsock_primitives_test | Vsock connectivity via agent |
virtio_fs_primitives_test | Virtio-fs mount/read/write via agent |
disk_primitives_test | Disk attachment |
console_test | Console I/O without agent |
Layer 1 — Feature Tests (sandbox_test.rs, sandbox_pool_test.rs, etc.): Verify higher-level features using capsa::sandbox() with the agent automatically available.
Using Test Utilities in Code
use capsa::test_utils::{test_vm, wait_for_agent};
let vm = test_vm("default").build().await?;
let agent = wait_for_agent(&vm).await?;
let result = agent.exec("echo").arg("hello").run().await?;
assert_eq!(result.stdout.trim(), "hello");Benchmarking
Benchmarks use Criterion.rs and produce statistical analysis with HTML reports in target/criterion/.
Sandbox Benchmarks
Measures boot time, agent exec latency, and virtio-fs throughput through the full VM stack.
# Run all sandbox benchmarks
cargo bench -p capsa --bench sandbox
# Quick correctness check (runs each benchmark once, no timing)
cargo bench -p capsa --bench sandbox -- --test
# Run a specific benchmark by name
cargo bench -p capsa --bench sandbox -- "boot_cold"
cargo bench -p capsa --bench sandbox -- "boot_from_pool"
cargo bench -p capsa --bench sandbox -- "agent_exec"
cargo bench -p capsa --bench sandbox -- "virtio_fs"
cargo bench -p capsa --bench sandbox -- "virtio_fs_raw"What's measured:
| Benchmark | Description |
|---|---|
boot_cold | Cold boot a sandbox from scratch |
boot_from_pool | Reserve a sandbox from a pre-warmed pool |
agent_exec/echo | Execute echo via the agent |
agent_exec/ls | Execute ls -la / via the agent |
virtio_fs/write, virtio_fs/read | Read/write files through RPC + agent (1KB, 4KB, 64KB) |
virtio_fs_raw/write, virtio_fs_raw/read | Raw dd throughput through virtio-fs (10MB, 100MB at various block sizes) |
Network Benchmarks
End-to-end iperf3 throughput through the full virtualization networking stack (guest TCP → virtio-net → NAT → host TCP). Requires iperf3 on the host and in the guest VM.
# Run all network benchmarks
cargo bench -p capsa --bench network
# Quick correctness check
cargo bench -p capsa --bench network -- --test
# Just throughput benchmarks
cargo bench -p capsa --bench network -- "iperf3_throughput"Linux only
The network benchmark currently uses the ip command to discover the host IP and only works on Linux.
Net Crate Throughput Benchmarks
Lower-level TCP download benchmarks through the userspace NAT stack. These don't require a VM — they use a simulated guest TCP stack to measure the networking layers in isolation.
# Run all net throughput benchmarks
cargo bench -p capsa-net --bench throughput --features benchWhat's measured (at 100MB and 1GB transfer sizes):
| Benchmark | Description |
|---|---|
tcp_download | Channel-based frame I/O (baseline) |
tcp_download_socketpair | Unix socketpair transport |
tcp_download_full | Full stack: socketpair → bridge → switch → gateway |
tcp_download_delayed_ack | Channel baseline with delayed ACK (simulates Linux guest) |
Viewing Benchmark Reports
Criterion generates HTML reports with plots and comparison against previous runs:
open target/criterion/report/index.htmlFuzzing
Capsa uses two fuzzing approaches: cargo-fuzz (libFuzzer) for protocol-level fuzzing and integration test-based fuzz tests for end-to-end coverage.
cargo-fuzz Targets (capsa-net)
The capsa-net crate has libFuzzer-based fuzz targets for the networking stack:
| Target | What it fuzzes |
|---|---|
fuzz_nat_process_frame | NAT table frame processing with arbitrary Ethernet frames |
fuzz_policy_checker | Network policy checker with arbitrary packets |
fuzz_dns_response | DNS response parsing and caching |
Running fuzz targets:
# Install cargo-fuzz (one-time)
cargo install cargo-fuzz
# Run a fuzz target (runs indefinitely until stopped with Ctrl+C)
cd crates/net
cargo fuzz run fuzz_nat_process_frame
cargo fuzz run fuzz_policy_checker
cargo fuzz run fuzz_dns_response
# Run with a time limit
cargo fuzz run fuzz_nat_process_frame -- -max_total_time=300 # 5 minutes
# Run with the existing corpus only (quick regression check)
cargo fuzz run fuzz_nat_process_frame -- -runs=0Seed corpora are checked in under crates/net/fuzz/corpus/ to give the fuzzer a head start with valid packet structures. Crashes are saved to crates/net/fuzz/artifacts/.
Integration Test-Based Fuzz Tests
These run as regular cargo test and exercise edge cases through actual VMs:
| Test file | What it fuzzes |
|---|---|
virtiofs_fuzz_test | Virtio-fs with unusual filenames, symlinks, and edge cases |
virtiofs_protocol_fuzz_test | FUSE protocol handling with uncommon flag combinations |
vsock_fuzz_test | Vsock port scanning, rapid connect/disconnect, data integrity |
# Run all fuzz-style integration tests
cargo test -p capsa --test virtiofs_fuzz_test
cargo test -p capsa --test virtiofs_protocol_fuzz_test
cargo test -p capsa --test vsock_fuzz_testPlatform Requirements
Linux
/dev/kvmaccess is required for integration tests and VM benchmarks- If KVM is unavailable:
sudo modprobe kvm_intel(orkvm_amd), thensudo chmod 666 /dev/kvmor add your user to thekvmgroup
macOS
- Tests require
codesign-runfor Virtualization.framework entitlements (automatically available in the dev shell) - If you see
capsa-vmm not found, either setCAPSA_VMM_PATHor build it withcargo build -p capsa-vmm - Integration tests require actual hardware — GitHub runners lack nested virtualization
Troubleshooting
Test VMs Not Found
If tests fail with missing kernel/initrd errors, rebuild the test VMs:
nix-build nix -A vms.aarch64 -o result-vms # Apple Silicon
nix-build nix -A vms.x86_64 -o result-vms # x86_64Entitlement Errors (macOS)
Error: The operation couldn't be completed. (Virtualization error -1)Make sure you're running inside the dev shell, which provides codesign-run for automatic code signing.
Benchmark Noise
For more stable benchmark results:
- Close other applications
- Avoid running on battery power
- Use
--sample-sizeand--measurement-timeto increase statistical confidence - Compare runs using Criterion's built-in comparison (it automatically compares against the last run)