Reproducing RQ2b — ProFuzzBench (exploratory)
SPECA application results against ProFuzzBench, derived from ChatAFL (NDSS 2024). Six text-based protocol implementations; ground truth is 9 zero-days. An exploratory track not included in the paper.
Status
⚠️ Exploratory: the figure generation pipeline is in place, but SPECA-side raw traces are exploratory and incomplete. Strict reproduction scripts against the ground truth are still being developed.
Dataset
- Source: ProFuzzBench, ChatAFL paper
- Targets: 6 protocol implementations (SMTP / DNS / TLS / DTLS / SSH / RTSP families)
- Ground truth: 9 zero-days. Details →
rq2b/ground_truth_bugs.yaml
How to reproduce
Figures only (baseline)
uv run python3 benchmarks/rq2b/visualize.py
Artifacts: benchmarks/results/rq2b/figures/rq2b_*.png + rq2b_table.tex.
Overlay SPECA results (if output data is available)
# Restore raw traces from the release tag (currently only figures are released)
bash benchmarks/scripts/restore-results.sh bench-rq2b-<date>-figures
uv run python3 benchmarks/rq2b/visualize.py \
--speca-results benchmarks/results/rq2b/speca/speca_rq2b.json
speca/speca_rq2b.json itself is treated as exploratory; a separate release tag has not been issued at this time.
CI workflows
rq2b-01-setup-dataset.yml— shallow clone of the ProFuzzBench repository + metadata extractionrq2b-02-visualize.yml— automated figure generation
Known issues
- Ground-truth coverage measurement is manual. We plan to add an automated matcher in
benchmarks/rq2b/evaluate.py - When integrating an LLM matcher, we want to reuse
matchers.pyfrom RQ1 (with protocol-specific vocabulary added) - End-to-end SPECA runs against all 6 implementations have not yet been performed