Pipeline overview

SPECA is a six-stage pipeline. The first three stages convert a specification into typed properties (Knowledge Structuring); the last three audit those properties against an implementation (Systematic Auditing).

SPECA pipeline

Phase numbering — paper vs. internal IDs

The two SPECA papers number the stages Phase 1–6. The codebase uses internal IDs that reflect the original development order (01a, 01b, 01e, 02c, 03, 04). Both refer to the same stages:

Paper	Internal ID	Plain-English name	Page
Phase 1	`01a`	Spec Discovery	01a
Phase 2	`01b`	Subgraph Extraction	01b
Phase 3	`01e`	Property Generation	01e
Phase 4	`02c`	Code Pre-resolution	02c
Phase 5	`03`	Property Audit (Map / Prove / Stress-Test)	03
Phase 6	`04`	Severity Review (3-gate FP filter)	04

The CLI and orchestrator everywhere use the internal IDs (e.g. speca run --target 04).

Dependency chain

01a (Spec Discovery)
  ↓
01b (Subgraph Extraction)
  ↓
01e (Property Generation)        ← BUG_BOUNTY_SCOPE.json required
  ↓
02c (Code Pre-resolution)        ← TARGET_INFO.json required
  ↓
03 (Audit Map: Map → Prove → Stress-Test)
  ↓
04 (Review: Dead Code → Trust Boundary → Scope)

The first three phases (01a / 01b / 01e) depend only on the specification and the scope rubric — they can be run once and cached across implementations. The last three (02c / 03 / 04) depend on the target codebase and run per implementation.

Inputs and outputs

ID	Name	Input	Output
01a	Spec Discovery	`SPEC_URLS` env	`01a_STATE.json`
01b	Subgraph Extraction	`01a_STATE.json`	Mermaid `.mmd` + `01b_PARTIAL_*.json`
01e	Property Generation	Subgraphs + STRIDE/CWE Top 25	`01e_PARTIAL_*.json`
02c	Code Resolution	Properties + source	`02c_PARTIAL_*.json`
03	Audit Map	Properties + code	`03_PARTIAL_*.json`
04	Review	Phase 03 findings	`04_PARTIAL_*.json` (six verdicts)

Data flow conventions

Partial files: outputs/<phase_id>_PARTIAL_W{worker}B{batch}_{timestamp}.json
Queue files: outputs/<phase_id>_QUEUE_{worker}.json
Logs: outputs/logs/{phase_id}_*.jsonl

Each phase consumes upstream partial files via glob patterns, skips already-processed items (resume), and writes results immediately after each batch — partial progress is never blocked on validation.

Cross-cutting harness features

The orchestrator provides four features that apply to every phase. They are described in detail under Agent design — Harness:

Circuit Breaker — shared across all workers in a phase. Trips on consecutive failures, total retries, or repeated empty results.
Cost Tracker — per-phase USD budget; raises BudgetExceeded and hard-stops the phase when crossed.
Resume Manager — scans *_PARTIAL_*.json to identify already-processed items so re-runs skip them by default.
Log Watcher — tails the stream-json log in real time and forwards events to the TUI dashboard.

For the model assignment per phase and the prompt / skill split, see Prompts and skills.

Phase numbering — paper vs. internal IDs​

Dependency chain​

Inputs and outputs​

Data flow conventions​

Cross-cutting harness features​

Phase numbering — paper vs. internal IDs

Dependency chain

Inputs and outputs

Data flow conventions

Cross-cutting harness features