Troubleshooting (manual recovery)

When the pipeline / Web UI / multi-runtime path gets stuck, this page is the manual-recovery cheat sheet. The goal is to make it obvious which log, state file, or env var to inspect — and what to edit to come back to a healthy state.

:::tip Order of inspection

Last 50 lines of uv run speca-web or the CLI stderr
Tail of the latest outputs/logs/<phase>_*.jsonl
.speca/runs/<run_id>/state.json for the supervisor's view
outputs/<phase>_PARTIAL_*.json for what is already saved

90% of problems get diagnosed from those four. :::

A. Setup

`uv sync` fails

error: Could not find Python 3.12 ...

uv python install 3.12
uv sync

Still stuck? Nuke the venv and retry:

rm -rf .venv
uv sync

`npm install` (web/frontend) hangs

Usually a Node version mismatch.

node -v        # must be v20+
npm cache clean --force
rm -rf web/frontend/node_modules web/frontend/package-lock.json
cd web/frontend && npm install

`claude` / `codex` / `gemini` / `gh` CLI not found

Check PATH:

where claude   # Windows
which claude   # macOS / Linux

Add to PATH (example ~/.bashrc / ~/.zshrc):

export PATH="$HOME/.npm-global/bin:$PATH"

B. Authentication

Web UI stays on the login screen even though `claude auth status` is OK

Cause: credentials path mismatch. SPECA reads ~/.claude/.credentials.json (leading dot) as the primary source and ~/.claude/credentials.json (no dot) as a legacy fallback.

Check:

ls -la ~/.claude/.credentials.json
cat ~/.claude/.credentials.json | head -c 80   # expect a `claudeAiOauth` field

Manual fix:

claude auth logout
claude auth login

Or paste an API key directly:

# Either in the Web UI login form, or:
echo '{"apiKey":"sk-ant-..."}' > ~/.claude/credentials.json

Chat panel gets 429 (rate-limited)

Cause: A claude.ai OAuth token forwarded to the Anthropic SDK as an API key hits the subscription throttle.

Manual fix: With PR #63 merged, the server auto-falls back to the claude CLI subprocess for OAuth tokens. If you still see 429:

# 1. Other concurrent claude processes can share the subscription quota
ps -ef | grep claude        # macOS / Linux
tasklist | findstr claude   # Windows

# 2. Re-login if SPECA is seeing stale creds
claude auth logout && claude auth login

# 3. Switch runtime via Settings (Ollama / Codex)

Codex / Gemini env vars are ignored

echo $OPENAI_API_KEY
echo $GEMINI_API_KEY

Env set after the Web server started → restart the server. PowerShell:

$env:OPENAI_API_KEY = "sk-..."
uv run speca-web --port 7411 --serve-frontend

C. Pipeline runs

Phase 01a returns "Empty results"

Cause: outputs/BUG_BOUNTY_SCOPE.json missing or in_scope empty.

cat outputs/BUG_BOUNTY_SCOPE.json

Expected:

{
  "url": "https://example.com/bug-bounty",
  "in_scope_assets": ["contracts/MyContract.sol"],
  "spec_urls": ["https://example.com/spec.html"]
}

Manual fix: Re-run the Web UI wizard, hand-edit the file, or rerun with explicit env:

export SPEC_URLS="https://geth.ethereum.org/docs"
uv run python scripts/run_phase.py --phase 01a --force

Phase 02c MCP tree-sitter error

Symptom: mcp__tree_sitter__get_symbols fails, code_scope is mostly empty.

Cause: MCP server not registered or runtime is not claude (only ClaudeRunner drives MCP today).

bash scripts/setup_mcp.sh --verify
bash scripts/setup_mcp.sh             # re-register if needed

Split-run workaround:

# 02c with claude:
ORCHESTRATOR_RUNNER=claude uv run python scripts/run_phase.py --phase 02c --force

# Rest with another runtime:
uv run python scripts/run_phase.py --phase 03 04 --runtime api --force

Phase 03 / 04 all batches fail

Symptom: circuit-breaker tripped (exit 65), every batch retry exhausted.

ls -t outputs/logs/03_*.jsonl | head -3
tail -50 outputs/logs/03_W0B0_<latest>.jsonl

Look for tool_use loops, Anthropic API timeouts, overload errors.

Manual fixes:

# 1. Transient API issue — resume with lower concurrency
uv run python scripts/run_phase.py --phase 03 --force --workers 2 --max-concurrent 4

# 2. Regen queues (prompt-derived hang)
rm outputs/03_QUEUE_*.json
uv run python scripts/run_phase.py --phase 03 --force

# 3. Delete one broken PARTIAL, keep the rest, resume
rm outputs/03_PARTIAL_W0B5_*.json
uv run python scripts/run_phase.py --phase 03

Phase 03 exits with "BUG_BOUNTY_SCOPE.json missing"

Phase 01e requires it. Minimal valid content:

{
  "url": null,
  "in_scope_assets": ["src/**/*.sol"],
  "spec_urls": []
}

D. Run state

`/api/runs/<id>` returns 404 (state.json exists)

Cause: state.json without a sibling manifest.json (cancel before finalize). Fixed in PR #62, which falls back to state.json.

ls .speca/runs/
ls .speca/runs/<id>/
cat .speca/runs/<id>/state.json | python -m json.tool

If run_id inside state.json mismatches the directory:

python -c "
import json
from pathlib import Path
p = Path('.speca/runs/<id>/state.json')
d = json.loads(p.read_text())
d['run_id'] = '<id>'
p.write_text(json.dumps(d, indent=2))
"

Clean up "orphaned running" runs

python -c "
import json
from pathlib import Path
for p in Path('.speca/runs').glob('*/state.json'):
    d = json.loads(p.read_text())
    print(p.parent.name, d.get('status'), d.get('owner_pid'))
"

owner_pid not present → orphan. A Web server restart auto-relabels to crashed. To do it manually:

python -c "
import json
from pathlib import Path
for p in Path('.speca/runs').glob('*/state.json'):
    d = json.loads(p.read_text())
    if d.get('status') == 'running':
        d['status'] = 'crashed'
        d['cancel_requested'] = True
        p.write_text(json.dumps(d, indent=2))
"

`.speca/workspaces/` is huge

du -sh .speca/workspaces/*
rm -rf .speca/workspaces/<target_slug>
# regenerated on next run

E. Web UI

Settings runtime switch has no effect

curl http://127.0.0.1:7411/api/runtime
cat ~/.speca/runtime.json

Both should agree. Validate by issuing a direct SSE turn:

CID=$(python -c "import uuid; print(uuid.uuid4())")
curl -N -X POST http://127.0.0.1:7411/api/chat/conversations/$CID/messages \
  -H "Content-Type: application/json" \
  -d '{"text":"hello"}'

Manual override:

cat > ~/.speca/runtime.json <<'EOF'
{
  "runtime": "ollama",
  "ollama_host": "http://localhost:11434",
  "ollama_model": "llama3.2"
}
EOF

Chat SSE renders nothing (Network tab shows bytes)

Cause (fixed in PR #62): Windows sse-starlette emits \r\n\r\n frame terminators; the SPA parser only split on \n\n.

CID=$(python -c "import uuid; print(uuid.uuid4())")
curl -N -X POST http://127.0.0.1:7411/api/chat/conversations/$CID/messages \
  -H "Content-Type: application/json" \
  -d '{"text":"hi"}' | xxd | head -10

If you see 0d 0a 0d 0a separators, ensure web/frontend/src/features/chat/useChatStream.ts normalises CRLF → LF:

buffer += decoder.decode(value, { stream: true }).replace(/\r\n/g, "\n");

Chat history disappeared

ls ~/.speca/chat/ 2>/dev/null || ls .speca/chat/

Plain JSON — recover or hand-write:

{
  "conversation_id": "...",
  "messages": [
    {"role":"user","content":[{"type":"text","text":"..."}],"timestamp":"2026-05-15T..."},
    {"role":"assistant","content":[{"type":"text","text":"..."}],"timestamp":"2026-05-15T..."}
  ],
  "created_at": "...",
  "last_message_at": "..."
}

F. Multi-runtime

`--runtime codex` exits 2

ERROR: runtime 'codex' cannot drive the orchestrator.

Until PR #67 merges, codex / gemini / ollama are orchestrator-side stubs. Use api against the same provider in the meantime:

export API_RUNNER_API_KEY=$OPENAI_API_KEY
export API_RUNNER_BASE_URL=https://api.openai.com/v1
export API_RUNNER_MODEL=gpt-4o
uv run python scripts/run_phase.py --target 04 --runtime api

Ollama self-hosted not responding

curl http://localhost:11434/api/tags
# Should list pulled models.

If not:

ollama serve &
ollama pull llama3.2

OpenAI-compatible endpoint sanity-check:

curl http://localhost:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"llama3.2","messages":[{"role":"user","content":"hi"}]}'

G. Archive / reproducibility

Inspect `.speca/runs/<id>/`

tree .speca/runs/<id>/
cat .speca/runs/<id>/manifest.json | python -m json.tool
ls .speca/runs/<id>/phases/

manifest.json carries commit SHA, env snapshot, spec sources, and runtime — that's all you need to reproduce on another machine.

Reproduce a past run elsewhere

# 1. Tarball
tar czf run-<id>.tar.gz .speca/runs/<id>/ outputs/<id>/

# 2. On the other host
tar xzf run-<id>.tar.gz

# 3. Replay env_snapshot
python -c "
import json
m = json.load(open('.speca/runs/<id>/manifest.json'))
for k, v in m['env_snapshot'].items():
    print(f'export {k}={v}')
"

# 4. Same commit + runtime
git checkout <sha>
ORCHESTRATOR_RUNNER=<runtime> uv run python scripts/run_phase.py --phase 03 04 --force

H. Nuclear option

When nothing else works:

# 1. Wipe state
rm -rf .speca/ ~/.speca/
rm -rf outputs/

# 2. Reset credentials
rm ~/.claude/.credentials.json ~/.claude/credentials.json 2>/dev/null
claude auth login

# 3. Reinstall deps
rm -rf .venv web/frontend/node_modules
uv sync
cd web/frontend && npm install && cd ../..

If that still fails, open an issue at NyxFoundation/speca/issues with a reproducer.

Log / state file map

Path	Contents
`outputs/logs/<phase>_W<W>B<B>_<ts>.jsonl`	claude CLI / APIRunner stream-json log: tool_use history, cost, errors
`outputs/<phase>_PARTIAL_W<W>B<B>_<ts>.json`	Per-batch results (resume input)
`outputs/<phase>_QUEUE_<worker>.json`	Per-worker queue
`.speca/runs/<id>/state.json`	Supervisor's run state (status / owner_pid / phases / cancel_requested / max_budget_usd)
`.speca/runs/<id>/manifest.json`	Immutable run metadata (commit SHA / env snapshot / spec sources / runtime)
`.speca/workspaces/<target_slug>/`	Bare cache + worktree for the target
`~/.speca/runtime.json`	Web UI runtime preference
`~/.speca/chat/<conversation_id>.json`	Chat history
`~/.claude/.credentials.json`	claude CLI OAuth tokens (secret)
`~/.claude/credentials.json`	Legacy API-key location

A. Setup​

uv sync fails​

npm install (web/frontend) hangs​

claude / codex / gemini / gh CLI not found​

B. Authentication​

Web UI stays on the login screen even though claude auth status is OK​

Chat panel gets 429 (rate-limited)​

Codex / Gemini env vars are ignored​

C. Pipeline runs​

Phase 01a returns "Empty results"​

Phase 02c MCP tree-sitter error​

Phase 03 / 04 all batches fail​

Phase 03 exits with "BUG_BOUNTY_SCOPE.json missing"​

D. Run state​

/api/runs/<id> returns 404 (state.json exists)​

Clean up "orphaned running" runs​

.speca/workspaces/ is huge​

E. Web UI​

Settings runtime switch has no effect​

Chat SSE renders nothing (Network tab shows bytes)​

Chat history disappeared​

F. Multi-runtime​

--runtime codex exits 2​

Ollama self-hosted not responding​

G. Archive / reproducibility​

Inspect .speca/runs/<id>/​

Reproduce a past run elsewhere​

H. Nuclear option​

Log / state file map​