# How to reproduce (v2)

> **v2 update (May 2026):** The benchmark sweep now defaults to `N_TRIALS_PER_CFG=500` with 95% bootstrap CIs, and uses CUDA fp16 when a GPU is available (auto-detected). On the RTX 5070 Ti the full VEWC vs AsymVZK comparative sweep takes about 25-30 minutes for the complete grid (was 27 seconds for N=16 in v1). The PSI-LM stdlib benchmarks are unchanged.

Single-page reproduction guide. The full paper is `paper-final-en.pdf` (with
`paper-en.md` as the markdown source). All Python code is in `experiments/`,
all measured numbers are in `experiments/results/*.csv`.

## Environment

- Python 3.11+ (tested on 3.13).
- For PSI-LM: standard library only. No installs.
- For AsymVZK and VEWC: PyTorch CPU build + HuggingFace Transformers.
  Install with `python -m pip install -r experiments/requirements.txt` after
  ensuring PyTorch is installed (use the index URL at pytorch.org for CPU).

The Qwen models are downloaded automatically from HuggingFace on first run
(roughly 1 GB for Qwen 2.5-0.5B, 3 GB for Qwen 2.5-1.5B, 2 GB for Qwen 3.5-0.8B,
8 GB for Qwen 3.5-4B).

## Reproduction steps

```bash
# 1. PSI-LM sanity tests (~5 seconds, stdlib only)
cd experiments
python tests.py

# 2. PSI-LM cost + soundness sweep (~3-4 minutes, stdlib only)
python run_benchmark.py
# -> writes results/benchmark.csv and results/soundness.csv

# 3. PSI-LM end-to-end projection
python project_psi_lm_cost.py
# -> writes results/projected_costs.csv

# 4. AsymVZK smoke test on real Qwen 2.5 pair (~1 minute on CPU)
python -m asymvzk.smoke_test

# 5. AsymVZK full benchmark sweep (~20 minutes on CPU)
python -m asymvzk.run_benchmark
# -> writes results/asymvzk_*.csv

# 6. VEWC smoke test on Qwen 3.5 (~1 minute on CPU after model download)
python -m asymvzk.smoke_test_vewc

# 7. VEWC vs AsymVZK comparative benchmark
#    - With CUDA: ~25-30 min for the full grid at N=500.
#    - CPU fallback: bump to N=20-50 in run_vewc_benchmark.py first or
#      this will run for hours.
python -m asymvzk.run_vewc_benchmark
# -> writes results/vewc_vs_asymvzk.csv

# 8. Render paper figures from the CSVs (matplotlib).
python make_paper_graphs.py
# -> writes ../paper/assets/graphs/*.png
```

The CSV files reproduce the tables in §9 of the paper. To re-render the paper
itself from `paper-en.md`, run `python paper/md_to_docx_pandoc.py`.

## What is in each module

- `experiments/psi_lm/` — PSI-LM reference implementation (commit, sampling,
  Fiat-Shamir, protocol, cheating provers). Uses a hash-based toy LM.
- `experiments/asymvzk/` — AsymVZK and VEWC reference implementations on real
  HuggingFace models. Includes `entropy.py`, `weighted_challenge.py`,
  `protocol.py`, `protocol_vewc.py`, smoke tests, and benchmarks.
- `experiments/results/` — measured CSV data referenced in the paper.

## What is stubbed

The per-position consistency proof (Pi_M in the paper) is implemented as
re-evaluation of the model at challenged positions. A production deployment
would replace this stub with a zkLLM-class single-position SNARK. The paper
notes this in §9.9 and substitutes published zkLLM cost figures into the
end-to-end projection in §9.8.

## Hardware notes

All numbers in the paper are from a single-threaded CPU run on a laptop. GPU
acceleration would dramatically lower the AsymVZK/VEWC prover times because
plaintext inference dominates. The framework overhead (commitments,
Fiat-Shamir, opening checks) is already negligible (~0.005 ms/token).