Keystone
A desktop instrument for benchmarking post-quantum cryptography, composing hybrid encryption, and executing quantum workloads against simulators or IBM Quantum Cloud — built where the timer floor is nanoseconds and the runtime won't lie about it.
At a glance
Keystone is a cross-platform desktop application that integrates three normally-separate domains behind one UI: classical and post-quantum cryptographic benchmarking, hybrid-encryption composition, and quantum workload execution against simulators or IBM Quantum Cloud hardware. It exists because evaluating a PQC algorithm responsibly means doing four things together — benchmarking primitives against classical baselines on your hardware, comparing across security parameters in a single coherent view, running real or simulated quantum circuits to validate threat-model assumptions, and persisting results across long research sessions. Each of those has good tools in isolation. Nothing tied them together with an instrument-grade UI.
It is built for cryptographers, security engineers, and PQC researchers working in long, data-dense sessions. The brand brief puts it directly: they do not need hand-holding, but they do need clarity. The product target is defense-lab procurement, a CTO evaluating a crypto-agility roadmap, a graduate student running parameter sweeps for a thesis. It is not a startup demo room. The surface reads like a measurement instrument: precise type, controlled palette, monospaced numerals, amber as the single annunciator on an otherwise mostly-dark panel.
- Algorithm / parameter combinations
- 39
- Commits across 14 months
- 99
- Platforms packaged from clean clone
- 3
| Family | Type | Parameter sets |
|---|---|---|
| Kyber (ML-KEM) | KEM | 3 |
| Dilithium (ML-DSA) | Signature | 3 |
| Falcon | Signature | 4 |
| SPHINCS+ | Signature (hash-based) | 12 |
| Classic McEliece | KEM (code-based) | 4 |
| AES | Symmetric | 3 |
| RSA | Public key | 4 |
| ECDH | Key agreement | 3 |
| ECDSA | Signature | 3 |
Capture a benchmark dashboard screenshot showing a populated KEM run (keygen / encaps / decaps) alongside a Dilithium signature run (keygen / sign / verify), on the macOS arm64 build of commit 98a3c6a or later, against the dark surface.
One shell, not three
The constraint
PQC evaluation needs three different runtimes. Crypto benchmarks live in C and C++ behind liboqs and OpenSSL. Quantum execution lives in Python behind Qiskit and Cirq. The comparison surface lives in JavaScript because that is where the data visualisation is. A browser cannot host the first two with the timer guarantees the measurement engine needs — web high-resolution time is clamped for spectre mitigations, and that ceiling is not negotiable when the readings gate protocol decisions.
The choice
One Electron 35 application, cross-platform, with full filesystem access and native-addon capability. The renderer is React 19 plus TypeScript 5.8 on Tailwind 3 and Material UI 6. The main process orchestrates three execution paths: in-process C++ via N-API for the hot crypto path, spawned benchmark executables for measurement-isolated runs, and a bundled Python distribution (~150 MB) for Qiskit and Cirq workloads against Aer simulators or IBM Quantum Cloud hardware. The shell is a thin window over a measurement engine, not the engine itself.
The tradeoff
Three build worlds — CMake for the native side, node-gyp for the addons, Webpack for the bundle — orchestrated by per-platform npm scripts. Electron's memory cost. A ~150 MB Python bundle on every release. The decision is correct for a single-user, long-session lab tool. It would be wrong for a background daemon.
What it cost at build time
Building liboqs at full CMake parallelism on macos-14 GitHub Actions runners exhausted posix_spawn process slots. The fix capped CMAKE_BUILD_PARALLEL_LEVEL at 2 and threaded the env var through the build script so local developer machines keep full parallelism while CI stays bounded. Small fix, real signal — three build worlds running on a constrained runner exposes contention that one runtime would not.
package.json holds package-mac-prod / package-win-prod / package-linux-prod scripts. external/ vendors liboqs and OpenSSL. The Python bundle build lives under build/ and the Qiskit / Cirq integration under src/infrastructure/quantum/. PR #5 (ci/cap-cmake-parallel), merge commit 76a8c4e, specifically commits e2ebe6a and 30a51c7.
Risk: the parallel-level cap is a CI-only override. If runner specs change — more cores, different scheduler, looser process limits — the cap becomes pessimistic. Worth revisiting on every runner image bump.
DDD for asymmetric change rates
The constraint
Three domains living in one app that change at very different rates. Crypto changes every few months — new ADRs, new algorithm families, a vendored dependency bump. The UI shell changes weekly: a chart resize fix, a copy tweak, a dark-mode adjustment. Persistence barely changes at all. A flat source tree would mean every UI tweak forces a mental tour of the crypto code on the way in, and every crypto bump risks dragging UI files into its diff.
The choice
Four-layer DDD: domain/, application/, infrastructure/, interfaces/. The domain layer holds entities like Algorithm, BenchmarkResult, and SecurityParameters with zero outward dependencies. Application orchestrates use cases. Infrastructure holds the native bindings, the Python adapter, the lowdb repositories. Interfaces is the Electron main process plus the React renderer. Each layer owns its concerns; layers do not reach across.
The tradeoff
More files than a flat src/, and more boilerplate per use case. Newcomers face the layering before they face the code. The friction is real, and it is the price for keeping the three domains' change rates from interfering with each other.
The src/ tree under domain / application / infrastructure / interfaces. ADRs in docs/ADR.md cover the layering and adjacent decisions (process model, repository pattern, dependency direction).
Risk: DDD earns its overhead when domains have different change rates. If all three stabilised — no new algorithms, no UI refresh, no persistence migration — the seams become pure tax. Not the current state, and not the foreseeable one.
In-process vs process-isolated
The constraint
Two competing requirements pull in opposite directions. The message-authentication path for the hot crypto algorithms needs to be fast — microseconds matter, and IPC overhead between the JS surface and the crypto engine swamps the measurement. But process isolation removes shared-runtime contention from the timing window, which is the only honest way to benchmark algorithms with very different runtime profiles in the same UI.
The choice
Split by algorithm family. Kyber and Dilithium — the two NIST primary picks for KEM and signature — run as in-process N-API native node addons, built per-platform via node-gyp. The hot path lives in C++ with a stable ABI to the JS surface. Falcon, SPHINCS+, Classic McEliece, and the four classical algorithms (AES, RSA, ECDH, ECDSA) run as standalone benchmark executables, spawned per run, each timed against the OS-level high-resolution clock (QueryPerformanceCounter on Windows, CLOCK_MONOTONIC elsewhere). Process startup is paid once and excluded from the measurement window.
The tradeoff
Two different execution paths to maintain. Two different build chains: node-gyp for the addons, CMake plus the platform's toolchain for the executables. The standalone benchmarks are vendored from a private Crucible repo at a pinned commit — a sync burden that has to be carried deliberately.
What sloppiness looked like
Early on, the src/infrastructure/benchmarks/benchmark_* files were Linux x86-64 ELF binaries, committed once and forgotten. Packaging on macOS or Windows shipped non-functional benchmarks that failed silently on first run. The fix moved binaries out of git, vendored source from Crucible at a pinned commit, wired per-platform builds into the packaging script, and added a test-benchmark-dlls smoke test as a gate. The boundary only stays clean if it has a check guarding it.
src/infrastructure/benchmarks/ holds the executable integration layer. package.json build.mac.binaries enumerates the nine benchmark binaries for individual signing. ADR-005 covers the algorithm-category split. The pinned Crucible commit is at benchmarks/src/CRUCIBLE-SOURCE.md (commit 15903430e21287ff4617b4975263e93073aee86e). PRs #3 and #4 (repair/crucible-benchmark-packaging). The smoke test lives at src/scripts/test-benchmark-dlls.cjs and runs on every packaging job.
Risk: pinning means manual sync on Crucible updates. The sync procedure needs to be documented in CRUCIBLE-SOURCE.md if it is not already. The smoke gate catches build-time breakage, not semantic drift in the upstream benchmark source.
Flexible-metrics schema
The constraint
Algorithm families do not emit the same metrics. KEMs report keygen, encapsulation, and decapsulation times. Signatures report keygen, signing, and verification times. Symmetric algorithms report throughput. A relational schema with one row per measurement would force either a least-common-denominator structure that loses information or an algorithm-family-per-table layout that fragments the comparison query into seven different SELECTs.
The choice
BenchmarkResult.metrics is a { [key: string]: number } map. Each algorithm family writes the keys that make sense for it. The UI consumes the map through a four-category taxonomy — KEM, Signature, Symmetric, ClassicalPubKey — that decides which keys to render in which panel. Persistence is JSON via lowdb 7. Inspecting a result during a research session means opening a file.
The tradeoff
Weakly typed by definition. The domain layer cannot enforce that an ML-KEM-768 result has exactly keygen / encaps / decaps and not something else. A typo in a key name on the writer side will show up silently in the UI.
src/domain/entities/benchmark.ts defines metrics as { [key: string]: number }. src/interfaces/renderer/utils/algorithm-categories.tsx defines the four-category UI taxonomy. ADR-005 records the decision and its rationale.
Risk: if a new algorithm family lands with a metric key that none of the four UI categories knows about, the metric is present in storage but invisible in the comparison view. An integration test that asserts every supported metric key is rendered somewhere is on the work list. Until that lands, the test is "manually scan the category file when adding a family."
Vendor with pinned commits
The constraint
Keystone needs three external source trees present at build time: liboqs for the PQC primitives, OpenSSL 3.0 for the classical baselines and the hybrid composition, and a private Crucible repo for the standalone benchmark executables. Two of these are public; one is private. Treating them as git submodules means anyone running npm install needs credentials for the private repo. Treating them as runtime-fetched dependencies means non-deterministic builds and a network dependency at install time.
The choice
All three are vendored under external/ at pinned commits. The pinned commit for Crucible is recorded at benchmarks/src/CRUCIBLE-SOURCE.md. npm install does not touch the network for these sources. Builds are reproducible from any clean clone, and the dependency boundary — what version of liboqs is in this build, what OpenSSL patches are applied — is explicit in the tree rather than implicit in a lock file.
The tradeoff
Manual sync when upstream changes. Security patches in liboqs or OpenSSL do not auto-flow. We have to watch the upstream changelogs and decide when to bump, then re-test the build matrix across all three platforms.
The external/ tree. benchmarks/src/CRUCIBLE-SOURCE.md records the Crucible pin (commit 15903430e21287ff4617b4975263e93073aee86e). Build scripts under package.json reference the vendored paths.
Risk: upstream security patches require active attention. The pin is a safety property, not a security one. Worth a quarterly review and a process for fast-tracking advisories.
Promoted to principle. Vendor with pinned commits over submodules when a dependency is private, when you want determinism over recency, or when the dependency boundary is part of the product's auditability story. Keystone hits all three.
Sign-pipeline-inert-by-default
The constraint
macOS distribution needs code signing and notarization through Apple's pipeline. That requires Developer Program credentials Keystone does not have today. The two obvious paths are both bad: wait until credentials arrive and write the integration code under enrollment-deadline pressure, or skip signing entirely and ship a Gatekeeper-flagged DMG.
The choice
Wire the pipeline complete and leave it inert. build/notarize.js runs as the afterSign hook. build/entitlements.mac.plist declares the hardened-runtime entitlements. package.json lists all nine benchmark executables under build.mac.binaries for individual signing — Apple's notarization requires every executable in the bundle to be signed, not just the top-level binary. forceCodeSigning: false keeps the path off today. When credentials arrive, three environment variables — APPLE_ID, APPLE_APP_SPECIFIC_PASSWORD, APPLE_TEAM_ID — activate it.
The tradeoff
The pipeline is untested end-to-end. The first real run will almost certainly need debugging: entitlements mismatches, notarization timeouts, signing-identity selection problems. The honest claim is that the integration code is ready to attempt notarization without further work, not that it is ready to ship a notarized DMG today.
build/notarize.js, build/entitlements.mac.plist, package.json build.mac.binaries array. The activation runbook is at docs/superpowers/plans/2026-05-21-macos-distribution-hardening.md.
Risk: untested. First real run debugs. The activation runbook is the safety net — if Step 4 hits an unexpected error, the runbook is the diff between "triaged in an hour" and "triaged over a weekend."
Promoted to principle. Any surface that needs a credential you don't have can be wired complete and left inert behind an env var. Sign-pipeline-inert-by-default is a pattern, not a hack. Adjacent surfaces this generalises to: third-party API keys, payment provider integrations, observability vendors with paid tiers, anywhere the integration shape is knowable but the credential is gated on an external clock.
State of play
- Commits across 14 months
- 99
- PRs merged in the 22-day hardening sprint
- 6
- ADRs in the design trail
- 14
Keystone is in R&D. The measurement engine is functional. The algorithm coverage is comprehensive against the NIST round-3 finalist set plus four classical baselines. All three platforms — macOS arm64 DMG, Windows x64 NSIS, Linux x64 AppImage — package from a clean clone. Fourteen architectural decision records sit under docs/ADR.md covering the trail from process model through chart resize.
What it does not have, said directly:
No release tags yet. The project has not cut a versioned release. The CI gates are green; the release-cut discipline is the next step, not a different project.
macOS signing wired but inert — covered in Section 07 as a deliberate pattern, named again here as honest debt.
IBM Quantum Cloud channel migration written but not executed. IBM is deprecating the legacy ibm_quantum channel. The migration plan at docs/superpowers/plans/2026-04-30-ibm-qiskit-platform-maintenance.md details the move to the new qiskit_ibm_runtime channel, the introduction of a shared runtime_config.py helper, making --api_token optional so Aer simulation works offline, Python version pinning in requirements.txt, and a regression test written before the migration runs. The scripts already import the new runtime client but still call the legacy channel — shor_qiskit.py around line 1129. The discipline story is that the plan exists before IBM forced our hand. The execution is queued.
Hybrid encryption composition policy is a product-level claim, not yet captured in an ADR. The composition runs end-to-end in the app; the why and the alternatives considered are not recorded.
Two persistence layers coexist. infrastructure/db/ is the legacy lowdb wrapper; infrastructure/persistence/ is the newer repository pattern. An in-progress refactor consolidates onto the second one. The dual presence is intentional during the migration window.
None of these blocks the case for the product. The case study quotes the numbers the project can defend today, not the numbers it would prefer to quote in six months.
Landing page: keystone-landing-silk.vercel.app.
Evidence to attach
- Section 01 — Benchmark dashboard screenshot: populated KEM and signature runs side by side on macOS arm64 build of commit 98a3c6a or later.
- Section 02 — Process topology diagram: main / renderer / native addons / spawned benchmarks / Python, exported from one of the 18 drawio sources under keystone/docs/diagrams/.
- Section 04 — Instrumentation panel screenshot: speedometer plus per-run summary card. The dial / gauge SVG assets already exist at keystone/dist/dial_dark*.svg, gauge_dark*.svg, glow_*.svg, needle.svg — they can be assembled into the panel render.
- Section 05 — Comparison view screenshot: KEM + Signature + Symmetric + ClassicalPubKey panels populated simultaneously to show the four-category UI working against the flexible-metrics schema.
- Section 07 — Packaging output tri-panel: .dmg, .exe, .AppImage build artifacts side by side. Could also be a screenshot of the release/ directory listing on each platform.
- Three demo videos at public/videos/keystone_demo{1,2,3}.mp4 are now wired into the process-topology (Section 02), instrumentation-panel (Section 04), and packaging-output (Section 07) surfaces — they autoplay muted, loop, and are hidden for users with prefers-reduced-motion. Comparison view (Section 05) remains recon-pending because that surface wants a static screenshot.
- Optional: a green GitHub Actions run of the macos-arm64.yml workflow showing the test-benchmark-dlls smoke gate passing. Reinforces Section 04's claim that the in-process / process-isolated boundary has a check guarding it.