Formal verification of vector/SIMD compute kernels — mapping the territory
RVS Our work
Next Planned
Partial In progress
Exists Others' work
Empty Nobody
Skip Deferred
Overview
Verification Stack: Model → Silicon
L7
Model— Does the neural network converge? ML theory.
L6
Quantization— Does Q4_0 ≈ FP32? Error bounds.
L5
Compiler— C/LLVM IR → correct assembly? CompCert, Alive2. Vector codegen = open gap.
L4
Compute Kernel— Does assembly implement the spec? (the innermost vectorized hot loop, not OS kernel)★ RVS — 13 kernels proven.
L3
ISA Spec— Machine-readable formal spec. Sail (RISC-V), ASL (ARM), Wasm spec (W3C). Trusted.
L2
RTL— Does hardware implement ISA? riscv-formal, Arm CCA, Intel FV.
L1
Silicon— Does the chip match RTL? EDA / fab verification.
L1 = physical, L7 = application (cf. OSI model). Lower layers are trusted by upper layers. Click A1 for a concrete walkthrough using llama.cpp as running example.
Section 1
Formal Work by Layer × ISA
RISC-V
Wasm
ARM
x86
LoongArch
L5
CompCert (no RVV) Alive2 (opt only) Nextcodegen TV
Empty Wasm codegen unverified
Alive2 (opt only) EmptyNo NEON/SVE codegen
Alive2 (opt only) EmptyNo AVX codegen
Empty No LLVM backend FV
L4
★ RVS 13 kernels UNSAT 3-layer pipeline
★ RVS Cross-ISA proof RVV == Wasm SIMD128
Empty ASL spec exists, no symbolic SIMD compute kernel prover
Empty No Sail model. ACL2 partial.
Empty No formal ISA spec
L3
SailOfficial (Cambridge)
Wasm specW3C formal semantics
ASLArm Spec Language (ASLi)
PartialACL2 (incomplete)
EmptyManual only
L2
riscv-formalSymbiYosys
N/A (no hardware)
Arm CCAInternal FV
Intel FVInternal
Vendor internal
L7 (Model), L6 (Quantization), L1 (Silicon) omitted — not in our scope. Shown in overview above.
Why RISC-V and Wasm are center
Our pipeline requires both a formal L3 spec and a symbolic execution toolchain (the L3→L4 bridge). Only RISC-V and Wasm have both.
ISA
L3 Spec
L3→L4 Bridge
Status
RISC-V
Sail(complete, machine-readable)
Isla(symbolic executor → SMT)
★ Ready
Wasm
W3C spec(formal semantics)
Reference interp(+ simpler semantics)
★ Ready
ARM
ASL(complete, machine-readable)
ASLi = interpreter only(not symbolic)
No bridge
x86
Intel SAE = prose(ACL2 partial)
Nothing
No spec
LoongArch
Manual only
Nothing
Nothing
This is the technical reason RISC-V and Wasm are our primary targets. ARM has L3 but not the bridge; x86 has neither. Building an ASL→SMT symbolic executor (an "Isla for ARM") is a significant engineering effort and a potential future research direction — but not ours.
▽ zoom into L4: RISC-V ecosystem ▽
Section 2
RISC-V L4 Ecosystem: Extension × Concern
RISC-V is modular — each extension family has independent instructions. Within each, compute kernel verification decomposes into four concerns by solver technique.
Integer QBV (bitvector)
Memory QBV + Array
Control Flow Induction / BMC
Float FP theory
RV64I/MBase + Multiply
Trivial
Scalar ops are simple. Pipeline supports them; no research novelty. riscv-formal covers L2.
Trivial
Standard load/store
Trivial
Branch/jump
N/A
RV64VVector SIMD
★ RVS
13 kernels
Q4_0–Q6_K, vecadd, conv1d, ReLU
Partial
Loads axiomatized as fresh symbols. vse32 store trace captured (72 write events).
Partial
Single iteration verified. vsetvli, strip-mining loops, mask (vm=0) TBD.
Skip
FP16 scale factor in quant dot products. Different solver domain.
Zvk*Vector Crypto
Next
AES / SHA / GCM
Sail spec ready (2024). Reuse pipeline.
Empty
Crypto I/O simpler
Empty
Simpler control flow
N/A
Zb*Bitmanip
Empty
clz, ctz, rotate — straightforward BV
N/A
N/A
N/A
RV64F/DScalar Float
N/A
N/A
N/A
Skip
Berkeley FPU (L2). FP theory at L4.
RV64AAtomic
N/A
Empty
Needs memory model
N/A
N/A
same BV solver←——→new technique←————→different domain
"Trivial" = our pipeline handles it, but scalar ops lack the SIMD complexity that makes L4 verification a research contribution. The tool covers them; the paper does not.
▽ zoom into L4: Wasm ecosystem ▽
Section 3
Wasm L4 Ecosystem: Proposal × Concern
Wasm is also modular (proposals). The decomposition mirrors RISC-V — each row maps to a RISC-V extension.
Integer QBV
Memory Linear mem model
Control Flow Structured (simpler)
Float IEEE 754
SIMD128128-bit fixed vectors
★ RVS
Cross-ISA proof
RVV == Wasm SIMD128 (Q8_0). Maps to RV64V integer.
Empty
Linear memory is simpler than RV (no PMA/PMP). Maps to RV64V memory.
Empty
Structured CF (block/loop/if) — no arbitrary jumps. Simpler than RV.
Empty
f32x4, f64x2. No FP16 native. Maps to RV64V float.
Relaxed SIMDPlatform-dependent
Next
relaxed_dot_i8x16 — maps to different ISA backends (NEON/AVX/RVV). Cross-ISA proof opportunity.
Same as SIMD128
Same as SIMD128
relaxed_madd etc.
WebCrypto+ SIMD crypto kernels
Next
AES/SHA via SIMD128 ops. Cross-ISA with Zvk*. Maps to Zvk* integer.
Wasm is structurally simpler: no PMA/PMP (linear memory), no arbitrary jumps (structured CF), no variable-length vectors (fixed 128-bit). This makes cross-ISA proofs tractable — the hard part is always on the RISC-V side.
▽ where to expand next ▽
Section 4
Expansion Map: Adjacency-Based Territory Growth
Next targets are adjacent to conquered territory. Two parallel fronts: Compiler (L5) expands vertically, Crypto expands horizontally.
RVV Int
Wasm Int
RVV Crypto
Wasm Crypto
L5
wave 2
codegen TV
wave 4
wave 4
future
L4
★ done
13 kernels
★ done
cross-ISA
wave 2
Zvk* AES/SHA
wave 3
cross-ISA crypto
Wave 1(done) RVV integer + Wasm cross-ISAWave 2(next, parallel) L5 compiler TV + Zvk* cryptoWave 3Wasm crypto cross-ISAWave 4+Expand codegen to all targets
Why L5 first? We already hold both endpoints: MLIR/LLVM IR specs (top) and Isla ISA traces (bottom). The compiler codegen gap in between is the last unverified link — and nobody does it for any vector ISA. Vertically adjacent to our L4 territory, maximum infrastructure reuse, highest impact.
Smart contracts, zkWasm correctness. Natural extension after crypto cross-ISA.
Appendix A — Reference Only (not in scope)
ARM L4 Ecosystem: Extension × Concern
ARM has a mature ISA spec (ASL) and rich extension set. Included for landscape completeness; not in our research scope.
Integer QBV
Memory QBV + Array
Control Flow Induction / BMC
Float FP theory
AArch64Base (v8/v9)
Trivial
Scalar ops. ASL spec complete. Arm ISA FV team covers L2.
Trivial
Standard load/store. TrustZone address space.
Trivial
Branch/compare.
N/A
NEON128-bit SIMD (v7/v8)
Empty
ASL spec exists. ASLi can extract semantics but no symbolic kernel prover built. Fixed 128-bit vectors.
Empty
NEON load/store lanes.
Empty
No predication (unlike SVE).
Empty
FP16/BF16 in NEON. Nobody at L4.
SVE / SVE2Scalable Vector (v9)
Empty
Variable-length (128–2048 bit). VLA model like RVV. ASL spec exists. Highest research value for ARM L4.
Empty
Gather/scatter, first-faulting loads.
Empty
Predicate-first model (unique to SVE). More expressive than RVV masks.
Empty
SVE FP ops.
CE / AES / SHACrypto Extensions
Empty
AES/SHA/SM3/SM4 instructions. ASL spec complete. No L4 prover.
N/A
N/A
N/A
SMEScalable Matrix (v9.2)
Empty
Matrix tile operations. Very new (2023). Outer product + accumulate.
Empty
Streaming mode memory.
Empty
Streaming SVE mode transition.
Empty
BF16/FP16 matrix ops.
ARM's L3 (ASL) is strong — machine-readable, maintained by Arm Ltd. The gap is L4: nobody has built a symbolic kernel verifier on top of ASL for NEON/SVE/CE. ASLi (interpreter) exists but is not a symbolic executor. The closest analog to our Isla would be building an ASL-to-SMT pipeline, which is a significant engineering effort.
Appendix B — Reference Only (not in scope)
x86 L4 Ecosystem: Extension × Concern
x86 has the largest deployed SIMD surface area but the weakest formal spec. Included for completeness.
Integer QBV
Memory QBV + Array
Control Flow Induction / BMC
Float FP theory
x86-64Base scalar
Trivial
ACL2 model (Centaur/AMD). Intel FV internal. K Framework partial model.
Partial
Complex memory model (TSO). Intel internal verification.
Trivial
Branch/jump.
N/A
SSE / AVX2128/256-bit SIMD
Empty
No formal spec (Intel SAE is prose). ACL2 has some SSE models. Widest deployment, worst formalization.
Empty
Aligned/unaligned loads.
Empty
No masking (pre-AVX-512).
Empty
SSE/AVX FP ops.
AVX-512512-bit SIMD + mask
Empty
No Sail model. Most complex x86 SIMD. Opmask registers (like RVV vm). ~5000 instruction variants.
Empty
Gather/scatter.
Empty
Opmask predication.
Empty
FP16 (AVX-512 FP16).
AES-NI / SHACrypto
Empty
AESENC/AESDEC/SHA instructions. No formal spec. Intel whitepaper only.
N/A
N/A
N/A
AMXAdvanced Matrix (Sapphire Rapids+)
Empty
Tile-based matrix multiply. Very new. Analogous to ARM SME.
Empty
Tile load/store.
Empty
TILECFG state.
Empty
BF16/FP16 tiles.
x86's fundamental problem is L3: there is no machine-readable formal ISA spec. Intel's Software Architecture Manual (SAE) is prose; ACL2 models (Centaur) cover a subset. This makes x86 the hardest target for our pipeline — you'd need to build the ISA model first. Alive2 covers LLVM optimizations (L5 partial) but not codegen to specific SIMD extensions.
References
Cell References
Click any cell above to jump here. Click a cell ID below to jump back.
A — Overview
A1–A7
How L7→L1 works in practice: llama.cpp as a running example
ASL (Arm Architecture Specification Language) — machine-readable spec maintained by Arm Ltd. ASLi — ASL interpreter by Alastair Reid (Arm Research). Interprets ASL specs but does NOT produce symbolic SMT traces; it is a concrete interpreter, not a symbolic executor like Isla.
B14
L3 × x86
ACL2 x86 model (Centaur/AMD) — partial formal model for x86 ISA in ACL2 theorem prover. Covers scalar subset; SIMD incomplete. Intel SAE (Software Architecture Manual) is prose only — no machine-readable formal spec.
ASL — Arm Architecture Specification Language. Machine-readable, maintained by Arm Ltd. Published with each architecture release.
C8
ARM L3→L4 Bridge (missing)
ASLi is an interpreter, not a symbolic executor. It cannot produce SMT traces from ASL specs. Building an "Isla for ARM" (ASL→SMT) is an open engineering challenge.
L5 × RVV Integer — Compiler Codegen TV — next target
CompCert — compcert.org(verified C compiler, no RVV backend) Alive2 — alive2.llvm.org(LLVM IR opt only, not codegen) Gap: nobody verifies LLVM vector codegen (IR → RVV asm) for any ISA. We hold both endpoints — MLIR/LLVM IR specs (top) + Isla ISA traces (bottom). Translation validation per compilation is the most practical path.