Inside the
Fabric vpu
ARCHITECTURE

TECHNOLOGY


Cryptography is evolving rapidly, much like AI workloads have evolved since the first neural nets. In the space of ZKPs, new proof systems or tweaks to existing systems are invented every month.

This might seem to pose a challenge for custom cryptography hardware. Fixed-function ASICs, like Bitcoin mining chips, can only solve one type of problem and cannot keep up with evolving needs and diversified demand for different proof systems. Other more general-purpose chips like GPUs or FPGAs are built for everything from graphics to artificial intelligence, and as a result, can’t fully deliver on the performance requirements of next-gen cryptography.

This is why Fabric built the VPU. To achieve the high performance of custom silicon while also providing the flexibility to handle a wide range of cryptography workloads now and in the future. Based on our collective centuries of experience in AI hardware architecture, which faced the same demands for performance and flexibility.

01/

PERFORMANCE MEETS VERSATILITY

Infinitely programmable


Our ISA enables infinite flexibility for the ever-changing world of advanced cryptography. Whether you’re using affine or projective elliptic curve points, whether you’re verifying EdDSA signatures or proving a zk-SNARK, whether you’re using Poseidon1, Poseidon2, or even SHA256 for your hash function, our ISA has got you covered.

Extremely performant


How do we break the performance vs programmability tradeoff? 

The VPU’s patent-pending architecture blends ideas seen in GPUs and traditional CPUs, using an array of custom RISC-like processing engines that prioritize large-integer prime field math. In the domain of cryptography, its unique design transcends the conventional programmability vs performance tradeoff.

An advanced datapath performs a huge amount of computation in each engine, natively operating on up to 384-bit numbers instead of composing operations from int32. Built-in support for modulo math provides convenient prime field instructions, and control flow instructions provides flexibility for implementing a range of cryptography kernels.

02/

CRYPTOGRAPHY-NATIVE ARCHITECTURE

Built just for cryptography


The Fabric VPU’s architecture is designed 100% for cryptography. Like the GPU, the VPU is a huge array of parallel cores. Unlike the GPU, which mostly focuses on floating-point math for AI, the VPU architecture is dedicated to accelerating the big integer math and the data movement primitives required for cryptography.

To handle even the most intensive workloads


Our patent-pending architecture includes high-performance arithmetic, unique memory architectures, and innovative instructions optimized for cryptography. With high-speed memory access to gigabytes of memory and enormous bandwidths to move around data, the VPU can handle entire proof systems end-to-end, including witness generation and polynomial evaluation.

03/

MASSIVE SPEEDUPS OVER THE GPU

For SNARKs, STARKs and whatever else you create


Our architecture significantly accelerates all major cryptographic primitives over the GPU, including every part of the ZK-SNARK and ZK-STARK protocols in common use. The VPU ships with a software development kit that comes with built-in libraries for all of these primitives, from Pippenger’s algorithm to Merkle tree generation to NTT on any prime field.

With revolutionary instruction efficiency


The VPU instruction set architecture (ISA) has native instructions for number theory. On the GPU, many int32 operations must be pieced together to complete a modular arithmetic operation, which then takes multiple cycles of machine runtime. On the VPU, write a single line for your exact integer precision and experience the nanosecond of joy it takes for the VPU to run your instruction. It’s like magic!

04/

POWERFUL SOFTWARE FROM DAY 1

Compiler stack co-designed from the start


Programmable silicon achieves its full potential only when it’s co-designed with a powerful compiler stack. The Fabric VPU’s software architecture and toolchain will allow users to implement state-of-the-art performance optimizations to core kernels and reuse them in higher level workloads. Our kernel-level programming model and LLVM-based kernel compiler is designed to expose and exploit data parallelism while achieving high utilization of the VPU cores.

Control your own data flow


We also offer a higher-level, user-friendly programming model to quickly implement whole algorithms (such as proof systems) end-to-end. Our software allows our chip to work smarter, not just harder, by allowing users to express the flow of data through the algorithm so that our compiler and runtime can efficiently allocate memory and orchestrate execution by minimizing data movement in the system.

Seamless integration with the languages you love


Our frontend can be natively used in familiar languages like Rust, C++, and Python. This form of programming enables users to express the information and data that is known statically so that our compilation system can perform precomputation and exploit optimizations. Finally, our runtime is designed to be general purpose, which means that changing ZK circuits or even proof systems doesn’t require redeploying executables on a production fleet.