Zero-knowledge proof generation speed comparison

The following text is in Chinese and needs to be translated into English:

TPS-related factors of the zkVM project, 1-5 are related to zero knowledge.

From the perspective of EVM compatibility solutions, proof of computational complexity is required.

EVM compatibility (high-level language equivalence, zkVM) > EVM equivalence > ETH equivalence (PSE)

Starknet = Sin7Y > Zksync > PloygonZkevm > Scroll > Ploygon Zero > PSE
From the perspective of polynomial commitment, the speed of proof generation is considered.

Small field + FRI > FRI > FRI&KZG > KZG (the order of algorithm complexity may vary)

Ploygon Zero > Starknet > Sin7Y > PloygonZkevm > Zksync = Scroll = PSE
From the perspective of algorithm direction, the possibility of implementing hardware parallel proofs is considered.

Single algorithms are better for optimization, and MSM is currently better than FFT.

Pure MSM > Pure FFT > Hybrid

Sin7Y (Pure MSM) > Ploygon Zero (Pure FFT) > Starknet > PloygonZkevm > Zksync > Scroll > PSE
Recursive proof optimization of zero-knowledge proof systems

Plonky2 > Stark > Plonk = Halo2

Ploygon Zero > Sin7Y > Starknet > PloygonZkevm > Zksync = Scroll = PSE
Composability of zero-knowledge proof systems

Ploygon Zero = Sin7Y > PloygonZkevm > Starknet > Zksync = Scroll = PSE
Parallelism of VM itself (transaction parallelism)

OlaVM > zkVM > zkEVM

Sin7Y > Starknet > Zksync > PloygonZkevm > Scroll > Ploygon > PSE

The development language for implementing VM also has a certain impact on efficiency.

The timing for the mainnet to run a large number of applications (the sooner, the better)

Zksync > Starknet > PloygonZkevm > Scroll > Sin7y > Ploygon Zero > PSE
The timing for running L3 (the sooner, the better)

Starknet > Zksync > PloygonZkevm > Sin7y > Ploygon Zero > Scroll

Depending on the different proof systems, the proof generation process may vary, but the bottlenecks are:

Variable-base and fixed-base scalar multiplication (MSM)
Fast Fourier Transform (FFT) and inverse FFT

In systems where both FFT and MSM exist, approximately 70% of the time spent generating proofs is spent on MSM, while the rest is dominated by FFT.

Both MSM and FFT are slow, but there are methods to improve their performance:

MSM has embarrassingly parallelism, which can be accelerated by running them on multiple threads. However, even on hundreds of cores, multiplication still takes a lot of time. This means that the same operations are frequently repeated and will consume most of the available memory on the device. In short, MSM requires a large amount of memory and is still slow even when highly parallelized.
FFT heavily relies on frequent shuffling of data during algorithm runtime. Additionally, they require a large amount of bandwidth when running on hardware. Shuffling means you need to "randomly" load and unload elements, for example, >100GB datasets on hardware chips with 16GB or less memory. While operations on hardware are very fast, the time to load and unload data over the network eventually significantly slows down the operation speed.

In summary:

MSM has predictable memory access, allowing for a large amount of parallelization, but their cost is still high due to the required computational and memory resources.
FFT has random memory access, making them unfriendly to hardware and naturally difficult to run on distributed infrastructures.

In addressing the slowness of large-scale MSM and FFT, the most promising work we see is PipeZK. In their paper, the authors describe a method to make MSM cheaper by skipping redundant calculations using the Pippenger algorithm.

They also describe a method to "unroll" FFTs so that they can be performed without significant shuffling, improving speed due to the now predictable memory access pattern.

Recently, the Plonk family proposed a new solution that is more suitable for parallelization after optimizing the MSM direction.

Parallelization

Rust VM

Cairo 1.0

Regenesis