The predicament of Ethereum - scalability

With the increasing prosperity of the Ethereum ecosystem, one of the long-standing problems surrounding the ecosystem is transaction speed and fees. This is also an ongoing issue for Ethereum, so scalability is the most widely discussed issue. Here is a brief introduction to the history.

The Road to Scalability#

Pos: The separation of block proposers and block validators, the workflow of pos is as follows:

Submit transactions in shards.
Validators add transactions to shard blocks.
Beacon chain selects validators to propose new blocks.
The remaining validators form a random committee to validate proposals in shards.

Both proposing blocks and proving proposals need to be completed within one slot, usually 12 seconds. Every 32 slots form an epoch, and each epoch shuffles the order of validators and re-elects the committee.

After the merge, Ethereum will achieve proposer-builder separation for the consensus layer. Vitalik believes that the ultimate goal of all blockchains is to have centralized block production and decentralized block validation. Due to the dense data of Ethereum blocks after sharding, centralization of block production is necessary to meet the high requirements for data availability. At the same time, there must be a way to maintain a decentralized set of validators that can validate blocks and perform data availability sampling.

What is Sharding? It is the process of horizontally partitioning a database to distribute the workload.#

Sharding is a way of partitioning that can distribute computational tasks and storage workloads in a P2P network. With this processing method, each node does not need to be responsible for processing the entire network's transaction load, but only needs to maintain information related to its partition (or shard). Each shard has its own validator network or node network. The security issue of sharding:

For example, if the entire network has 10 shard chains, it requires 51% of the computing power to disrupt the entire network, so disrupting a single shard only requires 5.1% of the computing power.

The beacon chain is responsible for generating random numbers, assigning nodes to shards, capturing snapshots of individual shards, handling handshakes, equity, and other functions, and coordinating communication between shards to synchronize the network.

A major issue with sharding is cross-shard transactions. Because in sharding, each node group only processes transactions within its own shard, transactions between shards are relatively independent. So how are transfers between users A and B on different shards handled?

Blocks can be discarded, so if A and B are accepted for processing and W and X transactions are selected in #2, the entire transaction cannot proceed. Although the probability of forking is very small.

The previous approach was to shard the data availability layer, with independent proposers and committees for each shard. In the set of validators, each validator takes turns to verify the data in the shard, downloading all the data for verification.

Disadvantages:

Requires tight synchronization technology to ensure that validators can synchronize within one slot.
Validators need to collect votes from all committees, which can cause delays.
And validators downloading all data also puts pressure on them.

The second method is to give up complete data verification and adopt data availability sampling. There are two random sampling methods:

Block random sampling, sampling part of the shards, and if verification passes, validators sign. But the problem here is that there may be missed transactions.
Use erasure codes to reinterpret the data as polynomials, and then use the property of polynomials to recover the data under specific conditions to ensure complete data availability.

Properties of polynomials: Data can be recovered from four points.

As long as more than 50% of the encoded data is available, the entire data is available.

When we perform multiple samplings, the probability of data unavailability is only 2^-n.

The logic is that we convert the data into polynomials using erasure codes and then expand them. The expansion can recover the data.

The problem here is whether the data is correctly expanded during polynomial expansion.

If the data itself is problematic, then the data will be incorrectly recovered after expansion. So how to ensure that the data is correctly expanded?

Celestia uses fraud proofs, which have a synchronization problem.
Ethereum and Polygon Avail use KZG commitments, which do not require honest minority and synchronization issues. But KZG commitments do not have the ability to resist quantum computing attacks, and in the future, Ethereum may switch to zk-SNARKs technology, which has anti-quantum algorithm attacks.

In this field, the most popular ones are zksync and starkware, which use zero-knowledge proofs. They will be discussed in detail later.

What is resistance to quantum attacks: It means that the algorithm does not rely on a large number of mathematical security assumptions.#

KZG commitment: Prove that the value of a polynomial at a specific position is consistent with the specified value.

KZG commitment is just one type of polynomial commitment, which can verify messages without specifying the specific message. The specific process is as shown in the figure:

Compared with Merkle Trees:#

The entire process is: convert the data into polynomials using erasure codes, expand them. Use KZG commitments to ensure that our expansion is valid and that the original data is valid. Then use the expansion to reconstruct the data and finally perform data availability sampling.

Celestia requires validators to download the entire block, while Danksharding uses data availability sampling technology.

Since blocks can be partially available, synchronization needs to be ensured whenever we need to reconstruct the block. When the block is indeed partially available, communication between nodes is used to piece together the block.

Comparison between KZG commitments and fraud proofs:#

It can be seen that KZG commitments can ensure that the expansion and data are correct, while fraud proofs introduce a third party for observation. The most obvious difference is that fraud proofs require a time interval for observers to react and report fraud, which requires synchronization between nodes so that the entire network can receive fraud proofs in a timely manner. KZG is obviously faster than fraud proofs, using mathematical methods to ensure the correctness of the data without waiting.

Celestia's own drawback is the use of large blocks, requiring validators to download all data, which is also the case for Ethereum's Danksharding proto scheme. To solve the potential problems, Celestia will also choose the method of data availability sampling, which requires the use of KZG commitments.

Whether it is KZG or fraud proofs, synchronization is required because there is a probability of block unavailability.