Fraud proofs — Secure on-chain scalability – Hacker Noon

If we read the paper we can find that it talks about two kinds of fraud-proofs:

  • State transition fraud-proofs.
  • Data availability fraud proofs.

Each of them tries to construct a proof that some undesired situation has happened.

State transition Fraud Proofs

Let’s come back to the idea of blockchains as state-machines. New states are created after all transactions in a block are applied.

The first core concept described in the paper is including intermediate states between transactions. Instead of only having a single Merkle root at the end of each block, we could add some intermediate Merkle root between every transaction.

If a full-node detects that a new block has an invalid state transition, it could tell a light-node the following:

  1. Hey, I found that this block is wrong.
  2. After processing n-1 transactions from the beginning of the block, if I process the next transaction, the Merkle root of the state is different from the following intermediate state said by the miner.
  3. I’m sending you a bunch of Merkle proofs with the minimal data needed so you can check by yourself.

As in many parts of most blockchains, Merkle proofs are a crucial tool for proving that some data corresponds to a block. The light-node only have to trust the block header it already has.

We don’t need to save intermediate states between each transaction. An option is saving them every two, five or more transactions. Doing this would save space in the state tree, but make fraud proofs bigger since more Merkle proofs are necessary between states.

Fraud proofs like this are powerful since only one honest full-node can prove to any number of light-nodes that a block is invalid; no majorities are needed. If after some time threshold a light-node doesn’t receive a fraud-proof, it seems reasonable that the block is valid.

Including economic incentives and penalizations for valid and invalid fraud proofs respectively is essential for desired full-nodes behavior.

The Data Availability problem

Fraud proofs for state transitions are great, but they rely on a critical assumption: all the block data is available. If the block miner only publishes the block header without the corresponding correct data, then it can’t be proven wrong. The same way I can’t show that a sum of two numbers is wrong if I don’t know them.

Moreover, even if 99% the data is available, the remaining 1% may be essential to prove that a block is invalid. We need 100% data availability. For block validation, this is a strict requirement, since data could be unavailable for multiple reasons, not only intentionally by malicious nodes. The correct rationale is making data unavailability hard for a malicious node.

This converts a 100% data availability problem into a 50% data availability problem. — Vitalik Buterin

The proposed solution uses Erasure Codes, in particular, multi-dimensional Reed-Solomon codes. In a nutshell, this means transforming the block data from M chunks to N chunks (M

The paper provides a more helpful way to see this idea:

Page 13 of the paper

This powerful tool together with pulling data using random-sampling provides strong mathematical guarantees for the reconstruction of the original data. We can find a full mathematical justification of this in the paper, but here’s an extract of an important conclusion:

Page 21 of the paper

In plain English, say you have 1000 light-nodes doing the random-sampling of the extended chunks of data. Increasing the sample-size will ensure that, eventually, they will ask for a chunk that the malicious party won’t like to share. Thus giving more certainty about detecting a data-availability problem.

In the paper, they also discuss another kind of fraud proofs within data availability, in particular proving that the miner generated the Reed-Solomon codes incorrectly. Since we need M out of N chunks to reconstruct a partition of block data, this may be enough to detect if the N chunks match the corresponding Merkle root of the partition.

Conclusion

Fraud proofs and erasure codes are useful tools for scaling public blockchains. They empower light-nodes to have their own opinion in discarding blocks without trusting a majority of honest full-nodes.

Moreover, solving the data-availability problem not only allows fraud proofs to be constructed but also addresses other issues:

Even if succinct zero knowledge proofs can be used to verify correctness, an attacker getting away with publishing unavailable blocks and getting them included in the chain is still very bad, as such a thing happening denies all other validators the ability to fully calculate the state, or to make blocks that interact with the portion of the state that is no longer accessible.

Shortly,

  • Merkle proofs are essential for proving that data belongs to a block.
  • Intermediate states significantly reduce the amount of Merkle proofs needed to prove that state transitions are wrong.
  • Being able to construct the Merkle proofs naively implies 100% data availability guarantees, which isn’t desirable.
  • Erasure codes plus random-sampling give strong mathematical guarantees that all the data needed from the block will be available, in particular, if the miner isn’t malicious.
  • A miner trying to trick the network by sharing wrong information can be detected thanks to the multi-dimensional characteristic of Reed-Solomon codes plus Merkle roots of partial data.

The mentioned paper, as well as the paper-related post, also indicate that zk-SNARKS/zk-STARKS are other options for proving correct state transitions and data-availability. But those are different and exciting monsters.

Most of what I mentioned in this article is a simple summary of the full idea within those resources. I learned a lot from reading them, so I encourage you to do the same!

read original article here