June 21st 2020
Founder & CEO of QuarkChain
Among the existing scalability solutions, sharding is probably the most adopted solution to enable horizontal scalability.
The basic idea of sharding is to partition a global system state into multiple sub-states, i.e., shards, and to process the transactions in each shard relatively independently.
With an appropriate design of the sharding technique, the capacity of the system is able to increase as the numbers of shards and processors (nodes) increase, in other words, linear scale.
To apply sharding, there are a couple of key questions we need to answer:
What is the global system state and how to change the system state?
For example, in a distributed key-value (KV) store (e.g., BigTable, Cassandra), the system state is a map from arbitrary bytes (key) to arbitrary bytes (value), and the operations to change the system state are: create, read, update, and delete (CRUD).
Another example is a distributed append-only file system (e.g., Google File System (GFS), Hadoop Distributed File System (HDFS)), where the system state is a set of directories and files, and the operations are two sets: create, delete, and list operations in directories, and open, append, read, and close operations of a file.
How to partition the system state to shards so that all operations can be processed correctly and efficiently?
The way to partition is critical to system performance, and the system can perform poorly if the design of partition is inappropriate. To design partition, there are several key aspects we need to consider:
- State to partition: Which parts of the system state should be partitioned? To simplify system model, we should only partition the parts of the system state that either 1) is with size too large to be in a single node; or 2) requires high operations per second, while we may not partition the parts of the system state whose size is small enough to host in single machine and operations are infrequent. An example is that the early versions of GFS/HDFS only partition data in files, while they do not partition directories since the data size of directories is relatively small (compared to data in files), and the directory operations are infrequent.
- Ensure operation (transaction) semantics: How to partition the system state to meet the operation semantics? One key semantics is atomicity, and if an operation changes the states in multiple shards atomically, such operations need proper coordination between shards (e.g., via distributed lock), which can be costly. As a consequence, the performance of such operations can hardly benefit from sharding technology — sometimes the performance can be even worse. To avoid such issues, most modern sharding systems support atomic batch operations in one shard and let the upper layer applications handle the complicated multi-shard atomicity issue.
- Balanced load/size: How to partition the system state so that a), the loads to all shards are statistically evenly distributed; and b), the sizes of the partitioned system states are also statistically evenly distributed over all shards. Achieving these is the key prerequisite of linear scale — as long as the loads/sizes are evenly distributed after adding more shards, we are able to increase system capacity linearly by adding more nodes to process new shards. Note that the distributions are highly related to user operation patterns and may cause uneven load (both temporarily or permanently) if user operation patterns change a lot over time.
- Reshard: How to add more shards and how could new nodes to be able to serve the new shards. After adding more shards, the new shards will consist of some states from old shards, which will be migrated to the new nodes. The migration during resharding may take time and pause the existing services. In addition, we also need to ensure the semantics of supported operations are the same before- and after- reshard.
Before answering the aforementioned questions for QuarkChain, let us first introduce the system model and difficulties of sharding of the existing blockchains.
System State and Transactions of Existing Blockchains
We consider an account-based blockchain model similar to Ethereum, where the system state is basically a key-value map from an address to its account data. There are two types of addresses:
- User address; and
- Smart contract address;
and the account data consist of
where the code and storage of a user address are empty.
There are two types of transactions supported with various combinations of CRUD operations:
1, Transfer transaction between two user addresses, which basically update the balances of two addresses and nonce of the sender;
2, Smart contract transaction, which may
- Update the nonce of the sender;
- Update the balances of multiple user addresses;
- Update the balances and storage of multiple smart contracts via call and
- Create multiple user addresses and their account data;
- Create multiple smart contracts.
Challenges in Blockchain Sharding
Compared to the existing scalability solutions, the good thing is that the system state of the blockchain is exactly the same as a distributed KV store such as BigTable and Cassandra; however, the bad news is that the transaction semantics is much more complicated than just simple CRUD operations — a smart contract transaction could potentially perform any CRUD operations on any key-value pairs of the system state. If the state is partitioned to different sub-states (shards), ensuring atomicity across multiple shards will be extremely difficult (most of time impossible) to scale. How to partition the blockchain ledger is the fundamental problem of blockchain sharding.
In addition, more challenges come for a decentralized world as we need to build proper consensus to process transactions in all shards in a secure way. New sharding consensus opens new possibilities of attacks, and thus without comprehensive analysis on the thread model, a shard may be easily compromised and thus the whole network can be easily broken.
Besides the challenges of partitioning and consensus, another common problem of sharding is interoperability among shards, i.e., cross-shard transactions. The underlying logic is usability — a user should be able to access all resources including smart contracts and other user accounts across all shards. How to develop efficient and secure cross-shard transactions is a key topic.
In the following articles, we will discuss QuarkChain’s solutions to the challenges in blockchain sharding. In addition, we will compare QuarkChain with existing centralized systems such as Google’s BigTable — and illustrate similarities and differences with centralized counterparts.