PeerDAS (Peer Data Availability Sampling) is a networking protocol that allows beacon nodes to perform data availability sampling (DAS) to ensure that blob data has been made available while downloading only a subset of the data. PeerDAS utilizes gossip for distribution, discovery for finding peers of particular data custody, and peer requests for sampling.
Motivation
DAS is a method of scaling data availability beyond the levels of EIP-4844 by not requiring all nodes to download all data while still ensuring that all of the data has been made available.
Providing additional data availability helps bring scale to Ethereum users in the context of layer 2 systems called “roll-ups” whose dominant bottleneck is layer 1 data availability.
Specification
We extend the blobs introduced in EIP-4844 using a one-dimensional erasure coding extension. Each row consists of the blob data combined with its erasure code. It is subdivided into cells, which are the smallest units that can be authenticated with their respective blob’s KZG commitments. Each column, associated with a specific gossip subnet, consists of the cells from all rows for a specific index. Each node is responsible for maintaining and custodying a deterministic set of column subnets and data as a function of their node ID.
Nodes find and maintain a diverse peer set and sample columns from their peers to perform DAS every slot.
A node can reconstruct the entire data matrix if it acquires at least 50% of all the columns. If a node has less than 50%, it can request the necessary columns from the peer nodes.
The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “NOT RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in RFC 2119 and RFC 8174.
Networking
This EIP introduces cell KZG proofs, which are used to prove that a KZG commitment opens to a cell at the given index. This allows downloading only specific cells from a blob, while still ensuring data integrity with respect to the corresponding KZG commitment, and is therefore a key component of data availability sampling. However, computing the cell proofs for a blob is an expensive operation, which a block producer would have to repeat for many blobs. Since proof verification is much cheaper than proof computation, and the proof size is negligible compared to cell size, we instead require blob transaction senders to compute the proofs themselves and include them in the EIP-4844 transaction pool wrapper for blob transactions.
To this end, during transaction gossip responses (PooledTransactions), the wrapper is modified to:
The tx_payload_body, blobs and commitments are as in EIP-4844, while the proofs field is replaced by cell_proofs, and a wrapper_version is added. These are defined as follows:
wrapper_version - one byte indicating which version of the wrapper is used. For the current one, it is set to 1.
cell_proofs - list of cell proofs for all blobs, including the proofs for the extension indices, for a total of CELLS_PER_EXT_BLOB proofs per blob (CELLS_PER_EXT_BLOB is the number of cells for an extended blob, defined in the consensus specs)
Note that, while cell_proofs contain the proofs for all cells, including the extension cells, the blobs themselves are sent without being extended (CELLS_PER_EXT_BLOB / 2 cells per blob). This is to avoid sending redundant data, which can quickly be computed by the receiving node.
In other words, cell_proofs[i * CELLS_PER_EXT_BLOB + j] is the proof for cell j of compute_cells(blobs[i]), where compute_cells(blob) outputs all cells of blob, including the extension cells.
The node MUST validate tx_payload_body and verify the wrapped data against it. To do so, ensure that:
There are an equal number of tx_payload_body.blob_versioned_hashes, blobs and commitments.
The KZG commitments hash to the versioned hashes, i.e. kzg_to_versioned_hash(commitments[i]) == tx_payload_body.blob_versioned_hashes[i]
The KZG commitments match the corresponding blobs and cell_proofs. This requires computing the extension cells for all blobs, and verifying all cell_proofs. (Note: all cell proofs can be batch verified at once)