Exploring chunked bytecode storage

Ethereum limits the size of contract code to 24KiB.. This is especially restrictive as gas limits are being increased, however bigger contracts can become become a DoS vector.

When you deploy code, it’s stored under a hash. When you load that code, you read the whole thing from storage, and then gas is charged. If the blob is too big, this could blow up memory before you even get to the gas meter. That’s the core problem.

We must fix the underlying reason the limit exists before we increase it.

Storing code in chunks

Here’s the idea:

If a contract’s code is small - 32 KiB or less - nothing changes. We store the code under its hash, just like today. This ensures backwards compatibility.

If the code is bigger, we don’t store the blob directly. We store a manifest. The manifest tells us how big the code is, and where to find - chunk by chunk. This way, we can read the code chunk-by-chunk and charge gas as we go (or charge upfront). No more atomic memory bombs. This is very similar to loading large files from filesystem.

Each chunk is at most 32 KiB (the last chunk might be smaller). The manifest is stored under the hash of the full code.

code chunk

The manifest

It’s just an RLP list prefixed with a magic byte:

0xfe || rlp([ total_length, chunk_hash_0, chunk_hash_1, ..., chunk_hash_n ])

0xfe is the magic byte that identifies this as a manifest rather than raw bytecode.
total_length is how big the full code is. This can be consumed directly by the EXTCODESIZE.
Each chunk_hash_i is the keccak256 hash of the corresponding 32-KiB slice of code.
Each chunk is stored in the DB under its hash.


CHUNK_SIZE = 32 * 1024  # 32 KiB
MANIFEST_MAGIC_BYTE = 0xfe # INVALID opcode (see: EIP-141)

def get_code(db, code_hash):
    raw = db.get_key(code_hash)

    # Check for magic byte indicating a manifest
    if len(raw) > 0 and raw[0] == MANIFEST_MAGIC_BYTE:
        # Remove magic byte before decoding
        manifest = RLP.decode(raw[1:])
        total_len = manifest[0]
        chunk_hashes = manifest[1:]

        code = bytearray()
        for h in chunk_hashes:
            chunk = db.get_key(h)
            if len(chunk) > CHUNK_SIZE:
                raise Exception("Invalid chunk size")
            code += chunk

        if len(code) != total_len:
            raise Exception("Length mismatch")

        if keccak256(code) != code_hash:
            raise Exception("Hash mismatch")

        return code
    else:
        # It's just code, not a manifest
        return raw

Pros

Backwards-compatible. Small contracts keep working exactly as before.
Content-addressable. You still look things up by code_hash.
Gas-safe. You can charge gas for each chunk as it’s read.
Minimal structure. The manifest is just a list: one length, N hashes.
Storage efficient. Shared code chunks across contracts can reuse storage.

Cons

The manifest is not strictly content-addressable (hash(value) ≠ key), but deterministically derived from the content. This must be accounted for in applications relying on pure content-addressed storage.

Possible enhancement: Dynamic chunk size

We could make the chunk size dynamic by including it in the manifest itself. This would allow the system to adjust chunk sizes over time or optimize for different contract sizes.

Enhanced manifest format:

0xfe || rlp([ total_length, chunk_size, chunk_hash_0, chunk_hash_1, ..., chunk_hash_n ])

With this change, the chunk_size would specify the maximum size of each chunk in this particular manifest. This adds flexibility:

Very large contracts could use bigger chunks for efficiency
The protocol could evolve chunk size parameters over time

Storing code in chunks#

The manifest#

Pros#

Cons#

Possible enhancement: Dynamic chunk size#

Storing code in chunks

The manifest

Pros

Cons

Possible enhancement: Dynamic chunk size