Ethereum limits the size of contract code to 24KiB.. This is especially restrictive as gas limits are being increased, however bigger contracts can become become a DoS vector.
When you deploy code, it’s stored under a hash. When you load that code, you read the whole thing from storage, and then gas is charged. If the blob is too big, this could blow up memory before you even get to the gas meter. That’s the core problem.
We must fix the underlying reason the limit exists before we increase it.
Storing code in chunks
Here’s the idea:
If a contract’s code is small - 32 KiB or less - nothing changes. We store the code under its hash, just like today. This ensures backwards compatibility.
If the code is bigger, we don’t store the blob directly. We store a manifest. The manifest tells us how big the code is, and where to find - chunk by chunk. This way, we can read the code chunk-by-chunk and charge gas as we go (or charge upfront). No more atomic memory bombs. This is very similar to loading large files from filesystem.
Each chunk is at most 32 KiB (the last chunk might be smaller). The manifest is stored under the hash of the full code.
The manifest
It’s just an RLP list prefixed with a magic byte:
0xfe || rlp([ total_length, chunk_hash_0, chunk_hash_1, ..., chunk_hash_n ])
0xfe
is the magic byte that identifies this as a manifest rather than raw bytecode.total_length
is how big the full code is. This can be consumed directly by theEXTCODESIZE
.- Each
chunk_hash_i
is thekeccak256
hash of the corresponding 32-KiB slice of code. - Each chunk is stored in the DB under its hash.
CHUNK_SIZE = 32 * 1024 # 32 KiB
MANIFEST_MAGIC_BYTE = 0xfe # INVALID opcode (see: EIP-141)
def get_code(db, code_hash):
raw = db.get_key(code_hash)
# Check for magic byte indicating a manifest
if len(raw) > 0 and raw[0] == MANIFEST_MAGIC_BYTE:
# Remove magic byte before decoding
manifest = RLP.decode(raw[1:])
total_len = manifest[0]
chunk_hashes = manifest[1:]
code = bytearray()
for h in chunk_hashes:
chunk = db.get_key(h)
if len(chunk) > CHUNK_SIZE:
raise Exception("Invalid chunk size")
code += chunk
if len(code) != total_len:
raise Exception("Length mismatch")
if keccak256(code) != code_hash:
raise Exception("Hash mismatch")
return code
else:
# It's just code, not a manifest
return raw
Pros
- Backwards-compatible. Small contracts keep working exactly as before.
- Content-addressable. You still look things up by code_hash.
- Gas-safe. You can charge gas for each chunk as it’s read.
- Minimal structure. The manifest is just a list: one length, N hashes.
- Storage efficient. Shared code chunks across contracts can reuse storage.
Cons
- The manifest is not strictly content-addressable
(hash(value) ≠ key)
, but deterministically derived from the content. This must be accounted for in applications relying on pure content-addressed storage.
Possible enhancement: Dynamic chunk size
We could make the chunk size dynamic by including it in the manifest itself. This would allow the system to adjust chunk sizes over time or optimize for different contract sizes.
Enhanced manifest format:
0xfe || rlp([ total_length, chunk_size, chunk_hash_0, chunk_hash_1, ..., chunk_hash_n ])
With this change, the chunk_size
would specify the maximum size of each chunk in this particular manifest. This adds flexibility:
- Very large contracts could use bigger chunks for efficiency
- The protocol could evolve chunk size parameters over time