Possible futures of the Ethereum protocol, part 6: The Splurge

2024 Oct 29 See all posts


Possible futures of the Ethereum protocol, part 6: The Splurge

Special thanks to Justin Drake and Tim Beiko for feedback and review

Some things are just not easy to put into a single category. There are lots of "little things" in Ethereum protocol design that are very valuable for Ethereum's success, but don't fit nicely into a larger sub-category. In practice, about half of which has ended up being about EVM improvements of various kinds, and the rest is made up of various niche topics. This is what "the Splurge" is for.


The Splurge, 2023 roadmap



The Splurge: key goals

In this chapter

EVM improvements

What problem does it solve?

The EVM today is difficult to statically analyze, making it difficult to create highly efficient implementations, formally verify code, and make further extensions to over time. Additionally, it is highly inefficient, making it difficult to implement many forms of advanced cryptography unless they are explicitly supported through precompiles.

What is it, and how does it work?

The first step in the current EVM improvement roadmap, scheduled to be included in the next hard fork, is the EVM Object Format (EOF). EOF is a series of EIPs that specifies a new version of EVM code that has a number of distinct features, most notably:


Structure of EOF code


Old-style contracts would continue to exist and be createable, although there is a possible path to deprecate old-style contracts (and perhaps even force-convert them to EOF code) eventually. New-style contracts would benefit from efficiency gains created by EOF - first, from slightly smaller bytecode taking advantage of the subroutine feature, and later from new EOF-specific features, or EOF-specific gas cost decreases.

After EOF is introduced, it becomes easier to introduce further upgrades. The most well-developed today is the EVM Modular Arithmetic Extensions (EVM-MAX). EVM-MAX creates a new set of operations specifically designed for modular arithmetic, and puts them into a new memory space that cannot be accessed with other opcodes. This enables the use of optimizations, such as Montgomery multiplication.

A newer idea is to combine EVM-MAX with a single-instruction-multiple-data (SIMD) feature. SIMD has been around as an idea for Ethereum for a long time starting with Greg Colvin's EIP-616. SIMD can be used to speed up many forms of cryptography, including hash functions, 32-bit STARKs, and lattice-based cryptography. EVM-MAX plus SIMD make for a natural pair of performance-oriented extensions to the EVM.

An approximate design for a combined EIP would be to take EIP-6690 as a starting point, and then:

This would be powerful enough to implement elliptic curve cryptography, small-field cryptography (eg. Poseidon, circle STARKs), conventional hash functions (eg. SHA256, KECCAK, BLAKE), and lattice-based cryptography.

Other EVM upgrades may also be possible, but so far they have seen much less attention.

What is left to do, and what are the tradeoffs?

Currently, EOF is scheduled to be included in the next hard fork. While there is always a possibility to remove it - features have been last-minute-removed from hard forks before - doing so would be an uphill battle. Removing EOF would imply making any future upgrades to the EVM without EOF, which can be done but may be more difficult.

The main tradeoff in EVM is L1 complexity versus infrastructure complexity. EOF is a significant amount of code to add to EVM implementations, and the static code checks are pretty complex. In exchange, however, we get simplifications to higher-level languages, simplifications to EVM implementations, and other benefits. Arguably, a roadmap which prioritizes continued improvement to the Ethereum L1 would include and build on EOF.

One important piece of work to do is to implement something like EVM-MAX plus SIMD and benchmark how much gas various cryptographic operations would take.

How does it interact with other parts of the roadmap?

The L1 adjusting its EVM makes it easier for L2s to do the same. One adjusting without the other creates some incompatibilities, which has its own downsides. Additionally, EVM-MAX plus SIMD can reduce gas costs for many proof systems, enabling more efficient L2s. It also makes it easier to remove more precompiles, by replacing them with EVM code that can perform the same task perhaps without a large penalty to efficiency.

Account abstraction

What problem does it solve?

Today, a transaction can only be verified in one way: ECDSA signatures. Originally, account abstraction was meant to expand beyond this, and allow an account's verification logic to be arbitrary EVM code. This could enable a range of applications:

Since account abstraction began in 2015, the goals have expanded to also include a large set of "convenience goals", such as an account that has no ETH but has some ERC20 being able to pay gas in that ERC20. One summary of these goals is the following chart:



MPC here is multi-party computation: a 40-year-old technique to split a key into multiple pieces that are stored on multiple devices, and use cryptographic techniques to generate a signature without combining the pieces of the key directly.

EIP-7702 is an EIP planned to be introduced in the next hard fork. EIP-7702 is the result of the growing recognition of a need to give the convenience benefits of account abstraction to all users, including EOA users, to improve user experience for everyone in the short term, and in a way that avoids bifurcation into two ecosystems. This work started with EIP-3074, and culminated in EIP-7702. EIP-7702 makes the "convenience features" of account abstraction available to all users, including EOAs (externally-owned accounts, ie. accounts controlled by ECDSA signatures), today.

As we can see from the chart, while some challenges (especially the "convenience" challenges) can be solved with incremental techniques such as multi-party computation or EIP-7702, the bulk of the security goals that motivated the original account abstraction proposal can only be solved by going back and solving the original problem: allowing smart contract code to control transaction verification. The reason why this has not been done so far is that implementing it safely is a challenge.

What is it, and how does it work?

At the core, account abstraction is simple: allow transactions to be initiated by smart contracts, and not just EOAs. The entire complexity comes from doing this in a way that is friendly to maintaining a decentralized network and protecting against denial of service attacks.

One illustrative example of a key challenge is the multi-invalidation problem:



If there are 1000 accounts whose validation function all depends on some single value S , and there are transactions in the mempool that are valid given the current value of S , then one single transaction flipping the value of S could invalidate all of the other transactions in the mempool. This allows for an attacker to spam the mempool, clogging up the resources of nodes on the network, at a very low cost.

Years of effort trying to expand functionality while limiting DoS risks have led to convergence on one solution for how to implement "ideal account abstraction": ERC-4337.



ERC-4337 works by dividing processing of user operations into two phases: validation and execution. All validations are processed first, and all executions are processed second. In the mempool, a user operation is only accepted if its validation phase only touches its own account, and does not read environmental variables. This prevents multi-invalidation attacks. A strict gas limit on the validation step is also enforced.

ERC-4337 was designed as an extra-protocol standard (an ERC), because at the time the Ethereum client developers were focused on the Merge, and did not have any spare capacity to work on other features. This is why ERC-4337 uses its own object called user operations, instead of regular transactions. More recently, however, we have been realizing that there is a need to enshrine at least parts of it in the protocol. Two key reasons are:

  1. The inherent inefficiencies of the EntryPoint being a contract: a flat ~100k gas overhead per bundle and thousands extra per user operation
  2. The need to make sure Ethereum properties such as inclusion guarantees created by inclusion lists carry over to account abstraction users.

Additionally, ERC-4337 has been extended by two features:

What is left to do, and what are the tradeoffs?

The main remaining thing to figure out is how to fully bring account abstraction into the protocol. A recently popular enshrined account abstraction EIP is EIP-7701, which implements account abstraction on top of EOF. An account can have a separate code section for validation, and if an account has that code section set, that is the code gets executed during the validation step of a transaction from that account.


EOF code structure for an EIP-7701 account


What is fascinating about this approach is that it makes it clear that there are two equivalent ways to view native account abstraction:

  1. EIP-4337, but as part of the protocol
  2. A new type of EOA, where the signature algorithm is EVM code execution

If we start with strict bounds on the complexity of code that can be executed during validation - allowing no external state access, and even at first setting a gas limit too low to be useful for quantum-resistant or privacy-preserving applications - then the safety of this approach is very clear: it's just swapping out ECDSA verification for an EVM code execution that takes a similar amount of time. However, over time we would need to loosen these bounds, because allowing privacy-preserving applications to work without relayers, and quantum resistance, are both very important. And in order to do this, we do need to find ways to address the DoS risks in a more flexible way, without requiring the validation step to be ultra-minimalistic.

The main tradeoff seems to be "enshrine something that fewer people are happy with, sooner" versus "wait longer, and perhaps get a more ideal solution". The ideal approach will likely be some hybrid approach. One hybrid approach is to enshrine some use cases more quickly, and leave more time to figure out others. Another is to deploy more ambitious versions of account abstraction on L2s first. However, this has the challenge that for an L2 team to be willing to do the work to adopt a proposal, they need to be confident that L1 and/or other L2s will adopt something compatible later on.

Another application that we need to think about explicitly is keystore accounts, which store account-related state on either L1 or a dedicated L2, but can be used both L1 and any compatible L2. Doing this effectively likely requires L2s to support opcodes such as L1SLOAD or REMOTESTATICCALL, though it also requires account abstraction implementations on L2 to support it.

How does it interact with other parts of the roadmap?

Inclusion lists need to support account abstracted transactions. In practice, the needs of inclusion lists and the needs of decentralized mempools end up being pretty similar, though there is slightly more flexibility for inclusion lists. Additionally, account abstraction implementations should ideally be harmonized on L1 and L2 as much as possible. If, in the future, we expect most users to be using keystore rollups, the account abstraction designs should be built with this in mind.

EIP-1559 improvements

What problem does it solve?

EIP-1559 activated on Ethereum in 2021, and led to significant improvements in average block inclusion time.



However, the current implementation of EIP-1559 is imperfect in several ways:

The formula later used for blobs (EIP-4844) was explicitly designed to address the first concern, and is overall cleaner. Neither EIP-1559 itself, nor EIP-4844, attempt to address the second problem. As a result, the status quo is a confusing halfway state involving two different mechanisms, and there is even a case that over time both will need to be improved.

In addition to this, there are other weaknesses of Ethereum resource pricing that are independent of EIP-1559, but which could be solved by tweaks to EIP-1559. A major one is average case vs worst case discrepancies: resource prices in Ethereum have to be set to be able to handle the worst case, where a block's entire gas consumption takes up one resource, but average-case use is much less than this, leading to inefficiencies.



What is it, and how does it work?

A solution to these inefficiencies is multidimensional gas: having separate prices and limits for separate resources. This concept is technically independent from EIP-1559, but EIP-1559 makes it easier: without EIP-1559, optimally packing a block with multiple resource constraints is a complicated multidimensional knapsack problem. With EIP-1559, most blocks are not at full capacity on any resource, and so the simple algorithm of "accept anything that pays a sufficient fee" suffices.

We have multidimensional gas for execution and blobs today; in principle, we could increase this to more dimensions: calldata, state reads/writes, and state size expansion.

EIP-7706 introduces a new gas dimension for calldata. At the same time, it streamlines the multidimensional gas mechanism by making all three types of gas fall under one (EIP-4844-style) framework, thus also solving the mathematical flaws with EIP-1559.

EIP-7623 is a more surgical solution to the average case vs worst case resource problem that more strictly bounds max calldata without introducing a whole new dimension.

A further direction to go would be to tackle the update rate problem, and find a basefee calculation algorithm that is faster, and at the same time preserves the key invariants introduced by the EIP-4844 mechanism (namely: in the long run average usage approaches exactly the target).

What is left to do, and what are the tradeoffs?

Multidimensional gas has two primary tradeoffs:

Protocol complexity is a relatively small issue for calldata, but becomes a larger issue for gas dimensions that are "inside the EVM", such as storage reads and writes. The problem is that it's not just users that set gas limits: it's also contracts that set limits when they call other contracts. And today, the only way they have to set limits is one-dimensional.

One easy way to eliminate this problem is to make multidimensional gas only available inside EOF, because EOF does not allow contracts to set gas limits in calls to other contracts. Non-EOF contracts would have to pay a fee in all types of gas when making a storage operation (eg. if an SLOAD costs 0.03% of a block's storage access gas limit, the non-EOF user would also be charged 0.03% of the execution gas limit)

More research on multidimensional gas would be very helpful in understanding the tradeoffs and figuring out the ideal balance.

How does it interact with other parts of the roadmap?

A successful implementation of multidimensional gas can greatly reduce certain "worst-case" resource usages, and thus reduce pressure on the need to optimize performance in order to support eg. STARKed hash-based binary trees. Having a hard target for state size growth would make it much easier for client developers to plan and estimate their requirements going forward into the future.

As described above, EOF makes more extreme versions of multidimensional gas significantly easier to implement due to its gas non-observability properties.

Verifiable delay functions (VDFs)

What problem does it solve?

Today, Ethereum uses RANDAO-based randomness to choose proposers. RANDAO-based randomness works by asking each proposer to reveal a secret that they committed to ahead of time, and mixing each revealed secret into the randomness. Each proposer thus has "1 bit of manipulation": they can change the randomness (at a cost) by not showing up. This is reasonably okay for finding proposers, because it's very rare that you can give yourself two new proposal opportunities by giving up one. But it's not okay for on-chain applications that need randomness. Ideally, we would find a more robust source of randomness.

What is it, and how does it work?

Verifiable delay functions are a type of function that can only be computed sequentially, with no speedups from parallelization. A simple example is repeated hashing: compute for i in range(10**9): x = hash(x). The output, proven with a SNARK proof of correctness, could be used as a random value. The idea is that the input is selected based on information available at time T, and the output is not yet known at time T: it only becomes available some time after T, once someone fully runs the computation. Because anyone can run the computation, there is no possibility to withhold the result, and so there is no ability to manipulate the outcome.

The main risk to a verifiable delay function is unexpected optimization: someone figures out how to run the function much faster than expected, allowing them to manipulate the information they reveal at time T based on the future output. Unexpected optimization can happen in two ways:

The tasks of creating a successful VDF is to avoid these two issues, while at the same time keeping efficiency practical (eg. one problem with the hash-based approach is that SNARK-proving over hashing in real time has heavy hardware requirements). Hardware acceleration is typically solved by having a public-good actor create and distribute reasonably-close-to-optimal ASICs for the VDF by itself.

What is left to do, and what are the tradeoffs?

Currently, there is no VDF construction that fully satisfies Ethereum researchers on all axes. More work is left to find such a function. If we have it, the main tradeoff is simply whether or not to include it: a simple tradeoff of functionality versus protocol complexity and risk to security. If we think a VDF is secure, but it ends up being insecure, then depending on how it's implemented security degrades to either the RANDAO assumption (1 bit of manipulation per attacker) or something slightly worse. Hence, even a broken VDF would not break the protocol, though it would break applications or any new protocol features that strongly depend on it.

How does it interact with other parts of the roadmap?

The VDF is a relatively self-contained ingredient of the Ethereum protocol, though in addition to increasing the security of proposer selection it also has uses in (i) onchain applications that depend on randomness, and potentially (ii) encrypted mempools, though making encrypted mempools based on a VDF still depends on additional cryptographic discoveries which have not yet happened.

One point to keep in mind is that given uncertainty in hardware, there will be some "slack" between when a VDF output is produced and when it becomes needed. This means that information will be accessible a few blocks ahead. This can be an acceptable cost, but should be taken into account in eg. single-slot finality or committee selection designs.

Obfuscation and one-shot signatures: the far future of cryptography

What problem does it solve?

One of Nick Szabo's most famous posts is a 1997 essay on "God protocols". In this essay, he points out that often, multi-party applications depend on a "trusted third party" to manage the interaction. The role of cryptography, in his view, is to create a simulated trusted third party that does the same job, without actually requiring any trust in any specific actor.


"Mathematically trustworthy protocol", diagram by Nick Szabo


So far, we have only been able to partially approach this ideal. If all we need is a transparent virtual computer, where the data and computation cannot be shut down, censored or tampered with, but privacy is not a goal, then blockchains can do it, though with limited scalability. If privacy is a goal, then up until recently we have only been able to make a few specific protocols for specific applications: digital signatures for basic authentication, ring signatures and linkable ring signatures for primitive forms of anonymity, identity-based encryption to enable more convenient encryption under specific assumptions about a trusted issuer, blind signatures for Chaumian e-cash, and so on. This approach requires lots of work for every new application.

In the 2010s, we saw the first glimpse of a different, and more powerful approach, based on programmable cryptography. Instead of creating a new protocol for each new application, we could use powerful new protocols - specifically, ZK-SNARKs - to add cryptographic guarantees to arbitrary programs. ZK-SNARKs allow a user to prove any arbitrary statement about data that they hold, in a way that the proof (i) is easy to verify, and (ii) does not leak any data other than the statement itself. This was a huge step forward for privacy and scalability at the same time, that I have likened to the effect of transformers in AI. Thousands of man-years of application-specific work were suddenly swept away by a general-purpose solution that you can just plug in to solve a surprisingly wide range of problems.

But ZK-SNARKs are only the first in a trio of similar extremely powerful general-purpose primitives. These protocols are so powerful that when I think of them, they remind me of a set of extremely powerful cards in Yu-Gi-Oh, a card game and a TV show that I used to play and watch when I was a young child: the Egyptian god cards. The Egyptian god cards are a trio of extremely powerful cards, which according to legend are potentially deadly to manufacture, and are so powerful that they are not allowed in duels. Similarly, in cryptography, we have the trio of Egyptian god protocols:



What is it, and how does it work?

ZK-SNARKs are one of these three protocols that we already have, to a high level of maturity. After large improvements to prover speed and developer-friendliness in the last five years, ZK-SNARKs have become the bedrock of Ethereum's scalability and privacy strategy. But ZK-SNARKs have an important limitation: you need to know the data to make proofs about it. Each piece of state in a ZK-SNARK application must have a single "owner", who must be around to approve any reads or writes to it.

The second protocol, which does not have this limitation, is fully homomorphic encryption (FHE). FHE lets you do any computation on encrypted data without seeing the data. This lets you do computations on a user's data for the user's benefit while keeping the data and the algorithm private. It also lets you extend voting systems such as MACI to have almost-perfect security and privacy guarantees. FHE was for a long time considered too inefficient for practical use, but now it's finally becoming efficient enough that we are starting to see applications.


Cursive, an application that uses two-party computation and FHE to do privacy-preserving discovery of common interests.


But FHE too has its limits: any FHE-based technology still requires someone to hold the decryption key. This could be a M-of-N distributed setup, and you can even use TEEs to add a second layer of defense, but it's still a limitation.

This gets us to the third protocol, which is more powerful than the other two combined: indistinguishability obfuscation. While it's still very far from maturity, as of 2020 we have theoretically valid protocols for it based on standard security assumptions, and work is recently starting on implementations. Indistinguishability obfuscation lets you create an "encrypted program" that performs an arbitrary computation, in such a way that all internal details of the program are hidden. As a simple example, you can put a private key into an obfuscated program which only lets you use it to sign prime numbers, and distribute this program to other people. They can use the program to sign any prime number, but cannot take the key out. But it's far more powerful than that: together with hashes, it can be used to implement any other cryptographic primitive, and more.

The only thing that an obfuscated program can't do, is prevent itself from being copied. But for that, there is something even more powerful on the horizon, though it depends on everyone having quantum computers: quantum one-shot signatures.



With obfuscation and one-shot signatures together, we can build almost perfect trustless third parties. The only thing we can't do with cryptography alone, and that we would still need a blockchain for, is guaranteeing censorship resistance. These technologies would allow us to not only make Ethereum itself much more secure, but also build much more powerful applications on top of it.

To see how each of these primitives adds additional power, let us go through a key example: voting. Voting is a fascinating problem because it has so many tricky security properties that need to be satisfied, including very strong forms of both verifiability and privacy. While voting protocols with strong security properties have existed for decades, let us make the problem harder for ourselves by saying that we want a design that can handle arbitrary voting protocols: quadratic voting, pairwise-bounded quadratic funding, cluster-matching quadratic funding, and so on. That is, we want the "tallying" step to be an arbitrary program.

Indistinguishability obfuscation also allows for other powerful applications. For example:

With one-shot signatures, we can make blockchains immune to finality-reverting 51% attacks, though censorship attacks continue to be possible. Primitives similar to one-shot signatures enable quantum money, solving the double-spend problem without a blockchain, though many more complex applications would still require a chain.

If these primitives can be made efficient enough, then most applications in the world can be made decentralized. The main bottleneck would be verifying the correctness of implementations.

What is left to do, and what are the tradeoffs?

There is a heck of a lot left to do. Indistinguishability obfuscation is incredibly immature, and candidate constructions are millions of times too slow (if not more) to be usable in applications. Indistinguishability obfuscation is famous for having runtimes that are "theoretically" polynomial-time, but take longer than the lifetime of the universe to run in practice. More recent protocols have made runtimes less extreme, but the overhead is still far too high for regular use: one implementer expects a runtime of one year.

Quantum computers do not even exist: all constructions you might read about on the internet today are either prototypes not capable of doing any computation larger than 4 bits, or are not real quantum computers, in the sense that while they may have quantum parts in them, they cannot run actually-meaningful computations like Shor's algorithm or Grover's algorithm. Recently, there have been signs that "real" quantum computers are no longer that far away. However, even if "real" quantum computers come soon, the day when regular people have quantum computers on their laptops or phones may well be decades after the day when powerful institutions get one that can crack elliptic curve cryptography.

For indistinguishability obfuscation, one key tradeoff is in security assumptions. There are more aggressive designs that use exotic assumptions. These often have more realistic runtimes, but the exotic assumptions sometimes end up broken. Over time, we may end up understanding lattices enough to make assumptions that do not get broken. However, this path is more risky. The more conservative path is to insist on protocols whose security provably reduces to "standard" assumptions, but this may mean that it takes much longer until we get protocols that run fast enough.

How does it interact with other parts of the roadmap?

Extremely powerful cryptography could change the game completely. For example:

At first, the benefits will come on the application layer, because the Ethereum L1 inherently needs to be conservative on security assumptions. However, even application-layer use alone could be game-changing, by as much as the advent of ZK-SNARKs has been.