Protocol support for Predicate Data continuity (and then ABI-aware block building)

mtchtrnr · November 27, 2024, 10:32pm

I want to propose 3 related changes to the Fuel protocol:

Attaching predicate data to outputs, rather than storing it in the predicate code
Allow outputs to contain multiple asset_ids
(more ambitious) allow the block builder to choose inputs and build outputs based on low-cost pre-execution Sway code

I am in favor of adding all of these, but #3 would require at least #1 to implement (and possibly #2 to implement ergonomically).

1. Data Outputs

This would be a new output type which either has a field for arbitrary bytes or for a hash of bytes attached to the tx that created it.

This allows many separate data “instances” to exist at a single predicate address, rather than creating a new address for each new piece of data. This also allows the data structure of outputs to be standardized (this is important for #3 of this proposal)

This is essentially isomorphic with the current data design for Fuel predicates, but it just moves the data from the spending tx to the creation tx, which allows the predicate “instances” to be indexed on creation rather than on spend. Of course, the old pattern is still and option, it’s just opt-in instead of the default.

2. Add Multi-asset to new Output

Currently, we only allow each output to have one asset_id. This means that any correlated value in a dApp needs to be explicitly stated in the predicate data, otherwise the predicate can’t constrain the spending of that value.

For example: imagine a predicate that represents a swap pool between two asset types ETH ↔ USDC. If each ETH value was a separate output from the USDC value, then each output would need to have its own datum to know with whom it could be spent:

enum SwapData {
    Parent {
        child_utxo_1: UTxO_Id,
        child_utxo_2: UTXO_Id,
        ratio: float,
    },
    Child {
        parent_nft: AssetId
    }
}

This would allow each child to be spent only when the parent NFT is spent, and the parent would only be spendable if the two child utxos were spent.

This works!

An alternative would be, if it the output includes multiple assets, then the data would just be:

struct SwapData {
    asset_1: AssetId,
    asset_2: AssetId,
    ratio: float,
}

This would mean fewer predicate executions and the checks in the single execution could be about the amounts of the AssetId and not also the existence of specific UTxOs.

Example: Combining Data Output and Multi Asset Output

These two things can just be a single output type since it only really matters at predicate addresses. If it’s at a wallet address it doesn’t need to have data nor does it need to be bound to a single output.

If we take a simple swap predicate as an example, the current code would look something like this:

Where the “continued data” is in the form of a new address (P(SwapData2)). This is difficult to index without also having the “SwapData2” data off-chain.

This would be solved simply by adding a Data Output:

Where the output address doesn’t change as the data changes. It’s always just P.

This introduces a little noise because the associated UTxOs must have pointers to each other in the form of algebraic data types Parent and Child. At least it is now indexible by an off-chain worker without requiring extra off-chain information besides the ABI.

We can clean this up more if the Data Output also had Multi-asset support:

Now all associated value is coupled together seamlessly. This is a very common scenario, so I would suggest we just add it to the protocol instead of expecting every Predicate dApp builder to introduce their own algebraic data types.

3. (more ambitious) ABI-aware Block Building for solving Predicate Parallel Execution

I’m not sure if this worth it economically, since predicates are already small, and the filter and template execution might be more expensive than the actual predicate, but it’s potential solution to the predicate input collision problem and is interesting to talk about. And probably it makes sense with the “right” designs or with some optimistic incentives!

Adding Data Outputs and Multi-asset Outputs would enable some other cool features. One of which would be giving the block builder access to the ABI of the Data field and giving it rules by which it can select Inputs and construct Outputs on behalf of the tx submitter. I think this would also be possible without the Multi-asset Output, but like the above example it could increase quality of life without much work to add.

Problem Statement

Each output can only be spent once. This means that if multiple txs reference the same input, then only one of those txs will execute. This might mean people have to rebuild and sign their txs multiple times, referencing the new outputs. If there is enough demand, then you might never be able to get your tx included in a block!!!

There are a number of ways to structure your dApps that will mitigate this problem (such as off-chain batcher workers), but essentially this is a problem with the “deterministic” nature of predicates themselves. If that isn’t addressed in some way at a protocol level, then we will always be stuck with UTxO collision problems.

Proposed Solution

Predicates are “deterministic” assuming the tx has already passed certain checks, e.g. all of the inputs exists, the outputs value equal the total of the inputs, the tx is signed by the owner of the inputs, the correct predicate is included, etc. This is work that the tx pool and the block builder have to do without reward.

This begs the question: How much work are we willing to make the block builder do without reward?

In theory, we could relinquish some of the tx building to the block builder who has all the context needed to build successful txs. This would be split into two parts:

Filters for inputs, which would allow the block builder to search for an input with values and data that meets the requirements of the Predicate
Templates for creating new outputs and with valid data.

Of course, this could create a combinatorial explosion of possible work for the block builder. I think there are ways to solve these issues though.

For example, there could potentially be millions of outputs associated with a given predicate address. However, the parallel execution problem we are trying to solve only exists because inputs are front-run by another, recent tx. With that in mind, the block builder only needs to keep track of outputs from the last few (at most 10) blocks. This would limit the search space to conservatively 10 blocks * 600 tx/block * 255 outputs/tx (1_530_000) outputs—probably a lot less since txs that large would fill up more of the block. We also already want to punish actors for creating outputs, so it would be an unsustainable DoS vector.

Filters

Filters would be a new type of Sway code that consumes the same ABI as your predicate.

They could look roughly like this:

struct SwapData {
    asset_1: AssetId,
    asset_2: AssetId,
    ratio: float,
}

filter;

fn ratio_low_enough(ratio_limit: float) -> bool {
    let data: SwapData = Input::data();
    data.ratio <= ratio_limit
}

fn enough_from_asset(from: AssetId, amount: int) -> bool {
    let values: Values = Input::values();
    vales.get(from) >= amount
}

fn main(from: AssetId, amount: int, ratio_limit: float) -> bool {
    ratio_low_enough(ratio_limit) && enough_from_asset(from)
}

Output Templates

Templates would be a new type of Sway code that also consumes the predicate ABI and

fn compute_new_ratio(new_from: int, new_to: int) -> float {
    ...
}

template;

fn main(from: AssetId, to: AssetId, amount: int, total_user_from: int, user_address: Address) -> (Output,  Ouput, DataOutput) {
    let chosen_input = Selected::input(0); // This needs some work, but for now select first (only) "selected" input 
    let input_data: SwapData = chosen_input::data();
    let old_ratio = input_data.ratio; 
    let old_from_amount = chosen_input::get(from);
    let old_to_amount = chosen_input::get(to);
    let from_payed = old_ratio * amount;
    let new_from = old_from_amount + from_payed;
    let new_to = old_to_amount - amount;
    let new_ratio = compute_new_ratio(new_from, new_to);
    
    // data output
    let new_data = SwapData {
        ratio: new_ratio,
        ...input_data
    }
    let predicate_address = chosen_input::address();
    let new_data_output = DataOutput::new(predicate_address, new_data);
    new_data_output.push(from, new_from);
    new_data_output.push(to, new_two);

    // from output
    let new_user_from_amount = total_user_from - from_payed;
    let new_user_from_output = Output(user_address, from, new_user_from_amount);

    // to output
    let new_user_to_output = Output(user_address, to, amount);

    (new_user_from_output, new_user_to_output, new_data_output)
}

This is expensive!!!

There is plenty of room for improvements here, but the predicate execution of this swap would probably be cheaper than either the filter or the template written above. Is this worth it? Maybe? Probably? Sometimes?

In terms of state growth of the chain, this is identical to normal predicates. Looking at which txs were added to the blocks, you won’t see all this extra work captured. The wire tx sizes will be larger, but we can use blobs for those too in theory, like we did for predicates.

We can also come up with “optimistic” execution models. The simplest is adding a tip for the builder to the tx. The builder might waste execution on failing filters and templates, but that could be solved with rate limiting in theory. There could also be bonding solutions and potentially(?) zk proofs for failing txs to punish the submitter by taking money from their bond.

Conclusion

These are ideas I’ve thought about a lot since learning the Predicate model on Fuel. #1 and #2 are inspired by Cardano’s eUTxO model. I’ve wanted to code more Sway, and I prefer to code in Predicates since that is what I’m familiar with, but the lack of these features has scared me away. It creates much more work for the writer of the dApp because they need to come up with bespoke solutions for the indexation problem.

#3 addresses a problem that hasn’t been solved on Cardano either. Even after years of research, they haven’t come up with a solution (they are working on it though, e.g. Validation zones). I don’t want to assume that it’s as simple as “filters” and “templates” above, but I do want to make sure we are at considering solutions that will make Fuel the best it can be.

Voxelot · November 28, 2024, 12:21am

Re: Data Outputs

I agree that indexing newly created predicates is a big hurdle right now. The original intent of the transaction model was to minimize stateful lookups / random IO ops needed to validate the inputs. I.e. we aimed to be as stateless as possible, and the ideal outcome would be to have the capability to validate a transaction using only the data passed over the wire without any storage access. This was kind of broken with the move to support blobs in predicates, which we didn’t really have another choice at the time.

There are big benefits to the separation of bytecode and data commitments from an indexing perspective, however I’d want to be sure we don’t regress further towards stateful designs. Ie. The design we come up with here ideally doesn’t require the block builder to lookup arbitrary datums that were previously stored in order to validate future inputs.

While this could be a separate output type, it may be more useful if the datum commitment was bundled to the predicate instance. For example, we could just modify the regular coin/variable/change outputs to include an optional predicate_data_hash. Then when the output is later spent, the user would be required to provide the full predicate data that corresponds to the hash similar to the predicate bytecode. For a distinct output type to work, we’d need a way to bind it to a particular predicate instance that also has an assetID and amount tied together.

This partially solves the indexing issue, however when trying to index an orderbook you’d want to know more than just the hash of the datum when the predicate is created. I’m not convinced this really needs to be solved at the protocol level though. An application could include the datum as a witness, and use off-chain indexing to validate that the witness hash matches the output predicate_data_hash.

mtchtrnr · November 29, 2024, 6:24pm

I agree. This was the motivation when I said:

a field for arbitrary bytes or for a hash of bytes

In theory, we can have a generalized indexer that can look for the datums provided on the tx, but not included on the UTxO directly. We just track the data hash. I know Cardano added support for adding the data directly to the Output, when originally they only had the hash (CIP-32 | Inline datums). That’s why I suggested just including directly. It would also enable my specific solution #3 (but since #3 happens off-chain, we have options to allow indexing for the block builder too).

For example, we could just modify the regular coin/variable/change outputs to include an optional predicate_data_hash.

Absolutely. I guess I only suggested creating the new output type so we could also support #2, multi-asset. But as I illustrated above, we don’t need multi-asset, it just cleans up the code.

I’m not convinced this really needs to be solved at the protocol level though. An application could include the datum as a witness, and use off-chain indexing to validate that the witness hash matches the output predicate_data_hash.

Yes. There are other generalized solutions. But those don’t necessarily allow #3, which if we can solve elegantly, is the holy grail of UTxO “smart contracts”.

mtchtrnr · November 29, 2024, 6:48pm

Data on outputs will also allow something like this: CIP-31 | Reference inputs

Where you can have inputs for txs that don’t get spent, just included in the tx so the data can be read (this is another solution to the predicate parallelism problem addressed with #3)

AzrielTheHellrazor · November 29, 2024, 9:57pm

I liked this proposal. First of all, I think there is a problem with the pictures you added to #2. I can’t properly read the texts but It could be a problem of mine. With the 2nd one, you can make batch transactions with predicates which is cool for an aggregator. Also, reading the 3rd one reminds me of 1 week ago when I was trying to send a call from one contract to another. But If you want to call a function from another contract you have to import some requirements like functions etc. and if the contract that I want to send calls to isn’t verified there is no way I can implement the functions for ABI. This way I can send calls to contracts even though I don’t know the functions, I only need to know the address of the contract and the function name I want to call.

Topic		Replies	Views
Predicate address General	7	54	June 18, 2024
What is the best practice for storing predicates? General	2	330	June 5, 2023
Proposal for "Note" inputs Research	6	100	December 8, 2024
Improving MetaMask Connector for Fuel Network Research	2	413	July 24, 2024
Is there a sway library to point to Predicate Sway	5	202	August 4, 2023