Data Processor

Enhance zero-knowledge off-chain compute for verifiable on-chain data using zkVMs.

Repositories

What is the Herodotus Data Processor (HDP)?

  • HDP is a tool that allows you to easily define large sets of on-chain data and then run compute over it in a fully sound and proven environment thanks to STARKs and storage proofs.

  • HDP comes with a framework for developers to also express their own type of computations.

  • HDP can deliver the data feeds either directly to smart contracts or also in the form of a raw STARK proof through off-chain.

Practical use cases for HDP

  1. Proving TWAP

    A TWAP helps smooth out short-term price fluctuations. By averaging prices over a period, we obtain a more stable and representative value of an asset pair, useful for financial applications like options.

  2. Proving Average Balance: Suppose you want to prove that an account maintained an average balance of 1 ETH over 1000 blocks. You'd set up a data lake to fetch the account balance for these blocks and use the avg function to compute the average. This would verify the consistency of the account's balance.

  3. Counting Balance Drops: To count how often the average balance of an account drops below 50 ETH, you'd use the count_if function. This helps in assessing the frequency of significant balance reductions.

  4. Ensuring that an address did not send ETH to an OFAC sanctioned address

    Often for compliance reasons you would like your smart contract to be invoked only by addresses that can prove that they have never interacted with addresses such as Tornado Cash, to do this you would have to iterate through every single transaction made by the caller, with HDP this is not longer a problem.

  5. Ensuring data integrity for zkML

    Projects such as for example Giza , Modulus Labs, or Ritual.net allow to prove the integrity of performing an inference, however such computation needs often onchain data as an input which in most cases os of today is trusted and the verifier can be convinced only about the proper execution of the inference but not about the correctness of the data put into it.

    Thanks to HDP this data can be injected in such inference when wrapped into an HDP module.

HDP Terminology(Jargon)

Batch

One request of data process can contain multiple Task. The task is the unit of one process flow.

Task

One batch of data process is called Task. A task must contain the following:

  • Datalake: What exact type of large on-chain data set want to process in this request?

  • compute module: How would like to aggregate this data set results with specific computation? Also called an aggregated function.

For example, let’s say you defined the task “Hey, I want to know what is the average value of this account’s balance among this block range“

  • What is average → need to use compute module AVG to aggregate data set with average operation.

  • This account’s balance → Property. Define a specific property of the on-chain data set like fields in the header, account, or storage slot.

  • This block range → Relevant block range of the data set.

Note: This is an example of BlockSampledDatalake

Datalake

Datalake is object that allow expression of a potentially large dataset such as a collection of all block base_fees in the range of blocks <x, z>.

Current available datalake:

Planned datalakes:

  • IterativeDynamicLayout: Allow iteration of dynamic layout variables across multiple blocks or different smart contracts. The targeted variables are such as mapping or array has its unique slot index. Read more about storage layouts.

Compute module

Compute module is purpose for aggregating large sets of data in off-chain computation. Each module should define the expected value type as input, like integer or string, and detail operation to perform in both preprocessing and proving steps. Because it is a defined function to aggregate a large set of data, we also call it an aggregated function.

Current available compute modules:

  • AVG: Averages a list of values.

  • MAX: Find the biggest value in a set.

  • MIN: Find the smallest value in a set.

  • SUM: Computes the sum of a set of values.

  • COUNT_IF: Takes an addition parameter context that encodes what is the counting condition.

    • The conditions are:

    • “00” → equality ==

    • “01” → non equality

    • “02” → Greater >

    • “03” → Greater or equal

    • “04” → Smaller <

    • “05” → Smaller or equal =<

Planned compute modules:

  • MERKLE: Computes the Merkle root of a dataset.

  • BLOOM: Generates a bloom filter out of a dataset.

Define your own compute module

We encourage projects to define custom compute modules if needed. If you have some off-chain computation logic that gets on-chain data and want this to be fully soundly operated, it would be worth it to consider connecting with HDP by defining a custom compute module.

Here are codebases where the modules are defined :

Cairo1 <> Cairo0 compatibility

As of today, HDP is developed in Cairo0, which does not mean that external contributors need to develop their own HDP modules also in Cairo0.

Cairo is a CPU architecture able to execute the Cairo bytecode. Cairo bytecode can be generated by compiling either Cairo0 or Cairo1. Thankfully the CairoVM allows for scoping thus process management.

An example of where this is done is the StarknetOS which despite being written in Cairo0 is able to execute smart contracts written in Cairo1. That said as long as the module is implemented in cairo0 or cairo1 the interface defined above HDP will be able to communicate with such module.

Contract addresses

Sepolia

HDP API

Base Url: https://hdp.api.herodotus.cloud API docs: https://hdp.api.herodotus.cloud/docs/static/index.html

Submit Task

Route: /submit-task

POST request with the request body like below examples. In one HDP request is considered as batch, and in each batch can contain multiple Tasks. Check out full supported fixtures in here.

Note that increment field is optional and by default set as 1.

Block Sampled Data Lake Example :

{
  "chainId": 11155111,
  "tasks": [
    {
      "datalakeType": "block_sampled",
      "datalake": {
        "blockRangeStart": 5515000,
        "blockRangeEnd": 5515031,
        "sampledProperty": "header.base_fee_per_gas",
        "increment": 1
      },
      "aggregateFnId": "avg"
    },
    {
      "datalakeType": "block_sampled",
      "datalake": {
        "blockRangeStart": 5515000,
        "blockRangeEnd": 5515031,
        "sampledProperty": "account.0x7f2c6f930306d3aa736b3a6c6a98f512f74036d4.nonce"
      },
      "aggregateFnId": "min"
    }
  ]
}

Transactions In Block Data Lake Example :

{
  "chainId": 11155111,
  "tasks": [
    {
      "datalakeType": "transactions_in_block",
      "datalake": {
        "targetBlock": 5858987,
        "startIndex": 0,
        "endIndex": 10,
        "increment": 1,
        "includedTypes": {
          "legacy": true,
          "eip2930": true,
          "eip1559": true,
          "eip4844": true
        },
        "sampledProperty": "tx.nonce"
      },
      "aggregateFnId": "max"
    },
    {
      "datalakeType": "transactions_in_block",
      "datalake": {
        "targetBlock": 5858987,
        "startIndex": 91,
        "endIndex": 100,
        "increment": 6,
        "includedTypes": {
          "legacy": false,
          "eip2930": false,
          "eip1559": false,
          "eip4844": true
        },
        "sampledProperty": "tx.max_fee_per_blob_gas"
      },
      "aggregateFnId": "avg"
    }
  ]
}

If want to use COUNT function :

{
  "chainId": 11155111,
  "tasks": [
    {
      "datalakeType": "block_sampled",
      "datalake": {
        "blockRangeStart": 5515000,
        "blockRangeEnd": 5515029,
        "sampledProperty": "header.blob_gas_used"
      },
      "aggregateFnId": "count",
      "aggregateFnCtx": {
        "operatorId": "gt",
        "valueToCompare": 100000
      }
    }
  ]
}

Finalized Value

Task hash had returned from /submit-task Use this hash as an identifier to fetch your valid result after the job is finished. After the task turned into finalized, can use task hash as identifier to query the result from getFinalizedTaskResult. Note that task hash and task commitment is used in same terminology.

Reminder

Note that this deployed API is not yet stable. If you face an error, try sending it one by one.

FAQ

  • When should I use HDP instead of the original Herodotus API?

    • Depending on your use case, both product have pros and cons.

    If you intend to access data over large ranges of blocks, we recommend using HDP. As it is designed to handle large amounts of data at a much lower cost.

Architecture Diagram

HDP Processing Steps

1. Request with intended task serialization

First, define the Datalake and Task detail intended to get results from HDP. The goal is to generate bytes representation for intended HDP request parameters.

Request on-chain

To define in Solidity :

You can also check more dynamic inputs using hdp-solidity dynamic test.

// HDP request
BlockSampledDatalake datalake =
    BlockSampledDatalake({
        blockRangeStart: 5260543,
        blockRangeEnd: 5260571,
        increment: 3,
        sampledProperty: BlockSampledDatalakeCodecs
            .encodeSampledPropertyForAccount(
                address(0x7f2C6f930306D3AA736B3A6C6A98f512F74036D4),
                uint8(1)
            )
    });

ComputationalTask computationalTask =
    ComputationalTask({
        aggregateFnId: AggregateFn.SUM,
        operatorId: Operator.NONE,
        valueToCompare: uint256(0)
    });

Request off-chain

To define using hdp-cli encoding :

hdp encode -a "sum" -b 5260543 5260571 "account.0x7f2c6f930306d3aa736b3a6c6a98f512f74036d4.balance" 3

To define using API request :

Using the server, could pass the POST request with params like this. And server will process the input and serialize it into the same bytes code representation as mentioned above with hdp-cli and smart contract.

{
    "chainId" : 11155111,
    "datalakeType": "block_sampled" ,
    "datalake": {
        "blockRangeStart": 5260543,
        "blockRangeEnd": 5260571,
        "sampledProperty": "account.0x7f2c6f930306d3aa736b3a6c6a98f512f74036d4.balance",
        "increment": 3,
    },
    "aggregateFnId": "avg"
}

2. Schedule Task in on chain ( request on-chain )

As the on-chain request is not proactive to move on next step, we emit an event that this contract function had called and use event-watcher to catch the event and move on to the next step of processing.

Pass bytes representation of HDP task in HDP contract, call request function as this checks if the task is first-time processing and schedules the task as this is the entry step of the whole HDP process.

// HDP Server call [`requestExecutionOfTaskWithBlockSampledDatalake`] before processing
hdp.requestExecutionOfTaskWithBlockSampledDatalake(
    datalake,
    computationalTask
);

The function call emits the event. This will be caught by the event watcher to proceed next step of data processing.

 /// @notice emitted when a new task is scheduled
    event TaskWithBlockSampledDatalakeScheduled(BlockSampledDatalake datalake, ComputationalTask task);

3. HDP Preprocess

To optimize proving cost and speed for handling large size of tasks, there is the preprocessing process that is responsible for fetching relevant MMR/MPT proof of intended header/account/storage.

For more detail about how this preprocess works under the hood, check out the repository. With this command below, the hdp preprocessor fetches all the proof related to the requested datalake, precomputes the result with the defined task, and generates human-readable output file and Cairo acceptable formatted input file that contains all preprocessed result.

hdp run ${tasksSerialized} ${datalakesSerialized} ${rpc_url} -o ${generalFilePath} -c ${cairoInputFilePath}

For the case that one request defines multiple tasks, we batch it using the Standard Merkle Tree format.

4. Cache MMR

Now from the HDP preprocess result, we know what MMR is relevant with whole datalakes. As Herodotus keeps aggregating new blocks and merging with the MMR, there is a chance that the relevant MMR state ( root, size ) has been modified. For more information on block aggregation, check out the blog. To ensure MMR proof validity before running computation in Cairo, we cache the MMR state fetched from the HDP preprocessing step by simply calling this load function in the smart contract.

 /// @notice Load MMR root from cache with given mmrId and mmrSize
  function loadMmrRoot(
      uint256 mmrId,
      uint256 mmrSize
  ) public view returns (bytes32) {
      return cachedMMRsRoots[mmrId][mmrSize];
  }

5. Run Cairo Program

The HDP Cairo program validates the results returned by the HDP preprocess step. The program runs in 3 stages.

  1. Verify Data: Currently, Cairo HDP supports three types of on-chain data: Headers, Accounts, and Storage Slots. In the verification stage, Cairo HDP performs the required inclusion proofs, ensuring the inputs are valid and can be found on-chain.

  2. Compute: Perform the specified computation over the verified data. E.g. avg if a users Ether balance

  3. Pack: Once the results are computed, the task and result are added to a Merkle tree. This allows multiple tasks to be computed in one execution.

To run Cairo HDP locally, the following is required:

  • Generate Cairo-compatible inputs via HDP preprocess. This can be generated with the -c flag

  • Set up python VENV & install dependencies, as outlined in the readme.

  • Move inputs to src/hdp/hdp_input.json

  • Run Cairo HDP with make run and then hdp.cairo

6. Check Fact Finalization (off-chain)

One Cairo Program’s validation is represented as one fact hash. After Cairo Program finished running, if it is indeed valid, send this fact hash to FactRegistry Contract of SHARP. So to process our final step of HDP, need to check if the fact has been registered in Fact Registry and then process the final step.

The way how we check fact finalization is simply run cron for targeted tasks, and call isValid method for expected fact hash.

7. Authenticate Task Execution

One Cairo Program’s validation is represented as one fact hash. After the Cairo Program finishes running, if it is indeed valid, send this fact hash to FactRegistry Contract of SHARP. So to process our final step of HDP, need to check if the fact has been registered in the Fact Registry and then process the final step. The way how we check fact finalization is simply to run cron for targeted tasks, and call isValid method for expected fact hash.

// Load MMRs root
bytes32 usedMmrRoot = loadMmrRoot(usedMmrId, usedMmrSize);

// Initialize an array of uint256 to store the program output
uint256[] memory programOutput = new uint256[](6);

// Assign values to the program output array
programOutput[0] = uint256(usedMmrRoot);
programOutput[1] = usedMmrSize;
programOutput[2] = resultMerkleRootLow;
programOutput[3] = resultMerkleRootHigh;
programOutput[4] = taskMerkleRootLow;
programOutput[5] = taskMerkleRootHigh;

// Compute program output hash
bytes32 programOutputHash = keccak256(abi.encodePacked(programOutput));

// Compute GPS fact hash
bytes32 gpsFactHash = keccak256(
    abi.encode(PROGRAM_HASH, programOutputHash)
);

// Ensure GPS fact is registered
require(
    SHARP_FACTS_REGISTRY.isValid(gpsFactHash),
    "HdpExecutionStore: GPS fact is not registered"
);

Second, given proofs and root of the Standard Merkle Tree of results and tasks, could verify the proof within the root and check if is included in the batch.

// Compute the Merkle leaf of the task
bytes32 taskCommitment = taskCommitments[i];
bytes32 taskMerkleLeaf = standardLeafHash(
    taskCommitment
);
// Ensure that the task is included in the batch, by verifying the Merkle proof
bool isVerifiedTask = taskInclusionProof
.verify(
    scheduledTasksBatchMerkleRoot,
    taskMerkleLeaf
);

 // Compute the Merkle leaf of the task result
bytes32 taskResultCommitment = keccak256(
    abi.encode(taskCommitment, computationalTaskResult)
);
bytes32 taskResultMerkleLeaf = standardLeafHash(
   taskResultCommitment
);
// Ensure that the task result is included in the batch, by verifying the Merkle proof
bool isVerifiedResult = resultInclusionProof
.verify(
    batchResultsMerkleRoot,
    taskResultMerkleLeaf
);

After these two steps of validation, we can confirm the targeted batched tasks are verified through whole HDP steps, and able to store this result in on chain. So that you can access on-chain data in on-chain mapping, that had computed and verified in off-chain using zk proof.

// Store the task result
cachedTasksResult[taskCommitment] = TaskResult({
    status: TaskStatus.FINALIZED,
    result: computationalTaskResult
});

Using HDP synchronously

Turbo is under development, this section will be updated very soon.

Last updated