Architecture
Architecture Flow of Processing Steps
Last updated
Architecture Flow of Processing Steps
Last updated
This section is not intended to explain how to integrate with the Data Processor. It's about explaining the internal pipeline. Please refer to the API documentation or Getting Started for more details.
This section provides an overview of the Data Processor architecture. It is intended to share technical details of the Data Processor. If you are interested in API integration, please refer to the API documentation.
First, define the Datalake
and Task
details intended to obtain results from the Data Processor. The goal is to generate a byte representation of the intended Data Processor request parameters.
Request Off-Chain API
Using the Data Processor server, you can send a POST request with parameters like the ones indicated above. The server will process the input and serialize it into the same bytecode representation as the one above using the hdp-cli and smart contract. Please refer to the API documentation for more details.
Request On-Chain
Note: request onchain is not stable yet.
Defining in Solidity:
You can also check more dynamic inputs using the hdp-solidity dynamic test.
Initially, you need to pass the byte representation of the Data Processor task into the Data Processor contract. Call the request function, which checks if the task is being processed for the first time and schedules the task. Since the on-chain request does not proactively move to the next step, we emit an event when this contract function is called. We use an event watcher to catch it and proceed to the next step of processing.
This function call emits an event. The event watcher catches it to proceed to the next step of data processing.
For more detailed information on how this step works under the hood, please refer to the repository.
The following command executes the necessary steps:
1) Preprocess to Generate input.json
File
The Data Processor preprocessor fetches all the proofs related to the requested datalake and precomputes the result if it involves preprocessable tasks. It then generates a Cairo-compatible formatted input file.
2) Run Data Processor Cairo Program Over Cairo VM to Generate a PIE
File
Using the input file generated in the first step, we run the compiled Data Processor Cairo program through cairo-run
. This process generates a PIE
file that can be later processed through SHARP. For a detailed explanation of the Cairo program, refer to the relevant section.
3) Generate a Human-Readable Output File Containing All Results
If the batch contains a Cairo 1 custom compute module, we need to parse task results and the batched results root, then reformat the file to include proofs, tasks, and results that will be used on the server.
The Data Processor Cairo program validates the results returned by the Data Processor preprocess step. The program runs in three stages:
Verify Data: Currently, the Cairo Data Processor supports three types of on-chain data: Headers, Accounts, and Storage Slots. In the verification stage, the Cairo Data Processor performs the required inclusion proofs, ensuring the inputs are valid and can be found on-chain.
Compute: Performs the specified computation over the verified data (e.g., calculating the average of a user's Ether balance).
Pack: Once the results are computed, the task and result are added to a Merkle tree. This allows multiple tasks to be computed in one execution.
To run the Cairo Data Processor locally, the following is required:
Generate Cairo-compatible inputs via the Data Processor preprocess. This can be generated with the -c
flag.
Set up a Python virtual environment and install dependencies, as outlined in the README.
Move inputs to src/hdp/hdp_input.json
.
Run the Cairo Data Processor with make run
and then hdp.cairo
.
The Data Processor preprocess result indicates which Merkle Mountain Range (MMR) containing the relevant datalakes is needed for the task. As Herodotus keeps aggregating new blocks and merging them into the MMR, there is a chance that the relevant MMR state (root and size) has been modified. For more information on block aggregation, check out this blog. To ensure the MMR proves validity before running computation in Cairo, we cache the MMR state fetched from the Data Processor preprocessing step by simply calling this load function in the smart contract.
The validation of a Cairo program is represented as a fact hash. After the Cairo program finishes running, if it is indeed valid, it sends this fact hash to the FactRegistry
contract of SHARP. If the fact has been registered in the Fact Registry, the process reaches its final step.
To check fact finalization, we run a cron job for targeted tasks and call the isValid
method for the expected fact hash.
After verifying that the fact hash is registered in the Fact Registry, we proceed to authenticate the task execution. This involves verifying the outputs and ensuring they match the expected results.
Next, given proofs and the root of the Standard Merkle Tree of results and tasks, we verify the proof within the root to check if it is included in the batch.
After these two steps of validation, we can confirm that the targeted batched tasks are verified through all the Data Processor steps, and we can store the result on-chain. You can access the on-chain data in the on-chain mapping, which was computed and verified off-chain using a zk-proof.
Note: Turbo is under development. This section will be updated very soon.