What is a Module?
Data Processor Module Introduction
Last updated
Data Processor Module Introduction
Last updated
Data Processor has been introduced in its basic form with data lake, which you can read about . Despite its potential, the data lake type has limitations of flexibility to define the data set and computation.
Data Processor Modules introduces significant flexibility advancements. This include the introduction of custom modules, unrestricted access patterns, parallelization, caching, composition, private modules, and the ability to pull data from multiple chains simultaneously. A key technology underpinning the integrity of Data Processor is Storage Proofs, ensuring that all data used is verified and cannot be manipulated.
Data lakes offer a structured framework for defining access patterns to retrieve data for computations. Currently, these patterns are static, allowing simple queries like extracting the balance from a specific address over a block range. However, more complex queries, such as retrieving transaction volumes for the ETH/USDC pair only during blocks where the price of ETH exceeds $3000, are not supported. Implementing such dynamic queries would require creating a custom data lake for each scenario, which is unsustainable long-term.
Compute modules are designed to process arrays of values derived from a data lake. For example, they can calculate the average balance of a specific address over a block range. Currently, integrating custom modules requires coordination with the Herodotus team, which is not a scalable solution. To address this, we aim to open up the capabilities of the underlying Cairo VM to developers, allowing them to write custom Cairo1 code while still accessing verifiable on-chain data.
To overcome these challenges, we have developed a new Cairo1 runtime, available as a Data Processor compute module. This runtime enables developers to write Cairo1 code similar to crafting smart contracts. By adhering to a predefined trait, developers can implement custom logic and access low-level Cairo1 system calls, such as the contract_call
syscall. This connects to a precompiled contract supporting arbitrary cross-chain and historical data queries, allowing developers to inject bytes directly from the verifier to ensure public inputs are appropriately passed through to computations.
We provide module interface as Cairo1 Package to developers for ease-use. Still, backbone program is written in cairo 0 for efficiency. For regarding interoperability.
A user-friendly Cairo1 package will abstract away Data Processor-specific operations, making it easier to interact with the system:
Data Processor Custom Module is influenced by the design of the memorizer, which dynamically fetches data. It uses deterministic key generation for different data types and supports various access patterns for transactions.
Key derivation for different data types:
Header: h(chain_id, block_number)
Account: h(chain_id, block_number, address)
Storage Slot: h(chain_id, block_number, address, slot)
Transactions: Different access patterns, such as sender-based and block-based approaches.
Dry-run logic involves evaluating state access through a dynamic dry-run approach, logging requested data, and generating inclusion proofs. This ensures correct data is available for program execution.
Keys are hashed to a felt value for use in DictAccess. The hash function balances collision resistance and reduced step count.
call_contract_syscall
SpecificationData Processor is using custom call_contract_syscall
under the hood from the abstracted cairo 1 library. The syscall reads multiple dictionaries at a specified key of type felt252
, retrieving data stored under that key as an Array<felt252>
:
Run the program in dry run mode to log requested state and pass it to the compute module. Generate the proof for proper trace generation.
Generate inclusion proofs for uncached data and include them in an input.json file.
TrieCache handles dependencies between batches, ensuring correct state transition proofs.
Workers generate the trace, requiring a valid input.json file. This computationally expensive task will eventually require a worker pool for parallel operations.