What is a Module?
Data Processor Module Introduction
Introduction
Data Processor has been introduced in its basic form with data lake, which you can read about here. Despite its potential, the data lake type has limitations of flexibility to define the data set and computation.
Data Processor Modules introduces significant flexibility advancements. This include the introduction of custom modules, unrestricted access patterns, parallelization, caching, composition, private modules, and the ability to pull data from multiple chains simultaneously. A key technology underpinning the integrity of Data Processor is Storage Proofs, ensuring that all data used is verified and cannot be manipulated.
Why we built Modules
Limited Access Patterns
Data lakes offer a structured framework for defining access patterns to retrieve data for computations. Currently, these patterns are static, allowing simple queries like extracting the balance from a specific address over a block range. However, more complex queries, such as retrieving transaction volumes for the ETH/USDC pair only during blocks where the price of ETH exceeds $3000, are not supported. Implementing such dynamic queries would require creating a custom data lake for each scenario, which is unsustainable long-term.
Custom Compute Logic
Compute modules are designed to process arrays of values derived from a data lake. For example, they can calculate the average balance of a specific address over a block range. Currently, integrating custom modules requires coordination with the Herodotus team, which is not a scalable solution. To address this, we aim to open up the capabilities of the underlying Cairo VM to developers, allowing them to write custom Cairo1 code while still accessing verifiable on-chain data.
Data Processor Runtime
To overcome these challenges, we have developed a new Cairo1 runtime, available as a Data Processor compute module. This runtime enables developers to write Cairo1 code similar to crafting smart contracts. By adhering to a predefined trait, developers can implement custom logic and access low-level Cairo1 system calls, such as the contract_call
syscall. This connects to a precompiled contract supporting arbitrary cross-chain and historical data queries, allowing developers to inject bytes directly from the verifier to ensure public inputs are appropriately passed through to computations.
Data Processor Modules Cairo1 Package
We provide module interface as Cairo1 Package to developers for ease-use. Still, backbone program is written in cairo 0 for efficiency. For further read regarding interoperability.
A user-friendly Cairo1 package will abstract away Data Processor-specific operations, making it easier to interact with the system:
Runtime Usage Example
Memorizer
Data Processor Custom Module is influenced by the design of the memorizer, which dynamically fetches data. It uses deterministic key generation for different data types and supports various access patterns for transactions.
Key Derive
Key derivation for different data types:
Header:
h(chain_id, block_number)
Account:
h(chain_id, block_number, address)
Storage Slot:
h(chain_id, block_number, address, slot)
Transactions: Different access patterns, such as sender-based and block-based approaches.
Dry-run Logic
Dry-run logic involves evaluating state access through a dynamic dry-run approach, logging requested data, and generating inclusion proofs. This ensures correct data is available for program execution.
Key Hashing
Keys are hashed to a felt value for use in DictAccess. The hash function balances collision resistance and reduced step count.
Data Processor Modules Custom call_contract_syscall
Specification
call_contract_syscall
SpecificationData Processor is using custom call_contract_syscall
under the hood from the abstracted cairo 1 library. The syscall reads multiple dictionaries at a specified key of type felt252
, retrieving data stored under that key as an Array<felt252>
:
Architecture: Preprocess & Trace Generation
1. Evaluate State Access (Dry run)
Run the program in dry run mode to log requested state and pass it to the compute module. Generate the proof for proper trace generation.
2. Generate Inclusion Proofs (Preprocess)
Generate inclusion proofs for uncached data and include them in an input.json file.
3. TrieCache
TrieCache handles dependencies between batches, ensuring correct state transition proofs.
4. Generate Trace
Workers generate the trace, requiring a valid input.json file. This computationally expensive task will eventually require a worker pool for parallel operations.
Last updated