What is a Module?

Data Processor Module Introduction

Introduction

Data Processor has been introduced in its basic form with data lake, which you can read about here. Despite its potential, the data lake type has limitations of flexibility to define the data set and computation.

Data Processor Modules introduces significant flexibility advancements. This include the introduction of custom modules, unrestricted access patterns, parallelization, caching, composition, private modules, and the ability to pull data from multiple chains simultaneously. A key technology underpinning the integrity of Data Processor is Storage Proofs, ensuring that all data used is verified and cannot be manipulated.

Why we built Modules

Limited Access Patterns

Data lakes offer a structured framework for defining access patterns to retrieve data for computations. Currently, these patterns are static, allowing simple queries like extracting the balance from a specific address over a block range. However, more complex queries, such as retrieving transaction volumes for the ETH/USDC pair only during blocks where the price of ETH exceeds $3000, are not supported. Implementing such dynamic queries would require creating a custom data lake for each scenario, which is unsustainable long-term.

Custom Compute Logic

Compute modules are designed to process arrays of values derived from a data lake. For example, they can calculate the average balance of a specific address over a block range. Currently, integrating custom modules requires coordination with the Herodotus team, which is not a scalable solution. To address this, we aim to open up the capabilities of the underlying Cairo VM to developers, allowing them to write custom Cairo1 code while still accessing verifiable on-chain data.

Data Processor Runtime

To overcome these challenges, we have developed a new Cairo1 runtime, available as a Data Processor compute module. This runtime enables developers to write Cairo1 code similar to crafting smart contracts. By adhering to a predefined trait, developers can implement custom logic and access low-level Cairo1 system calls, such as the contract_call syscall. This connects to a precompiled contract supporting arbitrary cross-chain and historical data queries, allowing developers to inject bytes directly from the verifier to ensure public inputs are appropriately passed through to computations.

Data Processor Modules Cairo1 Package

We provide module interface as Cairo1 Package to developers for ease-use. Still, backbone program is written in cairo 0 for efficiency. For further read regarding interoperability.

A user-friendly Cairo1 package will abstract away Data Processor-specific operations, making it easier to interact with the system:

let difficulty = hdp
            .header_memorizer
            .get_difficulty(HeaderKey { chain_id: 1, block_number: block_number.into() });

Runtime Usage Example

#[starknet::contract]
mod get_balance {
    use hdp_cairo::memorizer::account_memorizer::AccountMemorizerTrait;
    use hdp_cairo::{HDP, memorizer::account_memorizer::{AccountKey, AccountMemorizerImpl}};


    #[storage]
    struct Storage {}

    #[external(v0)]
    pub fn main(ref self: ContractState, hdp: HDP, input: Array<felt252>) -> u256 {
        let mut input_span = input.span();
        let mut input = Serde::<Input>::deserialize(ref input_span).unwrap();

        // Custom Cairo1 code using input of the program
        let difficulty =  hdp
            .header_memorizer
            .get_difficulty(HeaderKey { chain_id: 1, block_number: block_number.into() })

        // More Cairo1 code
        let smart_contract_creator =  hdp
            .storage_memorizer
            .get_slot(
                StorageKey {
                    chain_id: 11155111, block_number: block_number.into(), contract_address, owner_slot
                }
            );
        assert(smart_contract_creator == MY_FRIEND_ADDRESS);

        // Final Cairo1 code forming output of the program
        output
    }
}

Memorizer

Data Processor Custom Module is influenced by the design of the memorizer, which dynamically fetches data. It uses deterministic key generation for different data types and supports various access patterns for transactions.

Key Derive

Key derivation for different data types:

Header: h(chain_id, block_number)
Account: h(chain_id, block_number, address)
Storage Slot: h(chain_id, block_number, address, slot)
Transactions: Different access patterns, such as sender-based and block-based approaches.

Dry-run Logic

Dry-run logic involves evaluating state access through a dynamic dry-run approach, logging requested data, and generating inclusion proofs. This ensures correct data is available for program execution.

Key Hashing

Keys are hashed to a felt value for use in DictAccess. The hash function balances collision resistance and reduced step count.

Data Processor Modules Custom `call_contract_syscall` Specification

Data Processor is using custom call_contract_syscall under the hood from the abstracted cairo 1 library. The syscall reads multiple dictionaries at a specified key of type felt252, retrieving data stored under that key as an Array<felt252>:

enum MemorizerAddress {
    ACCOUNT,
    STORAGE,
    HEADER,
}

enum MemorizerAction {
    READ,
}

pub extern fn call_contract_syscall(
    address: ContractAddress, entry_point_selector: felt252, calldata: Span<felt252>
) -> SyscallResult<Span<felt252>> implicits(GasBuiltin, System) nopanic;

Architecture: Preprocess & Trace Generation

1. Evaluate State Access (Dry run)

Run the program in dry run mode to log requested state and pass it to the compute module. Generate the proof for proper trace generation.

2. Generate Inclusion Proofs (Preprocess)

Generate inclusion proofs for uncached data and include them in an input.json file.

3. TrieCache

TrieCache handles dependencies between batches, ensuring correct state transition proofs.

4. Generate Trace

Workers generate the trace, requiring a valid input.json file. This computationally expensive task will eventually require a worker pool for parallel operations.

PreviousWhat is a Data Lake?NextCairo1/Cairo0 Interoperability

Last updated 1 year ago