Documentation

General Documentation of Data Processor

Repositories

Contract addresses

Ethereum Sepolia

Datalakes

A Datalake is a format used to describe a set of data points that are stored on-chain, somewhere in the blockchain's history. Essentially, it is how we define the data we want to run computations on. As there are different ways to describe on-chain data, there are different datalakes available.

BlockSampled Datalake

The BlockSampled datalake is used to extract a specific data point, over a range of blocks. The range of headers is defined, along with the data point that should be extracted for each block in the range. Datapoints can be extracted from the block header, an account, or a smart contract storage variable.

Structure:

  • blockRangeStart: Start block

  • blockRangeEnd: End Block (inclusive)

  • sampledProperty: Specifies the exact field to sample. The following are available:

    • header: Sample a field from a header

      • Available fields: ParentHash, OmmerHash, Beneficiary, StateRoot, TransactionsRoot, ReceiptsRoot, LogsBloom, Difficulty, Number, GasLimit, GasUsed, Timestamp, ExtraData, MixHash, Nonce, BaseFeePerGas, WithdrawalsRoot, BlobGasUsed, ExcessBlobGas, ParentBeaconBlockRoot

    • account: Samples a field from a specific account

      • Available fields: Nonce, Balance, StorageRoot, CodeHash,

    • storage: Sample a variable of a smart contract via address and storage slot

  • increment: Incremental step over the range from blockRangeStart and blockRangeEnd.

TransactionsInBlock Datalake

The TransactionsInBlock datalake is used to query a specific field, from all transactions of a specific block.

Structure:

  • targetBlock: The specific block number from which transactions are being sampled.

  • startIndex: The starting index of transactions within the block.

  • endIndex: The ending index of transactions within the block.

  • increment: Incremental step over the range from startIndex and endIndex.

  • includedTypes: - The transaction types to include in the query

    • Available types: Legacy, EIP2930, EIP1559, EIP4844

  • sampledProperty: The available fields depend on the TX type.

    • Use the types filter to prevent unavailable field errors:

    Transaction FieldLegacyEIP2930EIP1559EIP4844

    NONCE

    GAS_PRICE

    GAS_LIMIT

    RECEIVER

    VALUE

    INPUT

    V

    R

    S

    CHAIN_ID

    ACCESS_LIST

    MAX_FEE_PER_GAS

    MAX_PRIORITY_FEE_PER_GAS

    BLOB_VERSIONED_HASHES

    MAX_FEE_PER_BLOB_GAS

Aggregation Functions

To define a computation we want to run on the specified data, we need to select an aggregation function. This function will then be run over all of the extracted datapoints, defined in the datalake.

Available Aggregation Functions:

  • AVG: Averages a list of values.

  • MAX: Find the biggest value in a list.

  • MIN: Find the smallest value in a list.

  • SUM: Computes the sum of a list of values.

  • COUNT_IF: Takes an addition parameter context that encodes what is the counting condition.

    • The conditions are:

    • “00” → equality ==

    • “01” → non equality

    • “02” → Greater >

    • “03” → Greater or equal

    • “04” → Smaller <

    • “05” → Smaller or equal =<

Available compute modules (Cairo1):

  • SLR: Simple Linear Regression. Takes an additional context to specify the target index.

Planned compute modules:

  • MERKLE: Computes the Merkle root of a dataset.

Cairo1 <> Cairo0 Interoperability

A CASM program consists of two main components: bytecode (vec<bigint>) and pythonic hints (vec<string>). The bytecode includes all Cairo VM instructions and all deterministic inputs to the program, while the Pythonic hints are pieces of Python code that run at specified program counter values during execution.


Cairo0

Cairo0 is a general-purpose language with features like memory manipulation, segment creation, and Cairo assembly code injection. These capabilities allow the creation of programs that can:

  • Dynamically load other programs' bytecode into a designated memory segment.

  • Load the corresponding hints into the parent pythonic hints list, relocating the program counters of the pythonic hints so they execute at the correct program counter values.

During the execution of a Cairo0 program, a jmp abs command can jump to the entry point of the externally loaded bytecode, making the Cairo VM execute those instructions normally. The jmp command inherits the frame pointer (fp) of the parent function frame and the return program counter (ret_pc), so when the function calls the ret opcode, it returns to the parent function in the call stack. Once the externally loaded program finishes, execution proceeds with the bytecode of the original Cairo0 program.


Cairo1

In contrast, Cairo1 is a more user-friendly language, similar to Rust in terms of syntax. It is more abstract and less versatile, which simplifies the development of large projects with more readable code. Cairo1 offers structures like structs, traits, and implementations, and it is modular. However, it lacks the memory manipulation capabilities and CASM code injections of Cairo0, making it less flexible. Despite these limitations, Cairo1 provides advantages in terms of memory integrity and call stack validity, making it an easier and safer language for developers.


Combining Cairo0 and Cairo1

Imagine you have a Cairo0 program that heavily utilizes Cairo0's flexibility to efficiently manipulate, check, and interpret external data, as is the case with Data Processor. However, you want to load external program modules that use the data prepared and validated by the Cairo0 host (Data Processor). These external modules should not be able to modify the Cairo0 host runtime memory in an unauthorized way, whether by mistake or on purpose. Therefore, these modules need to be written in a safer, less versatile language. The perfect match for this requirement is Cairo1. Additionally, Cairo1 offers the familiar Rust syntax and all its development conveniences.


Interoperability

The natural need for creating interoperability between Cairo0 and Cairo1 arises. This is achieved by:

  • Using a Cairo0 bootloader: A program that dynamically loads external programs into itself and runs them.

  • Utilizing a Cairo1 compiler: Required to compile Cairo1 projects into CASM.

  • Employing Python utilities: To load Cairo1 programs into Cairo0 memory dynamically.

This approach combines the best of both worlds, leveraging Cairo0's flexibility and Cairo1's safety and ease of use.

Last updated