Data Processor API

Documentation of Data Processor API

Note that this deployed API is not yet stable.

Data Processor API

The Data Processor API allows users to submit computational tasks combining data lakes and aggregate functions. This document outlines how to structure requests to the API, manage authentication, and interpret the parameters needed for successful data processing.

Authentication

Every call to the Data Processor API must include your API secret key. You can create your API key from the Herodotus Dashboard.

Submit Batch (/submit-batch)

  • URL: https://hdp.api.herodotus.cloud/submit-batch?apiKey={yourApiKey}

  • Method: POST

This endpoint accepts a JSON payload containing one or more computational tasks. Each task specifies a data lake configuration and an aggregate function to process the data.

Request Structure

Requests to the endpoint are organized into batches. A batch can contain multiple tasks, each defined by a combination of a data lake type and an aggregate function. For a detailed explanation of each request field, refer to this page.

Example Request Body: Block Sampled Data Lake

Example: In the Ethereum Sepolia ("ETHEREUM_SEPOLIA") blockchain, calculate the average base_fee_per_gas for blocks 5,515,000 to 5,515,039.

{
  "destinationChainId": "ETHEREUM_SEPOLIA",
  "tasks": [
    {
      "type": "DatalakeCompute",
      "datalake": {
        "type": "BlockSampled",
        "chainId": "ETHEREUM_SEPOLIA",
        "blockRangeStart": 5515000,
        "blockRangeEnd": 5515039,
        "increment": 1,
        "sampledProperty": "header.base_fee_per_gas"
      },
      "compute": {
        "aggregateFnId": "avg"
      }
    }
  ]
}

Example Request Body: Transactions in Block Data Lake

Example: In the Ethereum Sepolia ("ETHEREUM_SEPOLIA") blockchain, determine the maximum nonce for transaction indices 10 to 40 in block 5,409,986.

{
  "destinationChainId": "ETHEREUM_SEPOLIA",
  "tasks": [
    {
      "type": "DatalakeCompute",
      "datalake": {
        "type": "TransactionsInBlock",
        "chainId": "ETHEREUM_SEPOLIA",
        "targetBlock": 5409986,
        "startIndex": 10,
        "endIndex": 40,
        "increment": 10,
        "includedTypes": {
          "legacy": true,
          "eip2930": true,
          "eip1559": true,
          "eip4844": true
        },
        "sampledProperty": "tx.nonce"
      },
      "compute": {
        "aggregateFnId": "max"
      }
    }
  ]
}

Example Request Body: Module Task

{
  "destinationChainId": "ETHEREUM_SEPOLIA",
  "tasks": [
    {
      "type": "Module",
      "programHash": "0xaae117f9cdfa4fa4d004f84495c942adebf17f24aec8f86e5e6ea29956b47e",
      "inputs": [
        {
          "visibility": "public",
          "value": "0x3"
        },
        {
          "visibility": "public",
          "value": "0x5222A4"
        },
        {
          "visibility": "public",
          "value": "0x5222A7"
        },
        {
          "visibility": "public",
          "value": "0x5222C4"
        },
        {
          "visibility": "public",
          "value": "0x13cb6ae34a13a0977f4d7101ebc24b87bb23f0d5"
        }
      ]
    }
  ]
}

Response

The endpoint returns a JSON object containing the batchId and an array of taskHashes. The task hashes are required for fetching the task results from the result map smart contract.

Example Response:

{
  "batchId": "bq_01HZEHTJPR68JRTSKKFTPF1R5D",
  "taskHashes": [
    "0xf67f667be3ff4153a1ce7843b826a7d29db95b4a2c6076a16684db626b371a36"
  ]
}

Request Parameters

  • destinationChainId: Defines the specific chain to which the result of your computation is delivered.

  • tasks: An array allowing you to define multiple tasks in one request. Each task will be processed in the same batch.

Task Fields

Each task object includes the following fields:

  • type: Defines the task type. Currently, we support DatalakeCompute and Module.

For DatalakeCompute Tasks:

  • datalake: Detailed data definition to compute over.

    • type: The type of data lake. For block sampled data, set to BlockSampled; for transactions in block data, set to TransactionsInBlock.

    • chainId: The chain ID the data should be sourced from, e.g., "ETHEREUM_SEPOLIA" for Sepolia.

    • blockRangeStart: Starting block number of the range.

    • blockRangeEnd: Ending block number of the range (inclusive).

    • sampledProperty: Specific property to sample. There are three types you can utilize:

      • header: Use the format header.{specific_header_field}. All RLP-decoded fields from the block header are available.

      • account: Use the format account.{target_address}.{specific_account_field}. All RLP-decoded fields from the account are available.

      • storage: Use the format storage.{target_address}.{storage_slot}. Given the target contract address, the property points to the value from the given storage slot as the key.

    • increment: Incremental step over the range from blockRangeStart to blockRangeEnd. The default is 1.

  • compute:

    • aggregateFnId: The computation function that the task will execute. Available functions are: avg, sum, min, max, count.

For Module Tasks:

  • programHash: The hash of the uploaded program to execute.

  • inputs: An array of input objects, each containing:

    • visibility: Specifies whether the input is public or private.

    • value: The value of the input parameter.

Matching Data Properties with Supported Functions

Note that not all RLP-decoded fields are compatible with all computations. Check out this function support matrix to ensure you are using a supported field.

Special Functions Requiring Context

count

The count function performs operations over a specific value to compare.

  • operatorId: Operation symbol to filter the value set. Available operations are:

    • eq (equal to ==)

    • nq (not equal to !=)

    • gt (greater than >)

    • gteq (greater than or equal to >=)

    • lt (less than <)

    • lteq (less than or equal to <=)

  • valueToCompare: The value to compare against using the specified operator.

Example: Given the data lake, count the number of values greater than 1000000000000.

{
  "aggregateFnId": "count",
  "aggregateFnCtx": {
    "operator": "gt",
    "valueToCompare": "1000000000000"
  }
}

Batch Status (/batch-query/{yourBatchId})

  • URL: https://hdp.api.herodotus.cloud/batch-query/{yourBatchId}

  • Method: GET

This endpoint allows you to query the current status of a submitted batch using the batchId.

Available Statuses

  1. Opened: The batch has been accepted and is initiated.

  2. ProofsFetched: Successfully fetched proofs from the preprocessor and generated the corresponding PIE object.

  3. CachedMmrRoot: Successfully cached the MMR root and MMR size used during the preprocessing step to the smart contract.

  4. PieSubmittedToSHARP: Successfully submitted the PIE to SHARP.

  5. FactRegisteredOnchain: The fact hash of the batch is registered in the fact registry contract.

  6. Finalized: Successfully authenticated the fact hash and batch, and finalized the valid result on the contract mapping.

Finalized Result

Task hash values are returned in the /submit-batch response. Use these hashes as identifiers to fetch your valid results after the job is finished. Once the task is finalized, you can use the taskHash to query the result from the getFinalizedTaskResult function of the contract.

More Examples

Compute the Maximum Balance of a Specific Account Over 100 Blocks

{
  "destinationChainId": "ETHEREUM_SEPOLIA",
  "tasks": [
    {
      "type": "DatalakeCompute",
      "datalake": {
        "type": "BlockSampled",
        "chainId": "ETHEREUM_SEPOLIA",
        "blockRangeStart": 5515000,
        "blockRangeEnd": 5515100,
        "sampledProperty": "account.0x7f2c6f930306d3aa736b3a6c6a98f512f74036d4.balance"
      },
      "compute": {
        "aggregateFnId": "max"
      }
    }
  ]
}

Query the Average Value of a Smart Contract's Variable Over 70 Blocks

{
  "destinationChainId": "ETHEREUM_SEPOLIA",
  "tasks": [
    {
      "type": "DatalakeCompute",
      "datalake": {
        "type": "BlockSampled",
        "chainId": "ETHEREUM_SEPOLIA",
        "blockRangeStart": 5515000,
        "blockRangeEnd": 5515070,
        "increment": 1,
        "sampledProperty": "storage.0x75cec1db9dceb703200eaa6595f66885c962b920.0x0000000000000000000000000000000000000000000000000000000000000001"
      },
      "compute": {
        "aggregateFnId": "avg"
      }
    }
  ]
}

Query the Average Value for Transaction Max Fee Per Blob Gas

{
  "destinationChainId": "ETHEREUM_SEPOLIA",
  "tasks": [
    {
      "type": "DatalakeCompute",
      "datalake": {
        "type": "TransactionsInBlock",
        "chainId": "ETHEREUM_SEPOLIA",
        "targetBlock": 5858987,
        "startIndex": 0,
        "endIndex": 100,
        "increment": 1,
        "includedTypes": {
          "legacy": false,
          "eip2930": false,
          "eip1559": false,
          "eip4844": true
        },
        "sampledProperty": "tx.max_fee_per_blob_gas"
      },
      "compute": {
        "aggregateFnId": "avg"
      }
    }
  ]
}

Access Data Cross-Chain

By specifying the destination chain ID as L2, you can access data computed with the Data Processor. This L2 delivery is facilitated by the Storage Proof API.

To access computed data on Starknet, specify the destination chain as follows:

{
  "destinationChainId": "SN_SEPOLIA"
}

FAQ

  • When should I use the Data Processor instead of the original Herodotus API?

    Depending on your use case, both products have pros and cons. If you intend to access data over large ranges of blocks, we recommend using the Data Processor. It is designed to handle large amounts of data at a much lower cost.

Last updated