Data Processor API

Documentation of Data Processor API

Note that this deployed API is not yet stable.

Data Processor API

The Data Processor API allows users to submit computational tasks combining data lakes and aggregate functions. This document outlines how to structure requests to the API, manage authentication, and interpret the parameters needed for successful data processing.

Authentication

Every call to Data-Processor API must include the API secret key. You can create your API key from the Herodotus Dashboard.

Submit Batch (/submit-batch)

  • URL: https://hdp.api.herodotus.cloud/submit-batch?apiKey={yourApiKey}

  • Method: POST

This endpoint accepts a JSON payload containing one or more computational tasks. Each task specifies a data lake configuration and an aggregate function to process the data.

Request Type

Requests to the endpoint are organized into batches. A batch can contain multiple tasks, each defined by a combination of a data lake type and an aggregate function. For a detailed explanation of each request field, refer to this page.

Example Request Body

Example: In the Ethereum Sepolia (11155111) blockchain, calculate the average base_fee_per_gas for blocks 5515000 to 5515031 and determine the maximum nonce for transaction indices 0 to 10 in block 5515001.

{
  "deliveryChainId": 11155111,
  "sourceChainId": 11155111,
  "tasks": [
    {
      "datalakeType": "block_sampled",
      "datalake": {
        "blockRangeStart": 5515020,
        "blockRangeEnd": 5515039,
        "sampledProperty": "header.base_fee_per_gas"
      },
      "aggregateFnId": "avg"
    },
    {
      "datalakeType": "transactions_in_block",
      "datalake": {
        "targetBlock": 5409986,
        "startIndex": 10,
        "endIndex": 40,
        "increment": 10,
        "includedTypes": {
          "legacy": true,
          "eip2930": true,
          "eip1559": true,
          "eip4844": true
        },
        "sampledProperty": "tx.nonce"
      },
      "aggregateFnId": "max"
    }
  ]
}

Return Type

The endpoint returns a JSON object, containing the batchId and an array of taskHashes. The task hashes are required for fetching the task result from the result map smart contract.

Example Response Value:

{
  "batchId": "01HZEHTJPR68JRTSKKFTPF1R5D",
  "taskHashes": [
    "0xf67f667be3ff4153a1ce7843b826a7d29db95b4a2c6076a16684db626b371a36"
  ]
}

Request Parameter

  • sourceChainId : Defines the specific chain from which the desired data is sourced. You will compute data from the chain specified in this field.

  • deliveryChainId : Defines the specific chain to which the result of your computation is delivered.

  • tasks : An array type, allowing you to define multiple tasks in one request. Each task will be processed in the same batch.

Now, let's dive into each field of a task:

  • datalakeType : Defines the type of data lake. Based on the type, the allowed specific fields will differ. Currently, we support block_sampled and transactions_in_block.

  • datalake : Detailed data definition to compute over.

    • block_sampled : You can specify the property you would like to compute over a large range of blocks.

      • blockRangeStart : Start block number of the range.

      • blockRangeEnd : End block number of the range. Note that this is an inclusive index.

      • sampledProperty : Specific property. There are three types you can utilize:

        • header: header.{specific_header_field} format. All RLP-decoded fields from the block header are available.

        • account: account.{target_address}.{specific_account_field}. All RLP-decoded fields from the account are available. The target address is where the data will be parsed from.

        • storage: storage.{target_address}.{storage_slot}.Given the target contract address, the property points to the value from the given storage slot as the key.

      • increment: Incremental step over the range from blockRangeStart and blockRangeEnd. The default is 1.

    • transactions_in_block: Specifies the property of transactions to compute over a range of transaction indices within a certain block number.

      • datalake: Detailed data definition to compute over.

  • aggregateFnId:The computation index that the task will execute. Available functions are: avg, sum, min, max, count, slr.

Matching Data Properties with Supported Functions

Note that not all RLP-decoded fields are compatible with all computations. Check out this matrix to ensure you are using a supported field.

Special Functions Requiring Context

Currently, two functions require additional values when defining the task.

1. COUNT: The function performs certain operations over a specific value to compare.

  • operatorId : Operation symbol to filter the value set. Available operations are eq(=), nq(!=), gt(>), gteq(>=), lt(<), lteq(<=).

  • valueToCompare: Value to compare against using the specified operator.

Example: Given the data lake, count the number of values greater than 100000.

{
  "aggregateFnId": "count",
  "aggregateFnCtx": {
    "operatorId": "gt",
    "valueToCompare": "100000"
  }
}

2. SLR: The function computes simple linear regression over the given data lake and outputs via the given target index.

  • operatorId : In this case, set it to "none".

  • valueToCompare: Target value to output from the linear regression.

Example: Given the data lake, compute the linear regression model when the index is 100000.

{
  "aggregateFnId": "slr",
  "aggregateFnCtx": {
    "operatorId": "none",
    "valueToCompare": "100000"
  }
}

Batch Status (/status)

  • URL: https://hdp.api.herodotus.cloud/status?batchId={yourBatchId}

  • Method: GET

An endpoint for querying the current status of a submitted batch by batchId.

Available Statuses:

  1. Opened: When the batch is first accepted, it initiates with an opened status.

  2. ProofsFetched: Successfully fetched proofs from the preprocessor and generated the corresponding PIE object.

  3. CachedMmrRoot: Successfully cached the MMR root and MMR size used during the preprocessing step to the smart contract.

  4. PieSubmittedToSHARP: Successfully submitted the PIE to SHARP.

  5. FactRegisteredOnchain: The fact hash of the batch is registered in the fact registry contract.

  6. Finalized: Successfully authenticated the fact hash and batch, and finalized the valid result on the contract mapping.

Finalized Result

Task hash values are returned in the /submit-batch response. Use these hashes as identifiers to fetch your valid results after the job is finished. Once the task is finalized, you can use the taskHash to query the result from getFinalizedTaskResult.

More examples:

Compute the maximal balance of a specific account over 100 blocks

{
  "deliveryChainId": 11155111,
  "sourceChainId": 11155111,
  "tasks": [
    {
      "datalakeType": "block_sampled",
      "datalake": {
        "blockRangeStart": 5515000,
        "blockRangeEnd": 5515101,
        "sampledProperty": "account.0x7f2c6f930306d3aa736b3a6c6a98f512f74036d4.balance"
      },
      "aggregateFnId": "max"
    }
  ]
}

Query the maximum value for a smart contracts variable over 70 blocks

{
  "deliveryChainId": 11155111,
  "sourceChainId": 11155111,
  "tasks": [
    {
      "datalakeType": "block_sampled",
      "datalake": {
        "blockRangeStart": 5515000,
        "blockRangeEnd": 5515071,
        "sampledProperty": "storage.0x75cec1db9dceb703200eaa6595f66885c962b920.0x0000000000000000000000000000000000000000000000000000000000000001"
      },
      "aggregateFnId": "max"
    }
  ]
}

Query average value for transaction max fee per blob gas

{
  "deliveryChainId": 11155111,
  "sourceChainId": 11155111,
  "tasks": [
    {
      "datalakeType": "transactions_in_block",
      "datalake": {
        "targetBlock": 5858987,
        "startIndex": 91,
        "endIndex": 100,
        "increment": 6,
        "includedTypes": {
          "legacy": false,
          "eip2930": false,
          "eip1559": false,
          "eip4844": true
        },
        "sampledProperty": "tx.max_fee_per_blob_gas"
      },
      "aggregateFnId": "avg"
    }
  ]
}

Access Data Cross-Chain

By specifying the delivery chain ID as L2, you can access data computed with the Data Processor. This L2 delivery is facilitated by the Storage Proof API.

To access computed data on Starknet, specify the delivery chain as follows:

{
  "deliveryChainId": "SN_SEPOLIA",
  "sourceChainId": 11155111
}

FAQ

  • When should I use Data Processor instead of the original Herodotus API?

    • Depending on your use case, both products have pros and cons. If you intend to access data over large ranges of blocks, we recommend using Data Processor. It is designed to handle large amounts of data at a much lower cost.

Last updated