Data Processor API
Documentation of Data Processor API
Note that this deployed API is not yet stable.
Data Processor API
The Data Processor API allows users to submit computational tasks combining data lakes and aggregate functions. This document outlines how to structure requests to the API, manage authentication, and interpret the parameters needed for successful data processing.
Authentication
Every call to the Data Processor API must include your API secret key. You can create your API key from the Herodotus Dashboard.
Submit Batch (/submit-batch
)
/submit-batch
)URL:
https://hdp.api.herodotus.cloud/submit-batch?apiKey={yourApiKey}
Method:
POST
This endpoint accepts a JSON payload containing one or more computational tasks. Each task specifies a data lake configuration and an aggregate function to process the data.
Request Structure
Requests to the endpoint are organized into batches. A batch can contain multiple tasks, each defined by a combination of a data lake type and an aggregate function. For a detailed explanation of each request field, refer to this page.
Example Request Body: Block Sampled Data Lake
Example: In the Ethereum Sepolia ("ETHEREUM_SEPOLIA") blockchain, calculate the average base_fee_per_gas
for blocks 5,515,000 to 5,515,039.
Example Request Body: Transactions in Block Data Lake
Example: In the Ethereum Sepolia ("ETHEREUM_SEPOLIA") blockchain, determine the maximum nonce
for transaction indices 10 to 40 in block 5,409,986.
Example Request Body: Module Task
Response
The endpoint returns a JSON object containing the batchId
and an array of taskHashes
. The task hashes are required for fetching the task results from the result map smart contract.
Example Response:
Request Parameters
destinationChainId: Defines the specific chain to which the result of your computation is delivered.
tasks: An array allowing you to define multiple tasks in one request. Each task will be processed in the same batch.
Task Fields
Each task object includes the following fields:
type: Defines the task type. Currently, we support
DatalakeCompute
andModule
.
For DatalakeCompute
Tasks:
datalake: Detailed data definition to compute over.
type: The type of data lake. For block sampled data, set to
BlockSampled
; for transactions in block data, set toTransactionsInBlock
.chainId: The chain ID the data should be sourced from, e.g.,
"ETHEREUM_SEPOLIA"
for Sepolia.blockRangeStart: Starting block number of the range.
blockRangeEnd: Ending block number of the range (inclusive).
sampledProperty: Specific property to sample. There are three types you can utilize:
header: Use the format
header.{specific_header_field}
. All RLP-decoded fields from the block header are available.account: Use the format
account.{target_address}.{specific_account_field}
. All RLP-decoded fields from the account are available.storage: Use the format
storage.{target_address}.{storage_slot}
. Given the target contract address, the property points to the value from the given storage slot as the key.
increment: Incremental step over the range from
blockRangeStart
toblockRangeEnd
. The default is 1.
compute:
aggregateFnId: The computation function that the task will execute. Available functions are:
avg
,sum
,min
,max
,count
.
For Module
Tasks:
programHash: The hash of the uploaded program to execute.
inputs: An array of input objects, each containing:
visibility: Specifies whether the input is
public
orprivate
.value: The value of the input parameter.
Matching Data Properties with Supported Functions
Note that not all RLP-decoded fields are compatible with all computations. Check out this function support matrix to ensure you are using a supported field.
Special Functions Requiring Context
count
count
The count
function performs operations over a specific value to compare.
operatorId: Operation symbol to filter the value set. Available operations are:
eq
(equal to==
)nq
(not equal to!=
)gt
(greater than>
)gteq
(greater than or equal to>=
)lt
(less than<
)lteq
(less than or equal to<=
)
valueToCompare: The value to compare against using the specified operator.
Example: Given the data lake, count the number of values greater than 1000000000000
.
Batch Status (/batch-query/{yourBatchId}
)
/batch-query/{yourBatchId}
)URL:
https://hdp.api.herodotus.cloud/batch-query/{yourBatchId}
Method:
GET
This endpoint allows you to query the current status of a submitted batch using the batchId
.
Available Statuses
Opened: The batch has been accepted and is initiated.
ProofsFetched: Successfully fetched proofs from the preprocessor and generated the corresponding PIE object.
CachedMmrRoot: Successfully cached the MMR root and MMR size used during the preprocessing step to the smart contract.
PieSubmittedToSHARP: Successfully submitted the PIE to SHARP.
FactRegisteredOnchain: The fact hash of the batch is registered in the fact registry contract.
Finalized: Successfully authenticated the fact hash and batch, and finalized the valid result on the contract mapping.
Finalized Result
Task hash values are returned in the /submit-batch
response. Use these hashes as identifiers to fetch your valid results after the job is finished. Once the task is finalized, you can use the taskHash
to query the result from the getFinalizedTaskResult
function of the contract.
More Examples
Compute the Maximum Balance of a Specific Account Over 100 Blocks
Query the Average Value of a Smart Contract's Variable Over 70 Blocks
Query the Average Value for Transaction Max Fee Per Blob Gas
Access Data Cross-Chain
By specifying the destination chain ID as L2, you can access data computed with the Data Processor. This L2 delivery is facilitated by the Storage Proof API.
To access computed data on Starknet, specify the destination chain as follows:
FAQ
When should I use the Data Processor instead of the original Herodotus API?
Depending on your use case, both products have pros and cons. If you intend to access data over large ranges of blocks, we recommend using the Data Processor. It is designed to handle large amounts of data at a much lower cost.
Last updated