Blog
Welcome to our latest blog series on blockchain data, shedding light on the complexities involved in managing and utilising blockchain data, from the extraction of raw data to its transformation into actionable insights for diverse applications.
Welcome to our latest blog series on blockchain data analytics and management. This series aims to shed light on the complexities involved in managing and utilising blockchain data, from the extraction of raw data directly from the blockchain to its transformation into actionable insights for diverse applications.
This series is broken into three parts:
This series is designed for anyone curious about how blockchain data works and its impact on the digital asset space. Whether you're new to blockchain or looking to deepen your knowledge, we invite you to join us as we explore the crucial role of data in unlocking the potential of blockchain technology.
The first part of this series, Ingesting Blockchain Data – The Foundations of On-Chain Intelligence, explores the basics of data ingestion, the first step in understanding blockchain activities. We'll show you how we connect to blockchain networks, process real-time data, and ensure that every piece of information is accurate and reliable.
In the digital asset sector , data ingestion plays a key role in driving the information we obtain from on-chain activities. At CCData, our proficiency lies in capturing this real-time data with precision and reliability, ensuring that each transaction and block is accounted for and processed accurately. In this blog post, we explore the mechanics of our data ingestion process, which is the foundation for the advanced analytics and DeFi integrations that follow.
To collect blockchain data, we must first establish secure and reliable connections with blockchain nodes. This task involves more than simply maintaining a continuous link to the node; it also requires ensuring the robustness and resilience of these connections.
Our multi-source node ingestion system is designed to connect to multiple blockchain data sources. For Ethereum, the primary nodes we operate are Nethermind and Geth, which we augment with external RPC providers such as QuickNode. This multi-node multi-source strategy is crucial for two reasons:
We establish node connections using a combination of polling and subscription-based methods. Polling allows us to actively query nodes for new blocks at regular intervals, while subscriptions use WebSocket connections to receive new data as soon as it's broadcasted by the node.
To handle the real-time nature of blockchain data, we've implemented a streamlined ingestion process that captures blocks as they are propagated on the network.
Our input system is tasked with the initial reception of data. It's built to process high-volume requests efficiently, ensuring minimal latency between block creation and data capture.
We run one input per data source so for Ethereum, we have an input for our Nethermind node, one for our Geth node and one for the QuickNode RPC endpoint.
Upon receipt, the data undergoes preliminary validation to ensure structural integrity. This includes checks for data completeness and format correctness before it's passed onto the queuing system.
We use Redis for inter-process communication and for state storage due to its exceptional performance characteristics as an in-memory data store, which are ideal for handling the velocity and volume of incoming blockchain data.
We employ Redis lists, utilising LPUSH and BRPOPLPUSH commands for managing our data queues. This allows us to maintain a FIFO (First-In-First-Out) structure, which is essential for preserving the chronological order of the blockchain data.
The integrity of data within the queues is paramount. To ensure this, we implement a combination of Redis transactions and hash sets. Transactions are used to execute a sequence of commands atomically, while hash sets allow us to efficiently manage block metadata and track the last processed block number.
Given the performance-critical nature of the system, we continuously monitor and tune the Redis instance. This includes optimising memory usage and managing data persistence to balance speed with reliability.
Blockchain reorganisations, or reorgs, are events where the blockchain diverges into two potential paths due to temporary discrepancies in block additions by different miners. Handling reorgs is crucial as they can lead to inconsistencies in data if not managed properly.
Our processBlockReorgRecursively function stands at the heart of our reorg handling strategy. This recursive method ensures that any changes in the blockchain due to reorgs are tracked and managed efficiently. In the case of a reorg, the affected block and all subsequent blocks up to the latest correct block are reprocessed to ensure data integrity.
To manage reorgs effectively, we employ the following steps:
This approach to handling reorgs is a crucial part of our blockchain data ingestion process, allowing us to have accurate and reliable data down the line.
The process of ingesting blockchain data is an intricate yet understated element of the data lifecycle. It’s the foundation upon which all further data processing, analysis, and integration are built. By ensuring that this first step is executed flawlessly, we lay the groundwork for the advanced on-chain analytics and DeFi integrations that empower our clients to make informed decisions.
In this blog post, we've taken a deep dive into the technical intricacies of blockchain data ingestion, emphasising the importance of handling reorgs with precision. As we continue to evolve our processes and technologies, our commitment to data accuracy and integrity remains unwavering, providing our clients with the most reliable and actionable on-chain data available.
If you’re interested in learning more about CCData’s market-leading data solutions and indices, please contact us directly.
Get our latest research, reports and event news delivered straight to your inbox.