Blog
This blog explores how transitioning to a hybrid storage model with NFS* improved our block data retrieval times andhow Azure’s L-Series machines enabled us to efficiently process massive blockchain data, achieving a 48-hour turnaround on comprehensive statistics for all Ethereum addresses while keeping costs manageable.
Our NFS-based solution was a major leap forward in optimizing block data retrieval, but Ethereum’s vast address data introduced new challenges. Initially, Redis was our go-to solution for handling address metadata. Redis is ideal for this purpose because it allows us to quickly increment counters and set specific properties on each address. However, as our operations scaled, the costs associated with running large Redis instances became unsustainable. Managing 300 million addresses required approximately 750GB of RAM —an expense well beyond our project budget.
We initially tried to optimize costs by archiving data in batches, moving less frequently accessed data from Redis to blob storage. However, this approach significantly impacted performance; our block processing speed plummeted from 4,000 blocks per minute to just 400 — a tenfold decrease in speed and a corresponding tenfold increase in the time required to process all blocks. It became clear that while Redis was unmatched in speed, it was not practical at this scale due to the prohibitive cost of RAM. This is where Azure’s L-Series machines, equipped with NVMe storage, provided a compelling alternative. Although NVMe storage is slower than Redis, it offers a much more cost-effective solution for batch processing large datasets, especially during the backfill phase.
Ethereum’s ecosystem, with its millions of interactions and addresses, demands an infrastructure capable of high-speed processing without incurring prohibitive costs. The smallest Azure L8 machine, with 1.8TB of NVMe storage at $400 per month, provided sufficient capacity to handle our data processing needs. During the initial backfill, which involved processing and storing metadata for all 300 million Ethereum addresses, we used approximately 450GB of the available storage. This setup allowed us to process the entire Ethereum blockchain — including transactions, logs, and traces — across all addresses in under 48 hours.
To fully leverage the capabilities of the L-Series machines, we optimized our processing pipeline to manage Ethereum’s extensive data efficiently:
The ability to achieve a 48-hour turnaround on comprehensive stats across all Ethereum addresses has been a game-changer. This rapid processing capability enables us to quickly identify and react to any bugs or missed behaviours in our systems. If an issue arises, we can promptly reprocess the entire dataset, ensuring that our analyses remain accurate and up-to-date. This quick turnaround also facilitates the rapid expansion of our services. As we integrate new blockchains, we can confidently scale our infrastructure to handle the additional data, knowing that we can maintain the same level of performance and accuracy.
Following the backfill and the migration of data to Cloudflare R2, we scaled down the L8 machine to a more cost-effective F series configuration with 2 CPUs and 4GB of RAM, reducing our ongoing monthly costs to just $70. In this phase, we rely on Redis for state management, storing the most active addresses up to 70% of the VM’s RAM. Less active data is periodically offloaded to blob storage, ensuring that our system remains efficient and cost-effective while still capable of real-time processing. By selectively managing data in Redis and offloading older data to blob storage, we strike a balance between performance and cost, ensuring that our infrastructure can continue to scale with our growing needs. Plus when we are up to date with the blockchain we only need to process an average of 5 blocks a minute for Ethereum and that makes the Redis and Blob combination that can handel up to 400 blocks per minute an ideal and cost effective solution.
To maximize performance and cost efficiency, we carefully optimized the technical setup on our L-Series machines:
-i 1700
and -O large_dir
) to provide over 1 billion inodes on the 1.8TB drive. This configuration is more than capable of managing Ethereum’s 300 million addresses and supports efficient file operations at scale. We experimented with the XFS file system but found no significant performance improvements over ext4, so we standardized on ext4 across our infrastructure.Leveraging Azure’s L-Series machines has dramatically improved our blockchain data processing capabilities. The combination of the NFS server for quick access to full block data, NVMe storage, and a well-structured file system allowed us to process Ethereum’s entire blockchain — including detailed stats based on transactions, logs, and traces across all addresses — in under 48 hours. This rapid turnaround not only allows us to quickly address any issues but also provides the flexibility to expand our capabilities as needed. By migrating processed data to Cloudflare R2 and scaling down the L8 machine, we’ve also managed to maintain a cost-effective infrastructure that supports real-time processing.
In our next post, we’ll discuss the Address Metadata Endpoint, diving into its capabilities and exploring the various use cases it unlocks for developers, analysts, and other blockchain enthusiasts. This endpoint, built on the foundation of our optimized infrastructure, is poised to offer unparalleled insights into blockchain addresses and their interactions.
*NFS (Network File System) is a distributed file system protocol that allows a computer to access files over a network as if they were located on its local storage. NFS was originally developed by Sun Microsystems in the 1980s and has since become a widely used standard for sharing files across UNIX and Linux systems
Get our latest research, reports and event news delivered straight to your inbox.