Exchange Integrations

The first thing a lot of our developers do when they join CCDatais to integrate a new exchange. It’s our bread and butter and it’s the first thing that we ever developed as a company. We needed data before we had an API and website. The blue processes in the diagram below are the services we will cover in the next blog posts.

As you can imagine, it’s the core part of our software that everything else is built upon. We’ve kept improving it over the years. From bespoke integrations to integrations with common code, to templated integrationsto sharded integrations. We’ve moved mapping logic from the base layer to a middle layer and now to the edge layers. We’ve developed internal tools to manage exchanges, manage instrument mapping, manage integrations, manage data recovery. It’s been our most active area of developmentthroughout our companies history.

‍

Simplified exchange integration diagram — our V2 system

‍

Why is it so hard to get right and why does it need so many iterations you might ask. Well, it’s because we have very stringent requirements from our customers and from the regulators. You can’t be the primary data provider of the cryptocurrency industry and have low standards.

There are two things that historically we have valued above everything else: Reliability and Broad Data Sets.

Reliability —if everything fails, an integration that is consuming data, will continue to consume data

No single point of failure — two mirrored data centers: Azure and OVH
Issues with our internal infrastructure/networking should not impact data consumption — local data is all that is needed to run an integration
After the initial setup, an exchange integration should have all the information it needs to work in isolation — state sync service that keeps local data in sync with the central cluster
Higher adoption of our API should not impact data consumption — decoupled databases (only using replicas for reading) and Data Distributor / Router
Always have access to data if the instance is up / no rate limits — Proxy Swarm
Process up to 50k incoming and 150k outgoing messages per second per exchange (trade, order book, funding rate, etc.) — Exchange Sharding

Broad data sets — as much data as possible:

Split across multiple instances per exchange — Exchange Discovery Service
As many data points as the exchanges allow us to have, the more granular the better — Proxy Swarm
As much precision as the exchanges allow us to have — Multiple integrations per data type
Orchestrated and balanced by our internal team — Exchange Discovery Dashboard

The diagram above is not that far off what we originally had in 2014 when we started however now it is a lot better thought out. In the last 7 years, we've come around to rely a lot more heavily on Redis (it is one of the best pieces of software ever written). In our original design we had everything saved in files, but in the time since we’ve realized it’s almost impossible to crash Redis and it is a lot faster and easier to work with than files.

Big changes since older versions:

A lot more monitoring and stats (Icinga + Grafana)
Centralized logging (GrayLog)
Instrument Mapping moved to higher levels (Instrument Mapping Dashboard)
Packaged deployment (Jenkins + fpm -> Systemd package in Aptly -> apt-get install on the instance)
Redis instead of files
Multiple inputs with deduplication instead of one input
Sharded exchange integrations
One publishing and aggregation service instead of multiple
Redis pub-sub for message routing and distribution
Templated integrations and services
Instrument metadata: first seen, last seen, sequence number, first message, last message, external data, contract size, etc.

Each individual message type generates its own queue. Data comes inbound from polling and streaming, gets deduplicated using a Redis list, and then gets pushed into a second Redis list that behaves like a queue, to be consumed by the output service.

None of the integrations have any state data in their memory, they rely on reading Redis keys to figure out what instruments they are in charge of, what endpoints they need to poll, etc. To chain requests, we parse the first request data that we need and attach it to the metadata of the next request.

‍

Disclaimer: Please note that the content of this blog post was created prior to our company's rebranding from CryptoCompare to CCData.