Genius Yield Stake Pools: Infrastructure Best Practices
Since the start of our ISPO on December 15, 2021, we have significantly improved the reliability, security and resilience of our stake pool infrastructure. This article discusses our current deployment architecture and the actions we have taken to incorporate engineering best practices.
- Block producer: node that validates transactions and mints blocks
- Relay: node that propagates blocks to the block producer and the rest of the network
- Cardano Node: core software used to run the block producers and relays
- GCP: Google Cloud Platform; the cloud provider we are using to host our pools
- VM: virtual machine; an instance in GCP dedicated to a relay or block producer
- Container: unit of software that packages code and dependencies into a lightweight environment
How are the pools deployed?
Our block producers and relays are running on e2-standard-4 VMs in GCP, which contain 4 CPUs and 16 GBs of RAM. The VMs are running Ubuntu 20.04 as the host operating system and running the Cardano Node in a Docker Container.
Figure 1: A single instance of a Genius Yield stake pool.
Security best practices
Firewall rules ensure that only specified ports are open to specified parties. In Genius Yield’s deployment, the relay can access the block producer via a single port and the block producer can only access the relay. The public network cannot reach the block producer directly, but rather via the relay, which is open to the public network.
The Cardano Nodes on our instances are containerized with Docker. These containers provide process isolation, ease of deployment and stronger security. As ISPO demand grows, deploying a new Cardano Node is fast and easy: provision a VM, build the Docker image, specify configurations, and start the container.
Transaction signing in an air-gapped environment
All of Genius Yield’s stake pool transactions are signed within a dedicated, air-gapped (completely offline) computer. The signing keys that are used for these transactions are never exposed online, despite our strict firewall boundaries on GCP.
Figure 2: Interconnectivity of our stake pools
How have we improved since deployment?
At Genius Yield, we always strive to maintain high quality systems that are secure, reliable, and performant. Since the start of our ISPO, we have implemented a number of improvements to enhance our stake pool infrastructure.
Safety & security
We have made it impossible to overwrite or corrupt any pool-related data in our cold environment by making them immutable. Furthermore, we backup this data into password-encrypted physical drives and make periodic backups of the entire cold environment disk.
For our block producers, we have created a weekly disk snapshot schedule with a 15-day retention. In case of a node failure we can recover the state of our block producers as far as two weeks in the past, without having to undergo the process of regenerating keys and certificates in the cold environment.
Finally, all engineers that need access to our production systems must be connected to a dedicated VPN. This significantly tightens our security and prevents people outside of the organization from accessing our cloud environment.
Performance & reliability
To decrease the likelihood of missing slot leader checks, we have doubled the CPUs of both our block producers and relays. We have also updated our relay topologies to connect to at least 15 actively running relays in the network. To further improve relay connectivity, we have started using the Topology Updater community tool to ensure that our relays are “kept alive” in the network.
Finally, we now take a more concerted approach to upgrading and maintaining our nodes by using the CNCLI tool. This enables us to see when our block producers will be elected as slot leaders and plan maintenance accordingly.
At Genius Yield, we prioritize the deployment of first-class software infrastructure to ensure that the various components of our platform operate securely and efficiently. We will continue to incorporate industry best practices and strive for greatness and quality across all our systems.