Hidden architectures come from the misrepresented idea that serverless computation does not require anyone to manage or build an architecture. People often think that architecture means one thing, which is having a network architecture, or having servers and how many servers you have in a particular availability zone in AWS. The expectation that adopting serverless computing frees one from thinking about the architecture can be dangerous and cause issues in the long run.
Service Uptime: A Question of Reliability
Service outages amongst third parties and cloud providers are not unheard of in the modern era. For example, when Docker hub had partial service disruption issues in August 2019, people could suddenly no longer push images up to or pull images down from their container registries. In a scenario where you have a CI/CD process that depends on creating a Docker image and pushing it to the hub from where your applications pull images, this kind of disruption can affect the operation and stability of your deployment.
Similar instances to this have been seen with GitHub, GitLab and other third party services as well, despite their reputation for great service and reliable uptime. It is simply impossible to have perfect uptime. Similar considerations have to be made for these services as for the applications and services accessed through AWS. If you were to look at Service Level Agreements (SLAs) for the various products that AWS offers, you’d find that they offer great uptime, but it is not 100%. Even SLAs that promise 100% uptime can never really live up to the promise due to the inevitability of failure.
Planning for Fault-tolerance
Cognizant of these limitations, AWS continually pushes its customers to design with failure in mind. A network architecture built on servers that sit across multiple availability zones is one good step towards preparing for failure. Another thing to look at can be something like Docker registries. For instance, if you are using Amazon ECR container registries, thinking about the possibilities of the registries not working or going down in some particular zone can help you prepare by creating a backup process for storing a duplicate version in a different availability zone that can be accessed to counter such disruptions. Even when hosting code in GitHub, it is advisable to duplicate access, perhaps with something like GitHub enterprise which allows you to store redundant repositories on-prem. An alternative would be to clone your GitHub servers and use CI/CD pipelines for backups and restoration.
Planning for failure can be seen in the same light as Game Days or periods where robust disaster recovery testing would be performed. The specifics of the fallback strategy are not as important as having a fallback plan itself and testing it to ensure that it works when such a situation arises in the future. It is commonly seen that when working with alternate repository sources, time and time again, engineers half implement fallback strategies and they don’t test them. Only when things are down do you see the nuances that were not accounted for. Hence, testing is a consideration of operational maturity. On day-one of an endeavour, a company’s main goal is to get something out the door. As the endeavour matures however, accounting for inevitable failures and disruptive scenarios becomes increasingly important.
It eventually boils down to deciding what an acceptable risk is with such endeavours and being aware that things will fail regardless of how durable they appear. Hope is not a strategy and if things fail, the main question is do you have anything in place that can help you recover swiftly and with as little damage as possible.
Are you curious about how serverless technologies can help you modernize your application deployments? Check out our application modernization service. If you need help on your serverless development journey, get in touch with our team.