When I was in school, I learned how to do something called binomial expansion. A certain type of equation can be expanded by following a set of steps. When it came time for a test, I was able to identify an equation that could be expanded and could dutifully apply the algorithm myself. But then there was a hard question. There was an equation which I thought I could expand, but the question wasn’t explicitly asking for that. I went through the motions and expanded the equation anyway. The next day I got my results back and I had failed the question. I had learnt by rote the process but had failed to understand why the process is needed.
In the tech industry, we can be guilty of the same crime sometimes. Certain practices get ingrained into the profession to the point where we forget exactly why we did it in the first place. Unlike a maths exam, however, there is often no clear correct answer, and you aren’t going to get a pass/fail from your teacher the next day. Instead, “best practices” can be implemented up front with no clear understanding of the potential problems they there to solve.
Infrastructure as Code (IaC) is a practice I really feel should be implemented everywhere it is relevant. It is good common practice, the popularity of which continues to grow. More and more it seems to be the default approach to provisioning cloud infrastructure. So here I am not going to detail how to implement infrastructure as code, nor am I going to tell you why you should implement it, instead I am going to describe how infrastructure as code should feel for those who already have it and hopefully provide a path back to Nirvana for anybody who isn’t realising the benefits it can bring.
Infrastructure as code should feel safe
First and foremost, it should always feel safe to modify your infrastructure when using infrastructure as code. Modifying the file which defines your core database infrastructure shouldn’t give you a feeling of existential dread. If you do feel this way, consider the following:
Do you have a development environment?
You may be worried about breaking production infrastructure, if this is the case then you need a development environment to test out changes. Luckily infrastructure as code makes it much easier to replicate your infrastructure for multiple environments.
Is your development environment really a development environment?
You may be worried about breaking the development environment; maybe you’ll block ongoing testing or application developers coding features. If this is true, then you don’t have a development environment: you are maintaining someone else’s. See if it is feasible to create another environment for the development of infrastructure that you can break with impunity.
Do you know what the change is going to do?
The code shouldn’t feel like a black box, and changes should be reviewable against the real state. Terraform’s
plan and CloudFormation’s change sets features allow engineers to review changes against the running infrastructure and see what actions need to be taken. Learning how to read these plans is a powerful tool. Engineers should also be able to run these kinds of plans on all relevant environments (even if they can’t act on them) to ensure the change is valid all the way out to production. Bonus points for automatically creating plans on pull request.
Infrastructure as code should feel stable
Your IaC codebase should have low churn. When changes are made, they should usually be additive. Infrastructure should be the rock foundations for your application and not a shifting sandpit. If this doesn’t ring true, then you might want to consider:
Is your infrastructure tightly coupled with your application?
If you find yourself updating your infrastructure code alongside your application code frequently, this may imply there is too much coupling between the two. This can often happen when using “code as infrastructure” frameworks such as serverless or AWS SAM, in these cases you should think of the framework code as closer to your application than it is to your base infrastructure and organize it as such.
Is your infrastructure as code the only game in town?
Churn can happen if changes are being made reactively to keep code up to date with the state of your resources. If manual changes are being retroactively applied to your IaC then that process should be inverted, otherwise everything will just be getting done twice. It may also be the case that changes are being made to keep up with an automated system, in this case it can be useful to ignore such changes if your tooling permits it.
Infrastructure as code should feel understandable
Nobody should be backing away from making a change to your infrastructure because they can’t reason about the code. If you can’t understand the code then you can’t understand the infrastructure, and if you don’t understand the infrastructure then you can’t fix it when it breaks. Why might this be?
Is your logic too complex?
There shouldn’t be long-winded expressions defining whether a resource is deployed or not. Even with the limited expressiveness of Terraform’s HCL or CloudFormation, you may find yourself in a situation where complex but impenetrable logic is used to calculate values. This can be even worse with non-declarative tools like CDK or Pulumi where the halting problem becomes an issue. Try and remove such constructs whenever possible, or at least heavily document them.
Are there too many magic abstractions?
Features like Terraform’s modules allow defining smaller, repeatable units. While great for keeping things DRY, they can also hide away a lot of complexity. Moreover, one may feel enticed to import a community module from the internet without fully understanding what it does under the hood. Sometimes the fix here can be as simple as thinking of a great name for your module, but don’t be scared to breakup modules into smaller chunks if that makes the architecture clearer.
Infrastructure as code should feel fast
It should never feel faster to change something manually if it is already defined in infrastructure as code. Some food for thought:
Is your unit of deployment too big?
Most infrastructure as code tools will actively query your running resources to calculate changes and detect drift. The larger the project, the longer this process takes. You may even get rate limited by your cloud provider. Splitting your projects up into smaller deployable chunks can reduce the time your tooling takes to refresh its state.
Is there enough automation?
The beauty of infrastructure as code is its ability to be written once and deployed many times. However, if that deployment process is manual (albeit with IaC tooling) then small changes to infrastructure may require the same amount of effort as updating resources directly. Just like an application can benefit from CI/CD, so can your IaC.
Infrastructure as Code Should Feel Better
Finally, if none of the above points apply to you or your infrastructure, yet you still feel your IaC is letting you down then it may be time for some serious reflection. Assess the tooling you are currently using and if it is fit for your purposes. Evaluate whether problems might be caused by organisational structure or permissions issues. Ultimately, infrastructure as code should feel like it’s helping more than it is hindering.
This blog is written exclusively by The Scale Factory team. We do not accept external contributions.