The promise of cloud-native is frequent, worthwhile improvements without undue effort. Even though public cloud services aren’t quite at the point of being interchangeable utilities, it’s already easy to take for granted what they do offer.
This is important. “Civilization advances by extending the number of important operations which we can perform without thinking about them” (Alfred North Whitehead, back in 1911) and the field of IT is full of good examples.
I’m going to take a look at one niche of IT: keeping secrets.
You’ve got options. The easy option is the no-op choice: store plain text. In the cloud, there’s often a layer of encryption beneath the resources you’re using, but you shouldn’t really rely on that. Essentially, if it looks like plain text to the application, it also looks like plain text to any attacker. Only pick this choice for public information, or for low sensitivity information. For example, it’s rare for a business to enforce encryption for source code.
Confidentiality in the cloud cuts two ways. On the one hand, operating in the cloud means you can rely on internet-hosted managed services, rather than building your own thing. On the other hand, you’re relying on internet-hosted managed services, and you need to spend time understanding exactly what it is you’re depending on. Textbooks about cryptography often split into symmetric and asymmetric choices at this point. Actually that’s probably not the important difference for your organisation. Instead, categorise the secrets into two kinds: your secrets, and your users’ secrets.
Cloud-native architectures let you shift most, or even all of the business logic to happen client side, on your customer’s computer - often, in their browser. Why not encryption too? If the confidential data belong to your client, they’ll need to have access; you might not.
TLS, the technology that puts the S into HTTPS, used to be a tricky proposition; now, it’s largely not. Managed services let you hand off the genuinely fiddly details of creating, protecting and deploying the private keys and certificates that servers need to talk TLS.
The simple option is to deploy a managed load balancer and set it up to serve HTTPS. For SaaS, you might deploy a listener for each tenant, a single load balancer, or something in between.
Depending on your exact architecture the story might end there. Behind that load balancer, you’d have some additional networking that happens within a defined scope. Any clear-text transmission is limited to a tenant-specific network within your chosen cloud provider. If that “clear-text” part sounds worrying, then the big providers like AWS have options to avoid it.
Content distribution networks, the good ones at least, let you use TLS at the edge, without you needing to upload a private key to the CDN. So, use TLS there too. Overall this is a huge benefit for small and large organisations alike. If you’re small, manual processes for renewing certificates can be cumbersome or get overlooked. Large organisations avoid the compliance overhead of implementing split knowledge for private keys, or of getting sign off from another unit with an expiry deadline looming.
If you’re reading this and you’re thinking to yourself that your workload needs true end-to-end encryption, the cloud can still help. On AWS, there’s an integration between EC2 and AWS Certificate Manager that lets your workload talk TLS, without having any access to the private key that keeps connections confidential.
CI / CD
The category of “your secrets” also includes deploy credentials for your app:
SSH private keys for fetching code to Git, maybe the PGP private key you use for
signing a release.
Going cloud native means automating tests, builds and deploys, and so either the deploy tool has a key for code access, or the source code control system has a secret that lets it deploy into the cloud.
Whichever way you picked, protecting these properly is essential. If you run deploys with superuser or “root“ credentials for your provider then you probably don’t need to encrypt them. That’s because instead you really, really need to change how it’s set up, so you’re not using those superuser credentials in automation. For good measure, also disable or rotate the credentials once you’ve switched over.
Instead of superuser access, deploy with a least-privilege principal, and then make sure that those credentials are well protected.
Other API keys count as well, especially if they have an impact on the user experience. Good practice here is for servers and similar components to load these in from the environment. The overall principle is to make it somebody else’s problem. Take care though, because “API keys” take multiple forms. The easiest to work with are scope- and time- limited secrets. You need to protect them whilst your code is running, but you don’t even need to worry about cleaning up because the token expires anyway.
The more risky keys to protect are like the superuser deploy credentials from earlier. It might be tempting to lock these away in a managed service like Azure Key Vault, GCP Secret Manager or the charmingly named AWS Systems Manager Parameter Store. The thing is, it doesn’t really matter how well protected the service is, or what certifications it has, if the next step is for an application to fetch that secret and use it incautiously. Leaks happen.
Another kind of internal-use secret is for shared resources that should be encrypted at rest (probably all of them, then). It could be a virtual machine’s storage, a specific swap volume for a server, or a managed service like Amazon SNS.
With rare exceptions, these keys can have the same lifecycle as the resource. SNS, for example, lets you set up a KMS key and use that to protect the topic. These keys cost $1 a month so, unless you’re creating many of the same thing, you can take the easy option and create a new key along with each resource.
Often, your customers’ secrets are the ones that matter.
If you’re delivering SaaS in the cloud, all of your customers expect you to take care of security and privacy. In some sectors your customers also expect you to be able to prove that you have.
Privacy legislation like GDPR and CCPA tells providers who control information processing to make sure that people’s personal details stay private. From a design perspective, it makes little difference whether the user experience is a website, a desktop PC application, mobile, or an API. The information security goal stays the same: protect the stuff that matters.
Two key aspects make these secrets different. First, you’re protecting more than a system or an API - you’re directly looking after something the customer wants kept confidential. Second, you have to make the data accessible so you can achieve some business outcome.
In the cloud, and especially using managed services in the cloud, protecting secrets comes down to three things: encryption, system partitioning, and access controls. Whatever technical measures you implement, access rights are your primary means to keep secrets secret. Good cloud designs take physical access off the table: it should genuinely be hard to get anywhere near the infrastructure, even for your provider’s own staff. If you do compliance paperwork, that’s a whole lot of “not applicable” you get to tick.
The focus on access controls means that’s where you should expect to spend most of the effort. If you’re encrypting data in Amazon S3, but a slip means that anyone can read the object, that’s a problem. If you’re using a managed encryption service like AWS KMS or Azure Key Vault, but the right to decrypt isn’t locked down enough, that’s a problem too.
Reads and writes
Often, you want to control access to read secret information, separately from access that allows writing it. The simplest example is a mailbox service: only the recipient should be able to read messages, but more people - maybe the whole world - can be allowed to submit them. Another common story in the cloud is a deployment tool. When you create a managed RDS database in the AWS cloud, you need to choose an initial superuser password. Does that sound like a risk? It should.
A common pattern here is to wait until the new resource is deployed, then change from that initial password to something different, and write that into a store. From this point onwards the deployment tooling doesn’t need to know the password, and in fact you can separate out the change-password-after-deploy into its own component such as a Lambda function.
When you have resources tied directly to a tenant, such as giving each enterprise customer their own cluster, you can add extra controls if you need them. That change-password-after-deploy Lambda can become a resource you manage per customer (via code, of course). Partitioning, such as network access controls, lets you restrict who and what can decrypt important secrets.
You’ll need to decide the risks for each path (read and write) for each case and make sure you’re managing the ones that matter.
If you can, avoid dealing directly with encryption - use a managed service such as AWS Secrets Manager and it’s taken care of for you. That service also lets you define a resource policy on the secret itself, making it easy to separate out grants for sensitive operations. To me, AWS resource policies are one of the platform’s hidden strengths: the syntax is the same as IAM (the access control component) but the implementation is linked to the resource, not the principal requesting access.
Controls at the resource level mean you can lock a component down so that a privileged user can’t grant themselves more access to bypass the restrictions. You can lock yourself out doing this, though there are usually guards to make sure you really mean it. I miss those resource controls in Kubernetes, where there are many mechanisms available to read back Secret objects, and too few mechanisms to make that difficult.
Keys to the kingdom
What about a customer’s API keys? Maybe your business is also offering a cloud service with an API integration available. How do your tenants and their end users authenticate to that API, and how do you protect those secrets?
If your app has its own concept of a superuser or account owner, those API keys should be a priority to protect. As mentioned earlier, if you can avoid handling these directly, do that. A simple option is to treat keys like passwords and only store a salted hash. You can also rely on a third-party, managed service to issue JSON Web Tokens; you can then implement your own verifier code or also rely on the same third party. The trade-off there is resilience against outage versus simplicity of implementation.
To limit the impact of a leak, try to avoid having API keys that are either long lived or that provide full access. You can mitigate against the two things together by requiring that privileged API keys expire earlier.
Some platforms provide built-in, “inherent” credentials. In Kubernetes, every
namespace comes with a
ServiceAccount object that represents the namespace as an
authentication principal. In AWS, compute resources can be linked to an IAM role
that the service can assume, passing a short-lived session token to the workload.
Whenever you can, use these automatically provided credentials, so you don’t need to manage your own. You can even configure integration between platforms; for example, exchanging a GCP Service Account token for an AWS STS token, so that code on GCP can invoke AWS APIs as a role in the target. You can achieve this with no cost at idle, and without storing any long-lived secret.
Next month I’ll follow this up with details on technology you can use—focusing on the AWS cloud, because that’s our specialism at The Scale Factory—and outline how those choices can help reassure the customers of your SaaS business that you’re taking security seriously.
This blog is written exclusively by The Scale Factory team. We do not accept external contributions.