Please note that this post, first published over a year ago, may now be out of date.
Today in 2012, AWS announced DynamoDB.
DynamoDB was one of AWS’ first steps into cloud components with more than a basic behaviour. Before then the split was, I’d argue, polarised either for services so simple their abbreviations began with “S”: SQS, SNS, S3, and so on — or, on the other hand, the Turing completeness and complexity of things like EC2 compute.
DynamoDB marked out a new direction into managed services in the cloud, soon becoming AWS’ fastest-growing service. Although it’s part of AWS, you could imagine the same tech stack working out as a standalone, back-end SaaS offering.
In the rest of this article I’m going to run through how DynamoDB got to be the service it is today.
A NoSQL database for utility cloud
DynamoDB, to me, is less of a database and more of a key-value store with extras. That’s not to diminish it. You get durability promises that popular rivals Redis and memcached would struggle to offer. You can query documents by their key, but you can also get secondary indexing without having to build that yourself.
DynamoDB was far from the first non-SQL database service; it wasn’t even the first NoSQL-type database in the cloud. Even so, AWS’ backing has made it widely known and used. The existing RDS offering was already a really useful way to get some undifferentiated heavy lifting to happen. If you think about it though, a DBMS is usually Turing-complete like compute. You can run SQL or something like it; you might have triggers and stored procedures. Remember too that if you break that, it’s your responsibility.
Here’s a weird detail though. More and more databases with the properties of NoSQL offerings let you talk SQL, or something like it. Couchbase uses N1QL; RavenDB has RQL; MongoDB has a JDBC SQL driver in beta. And for DynamoDB, you get PartiQL. Of course, if you want a database that supports actual SQL syntax, there are plenty in the market.
Performance
There’s not much to say about performance – and that’s largely a good thing.
For a whole bunch of workloads, using DynamoDB means not really having to think about latency - it’s good enough and someone else takes care of it. There’s even autoscaling so not only do you not need to commit to infrastructure, you may well not need to make a capacity plan. I’m not saying you never need to think about the details, but again there’ll be lots of workloads or microservices where actually doing the numbers can get parked until well after a minimum viable service is running with real load.
Reliability
With DynamoDB, Amazon Web Services made a big deal about using solid-state storage for the service, at a time where traditional hard disks based on magnetic platters were not just the norm, they were the more reliable option. Early SSDs had a tendency to fail in odd ways, sometimes ways that storage controllers weren’t expecting. I remember having to deal with the risk that not only did one of your storage devices run out of healthy blank storage blocks, but so did the other devices you bought in the same batch.
From day 1, DynamoDB was designed with an architecture for tolerating failure. The obvious comparison is with Amazon RDS, which added cross-zone replication for fault tolerance several years after its initial launch.
How does DynamoDB handle zone failure? As far as I’m concerned, the answer is “really well”. There might be a video somewhere about how it all works, and there’s some hints in the design that it might involve a lot of Java code, but the great thing is: you don’t have to care. It’s a managed service and you can confidently turn your Somebody Else’s Problem field up quite high on the resilience side.
Perhaps precisely because it’s so managed, backups weren’t part of the service at launch and they actually arrived quite late. If you wanted to protect yourself against … yourself, then for a while you needed to roll your own thing.
Conveniently, you actually can back up a DynamoDB table now and restore it if you need to. It took until 2017 for AWS to add backup and restore, which tells me two things. First, that people were getting on either with their own bolted-on mechanisms to back up tables, or were happy to leave their data in the hands of their cloud provider. The second thing I see from that is that there clearly was demand for a managed ability to restore tables to a point in time, and we saw this also in our own consulting work.
After an entire earlier incarnation (Global Tables v1), DynamoDB also has a nice story for setting up cross-region replication. An example will be the easiest way to show this.
Here’s a single-region DynamoDB table, in Terraform:
resource "aws_dynamodb_table" "blog_article" {
name = "demo"
hash_key = "Example"
attribute {
type = "S"
name = "Example"
}
}
and here’s how it looks when you add replication:
resource "aws_dynamodb_table" "blog_article" {
name = "demo"
hash_key = "Example"
attribute {
type = "S"
name = "Example"
}
# in real code, you'd use a "dynamic" block
replica {
region_name = "eu-west-2"
}
}
It took a while to get here but you can now expand an existing table to be multi-region. It’s managed for you so you can fire up the workload in that other region if you ever need to, or you can run with multiple active regions. As with a lot of NoSQL-type services, the system trades away something (durable, atomic writes) for availability. If two clients in different regions write to DynamoDB at the same time you might lose an update, even if you use transactions. Even with some caveats, I think it’s impressive how little effort it takes to get a resilient, global persistence layer for your workload.
Security
A detail that made DynamoDB a real break from other database services was its integration with AWS API mechanisms. The important bit was that you access DynamoDB as an AWS principal, and you need permissions granted (via IAM) to make that possible.
That design choice has allowed a bunch of extra features to arrive later. For example, you can treat DynamoDB as a microservice and even expose its API to end users - matching each user’s access entitlement to their identity, and all managed by your cloud provider. Back in 2012 you had to write your own security token vending machine to make that work; in 2022, there’s a whole ecosystem of options that mean you don’t have to. You can use your app’s login system and connect that to IAM with OIDC, or SAML, all without running any extra servers.
Across the same period, plenty of broadly rival technologies - including Azure Cosmos DB, Elasticsearch, and MongoDB - have had their share of security slips. DynamoDB isn’t the only service to have done pretty well, but it’s still a nice track record to have.
DynamoDB’s design doesn’t have the same traps for the unwary that you can have with, say, S3 (you might well have seen the too-frequent stories of firms that have stored data into Amazon S3 with the wrong access controls, and then learned about this the hard way).
Cost
DynamoDB’s prices haven’t gone up over that decade. That’s the good news. There have even been price cuts. You have to design your application and a data with a view to managing costs. That’s the bad news.
If there’s one detail people considering DynamoDB need to know, it’s that a poor choice of partition key can mean you end up paying for capacity you don’t (and can’t) use. In a conventional SQL RDBMS, a full table scan is a poor design choice and maybe a hint to get someone to look at your indexes. In DynamoDB, you pay for that design in your actual AWS bill.
Used right, DynamoDB is a cost-effective way to persist data for your workload. My favourite example is using DynamoDB to lock the state storage for Terraform - it’s probably counted as free tier use, but if not then it’ll be a rounding error on your bill. In 2012, this kind of convenience and negligible cost at idle was fairly new to the market, and had a big impact on cloud solution architecture.
Development and operations
DynamoDB is serverless, and has been since before “serverless” was part of common IT jargon.
The operations part is there, but it’s a small detail. You might need a runbook for restores (and, if you do, of course I recommend that you test that out). Maybe you have to take capacity into account and have some monitoring to assure you that there’s room for more. You’ll also want access controls that work, though we don’t think those are difficult.
That leaves coding, which stays as a thing you have to think about and actually do.
I’ll admit, sometimes it is a bit of a pain having a cloud-only service. You can develop using a local container version that does a good job of providing the DynamoDB API, that’s certainly useful to have, but you’re going to want to try changes out in the AWS cloud as well before you unleash them to all your end users. You definitely didn’t have that local development option on day 1 so chalk another item up to progress.
That decade since launch has seen AWS add event streams that you can subscribe to. If you were missing the idea of stored procedures when DynamoDB first arrived, maybe you were one of the people who really welcomed the ability to react to writes asynchronously. It’s not something people often use. It is reassuring to have the option there if you come to need it.
A NoSQL database for today
If you’re doing development using a remote, cloud-hosted database option, I think it’s straightforward to set up DynamoDB using tools like Terraform or CloudFormation. We get lots of questions from customers about the AWS cloud, and we hardly ever get asked for advice on using DynamoDB. I think that’s a real strength, to be honest: the service does one thing, it does it well, and it’s been doing so for a decade.
I haven’t talked much about the internals. If you want to know how it started out, you can read Dynamo: Amazon’s Highly Available Key-value Store. What I really like, as a cloud consultant, is that you really don’t need to.
Happy birthday, DynamoDB! 🎂
Keeping on top of all the latest features can feel like an impossible task. Is practical infrastructure-modernisation an area you are interested in hearing more about? Book a free chat with us to discuss this further.
This blog is written exclusively by The Scale Factory team. We do not accept external contributions.