Please note that this post, first published over a year ago, may now be out of date.
You may already know that people can be divided into two subcategories - people who back up their files and people who will do that in the future. But, as it is usually the case in real life, there is more than meets the eye. There are actually more labels to make this simple binary classification task more difficult: one can do backup locally or in the cloud.
OK, so one might ask if you really need the cloud here? Let’s imagine that right now you are backing up onto a little USB stick: it’s convenient and portable. In fact, a USB stick is so easy to carry that you could pop it in your pocket - and maybe lose it. Actually it’s a similar story if you’re backing up to a portable hard drive, or even to a PC in the office. Any of those can get stolen or broken. And now let’s hope that the USB drive contains only your personal data, not corporate records. The latter would mean that you may have lost sensitive personal data, commercially confidential information and trade secrets. And that means you’re risking a substantial fine. So let me ask a question: why not make this responsibility somebody else’s problem?
Five other reasons to migrate data
The cloud gives you that Somebody Else’s Problem field. Managed backups mean you don’t need to take care of storing them. But backing up your data or security is not the only reason you may want to look into that possibility.
1. Opportunity cost
Maintaining and managing your own infrastructure is a significant cost. That requires a specialised team of administrators, as well as security specialists. Otherwise there is a risk that your developers will be busy figuring out why a recent system update caused an unexpected application behaviour instead of writing their code. You may also encounter a situation when your OS or a dependency is no longer supported, or a legacy system doesn’t integrate easily with newer components, which is especially important in the sector of SaaS businesses.
2. Archival
Nobody wants to store and maintain old clutter, but sometimes simply getting rid of it is not an option; for example: you’re a regulated entity and you need to hold a record for compliance. Does that mean you need to manage and pay for underlying resources in the same way you do when frequently-accessed data is involved? Cloud providers usually offer you a solution to that problem, offering a range of options for infrequently accessed data (e.g. Amazon S3 Standard-Infrequent Access, Amazon S3 Glacier). Storing comes with a minimal cost, whereas there is a trade-off in cost and time of retrieval.
3. Lifecycle management
You can also rely on the cloud to advise how to categorise the data by analysing frequency of access to different parts of it and then deciding which tier to choose for a specific use case. Alternatively, Amazon S3 Intelligent-Tiering may decide for you. What is more, with Amazon S3 Lifecycle you can also define rules, which data should be moved to a different tier or archived, when, or under which conditions. You don’t need your colleagues to do that manually or run scheduled jobs anymore.
4. Additional features
As with cloud you do not need a team of specialists in system administration and security to implement non-standard features, you can simply choose them as a dish from the menu. Enhanced security with encryption? With Amazon S3 you can have encryption by default, or you can use KMS backed by hardware security modules. If you need it, you’ve got client-side encryption available too. Or maybe there is a need for protection measures from deleting the data (for example, regulatory requirements)? Not a problem anymore with MFA deletion protection or Amazon S3 Versioning enabled. Do you need audit logging and customising permissions? Cloud makes it a lot simpler with Amazon CloudWatch, Amazon CloudTrail and AWS IAM.
5. Easier access
Cloud allows your team to share resources in an easy way and access files from multiple computers reducing latency at the same time (for example, EFS). You no longer need to wait forever for your IT administration team to provide newly onboarded person with access to the local network storage (which, by the way, may take forever to connect to, and when uploading bigger files you never know how it will play out) or be hesitant about overwriting somebody else’s work there.
Facilitate your migration with AWS products
There are a variety of AWS products for your disposal to help you with data migration. This section is meant to help you understand which one and when to choose. First of all, let’s begin with the aim you have in the migration process.
1. Migrating databases
Migrating a database can be a complex task, especially when you’re running a SaaS business and your customers expect minimum service downtime. You have three main options here: to manually dump and restore the database (which - spoiler alert - will include some downtime and probably unhappy customers if it’s down for longer than you hoped), to use a managed service or to build a solution on your own (which costs time and money). Taking care of the process on your own means a need to deal with potential schema conversion and data validation afterwards. On the other hand, you can use a managed tool with these features in-built, just like AWS Database Migration Service (AWS DMS). This is an AWS managed service which allows you to specify source (can be either on-premise or in the cloud) and destination databases, instances used for data replication, as well as configure replication tasks. Using a proxy replication instance will take the load off your original database, which is a certainly good thing. Furthermore, you can run a pre-migration assessment to find out potential issues, change the schema using AWS Schema Conversion Tool and validate the data after they will be migrated. Also, for less technical people - you can perform the whole process via AWS console, minimising usage of terminal. Another advantage of AWS DMS over a manual approach is the possibility of establishing continuous replication: CDC (change data capture), as sometimes your migration doesn’t have to be a one-off task.
2. Migrating unstructured data
In case you have other types of data, your first choice will probably be transferring it over the internet to Amazon S3. But what if data is too big, internet connection too slow, or the use case is not standard? It is probably advisable to take a look at AWS DataSync, Storage Gateway and AWS Snow Family products.
- AWS DataSync can help with moving large amounts of data via network connection (it doesn’t really matter, whether from on-premises, other cloud or AWS itself), without storing it directly to any proxy hardware. All you need to do is install a DataSync agent (unless you migrate from AWS) and define tasks - this product will optimise data upload for you (e.g. processing small files in batch). You can choose S3, Glacier, EFS or Amazon FSx as a destination and your task can be scheduled regularly.
- AWS Storage Gateway serves as a bridge between on-premises infrastructure and AWS S3 (no other destinations allowed). You may want to use it if you need disaster recovery, caching feature or backup and restore feature. What is worth underlining, as it is a bridge, it does not migrate your data fully. Depending on the need, you have File, Volume, Tape and FSx File Gateway at your disposal.
- The AWS Snow Family gives you a choice of portable physical devices on-demand, meant to collect data off-line (or process them remotely - “at the edge”). While it might feel a bit like an old-school solution, it’s there to help manage the situation when your data is too big to migrate over network connection. You can order a device from AWS (depending on your needs and size of data to be transferred, it can be Snowcone, Snowball Edge or Snowmobile), migrate your data there and ship it back to AWS. AWS will take care of uploading it to S3 for you. The rule of thumb can be: if it takes more than a week to transfer, use Snow Family. For example, migrating 1 PB of data using all the capacity of a 100 Mbps connection will take around 3 years - or longer if you have to share that link.
3. Migrating servers
When you want to re-host your infrastructure regarding the “lift and shift” approach (you can read more on the strategies of cloud migration here) without cloud optimizations - just as it is, you may want to migrate your servers using AWS VM Import/Export or AWS Server Migration Service. The latter is a fully managed service converting your physical (or other cloud provider based) machines to run natively on AWS with a minimal business disruption and downtime. This service will continuously replicate any application or database on your primary machine with use of AWS Replication Agent.
Tip: if you are not happy with your existing internet connection, but don’t want to use AWS Snow devices, you might be interested in AWS Direct Connect: dedicate private connection from your on-premises to AWS VPC. Nevertheless, establishing such a connection may be a costly and time consuming process. It is worth considering when you need to tick a compliance box or just require privacy and a stable connection, plus you will use your connection afterwards (i.e. your migration is not a one-off task).
It is worth noting that after moving to the cloud you will still need cloud engineers in your team. However, instead of focusing on maintaining and supporting the infrastructure (or figuring out why something stopped working), or working on the security, most of your team can focus on your comparative advantage and devote more time to what is actually needed to contribute to the core business.
Our team knows how to manage the risks around moving data to the cloud. We also know how risky it can be if you don’t have a cloud copy of your critical data. Book a free chat to find out how we can help.
This blog is written exclusively by The Scale Factory team. We do not accept external contributions.