Scaling and operationalising AI/ML

Mike Mead
30 November, 2023

Please note that this post, first published over a year ago, may now be out of date.

Artificial Intelligence (AI) and Machine Learning (ML) have moved well beyond being buzzwords for the tech elite. Investors are showing heightened interest in AI/ML plans and companies are using AI and ML to gain a competitive edge. However, despite the excitement, this is a field of IT that remains relatively immature from an engineering perspective. Many companies are still not sure how to approach the implementation challenges.

In this blog post, I’ll explore how businesses can scale and operationalise their AI and ML usage effectively.

An engineer, at a desk, working on computer code — Photo by *This is Engineering*

Assess your needs and potential

Your AI/ML journey usually started with a use case. Underlying all of this is a story you tell so that AI / ML aren’t simply a slogan but really are part of delivering end user value.

Good models need good quality data. There’s the famous saying: “garbage in, garbage out”. All that power of AI is useless when you don’t have an extensive or high quality dataset to train your model on. Your use case might mean training your own, or it could be fine with something generic. You’ve assessed where your competitive advantage would be and whether a custom model is part of that.

There are some traps you might encounter in your commercial context. Overestimating the benefits that come from investment in AI engineers and infrastructure is a risk all SaaS businesses want to avoid. It’s great if you realise your overestimation in time; it’s also bad if you’re landing sale after sale - but every time it’s to a client business that doesn’t really need your service. No matter how good the engineering is, what you offer has to add value and be seen to, otherwise your churn rates are at risk. Essentially, there is no point of implementing AI just for the sake of being able to use the buzzword - and that can go for your customers too.

To assess that you have a valid aim in mind, define a clear AI and ML roadmap that outlines goals, explains the value delivered, and has enough detail to inform technology selection. But don’t stop there. The roadmap needs to go past the point things go into production, far enough ahead that you’re looking at how to deliver the same thing, better, and more effectively.

Product / market fit

Many SaaS businesses are prioritising investing in AI to get ahead of the competition. You might think it’s obvious but the next trap I’ll call out is when you develop a complex solution that is not addressing customers’ interests effectively. There’s no one best way to avoid this trap, but a good tactic that rarely isn’t worth doing is to identity that competition and assess what they’re already offering. You may uncover some interesting market gaps. If it’s right for your context, work with an AI/ML expert, or consider an acquisition, to fast track your products’ AI capabilities and offerings.

You also need to review your existing product roadmap with your customers’ pain points in mind to see if an AI-driven solution can improve the customer experience. Make sure to adopt a customer-centric approach so that you focus on the customers’ needs rather than on the technologies. To help with this, you’d pick a “fail fast” philosophy and build a small proof of concept.

The bottom line is to ask yourself if integrating AI and ML in your services is going to make them better without costly complexity. If it doesn’t, don’t worry about day 2 operations; fix that first.

Engineering efficiency

You got this far. You’ve built, or you’re planning, a machine learning / AI element for your services. Whether your competitive advantage comes from better models, lower latency, or from anywhere else, you want to keep the lead you have. And you probably want your costs to grow more slowly than your customer count.

For this, I’ve got three points to make:

1. Automate the AI

Absolutely consider using automation, including ML technologies, to reduce the amount of manual engineering effort in activities such as feature labelling, model feedback, or hyperparameter tuning. You’re probably doing this already.

2. Deliver sooner

Speed up software development. AI-powered code generation tools can help developers generate code snippets, templates, and boilerplate more quickly and accurately. You need to watch out for automation adding code errors no human would make, and you should make sure using ML doesn’t put your intellectual property at risk.

This is a very rapid evolving space and we see developer tools like GitHub Copilot and Amazon CodeWhisperer getting a lot of traction among developers. However, bear in mind that they are a tool and not the panacea to your engineering woes. These developer tools can speed up boilerplate code and debugging but, like any generative AI tool, their output still need to be reviewed by a professional developer as they can introduce errors or even leak secrets.

3. Reduce toil

Use AI for operations. Whatever you’re operating - which could even be a legacy-style workload on virtual machines - all the operational effort you put into managing that is a cost.

Here’s one example: instead of having automatic scaling that responds to demand (and needs some overhead because of lag), your business might get some serious savings from getting ML to predict the right replica count. It’s usually not relevant for smaller firms but as your overall cloud costs climb, the return on investing in operations grows too.

Find what fits for you

Some teams are all in on Kubernetes. Maybe you use Slurm to manage training. Or plain Sagemaker for launch-and-forget inference. On this, I don’t have an opinion - and if I did, it wouldn’t be right anyway. If you’ve got skills in Slurm, and that works well for the team that use it, build on them - maybe with the box-fresh HyperPod, or with something more mature like ParallelCluster. If the tech you have already works, why fix it?

Basically, switching to something newer and shinier carries costs. ML is such a hot topic that cloud providers probably have a service that looks after what you need to do and scales to match your firm’s growth. Your starting point should be to stick with the technology the team already know, but make sure you’re managing it effectively. That means DevOps good practices, the right amount of documentation, and designed-in security. Consider building platform services if you find that multiple teams keep solving the same problem.

What we can do for you as an award winning SaaS consultancy is to validate your approach. If you’re facing a project complex enough that doing it wrong could move the needle on your balance sheet, we can run a design workshop to review and derisk it. Or we can come in earlier, uncovering your needs, linking those to outcomes, and shaping the architecture of your solution.

Mature your AI/ML approach

Once you’ve determined a need and found a use case for AI delivering business value to the company, you build it. After you deploy a successful proof of concept, that’s an ideal time to think about how you’ll mature the operations side. You can even do that by leveraging AI/ML tools to automate operational tasks, as well as using inference as part of your offering.

OK, so you’re post MVP and you want to improve. There’s a term for that approach: MLOps. So, just like 15 years ago there was a mini revolution around defining operational practices as code and managing it all in source control, over the last 5 years there’s been a big growth in the equivalent concepts for model design, training, and inference. MLOps is about streamlining and automating the ML development process from model training to deployment, making use of what the industry already knows about CI/CD automation.

When properly implemented, MLOps doesn’t just speed up the deployment process, it also gives huge benefits in terms of resilience, information security and disaster recovery. However, using MLOps does require a team that includes engineering skills across machine learning, software engineering, IT infrastructure, and data engineering. Don’t expect to find that all in one person — we wouldn’t.

What this means: scaling your business might mean thinking about the people aspect even more than the technology.

Compliance and inference

There’s a specific trap around using data to drive decisions. You’ve got a model and it works great. A customer uses it and they’re happy with the outcome. But, one day, that software you’re running supports a decision by your customer that affects a person, and that person uses the GDPR right to have the decision explained.

It’s not just about being able to provide that explanation: it’s about providing the mechanisms, and maybe the APIs, to support that explainability at scale.

Address legal, security, and data concerns

Despite the promise, there are also concerns and barriers to AI and ML adoption. These are often related to compliance, security, data quality and transparency, and data privacy. For example, you need to stay up-to-date with regulations related to data privacy and AI ethics as we will probably see government policies in this area in the near future. You can mitigate this by using a robust compliance framework and keep it updated on a regular basis.

In terms of security, make sure to implement security protocols to protect sensitive data and AI models from malicious attacks. This often boils down to securing your ML infrastructure or using ML managed services such as Amazon SageMaker.

As far as data is concerned, establish data quality standards and strive for model transparency and explainability to build trust with stakeholders and customers. Don’t forget that your model is often as good as your dataset. Also, be mindful of privacy concerns, and ensure that any data used is anonymised and secure. Managed services can help here too: have a look at Amazon Macie to learn how to automatically detect sensitive data.

In a nutshell, don’t ignore these concerns as they may be the main pushback coming from your stakeholders for leveraging AI/ML technologies. Address their concerns and mitigate any shortcoming so that you can convince your stakeholders to give you the go-ahead (and the budget that comes with it) to integrate AI/ML technologies in your products.

Conclusion

With investors seeking concrete AI/ML plans, firms making immediate AI investments across the SaaS sector, and AI-driven tools that enhance engineering efficiency, the time to grow your adoption of AI and ML is pretty much “right now” for any SaaS business. A strategy that demonstrates the value of AI and ML - including regulatory compliance and data privacy, is essential. That’s obvious enough. What often gets overlooked is that having a well scoped approach for managing that AI/ML solution at scale is even more important.

We’ve been an AWS SaaS Services Competency Partner since 2020. If you’re building SaaS on AWS, why not book a free health check to find out what the SaaS SI Partner of the Year can do for you?

Tags:

AI/ML Infrastructure as Code Machine Learning

Back to Blog

This blog is written exclusively by The Scale Factory team. We do not accept external contributions.