re:Invent is back! Every year, re:Invent brings a bombardment of new services, features and announcements. Unsurprisingly, this year has come filled with all kinds of announcements in the world of AI, ML, and data storage.
We’ve filtered through these to find which are the most interesting and could be worth a deeper look because although you may be tired of hearing about it, there are exciting developments happening in this space.
Amazon SageMaker
If you weren’t aware, SageMaker is Amazon’s cloud machine-learning platform. The first thing you’ll notice is that in the AWS console, the current SageMaker has been rebranded to Amazon SageMaker AI. Why? Good question. It’s to support their announcement of the next generation of Amazon SageMaker which aims to bring together even more of AWS’ AI/ML offerings.
SageMaker Unified Studio
Amazon have announced SageMaker Unified Studio - a development environment that brings together multiple other existing visual tools AWS have (Athena, Glue, Redshift, etc). SageMaker Unified Studio will let you query data from multiple sources, visually build ETL tools, and then develop models. The Amazon Bedrock IDE will also be integrated with Amazon SageMaker Unified Studio for building generative AI applications. Whilst normal pricing applies for any services you use within, there is no separate cost for using SageMaker Unified Studio itself.
Why is this interesting? Well, it seems AWS are really lowering the entry barrier to building complex AI/ML programs. With this, data scientists can focus on developing models in a single IDE instead of jumping between different pages in the AWS console. I’m especially interested in how it claims to bring multiple data sources together, as this can always be a bit of a pain point.
SageMaker Lakehouse
Whether you have a data lake or a data pond (or even some other body of water), managing and querying your data can be difficult. SageMaker Data Lakehouse aims to simplify this by managing all of your data (stored in S3 or AWS Redshift) in one place ready for use in developing your AI/ML applications. In addition, with the new AWS Glue updates replicating data into S3 should be even easier. For example, AWS have published a blog showing a zero-ETL integration from DynamoDB to Lakehouse via S3.
Scale to Zero in SageMaker Inference Endpoints
Your inference endpoints can now scale to zero instances. Previously, if you had large periods of inactivity, you either had to simply pay for it, or use serverless endpoints (which come with their own limitations). With this announcement, you can scale down GPU-backed endpoints either on a schedule or in response to traffic demands to significantly reduce costs.
Just make sure this supports your use case, as scaling back up from zero always requires some waiting time.
Container Caching and Fast Model Loader
SageMaker has also announced new features in SageMaker inference to scale and load LLM models much more quickly via container caching and model weight streaming. With cached container images, SageMaker no longer needs to download the image when scaling up, and model weight streaming omits several steps from loading the model onto the compute instance, instead streaming model weights directly onto the GPUs. The only slight downside to this is that it doesn’t support custom inference images, but regardless it’s another nice improvement for deploying large models and ties in nicely with the “scale to zero” improvement to give you more elastic cloud deployments.
Amazon Bedrock
Amazon Nova Foundation Models
Amazon have rolled out Amazon Nova, a family of multilingual, multimodal foundational models ready to be fine-tuned for your use case. We haven’t yet tried this ourselves, but compared to competitors the pricing is pretty incredible; “at least 75 percent less expensive than the best-performing models in their respective intelligence classes in Amazon Bedrock”. This is certainly something to keep a close eye on.
Intelligent Prompt Routing and Prompt Caching
Developing and running complex models can be expensive. Fortunately, AWS’ preview announcement of Intelligent Prompt Routing and prompt caching could make a significant dent in those costs.
With prompt routing, you can route prompts between either Anthropic’s Claude family or Meta Llama models, allowing you optimise for quality and cost. AWS suggest Intelligent Prompt Routing can reduce costs by up to 30%.
In addition to this, you will be able to cache frequently used context in prompts across multiple model invocations. In this case, AWS suggest prompt caching in Amazon Bedrock can reduce costs by up to 90% and latency by up to 85% for supported models.
Automated Reasoning Checks
Amazon Bedrock Guardrails now supports Automated Reasoning checks in preview which help to detect hallucinations in your LLM responses where accuracy is absolutely critical. This, and the existing guardrails available in Bedrock are vital for responsible and safe LLM use, so it’s great to see AWS becoming the first major cloud provider to offer this.
In addition, shortly after re:Invent Amazon announced significant price reductions for content filtering and denied topics in Amazon Bedrock Guardrails.
Amazon Q
Despite what was a somewhat shaky start to its life, Amazon Q (the generative AI assistant) has continued to make significant improvements. A few key things of note:
- Amazon Q developer can perform automatic unit test generation. Nobody likes writing unit tests, but they’re very important!
- Amazon Q developer can automatically generate documentation from your source code, taking care of another important task that most developers try to avoid.
- Amazon Q developer can automate code reviews, automating the first round of code reviews. We would still highly recommend having human reviews in your process.
- Amazon Q business can now combine information from multiple different sources to answer questions via its AWS QuickSight integration.
AWS S3
Whilst S3 is not an AI or ML tool specifically, it’s highly likely you’re going to need your data stored there. The new Amazon S3 tables offering gives you an interesting way to store tabular data in Apache Iceberg format. This can then be queried and integrated with multiple other services. If you were previously storing data in Parquet format, and especially if you were overwriting a few extra rows each day, Iceberg gives you a much lower object read cost.
With these innovations across SageMaker, Bedrock, Amazon Q, and S3, AWS continues to improve their offering in the world of AI and machine learning, making it more accessible and cost-effective for businesses.
Ready to start building your GenAI supported applications on AWS? Book a free chat to find out how we can help.
This blog is written exclusively by The Scale Factory team. We do not accept external contributions.