Skip to Content

How to build a scalable recommendation engine on Azure

Scaling recommendation systems from model to production on Azure
10 April 2026 by
How to build a scalable recommendation engine on Azure
Dark Light - Data & BI consultancy

Build an engine on Azure

Why a New Platform Was Needed

The organization operated across multiple digital brands, each relying on personalized content to drive engagement. While recommendation models were already in use, the setup had grown fragmented:

  • Separate configurations per brand
  • Limited scalability
  • Manual deployment processes
  • Insufficient monitoring and feedback integration
  • Increasing pressure to deliver real-time relevance

The business challenge was clear: personalization was becoming critical to performance, but the underlying infrastructure wasn’t built for long-term scale.

Rather than optimizing individual models, the decision was made to rethink the foundation.

The Core Idea: Platform First

The initiative focused on one principle: build a reusable, scalable platform that separates experimentation from production stability.

This meant designing an architecture that could:

  • Combine batch training with real-time serving
  • Standardize deployment and versioning
  • Scale across brands without duplicating infrastructure
  • Ensure reliability in live environments
  • Create feedback loops to continuously improve model output

By shifting from “a model project” to “a platform strategy,” the organization ensured that recommendation capabilities could evolve without rebuilding core infrastructure every time.

The Architecture in Practice

The solution combined two complementary layers.

A real-time layer processed user interactions and served recommendations instantly via APIs. Containerized services running on Azure Kubernetes ensured resilience and scalability, while monitoring tools provided visibility into performance and stability.

Alongside this, a batch layer handled offline model training and retraining. Data was processed and stored in Azure, with MLFlow managing experiments and model versioning. This allowed the data science team to iterate and improve models without disrupting live systems.

The clear separation between online serving and offline training proved essential. Real-time systems were built for reliability; training pipelines were built for flexibility.

Challenges Along the Way

Like most platform transformations, the complexity wasn’t in the algorithms, it was in integration and coordination.

Key challenges included:

  • Aligning data engineering and data science workflows
  • Standardizing processes across multiple brands
  • Balancing speed of experimentation with production stability
  • Ensuring observability and measurable impact

Establishing clear ownership between platform engineering and data science teams was crucial. Once responsibilities were defined and workflows standardized, iteration speed increased significantly.

Technologies Behind the Platform

The platform was built on Azure, using a modern, cloud-native stack including:

  • Python-based services
  • MLFlow for experiment tracking
  • FastAPI for model serving
  • Docker and Kubernetes for orchestration
  • Distributed data processing for batch workloads
  • Monitoring and dashboarding tools for observability

However, the real differentiator wasn’t the tooling; it was the architectural coherence and lifecycle management around machine learning.

Results Achieved

By the end of the project, the organization had:

  • A unified recommendation backbone across brands
  • Standardized deployment and version control
  • Improved stability of real-time recommendations
  • Faster iteration cycles for data science teams
  • A foundation ready for further AI expansion

Most importantly, recommendation systems evolved from isolated experiments into core digital infrastructure.

The strengthened platform now enables the organization to expand its AI capabilities confidently, including plans to further grow the data science function.

What This Means for the Market

Many companies today face a similar inflection point. They have promising models but lack scalable infrastructure. As AI moves from experimentation to operational necessity, the competitive advantage shifts toward organizations that invest in robust ML platforms.

The lesson from this project is simple:

Sustainable AI impact requires engineering maturity as much as modeling expertise.

Personalization is no longer just about building smarter algorithms. It’s about building the systems that make those algorithms reliable, measurable, and scalable.

And that shift, from model to platform, is where real long-term value is created.


FAQ


A recommendation model is an algorithm that predicts what a user finds relevant, while a recommendation platform includes the entire infrastructure around training, version management, real-time delivery, monitoring and feedback loops. This difference is crucial because a model that works well in test environments in production can fail if the infrastructure is not scalable and reliable.

The core is a clear separation between a real-time layer and a batch layer. Built for speed and stability, the real-time layer serves recommendations through APIs, without experimenting with data scientists disrupting that environment. The batch layer is available for offline training, retraining and experiments, separate from production, so that only validated models go live via a standardized process.

The right time usually comes earlier than organizations think, ideally once multiple models are in production or multiple teams become dependent on the same data pipeline. Waiting for deployments to become too slow, models to be difficult to maintain or production to become unstable only makes the transformation more expensive.