Operationalize Machine Learning: The Model Factory Framework

Mark McQuade

AI & Machine Learning, Cloud Native Development, DevOps
March 3, 2021

[rt_reading_time label=”Read Time:” postfix=”minutes” postfix_singular=”minute”]

Data is unequivocally, the greatest asset of our times. Companies across the world’s largest industries utilize data, deriving insights that help them guide business decisions to reach their unique objectives.

Traditionally, business analytics has been reactive, guiding business decisions in response to past performance. With machine learning, however, companies can derive insights from their data and harness it for predictive analytics, serving a broad scope of use cases from financial forecasting to inventory management.

While machine learning is an incredibly powerful tool, implementing machine learning models for real-world applications can be highly challenging. In fact, according to IDC, over a fourth of AI and machine learning initiatives fail. The culprits are multi-faceted, from lack of experience on the developer’s end to poor data quality and challenging operationalization.

Data tends to lose value as time passes, requiring these ML models to be repeatedly trained with new datasets, ensuring that they work effectively with the latest data. This makes the process very time-consuming. Furthermore, there was no standardized set of best practices that integrate CI/CD, DevOps, DataOps, and software engineering practices to improve agility, quality, and success in ML model deployment.

Rackspace Technology’s Model Factory Framework tries to address these challenges, providing a coherent mechanism for multiple data teams across an organization and operations teams on the other end, to collaborate, develop models, automate packaging, and deploy to multiple environments.

Machine Learning Lifecycle

The machine learning lifecycle is complex, with multiple building, training, testing and validation steps across data analysis, model development, deployment, and monitoring. All these stages bring their own set of challenges. To address these, the Model Factory Framework integrates Amazon SageMaker, an AI and machine learning services stack that includes:

AI Services that provide pre-trained models for ready-made vision, speech, language processing, forecasting, and recommendation engine capabilities.
ML Services that provide pre-configured environments within which you can build, train and deploy deep learning capabilities into your applications.

The Amazon SageMaker stack also supports all the leading machine learning frameworks, interfaces, and infrastructure options, for maximum flexibility.

The Model Factory Framework

The abundance of machine learning tooling, processes, and frameworks is precisely why model deployment tends to be so challenging. Data and ML teams have their own preferences as to these while operationalizing teams have others, which can cause deployment delays, incompatibilities, and other problems. In order to bridge the gap between the efforts of these teams, a standardized framework that is agnostic of platform or tooling in the form of the Model Factory framework starts to display its unique utility.

Model Factory Framework provides a cloud-based machine learning lifecycle management solution. An architectural pattern rather than a product, Model Factory Framework is an open, modular solution that is agnostic of platform, tooling or framework, that allows integration with AWS services and industry-standard automation tools (Jenkins, Airflow, AWS CodePipeline) for data processing.

Key Benefits of the Model Factory Framework

The Model Factory Framework can help you cut the entire machine learning lifecycle from more than 25 steps, down to under 10. It further accelerates the process by automating handoffs between the different teams involved, and by simplifying troubleshooting which it achieves due to supplying a single source of truth for ML management.

For data scientists, the Model Factory Framework provides a standardized model development environment, the ability to track experiments, training runs and resulting data, automated model retraining, and up to 60% savings on compute costs through scripted access to spot instance training and hyperparameter optimization (HPO) training jobs in QA.

For operations teams, the framework automates model deployment across development, Q/A, and production environments, provides a model registry for model version history tracking as well as tools for diagnostics, performance monitoring, and mitigating model drift.

For the organization, the framework provides a model lineage for governance and regulatory compliance, improves time to insights, and accelerates ROI, while reducing effort to get ML models into production.

If you would like to learn about Rackspace Technology’s Model Factory Framework in more detail and explore how it improves processes from model development to deployment, monitoring, and governance, download our whitepaper.

Are you working on artificial intelligence or machine learning initiatives on AWS? Get in touch with our AI/ML experts to learn how you can leverage our experience and knowledge to accelerate your deployments, improve success and maximize ROI.