A Guide to Predicting Future Outcomes with Amazon Forecast

Mark McQuade

AI & Machine Learning, Data Analytics
April 29, 2020

[rt_reading_time label=”Read Time:” postfix=”minutes” postfix_singular=”minute”]

Amazon Forecast is a fully managed, machine learning service by AWS, designed to help users produce highly accurate forecasts from time-series data. Amazon has utilized machine learning to solve hard forecasting problems since 2000, improving 15X in accuracy over the last two decades. Built on the same technology as is used at Amazon.com, Amazon Forecast can be utilized for a variety of business use cases, from financial and resource planning to predicting future performance and product demand across a wide spectrum of industries from retail to healthcare.

The machine learning models powering Amazon Forecast can be used to automatically determine how the relationships between time-series data that changes over time and independent variables such as product features, store locations, etc., affect forecasting outcomes, improving the accuracy of predictions, and resulting business insights. Built as a part of the AWS Machine Learning Suite of services, Amazon Forecast benefits from AWS’ comprehensive cloud platform that is highly secure and reliable and offers the best set of compute, storage, security, and analytics capabilities.

Benefits of using Forecasting in business

Forecasting has a range of important business use cases, from planning product demand, resources, and inventories to projecting financial outcomes. Forecasting allows for businesses to use past data and external factors to understand what their near future looks like in terms of costs needed to be productive, earnings that will be generated from sales, and areas where investments need to be made or pulled from, to meet the company’s desired goals.

Accurate forecasting is therefore extremely vital. For example, under-forecasting product demand can lead to lost opportunities, while over-forecasting can leave a company with wasted resources and sunk costs. Similarly, under-forecasting financials can leave a company with no option but to undercut prices while over-forecasting can lead to depleted cash reserves.

How is Forecasting accomplished?

Forecasting consists of three main steps.

Looking Backward
The first step involves beginning by looking at historical data that preferably contains identification in the form of timestamps, items, and values. These provide baseline data.

Identifying Trends
Approaches such as statistical deep learning help you look over the historical data to find trends.

Projecting Forward
Identified trends help project expected future values.

What Amazon Forecast brings to the table

Amazon Forecast is a fully automated and fully managed machine learning service that delivers highly accurate forecasting with up to 50% improvement over traditional methods. The service is simple to use and requires no deep learning experience. From a security perspective, your data and your models are fully secure and encrypted in line with AWS’ security standards.

The technology behind Amazon Forecast begins with three types of data from your Amazon S3 repositories – Historical data, Related data, and Item data – that are fed into the service. Amazon Forecast then adds in relevant built-in datasets to enrich the data further and automatically trains the best ML model for you, selected through AutoML. Once the model is trained, it generates accurate forecasts through the console or private API.

Behind the scenes

Amazon Forecast performs multiple processes in the background that the user does not have to manage. These range from loading and inspecting data, training models with multiple diagrams, and selecting hyperparameters for optimization, selecting the most accurate model, and hosting it. All these processes are required to allow your raw data to be utilized in creating forecasting exports.

How does it hold up against Legacy Systems?

Amazon Forecast achieves significant acceleration in installation time, allowing you to be prepared with a working model in 6 – 8 weeks as compared to 2 – 8 months that legacy systems generally require. The service is also highly cost-effective, with a pay-as-you-go-model for pricing and significantly lower professional services and maintenance costs over the medium term.

Workflow

The workflow to generating forecasts consists of the following steps.

Creating related datasets and a dataset group
Retrieving training data
Training predictors (trained model) using an algorithm or AutoML
Evaluating the predictor with metrics
Creating a forecast
Retrieving forecast for users

Datasets and Dataset Groups

Datasets contain the data used to train predictors. One or more datasets with matching schemas must be created as locations where training data will be imported. Dataset schemas define the logical view and organizational structure of the entire database. Dataset groups are collections of complementary datasets (up to three, one of each dataset type – target time series, related time series, and item metadata) that detail a set of changing parameters over a series of time.

Dataset Domains

Each dataset created, requires you to associate a dataset type and a dataset domain that defines a forecasting use case. You can use prebuilt domains provided by Amazon Forecast (as seen below) or make custom once for your use case.

Dataset Types

Each domain can have up to three dataset types, based on the type of data you want to include in the training.

Target time-series dataset is the only required dataset that defines the target field you are looking to generate forecasts for. Data included could be historical demand or sales numbers, or other such primary data. Up to 10 dimensions can be added to this dataset.

Related time-series dataset is an optional dataset that consists of time-series data that is not included in the target dataset and can help improve accuracy. This dataset includes up to 10 dimensions as chosen for the target dataset as well as an additional 13 related time-series features. These datasets can only be used when working with specific algorithms such as DeepAR+.

Item metadata dataset is for metadata that applies to the time-series data. This optional dataset is used to define things such as the color of a product, or the city where it sold, for example, when building a retail forecast.

Predictors

Forecasting models trained by Amazon Forecast, used to generate forecasts based on time-series data, are called predictors. During training, accuracy metrics are generated to evaluate each predictor when selecting a model to generate forecasts. To create a predictor, the following elements are required:

Dataset group that provides data for training.
Featurization configuration that specifies the forecast frequency and provides information to transform the data to be compatible with the training algorithm.
Forecast horizon that details the number of time-steps to make.
Evaluation parameters to split the dataset to be purposed towards training and testing.
Algorithm that trains the model and specifies default values for hyperparameter optimization or AutoML that automatically picks a suitable algorithm based on your dataset.

Evaluating Accuracy

Predictor metrics that are generated when training predictors, help evaluate the accuracy of an algorithm for various forecasting scenarios. Amazon Forecast uses backtesting, or testing a model on historical data, to produce these metrics.

Evaluation parameters, specified in the predefined algorithms, split the dataset into training data and testing data that is processed by the algorithm in training and testing stages. A set of metrics help you effectively evaluate forecasts, some of which are:

Error/loss functions, that calculate the error between true and predicted results.
Weighted quantile loss, that calculates how far off forecast, a certain quantile is from actual results.
Root mean square error, that calculates the difference between the actual target value and predicted mean value

To learn more about these metrics, take a look at the Amazon Forecast documentation.

The uncertainty associated with forecasts, in comparison to the target result, is expressed in prediction quantiles. Three distinct quantiles exist upon which Amazon Forecast provides predictions and calculates errors – 10%, 50%, and 90%. A P90 quantile, for example, predicts that 90% of the time the true value will be less than the predicted value, while a P50 quantile predicts that 50% of the time the true value will be less than predicted.

Algorithms

A broad set of different algorithms power Amazon Forecast, some of which include:

Auto-regressive integrated moving average (ARIMA), which is a classical approach to model autocorrelations that works well with a small number of time series data.

Error trend seasonality (ETS), that uses exponential smoothing, working with a small number of time series data to find trends, seasonality, and residual.

DeepAr++, which is an algorithm used widely internally at Amazon for mission-critical decisions. It performs well at many related time-series and cold-stat problems.

Learn about the other algorithms used in Amazon Forecast, by watching our webinar or reading Amazon’s documentation.

Forecasts

After you have created a predictor, calling the CreateForecast operation helps you create a forecast. During this process, Amazon Forecast trains a model on the entire dataset before hosting the model and doing inference. A forecast for every item (item_id) in the dataset group that was used to train the predictor is created and once this process is complete, you can query the forecast or export it to your Amazon S3 bucket for future use.

Forecasts produced using Amazon Forecast can be expressed through visualizations. You have the ability to go into the console and view the forecast, retrieve the forecasts through a private API, and export them in the .csv format.

Benefits & Pricing

Amazon Forecast effectively handles tricky forecasting scenarios such as missing values, product discontinuation, new product introduction, highly spiky data and irregular seasonality, maintaining a high degree of accuracy, due to its use of deep neural networks. You can easily look up forecasts on the console and express them through visualizations for any time series at different granularities. Metrics for accuracy are also available right in the console.

Amazon Forecast follows a pay-as-you-go pricing model, costing $0.6 per 1000 generated forecasts, $0.088 per GB of data storage, and $0.24 per hour of training.

To see an example of Amazon Forecast in production and a detailed demo on how you can structure and deploy a forecasting project with Amazon Forecast, check out our webinar. If you’re interested in leveraging Amazon Forecast, or any other AWS artificial intelligence and machine learning service, get in touch with our team today!