Accelerating Survey Analysis and Maximizing Scalability with Machine Learning on AWS

Accelerating Survey Analysis and Maximizing Scalability with Machine Learning on AWS

How leveraging Machine Learning on AWS significantly improved the speed, accuracy and resource efficiency for an advisory services and leadership development company.

Machine learning (ML) has brought about a paradigm shift in how large volumes of data are processed based on patterns and inferences. Inputting large quantities of data to train machine learning models have laid the foundation for a whole new set of tools and capabilities, ranging from pattern recognition and accurate forecasting to automatic document processing and language analysis.

One of the most significant features of machine learning is that it can accelerate processing exponentially while surmounting the accuracy and quality achievable by manually performed work.

Let’s take a closer look at a machine learning implementation project where the customer is able to accelerate the processing of thousands of surveys from days to mere minutes, all while producing results that are richer and more accurate than ever before.

As a leading advisory services and leadership development firm, our customer helps organizations identify opportunities to improve their performance. Their process involves utilizing a networked approach to enable clients to operate optimally in today’s complex environments.

Augmenting Survey Processing with Machine Learning

While helping businesses with organizational development and process improvement, the customer utilizes surveys to get input from various stakeholders. These surveys include multiple-choice questions as well as open ended questions that are designed to identify underlying issues.

For some of their larger client accounts, as many as 20,000 responses can be received on a single survey. As the existing process to review the survey results was a manual one that was both time consuming and labor intensive, the customer was looking to develop an automated natural language processing solution that would be significantly faster and be able to identify key topics that were cropping up in the answers to open ended questions.

Having previously worked with Onica to leverage AWS for an infrastructure move and data flow modernization project, the customer was well aware of the advanced capabilities of the AWS cloud. They also had a very clear idea of the solution they were looking for, but chose to bring in an external partner instead of developing it in-house in order to meet their timelines. They decided to work with Onica as a result of our proven expertise and extensive experience developing Machine Learning (ML) solutions for customers across multiple industries.

Building a Scalable Data Visualization & Language Processing Solution on AWS

The customer had considered several off-the-shelf language analysis tools for their workflow, however, they did not find one that met their specific needs. Having significant experience and expertise in machine learning and natural language processing, Onica was able to suggest tools that would provide the visualizations that could best serve their use case.

Onica’s ML team worked closely with the customer’s team to first understand what the highest value output from the ML data processing would be in order to ensure the proposed solution produces the results they were looking for. With this information, Onica was then able to select the right tools and ML models to ensure the delivery of high value data derived from processing responses to open ended questions.

The solution designed by Onica’s ML team uses Amazon Athena, AWS Glue, Amazon S3, Amazon SageMaker, and Amazon Translate. As some of the company’s clients respond to the surveys in different languages, Amazon Translate is used to first translate all responses into English before processing. The solution is also using Latent Dirichlet Allocation (LDA), an open source topic modeling algorithm generally used for large documents. Onica’s ML team optimized the LDA implementation to the customer’s specific use case, improving its processing of shorter text passages.

One of the biggest challenges that the team encountered while developing the solution was the varying sizes of the text responses. Responses to the same question might elicit a lengthy answer from one person, and a shorter one from another. The team had to make calculated decisions on how to select words or n-grams (multiple words or phrases) and the number of topics, based on the amount of text variations and lengths.

Another challenge while working with a variety of large and small client accounts is that the sources and amount of data may vary significantly. Onica’s ML team used an iterative approach in getting feedback from the customer to ensure that the different modelling approaches that were being tested and deployed would achieve the best visualization result.

The effectiveness of visualizations also vary depending on the person viewing them. This is because a detailed, intricate visualization may be ideal for a trained specialist, but too complex for someone not familiar with big data. Onica’s team optimized the visualizations for the firm’s users and analysts based on their feedback.

Maximizing Speed, Accuracy and Comprehension with Machine Learning

One of the key benefits of automated language analysis is faster identification of latent topics in the survey responses across large datasets. By automating the analysis using natural language machine learning, results are now available in minutes seconds, as opposed to what previously might take days for teams of analysts to look into. Furthermore, this also frees up the analysts’ time to focus on other important tasks. The analysts will now be able to work with the visualization results, rather than spending days creating the visualization.

Another advantage of a ML solution is that ML can detect subtle, hidden, and abstract topics that might be missed by human analysts and is also more comprehensive and consistent in the processing of the data. As a result of the way the topic modeling is done, the solution provides more detailed accuracy than human analysts might provide. For example, rather than simply identifying an issue as a communications issue, the solution can indicate that it is a communication issue between an organization and their contractors, or between specific departments. This enables our customer to provide their clients with more accurate and effective solutions to issues that have been identified.

Once deployed, Onica’s natural language ML solution enables a quick visualization of survey results, and rapidly identifies the underlying themes that might be coming through in the responses. It provides a classification of business challenge types, quickly identifying the specific challenges that the customer’s clients face, such as indications of corporate communications issues, process issues, merger issues, and so on. The solution adds texture to the analysis and ensures a standard process focused solely on the extraction of relevant data, including subtle topics, enabling them to provide their clients with the intricate guidance they require to manage any issues identified.

The solution is also designed to be scalable, with the ability to process data of varying sizes from their small and larger clients. In fact, the scalability is so advanced that it could process all of the data from all of their client accounts at once, and is capable of processing data from some of the largest organizations in the world. In addition to this, the solution requires minimal management, leveraging high availability, redundancy and scalability enabled on AWS. The only management required is for the visualizations, or output for each individual client account.

Onica’s automated ML natural language processing solution on AWS will enable this leading advisory and leadership development company to replace a time intensive manual process that consumed considerable resources with an automated process that is significantly faster, more consistent, and replicable. Not only does the solution increase the rate at which survey results can be processed to extract relevant data, it also increases the accuracy of the data enabling the customer to provide more holistic and effective solutions to their clients.

Hidden layer

Share on linkedin
Share on twitter
Share on facebook
Share on email

Onica Insights

Stay up to date with the latest perspectives, tips, and news directly to your inbox.

Explore More Cloud Insights from Onica


The latest perspectives on navigating an ever-changing cloud landscape

Case Studies

Explore how our customers are driving cloud innovation in their industries


Watch an on-demand library of cloud tutorials, tips and tricks


Learn how to succeed in the cloud with deep-dives into pressing cloud topics

Subscribe to Onica Insights