AWS Performance Efficiency Tools & Best Practices

[rt_reading_time label=”Read Time:” postfix=”minutes” postfix_singular=”minute”]

AWS Well-Architected Framework

Harnessing the full power of the AWS cloud involves far more than building a solid technical infrastructure. Amazon developed the Well-Architected Framework (WAF) to enable companies to build the most secure, high-performing, resilient, and efficient infrastructure possible for their businesses.

Business operations play an increasing role in how companies can truly transform business through cloud computing. Operational Excellence is one of the five “pillars,” or areas of focus in the AWS WAF. The AWS WAF Operational Excellence Pillar covers best practices around developing robust, repeatable processes for all aspects of managing your cloud infrastructure.

Operational Excellence in the AWS cloud starts with preparation

Like a pilot runs through a pre-flight checklist before takeoff, AWS recommends using operational checklists to ensure that your workloads are ready for production operation and to prevent migrating untested workloads to production.

Create and use these checklists for Operational Excellence in AWS:

  • Operational Checklist–create an operational checklist that you use to evaluate if you are ready to operate the workload.
  • Planning checklist–this may seem redundant but it is important to have a plan that syncs with company events, milestones and roadmaps to stay infront of events that might cause sudden increases in traffic and requests for specific resources, where network performance could impact a company’s revenue or reputation.
  • Security checklist–security is among the most misunderstood features of the cloud. A detailed security checklist should be developed and used to ensure that you are ready to securely operate the workload and respond to a any security event or attack.

AWS Configuration Management Best Practices

The ways that you monitor, measure and manage your architecture, your environments, and the configuration parameters for resources within them, should be documented in a way that allows components to be easily identified for tracking and troubleshooting. Changes to configurations should also be trackable and automated. within a in a Configuration Management Database (CMDB), you should record a detailed resource tracking program using tags and metadata and thorough, accessible documentation of your entire architecture and infrastructure configuration.

Automate Cloud Deployment for Operational Excellence

Automation can take human error out of the operational excellence equation. Best practices for automation include regular quality assurance testing, and defined mechanisms that can continually track, audit, roll back, and review changes as warranted.

Best practices for AWS deployment automation include:

  • Developing a deployment pipeline (e.g., source code repository, build systems, deployment and testing automation) with standard automated procedures for continuous integration and continuous development.
  • An automated release management process.
  • A process to revert changes if they produce operational issues.
  • Risk management strategies (blue/green, canary, A/B testing) to continually assess risks.
  • System monitoring using CloudWatch to monitor system performance.
  • Set alarms and notifications based on key performance thresholds that indicate problems or opportunities for improvement.
  • Automate actions based on performance, such as using Auto Scaling to automatically add capacity based on current conditions.
  • Track and save logs including application logs, AWS service-specific logs, VPC flow logs, CloudTrail to be able to troubleshoot and review performance.

Responding Efficiently in AWS

Responding to network problems is as important as preventing them in the first place. It is important to be prepared to automate responses as much as possible, including alerts and notifications as well as actions and recovery. It also important to have an escalation procedures in place to get the right issue to the right resources as quickly as possible.

Best practices for responding to unplanned events include:

  • Create an event response playbook that all will follow, that defines the circumstances for when this playbook should be activated, and that includes escalation guidelines and procedures.
  • Automate responses as much as possible, such as using Auto Scaling to instantly add capacity when the system passes critical load thresholds.
  • Develop a Root Cause Analysis (RCA) to ensure that you can resolve, document, and fix issues so they do not happen in the future. Make sure you’re not just fixing symptoms of a deeper problem.
  • Develop an escalation process that puts the necessary stakeholders and systems in place for receiving alerts when escalations occur.
    Automate escalation as much as possible based on demand or time thresholds, sending the issue to the right resources.
  • Create an automated escalation queue that between appropriate functional teams based on priority, impact, and intake mechanisms.
  • Use a demand- or time-based approach to escalate higher in the organization as impact, scale, or time to resolution/recovery of incident increases.
  • Define when external escalation to AWS or an AWS partner would be engaged.

What is Operational Excellence in AWS? The AWS Operational Excellence pillar focuses on running and monitoring systems to deliver business value, and continually improving processes and procedures. It helps organizations spread the benefits of cloud adoption beyond the IT department as well as ensure that the cloud infrastructure can efficiently manage changes, respond to events, and automate standards-based tasks and process to successfully manage daily operations.

Learn more about the other Well-Architected Framework pillars:

Hidden layer

Share on linkedin
Share on twitter
Share on facebook
Share on email

Onica Insights

Stay up to date with the latest perspectives, tips, and news directly to your inbox.

Explore More Cloud Insights from Onica

Blogs

The latest perspectives on navigating an ever-changing cloud landscape

Case Studies

Explore how our customers are driving cloud innovation in their industries

Videos

Watch an on-demand library of cloud tutorials, tips and tricks

Publications

Learn how to succeed in the cloud with deep-dives into pressing cloud topics