Pillar #3 of the AWS Well-Architected Framework: Reliability

December 22, 2022
/
Narendar Nallamala
/
No items found.

We continue our 5 Pillars of AWS Well-Architected Framework series today with Reliability. If you missed out on the previous installments, follow the links below to catch up:Pillar 1: Operational Excellence Pillar 2: Security In software development, a system is only as good as it is reliable, and the dependability of such a system depends highly on the reliability of the environment in which it is running. From a cloud computing perspective, reliability isn’t necessarily the ability to run and remain running; we have covered this when talking about the first pillar, Operational Excellence.Reliability in the 5 pillars of the AWS Well-Architected Framework deals with facing disasters and recovering from them. The Reliability Pillar also promotes better mitigation of disruptions, including misconfigurations and network issues.The reliability of your cloud environment affects the business value brought by the environment greatly. The environment, the way it is configured using Amazon resources, and the systems running in it need to be dependable under the most difficult situations.

What Makes an Environment Reliable?

Reliability can be viewed from different perspectives; that is actually one of the key takeaways of this pillar. As an organization/team or department, you are the one setting the KPIs and measuring the reliability of the system based on specific business requirements and the objectives that need to be achieved. That said, there are a number of factors to focus on when following the guidelines of this pillar.The most important factor is system or service availability, or simply availability. This is usually defined as uptime in the AWS Service Level Agreement and calculated on an annual basis. A 99% availability means there must not be more than three hours and 15 minutes of disruption every year. A 99.99% availability, on the other hand, brings that number all the way to 52 minutes.Availability for different application categories vary. For real-time applications such as point of sale and e-commerce sites, a 99.95% availability is the recommended level. For ATMs and more advanced use cases, hitting the 99.99% mark is the goal. In select cases, cloud engineers must strive for 99.999% availability to achieve the desired standard.

Basic Design Principles

The main objective is an environment designed to sustainably support the systems it hosts; one that can deal with disruptions and failures in an effective way. This means there are certain parts of the cloud environment and the policies around them that must follow the basic design principles of this pillar, which are:

  • Test recovery procedures: As discussed at the beginning of this article, the risks faced by cloud environment and systems, the points of failure for systems and ecosystems, as well as details about the most probable attacks are known and can be simulated. Testing recovery procedures are something that can be done using these insights. Real points of failure are exploited and the way the environment reacts to the emergency shows just how reliable the system it.
  • Automatic recovery from failure: Once again, automation – one of the strong suits of Amazon Web Services – plays an important role in keeping an AWS environment reliable. Using logs and metrics from CloudWatch, designing a system where the failures themselves trigger recovery is the way to move forward.
  • Scale horizontally to increase aggregate system availability: In other words, the cloud environment needs to have multiple redundancies and additional modules as added security measures. Of course, multiple redundancies require good management and maintenance for them to remain active through the environment’s lifecycle.
  • Stop guessing capacity: The use of resources is monitored not just for cost-efficiency, but to allow the environment to remain optimum at all times. Having enough resources to deal with spikes in traffic or requests, combined with clear policies and automation, means the AWS environment always has the resources needed by the systems running in it. Scaling up (and scaling back down if needed) is equally easy.
  • Use automation to handle changes: Once again, the five pillars take into account human error as a prominent cause of issues and suggest the use of codes and automation to simplify processes like upgrading, adding new EC2 computing power, and bringing in more cloud storage space to the environment.

Elements of High Availability

Both the AWS Trusted Advisor and the AWS Management Console give you access to information about system and resource usages, usage patterns, and more; it takes a short period of time to gather these details, but they are valuable if you are serious about designing a highly available app.High availability is not just jargon. The process of creating and maintaining a highly available environment requires the use of capable AWS instances, along with supporting services such as EC2, good monitoring tools – including the Amazon CloudWatch – and Elastic Load Balancing. It is also necessary to introduce multiple redundancies across the environment.But that doesn’t mean it is not doable. More importantly, a highly available environment is worth pursuing for the long list of benefits it offers in return and the boost in user experience delivered in the process. With the design principles and guidelines found in this article, you can establish the pillar of reliability for your cloud environment.To sign up for a Well-Architected Review with Ibexlabs, contact us here. As APN Partners, the team at Ibexlabs can assist in making business recommendations surrounding the implications of AWS work-based designs and infrastructure. Following the review, Ibexlabs will advise an organizational roadmap to scale your business in accordance with your short to long-term goals based on the AWS Well-Architected Pillars.AWS will also provide up to $5,000 worth of AWS credits for remediation for all customers who sign up with an AWS APN Partner for the AWS Well-Architected Program.Ibexlabs is an experienced DevOps & Managed Services provider and an AWS consulting partner. Our AWS Certified DevOps consultancy team evaluates your infrastructure and make recommendations based on your individual business or personal requirements. Contact us today and set up a free consultation to discuss a custom-built solution tailored just for you.

Narendar Nallamala

Narendar is the Lead Solution Architect, and Co-Founder of Ibexlabs India.

Talk to an Ibexlabs Cloud Advisor