DevOps and managed services: infrastructure-as-code
Infinity Works proposed building an iOS application and a website to allow businesses to register for credit and manage the sending and receiving of invoices to their customers. To reduce the amount of time spent on infrastructure maintenance, Infinity Works proposed using DevOps techniques such as infrastructure-as-code to configure the solution, and to construct the solution using managed services from AWS.
Infinity Works proposed a microservices architecture using Kubernetes, that was initially run on EC2, but was later migrated to the managed Amazon EKS Kubernetes service when it became available in the London region. Kubernetes was selected because of the ability to decouple deployments from infrastructure configuration, while the later migration to EKS was carried out to allow for a reduction in time spent on infrastructure management.
For authentication, Infinity Works initially proposed the use of Auth0. However, delays obtaining legal approval for the service meant that alternatives were required. Ultimately, Cognito was used to authenticate users on the mobile application and website due to the existing approval of AWS solutions, and Cognito fulfilling all necessary requirements.
Infinity Works proposed the use of API Gateway and PrivateLink through to an internal Network Load Balancer (NLB), interfaced into the Kubernetes cluster. Kubernetes Ingress Controller is then used to route traffic into the various backend services.
API Gateway was selected due to the ease of use, security features, and automatic deployment features. Infinity Works proposed migrating to API Gateway from the WSO2 solution that was already in place, because WSO2 did not support code-based configuration resulting in reduced development pace and production incidents due to manual misconfiguration.
All of the microservices that make up the solution produce OpenAPI Specification documents automatically. These documents are then used to automatically configure API Gateway as part of the CI/CD pipeline.
AWS RDS Postgres is used for databases, with a distinct database cluster per microservice. Postgres was selected due to the team’s experience with the service, and the availability of relational database skills in the engineering marketplace. AWS RDS service was selected because it allows for automated, cost effective management of the service, and allows the team to focus on delivering customer-facing features.
AWS ElasticSearch was selected to deliver a centralised logging service, in conjunction with Jaeger for tracing and log analysis. MongoDB Cloud was originally our NoSQL store of choice, however we proposed to migrate to DocumentDB when it became available due to its high availability, and improved Terraform capabilities.
All backend services heavily use automated testing to ensure the service operates as expected, and we selected to use the PACT contract testing framework to validate that microservices are meeting contract expectations before allowing any changes in live environments.
A data pipeline provides data analysis capabilities by combining AWS MSK to deliver data to S3, where data lake analysis is carried out using AWS Athena and AWS Glue. AWS Athena and AWS Glue were selected because of the cost effectiveness and simplicity. It allowed analytics teams within the organisation access to reporting data without direct access to production datastores or systems.
Microservices and modern software engineering transforms service delivery
Infinity Works transformed how Asto delivers services by introducing microservices architecture and modern software engineering delivery practices. Infinity Works allowed Asto to move away from traditional processes such as ITSM and ITIL that are more suited to service management than custom engineering delivery, and towards automated CI/CD pipelines.
These techniques increase the pace of delivery while reducing the risk of that increased pace by introducing automated functional and security testing. In particular, CircleCI is used to build and deploy changes to environments, while AWS Lambda is used for automation of manual administrative tasks.
Infinity Works introduced a culture of measurement, surfacing data on key metrics to demonstrate health of the delivery process, focussing on the key metrics of:
- Mean Time between Failures (MTBF)
- The average amount of time between issues occurring within a system. This is a measure of a team’s ability to deliver working, stable services.
- Mean Time to Recovery (MTTR)
- The average amount of time between a failure occurring, and a resolution of that issue. This is a measure of a system’s resilience and a team’s ability to recover from unexpected failures.
- Percentage of rollbacks
- This metric is the number of deployments that reach production, but are then rolled back (the change reverted) due to defects not discovered during testing. This is a measure of how well a team is capable of deploying working software to a production environment.
Infinity Works helped Asto to leverage best practice on AWS by focussing on the key pillars of the AWS Well Architected Framework:
Security
Infinity Works ensured that engineers have a minimal set of permissions appropriate to their work, while ensuring limited access to production environments with break-glass access to enable engineers to support production incidents.
Use of AWS security features enables network logs (VPC Flow Logs) and audit logs of all actions taken within the AWS accounts to be sent for automated security analysis, while continuous monitoring powered by Guard Duty, AWS Config, and Trusted Advisor allow continuous monitoring of the platform’s security posture, including sending alerts to administrators.
Encryption features in each product within AWS allowed for encryption at rest and in transit.
Reliability
The solution used the highly available AWS services including API Gateway, EKS and AWS RDS for automatic configuration of fault tolerant database clusters. To ensure that the platform was capable of managing failure, automated fault injection was carried out to simulate random failures.
Scalability and performance efficiency
To ensure that the system was capable of scaling, performance testing was carried out. This performance testing demonstrated that CPU utilization was not a primary concern, but that memory utilization of the Web framework and programming language meant that memory consumption needed to be carefully managed.
Data from the performance testing was used to drive the selection of appropriate compute, enabling right-sizing and appropriate selection of instance types.
Cost-effectiveness
While not a major consideration in this project due to the relatively low costs involved in managing the platform, effective use of auto scaling to disable unused instances (and more particularly on non-production environments) was used to lower overall costs.
A future optimization may be to apply reserved instances, however due to the growth of the service’s usage, it is currently difficult to guess the appropriate volume and size of reservations.
Operational excellence
Infinity Works believes that the best way to operate a service is that the team that builds a service to then run that service in production. This encourages the team to consider operational characteristics and monitoring of a solution early in the project lifecycle, and ensures that feedback from production running is folded into product development.
Frequent small changes with automated rollbacks allow the solution to be deployed rapidly, while the use of Infrastructure as Code ensures full traceability of changes and allows for comparison between the desired and actual states of infrastructure.
Continuous improvement
The use of blameless incident analysis techniques based on the Google SRE process enables the team to continuously improve the service to customers.
DevOps summary
Infinity Works helped Asto to deliver features at a high cadence, meet their goals of rapid product innovation, and ultimately enable their customers to benefit from the faster pace of delivery.