MAG-O had some concerns around using Serverless, especially around using it for serving the storefront in a highly competitive ecommerce setting. By extensive usage of caching, as well as using correct frameworks, languages and optimisations of the Serverless AWS services we were using, we were able to deliver a large performance increase compared to their existing platform, while reducing the costs of having EC2 instances always deployed.
For infrastructure and application deployment it was initially planned to use the Serverless Application Model (SAM) CLI for deploying the required lambda functions, and this was an acceptable solution for the start of the project. However as the project progressed and the CI/CD pipeline was developed, it was decided to use CloudFormation, CodeBuild, CodeDeploy and Octopus Deploy, to more closely align this solutions deployment with the deployment process MAG-O uses for the rest of their projects. The final deployment solution allowed best practices for building and deploying, while having Octopus manage some configuration settings and the release management.
Solution Characteristics
As an e-commerce platform, the storefront was the main component to be performance tested, as page load times as well as the ability to scale to serve thousands of customers was of utmost importance to MAG-O. This was achieved using the Gatling testing tool to make randomised page requests, carefully designing tests to ensure we had a realistic mix of hits and misses from the caching layer to maximise load on our services rather than CloudFront.
Throughout load testing, we were able to prove that load on the system actually had performance benefits for the storefront, reducing page load times as cold starts were less impactful, getting page load times below 300ms for a non cached, server side rendered page.
As the Lambda Functions we developed had specific use cases, each one was given limited permissions to access other AWS resources. Each Lambda Function started life with no permissions then, as the need arose, the least amount of privileges required to perform required operations on specified resources were added. This least-privileged approach ensured we kept the services secure, minimising external attack vectors, keeping systems compliant and reducing misconfiguration incidents.
Due to the use of caching for the storefront and the architecture of the OMS, concurrency limiting wasn’t required for either of the solutions. Step Functions also allowed the system to replay failed executions if required, ensuring customers orders were fulfilled.
A number of monitoring tools were implemented, logs were monitored from CloudWatch, then CloudWatch Alerts were created to monitor usage and error states. Step Function failures were monitored from CloudWatch and displayed on an internal dashboard, allowing key stakeholders to react quickly to any possible issues. As well as this AWS tooling other systems were used including using Raygun to track .NET exceptions and TrackJS to monitor front end JavaScript errors.
This service was developed to replace a series of existing systems which ran within a VPC, isolating internal APIs from the public internet. The Serverless tooling we chose to use meant we minimised the need to place new infrastructure inside the VPC, however integrations into other systems required one Lambda Function for authentication to be in the same VPC as an internal authentication service.