AWS re:Invent 2020 – week 1 review

December 10th, 2020 Written by Adrian Hesketh

AWS re:Invent 2020 kicked off last week, but not in the casino hotels of Las Vegas, as was the case last year. Instead it is being delivered online and spread out across a few weeks rather than in a jam-packed seven days.

Last year, I attended re:Invent in Las Vegas with several Infinity Works colleagues, and I must admit I drank a lot more piña colada then than I have been doing this year… in Leeds.

AWS uses re:Invent to launch its big new services, but there’s also usually a ramp-up of releases ahead of the big event. Sometimes, these pre-re:Invent releases are more interesting to AWS users because they improve the current services.

Of course, AWS doesn’t just save everything up for December, there’s been a constant stream of new services released over the last year. AWS is proud of its track record, averaging nearly 2,000 releases and improvements per year, so I started my week by thinking back and pulling out some of my highlights from the last year.

I keep an eye on AWS releases (I get an automated notification via Slack), but how do you think back over 200 press releases? Well, I wrote a program to do it! You can see it here: https://github.com/a-h/reinventrecap

I used this tool to help me search for blog posts in technology that I was interested in to ensure I’d see the most relevant things from the last year. So, here are my top picks…

AWS re:Invent 2019 and the famous piña colada

Last year’s top picks

AWS Lambda supports batch windows of up to 5 minutes for functions with Amazon SQS as an event source – this is helpful if you’ve got a system populating a queue, and you want to process the messages off the back of that queue in batches for efficiency. If your Lambda functions are only processing single SQS messages, and you can afford to wait, this can improve the efficiency.

AWS Step Functions increases payload size to 256KB – this is such a basic change, but huge. AWS Step Functions only used to handle 32KB of state, which can be very limiting because it even includes the names of the JSON keys. AWS Step Functions and Lambda are often used very closely together, and AWS Lambda supports 256KB of payload, so it makes sense for these to be the same. Ghandhar Tannu from AWS asked us for feedback on this a few months before it was launched, and I asked for exactly this – so I can confidently say that AWS does listen if you provide feedback.

Amazon EventBridge adds Server-Side Encryption (SSE) and increases default quotas – it caught me by surprise that EventBridge wasn’t encrypted at rest by default in a security review I was doing. I couldn’t believe that AWS introduced a new service that wasn’t encrypted by default, so it was great to cross a risk off my list by seeing this added to the service.

This isn’t strictly within the last year, but I think it’s so important that it’s worth pointing out CloudWatch Embedded Metric Format. When I first saw this change, I wasn’t really interested in it because I was midway through a project and we’d already nailed our metrics strategy. But this year when I was starting a new project I decided to use it and it saved us a lot of time. We could use the client SDK, drop a line or two into the code that writes to the logs, and we got a CloudWatch Metric that we could report on, without having to scrape the logs or harm our latency by using an API push.

Snowcone was introduced in the summer. It’s the newest smallest member of the AWS Snow family – a set of capabilities to migrate storage into the cloud, proving 4GB of RAM, 2 CPUs and 8TB of storage at the edge. I can see this being useful for both migrating on-premises data, or simply providing cost-effective temporary compute to edge locations, such as warehouses or retail buildings, without having to send an engineer out.

When I first saw PartiQL for DynamoDB being announced, I was both intrigued and horrified. I was intrigued that there might be some new features coming to DynamoDB that changed its model completely and made a new query language something that would be useful, but also worried because one of the nice things about DynamoDB is that it is predictable due to its simplicity. Most SQL implementations are Turing complete, so you can write a SQL query that is impractical to execute, and many developers seem to like to have a go at proving this.

I couldn’t understand at all why AWS would think it was a good idea to add PartiQL, but then I was reading some documentation and examples for DynamoDB, and I realised that they were all programming language specific. Maybe the key advantage is to their documentation writers, they can write one post and it’s language independent? Or maybe it’s part of a bigger “one language for data” strategy. Other folks are telling me that the PartiQL API transparently executes a Scan operation on DynamoDB if required by the query, which sounds like a complete recipe for disaster if it’s true (I haven’t confirmed that yet). I’m not sure how I feel about this one, but I’m not queueing up to use it.

However, exporting DynamoDB data to S3 is something I can get behind. Previously, this required streaming data from DynamoDB Streams into a Kinesis Data Firehose and all that jazz. Now, it’s a lot simpler. I use this to get data out from transactional APIs that power customer journeys, into analytics platforms like AWS Athena and Snowflake for analytics queries that aggregate across related data.

I’m not a big relational database user these days (I can’t think of where I’d use one as a transactional store in a new system), but I’ve got customers that use it for various off-the-shelf apps, and I can see why it’s a good choice for some development teams. AWS’s own Graviton series of ARM processors offer better cost / performance ratio than Intel chips running in AWS, so I took notice when AWS announced RDS can now run on Graviton 2 processors.

In the same vein, I can’t think of where I’d use Microsoft SQL Server in a new system, but during cloud migrations, I regularly encounter the use of SQL Server Reporting Services. This used to be a problem for SQL Server migrations, because SQL Server Reporting Services wasn’t supported on Amazon RDS, meaning that more administration was required. So in May, when RDS announced support for it, I was pretty sure some of my customers would find that handy.

Just before re:Invent, AWS pushed out a long-awaited feature of a managed Apache Airflow – a tool that’s used to configure and execute data pipeline workflows. Airflow solves the problem of wasted space between import / export jobs, the old-fashioned “start the export at midnight”, “start the import at 1am” type thing, and all of the problems that causes when jobs don’t finish on time. Infinity Works has lots of customers taking advantage of Airflow, so some of our teams jumped straight on it for a look around. 

Finally, DynamoDB just keeps adding good stuff without me having to do anything to take advantage. In this case, faster restores. If you’re restoring a DynamoDB table from a backup, things have gone very wrong on a lot of levels, but waiting around for a restore is a pain you can do without, so I was pleased to see that.

Adrian Hesketh AWS re:Invent
re:Invent 2019 in Las Vegas

Week 1

I’ve talked about the last year, but what were the key takeaways from week 1 of re:Invent 2020? For me, the week started with the CEO of AWS, Andy Jassey’s keynote. It launched a slew of new services and a number of updates to existing services. Here are the new releases that interested me the most…

Infinity Works is an AWS Lambda Service Delivery Partner, so we were given advance access to the new Lambda capabilities of launching Lambda Functions from Docker containers, and the bump in maximum RAM and CPU ahead of time under NDA so that we could feed back.

The key benefits of the Docker container support are probably going to be most keenly felt by Python users, where packaging Python dependencies into a 50MB zip file can be a problem. The increase in RAM from 3GB to 10GB of RAM is huge, I can see most use cases here being in data processing and loading large machine learning models. The maximum of 6 vCPUs makes Lambda a viable target for multi-threaded churning through that data on-demand, rather than having to start up a Docker container.

We also saw a billing change in AWS Lambda. The price of AWS Lambda used to be rounded up to the nearest 100ms, but is now just rounded to the nearest millisecond instead. The price of Lambda invocations hasn’t been an issue in any company I’ve seen – API Gateway’s pricing of $1.50 per 1M requests tends to be a bigger consideration, but any price reduction is a good thing.

The new AWS Babelfish caused a bit of interest in the community. It’s a tool that automatically rewrites T-SQL code targeting Microsoft SQL Server and makes it Postgres compatible. This allows applications written for Microsoft SQL Server to be ported instantly to Aurora Postgres without being rewritten. My initial thoughts, based on developing on SQL Server for over 10 years, was that there’s no way it could possibly work, and even if it looked like it worked, there’s no way I’d trust it. Then I thought about the JavaScript Web development workflow – engineers write one type of JavaScript or TypeScript code, test it on x86 processors, then run it through a series of Babel plugins to spit out browser-y JavaScript. That code then runs in a JavaScript virtual machine on top of all sorts of ARM and x86 processors. Then I thought about how many database migrations AWS has done, and the experience AWS has gained by delivering the AWS Database Migration Service. Maybe it could work, after all.

In the compute space, ECS and EKS anywhere are very interesting. This allows Docker containers to be executed in your own on-premises data centre, at the edges of your network, or even in other cloud providers, while being managed by AWS ECS and EKS. This sounds like a really useful feature for organisations that want to migrate to the cloud, want to unify their management strategy, or have integrations with hardware. I spent a year or two working at an organisation shipping orders using warehouse robotics, being able to manage the systems that we integrated with, and to deploy new versions remotely from the cloud using the same DevOps tooling as the rest of our systems would be very useful. I can also think of some companies that might like to encrypt data that’s going to be stored within AWS but retain complete control over security keys, and this sounds like it could help there.

Aurora Serverless is a relational database service that scales up and down automatically. It’s a great solution for spiky workloads and non-production systems, and can dramatically lower the costs of using relational databases. The downside of the scalability was that the service could be slow to scale up, and a lot slower to scale down, but all of that appears to have been resolved in version two. I’ve not looked into it much, but I made a note to even more strongly consider Aurora Serverless if I need a relational database.

In the keynote, AWS Proton sounded like something I should be interested in – a “fully-managed application deployment service for container and serverless applications”. I thought I was going to get a slick replacement to the clunky CodeBuild suite, which I’m really not a fan of, and maybe a rethink of the AWS Serverless Application Repository. Unfortunately, taking a look at the sample apps, it’s definitely not something I’m interested in. I saw a CodeBuild template, embedded in a CloudFormation template, embedded in a Jiinja template. The “serverless app” was some Lambda functions and a DynamoDB table, but importantly… wasn’t a Serverless Framework or SAM app. AWS Proton looks like a recipe for wasted days to me, so I’ll be skipping that.

Andy Jassey said he wasn’t going to talk about much machine learning stuff, then proceeded to list a number of interesting launches in the space. AWS Glue Elastic Views sounds useful to me, a way of maintaining an aggregated view across data sources, while SageMaker Data Wrangler and SageMaker Feature Store sound like they might help deal with some of the workflow hassle around getting data into the right shape for machine learning training. I haven’t looked at it yet, but if you can collate your organisation’s data together with AWS Glue Elastic View, then extract useful attributes from that data to convert them into encoded “features” (e.g. by normalising values between 0 and 1, using various encoding categories (1 hot, frequency encoding) with SageMaker Data Wrangler, then allow the specific features to be stored in SageMaker Feature Store, where they can be exported into datasets used to automatically retrain models on changes, well, that sounds like a potentially great workflow.

I watched a few talks during week 1, but the key highlights were the launches for me.

My pick of the week for this week is “I didn’t know SAM could do that!”. I’m a heavy user of Serverless Framework, but I keep an eye on SAM to see how it’s coming along. I’m genuinely interested in what Ajar Nair (the product manager of AWS Lambda) will have to say in his talk “AWS re:Invent | Building revolutionary serverless applications”, who knows, maybe there will be even more launches…

Right, that’s enough for this week; it’s time to find out how you make a piña colada at home.