DeepStream’s infrastructure

At DeepStream, our cloud infrastructurethings like servers, databases, etcsit within Amazon Web Services (AWS). They’re easily the global leader for cloud infrastructure and their services power a significant chunk of the internet.

So far, we’ve taken a hybrid approach to our cloud infrastructure. Meaning some of our system sits on servers that run 24/7 (charged per hour) and the rest of our system is “serverless” (charged on an as-used basis).

The term “serverless” is somewhat of a misnomer because servers are still used in a serverless system. The difference is that our engineering team doesn’t manage the servers directly. These nitty-gritty details are hidden by AWS which allows us to focus on writing code (fun) instead of managing servers (not fun).

A defining characteristic of serverless infrastructure is its resilience under load. Using Lambda AWS’s core serverless servicewe can configure a small unit of code (called a “function”) to be trivially scaled up and down directly in correlation to demand, and we only get charged for what gets used.

Downsides

It’s not all perfect. A problem with serverless infrastructure is that there’s a lot of “AWS stuff” (aka “Resources”) to manage. Initially, that might sound like we’re just moving the complexity of managing servers somewhere else without much benefit, but I’ll clarify that later.

The serverless part of our system is for “ephemeral jobs.” These are pieces of code (using Lambda functions) that do “ephemeral” things like send emails or trigger in-app notifications. Most of these jobs are event-driven (using SNS) and buffered with a queue (using SQS). If the job can’t complete without failing, we move the message to a failure queue (using SQS again) which is monitored by an alarm (using CloudWatch) and that notifies the engineering team (using SNS).

Here’s an example of what a single job might look like:

SNS Subscription → SQS Queue → Lambda Function ( –failure→ SQS Failure Queue → CloudWatch Alarm)

We currently have 63 jobs per environment (and growing) and 4 environments. So, you do the math. It’s a lot of AWS stuff to keep track of. So much so, that any manual management is not an option.

not an option

Introducing the aws-cdk

In the middle of 2019, AWS released a very exciting open source project: AWS Cloud Development Kit (also known as the aws-cdk). The official documentation defines the project as:

a software development framework for defining cloud infrastructure in code and provisioning it through AWS CloudFormation.

CloudFormation is a service which allows the definition of AWS resources (eg: servers, functions, etc) in a template file. As the file is updated, CloudFormation will automatically spin-up/update/tear-down the appropriate resources. This sounds great (and it is) but unfortunately, CloudFormation templates are super-not-fun to work with. They’re verbose, difficult to read, and ultimately error-prone. These issues are exacerbated by a relatively slow feedback loop in CloudFormationone mistake can cost you a lot of time.

The aws-cdk abstracts away the process of creating CloudFormation templates and it is available in many common languages including ones we use at DeepStream: TypeScript and JavaScript. The aws-cdk allows us to define groupings of resources (called “constructs”) in familiar languages, and since these groupings are defined in code, it means we can parameterize them like any other piece of code.

In other words, we can parameterize our infrastructure code with our application code.

The power of this might not be totally apparent, so allow me to elaborate with…

Some code

Note: For the technical people, you’ll probably realize that I’m skipping a lot of details and doing a lot of “hand-waving” here… maybe I’ll do a deeper dive another time.

To be able to dispatch an event (eg: when a request is sent, we want to dispatch a “request sent” event) we first define the possible events with a data structure:

enum EventType {
REQUEST_SENT = 'requestSent',
}

And in our API we would use the above structure to dispatch the event with the corresponding id of the sent request:

events.dispatch(EventType.REQUEST_SENT, { requestId });

To email the recipients of the request, we define a job to take the request’s id and get the corresponding request in our database, generate some emails for the recipients of the request, and send them off using an email client:

job({
name: 'email-requestReceived',
 
subscriptions: [
EventType.REQUEST_SENT,
],
 
async start({ requestId }, { emailClient }) {
const request = await db.requests.findById(requestId);
const emails = request.recipients.map(generateRequestReceivedEmail);
return emailClient.sendEmails(emails);
},
})

How nice is that?

Using the aws-cdk

So. We’ve written some code, but none of the required AWS resources exist yet to glue this all together. This is where we use aws-cdk to create two constructs: 

  • An Events construct which takes the EventType structure and generates:
    •  SNS Topic per event
  • A Job construct which takes in a job definition and generates:
    • SNS Subscription(s) to the SNS Topics for the event(s) in the subscriptions field
    • SQS Queue
    • Lambda Function that runs the start() function
    • SQS Queue (for failures)
    • CloudWatch Alarm (for monitoring failures)

On deployment of our application, we use these constructs to generate the description of all the necessary resources, and then AWS handles the setup of all of it. The infrastructure becomes a deployment artifact.

Need a new event? Extend the EventType structure and redeploy.

Need a new set of emails to go out? Write a new job definition and redeploy.

It’s completely automatic and requires no thought to go into infrastructure once the aws-cdk constructs are written. We get a totally automated deployment of a cost-effective, highly resilient, and easily monitored system. 

Find out more about DeepStream and how it can revolutionise your tender process here.