Building a Serverless Discord Bot on AWS

Discord is a pretty exciting platform. One of it’s biggest features is the ability to support bots on the platform, letting developers create some incredible tools for server moderators. In 2020, Discord introduced two major new features however, that don’t have a ton of example use cases around them yet: Slash Commands and Interactions Endpoints. These two things are incredibly powerful, and with today’s post I want to go ahead and take a deep dive in how you can use Interactions Endpoints to build an entirely serverless Discord bot.

If you don’t want to read through this and instead want to jump straight into the code, you can see the full repository for this here. Otherwise, read on for a full explanation of how the code works!

Why Serverless?

I’m not going to regurgitate what a serverless architecture is yet again (Google that if you are unfamiliar with it), but I do want to go over the benefits of it, especially in the context of building a Discord bot. Severless architectures generally provide two major benefits over traditional server architectures: High scalability and low costs. The low costs generally comes from the fact that you only pay for what you use, and the high scalability generally comes from the fact that you can more easily distribute your execution across multiple nodes.

In the case of a Discord bot, where the bot only needs to respond to specific commands executed at it, a serverless architecture is actually a perfect use case for such a setup. In fact, thanks to Amazon Web Services’ (AWS) generous free tier for Lambda, you can likely run your Discord bot for near free. I say “near” because you will still need to pay for API Gateway to allow for the calls into the Lambda, but this is still relatively cheap compared to hosting a continuously running server for your bot.

And again, you’re only charged for what you use. So if you get no calls to your bot for a month, then you’ll pay nothing at all!

Okay, what does the setup look like?

The setup for a serverless Discord bot is actually rather straightforward in AWS. Here’s a quick architecture diagram of what the bot will look like:

That looks simple enough, right?

As you can see, we have our Discord server, which hits our API Gateway and forwards to our specific endpoint, that endpoint then goes to our Lambda, which then… hits another Lambda? Wait, what’s with this “AWS Secrets Manager”? Don’t worry, all of this will make sense in due time.

Let’s start with the big question: Why do we have two Lambda’s in this setup? This is primarily due to a big challenge that Discord’s Interactions Endpoint gives us: We must respond to any requests within 3 seconds! That might not sound like a big deal, but the cold start times for Lambda’s can give a major challenge to this the first time. There’s no way to increase this time, and Discord seems to have no plans to do anything about this sadly.

You might be asking now “why can’t we just make our code even faster?”, and the problem is that just the cold start setup time and API Gateway latency alone will amount to nearly 1 second of that precious time wasted. You can see a comparison of various languages and their startup times here too in case you think a different language might help. Unfortunately, to do anything more intense then a simple “Hello World” response, we’re going to need to invoke another Lambda, after which we can then use Discord’s Web APIs to actually send our response to a user.

Okay, so what about this weird “AWS Secrets Manager”? Couldn’t we get rid of that to speed things up too? Why do we even need it? Sure, you could remove this and save about 500ms, but your request validation API will likely need even more speedups (TweetNacl, which I use, has about a 1 second piece of latency when validating the requests). This will also expose your Discord API keys in both your Lambda, as well as your CloudFormation Stacks. I would not recommend this. AWS Secrets Manager is a secure place to put all of your API Secrets, and will let us store the keys separately from the rest of our code. It also makes it easy to rotate the keys, should they ever become compromised.

Alright, just show me the code!

The code for this is all pretty straightforward actually, and I’ve gone ahead and broken it into a nice CDK Construct for you to use here if you don’t want to read through all of this. If you’re curious about how the code works however, read on!

Let’s start with our CDK Construct setup:

First, we create our Secrets to store all of our Discord API keys in, since this will need to be referenced in our Lambdas. We don’t populate them here however, so it’s important to remember that you will need to populate them before you can use your bot. This is to prevent your keys from leaking into your generated CloudFormation stacks. Next, we create our Lambda function, passing in the ARN for our command function to invoke as an environment variable. Finally, we create our API Gateway reference our Lambda, which is the bulk of our CDK code.

Discord expects our API Gateway to respond with one of two responses: Either a 200 for success, or a 401 for errors (specifically when validating a request). Thus, we can simply do a selection pattern within our Lambda’s response to determine which status code to return, passing through the rest of our response on successful results. We’ll also use a request template to nicely map our inputs for our Lambda. Very straightforward!

As you can see, we provide a Lambda as one of the inputs to our API Gateway endpoint, so let’s take a look at what the Lambda code looks like:

That’s really not a lot of code now, is it? We take in our event, validate it using TweetNacl, invoke our command Lambda (more on that later), and then finally return a response. Wait, weren’t we not returning a response since it’s so slow? This is where Discord gets a bit wonky: We’re returning a response with a type of 5, which basically says “hey, we’re going to update this response later, so just hang tight”. You can read all about the Interaction Response Types on Discord’s documentation.

You can also see that this handles ping requests by returning a quick pong response. In the case where an incoming response is invalid, we simply throw an exception based on our selection pattern we setup for our API Gateway. If you’re curious about how we handle getting the Discord secrets, you can see that here. Ultimately, this piece of code is pretty short, doesn’t have a lot to it, and will be significantly faster once it is warmed up.

So what about a simple command Lambda function? Here’s an example file for you to test with:

In this case, the main difference is that instead of returning our response, we send it off to Discord’s Web APIs. We can end up reusing the majority of the code for getting our Discord API keys too, which saves a bit of our coding too. With that, we can now deploy this and get a simple hello world response on our Discord server! Note that you’ll need to setup Slash Commands before this will work properly. You can get started reading about those here.

Why not just do X instead?

Let’s address a few things that might arise as questions here based on how I’ve chosen to do things:

  • Why not just use provisioned concurrency to keep your Lambda functions warm instead of having two of them?
  • Provisioned concurrency is not free, and will likely cost at least $10 USD/month to maintain. You’d be better off going with a cheap EC2 instance in this case. The solution above (assuming you have fewer than 500,000 requests per month since we’re using 2 Lambdas) will end up falling within AWS’ free tier in most cases.
  • Why use TypeScript instead of Python? Python has a shorter cold start time based on the link you shared!
  • Python has several other issues in my personal opinion, including more difficulties managing projects like this. The advantages to using TypeScript end-to-end is pretty large in my opinion, and is the main reason why I went with it (strong typing being another big one). While there’s no reason why this couldn’t be done in Python instead, you likely won’t see enough of a boost to fix the cold start time either.
  • Why not use library X instead of TweetNacl?
  • TweetNacl was what Discord’s APIs recommended, and just what I’m aware of. There’s likely a lot of optimizations to be made around the validation of requests specifically that would be of great benefit here, I’m just not familiar enough with how this works still to do so. This is something I want to investigate in the future, and might write about if I get the time to do so!
  • How much does this cost to run each month?
  • This is not an easy to answer question, as it will depend on your use cases, how often people use commands on the server, etc. For my own use cases (a server I moderate with ~500+ members with very few calls each month), the costs look to be well under $1 USD/month. That said, your mileage may vary, and this may change over time very rapidly.

Software engineer, hardware tinkerer, and a big fan of running. I write about whatever crosses my mind, but try to focus on business vs tech vs philosophy.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store