Protect yourself and your users

Opening the gates

This is it. Your shiny new product is ready to be released to the public. You worked hard for it, surely it will be a smash hit! A horde of users are impatiently waiting to try it out.

With a confident hand, you click the button. The service is live! Your analytics show that more and more users are coming in. Your advertising campaign and marketing were very effective.

As the numbers grow, you feel overwhelmed, just like your cloud infrastructure. They are simply too many. Your bug tracker starts spitting thousands of reports. Your monitoring goes from green to yellow, to red… You weren’t ready for such a huge traffic. You thought you were, but that wasn’t the case.

Solving the issue

No time to waste, you need to fix this. You can’t let your first user experience remain so miserable. You decide to just shut down the service temporarily while you work on this. But now, nobody can use it anymore. This doesn’t feel right.

💡 That’s right! You need to open the gate just a little. This way you let in an amount of traffic that is manageable for your infrastructure.

The remainder will at least be informed that there is no room left for them right now. This is better than getting a terribly slow and malfunctioning experience.

Rate limiting

What you just set up is called rate limiting. It is a crucial component of every system exposed to the internet.

The idea is simple: managing traffic by limiting the number of requests within a period. That can be something like “100 requests per minute” for example.

Use cases

Rate limiting solves several problems.

Stability: by limiting the load, your infrastructure’s stress is alleviated.Cost control: one should never auto-scale without limits. You would inevitably receive an invoice from your cloud provider that would single-handedly make you go bankrupt. You know what to expect when you put clear limits on your service usage.User experience and security: only abusive users will ever hit the rate limit if it’s configured properly. This way, honest users won’t have to suffer for a handful of malicious ones.Data control: it is not unusual for any service to be visited by bots or malicious actors trying to extract all the data they have access to. Rate limiting is a great way to hinder scrapers.


No solution is perfect. Rate limiting also has a fair share of drawbacks.

Complexity: behind this seemingly simple idea lies a ton of complexity. It is not that simple to set it up right. There are multiple policies you can use. You will have to carefully calculate and tweak the rate. Some applications also need to correctly handle request bursts.User experience: It is a double-edged sword. If a user legitimately reaches the limit, they will get frustrated. It is never fun to have to stop and wait when you’re very productive.Scaling up: rate limiting needs to be constantly monitored and tweaked as you scale up. Limits may need to be increased when new features are rolled out. Do you prefer scaling your infrastructure or trying to squeeze as many users as possible until it degrades?

With that said, I would like to insist that there is no perfect way to do rate limiting. You will have to make your own choices depending on your service and your business.

Going down the rabbit-hole

Things are getting interesting, but more and more complicated. Let’s dive into this rabbit hole and hopefully find out what will work best for you.

Proxy vs App rate limiting

Before anything, you should ask yourself at which level you need to setup rate limiting. There are two options here:

The proxy level allows you to rate limit users even before they actually hit your service. This is the most efficient approach when it comes to performance and security. Most cloud providers have built-in solutions to handle this for you.The application level allows for more fine-grained control over the quotas. You can make it vary depending on if the user is authenticated or not or if it has special permissions. This even allows you to potentially monetize your API by allowing paying customers a higher limit.

Opting for both solutions can be interesting. You would use the proxy level for DDoS protection and to avoid overloading your services. And you would use the application level alongside it where some business logic comes into play.


There are many different policies that can be used to calculate the users quotas. All of them have their use-cases, so it is again up to you to pick the one that works best for you. Without going to much into details, we will see the four most common rate limit policies.

Fixed window

This policy is the simplest: the rate limit is applied within a fixed time window. Everyone has request counter that is reset every few moments. If the counter exceeds the allowed quota, the request is rejected.

Its main drawback is that it cannot handle burst traffic at all. Imagine you have a quota of 100 requests per minute. When the counters reset for everyone, potentially all your users can send 100 requests all at once.

I can also be very rigid. If the window is too long, users may have to wait a long time before being able to send requests again. If the window is too short, the benefits of rate limiting are reduced.

Sliding window

This policy is an improvement over the fixed window. In short, it tracks the requests made by one user in the last moments rather than resetting all counters all at once.

Let’s say we have a window of 1 minute and a quota of 100 requests. If the user has performed less than 100 requests during the last 60 seconds, then the request is accepted. The window is therefore sliding continuously relative to the current time.

This policy is still very rigid but ensures smooth traffic.

Token bucket

The token bucket is a completely different approach. The idea is that every user has a bucket filled with a specified number of tokens. When performing a request, they take one token in the bucket. If the bucket is empty, the request is rejected. The bucket is refilled at a predefined rate until it’s full again.

This policy is great for handling short burst traffic and spikes with a long-term smooth rate limiting.

Leaky bucket

The leaky bucket works a bit like a funnel. Every user starts with an empty bucket that has a hole in the bottom. The hole is more or less wide, letting only a fixed amount of requests flow for a unit of time. As the requests come in, the bucket can fill up faster than it can drain. Eventually, the bucket is full and overflows: all new requests are rejected.

In the analogy, the width of the opening at the bottom represents the rate. The depth of the bucket represents the burst.

This policy is the most flexible of all four. It can be adjusted easily depending on the traffic and also smooths out the traffic flow.

HTTP standard

At the time of writing, the closest we have to a standard for rate limiting with HTTP is this expired IETF draft.

In short, this document defines a set of HTTP headers that can be used to inform the clients of their quotas and the policy used.

Unfortunately, it is hard to know where this is going or if this has been dropped completely. This is the best we’ve got, so let’s roll with it.


For our example, we will work at the application level. We will use Go, redis and the leaky bucket policy. To avoid having to implement the algorithm ourselves, we will use the go-redis/redis_rate library.

Why do we need redis?

Redis is a key/value store that we will use to store our user counters. In a distributed system that can scale up to a certain number of instance, you don’t want counters to be held individually by each instance. That would mean that the rate limiting is done by instance and not actually for your service, making it basically useless.

Rate limit service

Let’s start by implementing an agnostic service. This way, we can use it with any framework or library easily.

Let’s create a new ratelimit package, import our libraries, and setup the basis for our service:

// service/ratelimit/ratelimit.go
package ratelimit

import (
rate “”

type Service struct {
limiter *rate.Limiter
limit rate.Limit

func NewService(redisClient redis.UniversalClient, limit rate.Limit) *Service {
return &Service{
limiter: rate.NewLimiter(redisClient),
limit: limit,

Let’s then expose a simple Allow() method.

// Allow checks if the given client ID should be rate limited.
// If an error is returned, it should not prevent the user from accessing the service
// (fail-open principle).
func (s *Service) Allow(ctx context.Context, clientID string) (*rate.Result, error) {
return s.limiter.Allow(ctx, fmt.Sprintf(“client-id:%s”, clientID), s.limit)

We are using the fail-open principle. It is well suited for high-availability services, where it would be more detrimental to block all traffic than to potentially let it flow a bit too much.

For more resource-intensive operations, it would be smarter to use a fail-close approach to ensure the stability even if the rate limiting fails.

Now, we can also implement a simple method that would update the response headers according to the IETF draft previously mentioned.

// UpdateHeaders of the HTTP response according to the given result.
// The headers are set following this IETF draft (not yet standard):
func (s *Service) UpdateHeaders(headers http.Header, result *rate.Result) {

fmt.Sprintf(`%d;w=%.f;burst=%d;policy=”leaky bucket”`, result.Limit.Rate, math.Ceil(result.Limit.Period.Seconds()), result.Limit.Burst),


fmt.Sprintf(“%.f”, math.Ceil(result.ResetAfter.Seconds())),

Finally, we need to identify our users. For unauthenticated users, it can be tricky. Usually, you then rely on the client’s IP. It’s not perfect but sufficient most of the time.

// GetDefaultClientID returns the client IP retrieved from the X-Forwarded-For header.
func (s *Service) GetDefaultClientID(headers http.Header) string {
// X-Forwarded-For: <client-ip>,<load-balancer-ip>
// or
// X-Forwarded-For: <supplied-value>,<client-ip>,<load-balancer-ip>
// We only keep the client-ip.
parts := strings.Split(headers.Get(“X-Forwarded-For”), “,”)
clientIP := parts[0] if len(parts) > 2 {
clientIP = parts[len(parts)-2] }
return strings.TrimSpace(clientIP)

⚠️ This header format is the one used by Google Cloud load balancers. This can be different depending on your cloud provider.

We can now create an instance of our service like so:

opts := &redis.Options{
Addr: “”,
Password: “”,
DB: 0,
MaxRetries: -1, // Disable retry
redisClient := redis.NewClient(opts)
ratelimitService := ratelimit.NewService(redisClient, rate.PerMinute(200))

Of course, ideally we would not hardcode all those settings. Making them configurable with a config file or environment variables would be best. This is however out of the scope of this article.


Now that we are done with the rate limit service, we need to put it to use in a new middleware.

For this example, we are going to use the Goyave framework. This REST API framework provides a ton of useful packages and encourages the use of a strong layered architecture. We’ll take the blog example project as a starting point.

Registering our service

The first step is to add a name to our rate limit service.

// service/ratelimit/ratelimit.go
import “example-project/service”

func (*Service) Name() string {
return service.Ratelimit
}// service/service.go
package service

const (
Ratelimit = “ratelimit”

Then let’s register it in our server:

// main.go
func registerServices(server *goyave.Server) {
server.Logger.Info(“Registering services”)
opts := &redis.Options{
Addr: “”,
Password: “”,
DB: 0,
MaxRetries: -1, // Disable retry
redisClient := redis.NewClient(opts)
ratelimitService := ratelimit.NewService(redisClient, rate.PerMinute(200))

ℹ️ You can find the documentation explaining how services work in Goyave here.

Implementing the middleware

Let’s set up the basis for our middleware. We’ll first create a new interface that will be compatible with our rate limit service, and use it as a dependency of our middleware.

// http/middleware/ratelimit.go
package middleware

import (
rate “”
type RatelimitService interface {
Allow(ctx context.Context, clientID string) (*rate.Result, error)
GetDefaultClientID(headers http.Header) string
UpdateHeaders(headers http.Header, result *rate.Result)
type Ratelimit struct {
RatelimitService RatelimitService
func NewRatelimit(ratelimitService RatelimitService) *Ratelimit {
return &Ratelimit{
RatelimitService: ratelimitService,
func (m *Ratelimit) Init(server *goyave.Server) {
ratelimitService := server.Service(service.Ratelimit).(RatelimitService)
m.RatelimitService = ratelimitService

Now let’s implement the actual logic of our middleware. We want our authenticated users to have a quota of their own, and our guest users to be identified by their IP.

func (m *Ratelimit) getClientID(request *goyave.Request) string {
if u, ok := request.User.(*dto.InternalUser); ok && u != nil {
return strconv.FormatUint(uint64(u.ID), 10)

return m.RatelimitService.GetDefaultClientID(request.Header())

We just have the Handle() method left to implement:

import (

func (m *Ratelimit) Handle(next goyave.Handler) goyave.Handler {
return func(response *goyave.Response, request *goyave.Request) {
res, err := m.RatelimitService.Allow(request.Context(), m.getClientID(request))
if err != nil {
next(response, request)
return // Fail-open
m.RatelimitService.UpdateHeaders(response.Header(), res)
if res.Allowed == 0 {
next(response, request)

Finally, let’s add it as a global middleware, just after the authentication middleware.

// http/route/route.go

func Register(server *goyave.Server, router *goyave.Router) {

ℹ️ You can find the documentation explaining how middleware work in Goyave here.

One last thing

Wait! There is one problem with this. The rate limit middleware won’t be executed if the authentication fails. Let’s extend the auth.JWTAuthenticator to handle this case. We just have to make it implement auth.Unauthorizer. This interface allows custom authenticators to define a custom behavior when authentication fails. The idea is to execute the rate limit middleware even if the auth one blocks the request.

Let’s create a new custom authenticator that will use composition with auth.JWTAuthenticator:

// http/auth/jwt.go
package auth

import (
type JWTAuthenticator[T any] struct {
*auth.JWTAuthenticator[T] ratelimiter goyave.Middleware
func NewJWTAuthenticator[T any](userService auth.UserService[T], ratelimiter goyave.Middleware) *JWTAuthenticator[T] {
return &JWTAuthenticator[T]{
JWTAuthenticator: auth.NewJWTAuthenticator(userService),
ratelimiter: ratelimiter,
func (a *JWTAuthenticator[T]) OnUnauthorized(response *goyave.Response, request *goyave.Request, err error) {
a.ratelimiter.Handle(a.handleFailed(err))(response, request)
func (a *JWTAuthenticator[T]) handleFailed(err error) goyave.Handler {
return func(response *goyave.Response, _ *goyave.Request) {
response.JSON(http.StatusUnauthorized, map[string]string{“error”: err.Error()})

We now need to update our routes:

// http/route/route.go

import (
customauth “”

func Register(server *goyave.Server, router *goyave.Router) {
ratelimiter := middleware.NewRatelimit()
authenticator := customauth.NewJWTAuthenticator(userService, ratelimiter)
authMiddleware := auth.Middleware(authenticator)

ℹ️ You can find the documentation explaining how authenticators work in Goyave here.

Rate limit in action

We are done! Let’s test this.

Before that, we need to add a redis container in the docker-compose.yml:

image: redis:7
– ‘’

Start the application as explained in the README:

docker compose up -d
dbmate -u postgres://dbuser:secret@ -d ./database/migrations –no-dump-schema migrate
go run main.go -seed

Let’s query our server with our trusty friend curl:

curl -v http://localhost:8080/articles


HTTP/1.1 200 OK
Access-Control-Allow-Origin: *
Content-Type: application/json; charset=utf-8
Ratelimit-Limit: 200
Ratelimit-Policy: 200;w=60;burst=200;policy=”leaky bucket”
Ratelimit-Remaining: 198
Ratelimit-Reset: 1
Date: Thu, 27 Jun 2024 12:47:08 GMT
Transfer-Encoding: chunked


We can see our RateLimit headers. Success!


You can finally open the gates (a little) once more and let everyone enjoy your awesome new product without issues or slowdowns.

In the process, you learned all you need to know about in order to get started with rate limiting. Despite not being perfect, this solution is greatly effective! Don’t forget to closely monitor your services from now on, and make adjustments to your limits accordingly.

Check out the Goyave framework ! It can help you build better APIs faster in so many ways thanks to its many features such as routing, validation, localization, model mapping, and much, much more.

Let’s talk! Was this article useful to you? Do you have anything to add or correct? Or maybe you have an interesting experience to share with us. I’ll see you in the comments. Thank you for reading!

Why and how you should rate-limit your API was originally published in Level Up Coding on Medium, where people are continuing the conversation by highlighting and responding to this story.

​ Level Up Coding – Medium

about Infinite Loop Digital

We support businesses by identifying requirements and helping clients integrate AI seamlessly into their operations.

Gartner Digital Workplace Summit Generative Al

GenAI sessions:

  • 4 Use Cases for Generative AI and ChatGPT in the Digital Workplace
  • How the Power of Generative AI Will Transform Knowledge Management
  • The Perils and Promises of Microsoft 365 Copilot
  • How to Be the Generative AI Champion Your CIO and Organization Need
  • How to Shift Organizational Culture Today to Embrace Generative AI Tomorrow
  • Mitigate the Risks of Generative AI by Enhancing Your Information Governance
  • Cultivate Essential Skills for Collaborating With Artificial Intelligence
  • Ask the Expert: Microsoft 365 Copilot
  • Generative AI Across Digital Workplace Markets
10 – 11 June 2024

London, U.K.