What are the standard HTTP headers for rate limiting?

The most commonly used headers are: X-RateLimit-Limit (maximum allowed requests), X-RateLimit-Remaining (remaining requests in the current window), X-RateLimit-Reset (timestamp when the window resets) and Retry-After (seconds to wait on a 429 response).

How do I implement rate limiting in a distributed system?

Use a shared state store such as Redis with atomic increment operations. Each API instance checks and increments the counter in the same Redis instance. This guarantees that limits are correctly enforced regardless of which instance handles the request. Account for the additional latency of the network roundtrip.

API Rate Limiting Template - Free Design & Implementation Guide

Design an effective rate limiting strategy for your API with this free template. Covers per-tier limits, throttling algorithms, response headers and monitoring setup.

Rate limiting is an essential part of every API design. Without limits a single client, whether a buggy integration or a malicious attacker, can overload your API and disrupt service for all users. This template provides a structured approach to designing a rate limiting strategy that protects your API without blocking legitimate usage. The document guides you through classifying API endpoints by resource intensity, defining limit thresholds per subscription tier, choosing the right throttling algorithm and designing informative responses that help clients adjust their consumption. The template includes sections for the most common algorithms: fixed window, sliding window, token bucket and leaky bucket, with a comparison of the pros and cons of each. Additionally, the document covers the proper HTTP response headers (X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset, Retry-After) that inform clients about their current consumption and remaining quota. The template also contains sections for monitoring and alerting so you detect consumption spikes early, and for communicating limit changes to API consumers via changelogs and deprecation notices. Additionally, the document covers the integration of rate limiting with your existing API gateway or reverse proxy, and provides guidelines for load testing the rate limiter to verify that limits are correctly enforced without the rate limiter itself becoming a bottleneck in the request path.

Variations

Token Bucket Rate Limiter

Flexible algorithm that adds a fixed number of tokens per time unit to a bucket. Each request consumes a token. Allows short bursts as long as tokens are available, making it more tolerant of irregular traffic patterns than fixed window.

Best for: Suited for APIs with variable traffic behaviour where short peaks are acceptable, such as public APIs for mobile apps or integration APIs supporting batch processing.

Sliding Window Rate Limiter

Combines the simplicity of fixed window with a more accurate distribution. Calculates consumption as a weighted average of the current and previous window, solving the boundary problem of fixed windows.

Best for: Ideal for APIs that need a predictable and fair distribution pattern without the implementation complexity of a token bucket.

Tiered Rate Limiting

Different limits per subscription level (Free, Basic, Pro, Enterprise). Each tier has its own quotas for requests per minute, per hour and per day, plus separate limits for resource-intensive endpoints.

Best for: Perfect for SaaS APIs with a freemium model where rate limits serve as a differentiator between subscription tiers and as an incentive for upgrades.

Per-endpoint Rate Limiting

Endpoints receive individual limits based on their resource intensity. Read-intensive endpoints (GET) get broader limits than write-intensive endpoints (POST, PUT, DELETE) or compute-intensive endpoints.

Best for: Suited for APIs with widely varying endpoint costs, where a global limit would mean cheap endpoints are unnecessarily restricted or expensive endpoints are insufficiently protected.

Distributed Rate Limiting

Rate limiting across multiple API instances with a shared state store (Redis, Memcached). Ensures limits are consistently enforced regardless of which instance handles the request.

Best for: Necessary for APIs running behind a load balancer with multiple instances, where per-instance rate limiting would lead to inaccurate limits and potential quota overruns.

How to use

Step 1: Inventory all API endpoints and classify them by resource intensity: light (simple reads), medium (database queries, moderate logic) and heavy (complex calculations, external API calls, file processing). Step 2: Define the user segments and their subscription tiers. Determine per segment what consumption pattern is realistic and which limit thresholds are both fair and protective. Step 3: Choose the rate limiting algorithm that fits your usage pattern. Token bucket for APIs with bursts, sliding window for steady traffic, fixed window for simple implementation. Step 4: Set the concrete limits per segment and per endpoint category. Document the limits in a clear table with requests per minute, per hour and per day. Step 5: Design the HTTP response for exceeded limits. Return status code 429 (Too Many Requests) with a JSON body communicating the limit, current consumption and reset time. Include the standard rate limit headers. Step 6: Implement informative response headers on all requests (not just when limits are exceeded): X-RateLimit-Limit, X-RateLimit-Remaining and X-RateLimit-Reset. This enables clients to manage their consumption proactively. Step 7: Define the behaviour when limits are exceeded: hard blocking (request is rejected) or soft degradation (request is delayed or processed with lower priority). Document the choice and the rationale. Step 8: Set up monitoring and alerting. Define dashboards showing consumption per client, per endpoint and per segment. Configure alerts for clients repeatedly hitting their limits and for unusual spikes that may indicate abuse. Step 9: Document the rate limits in your API documentation. Describe the limits per endpoint, the response on exceeding them and recommendations for clients to optimise their consumption (caching, batching, exponential backoff). Step 10: Plan a communication strategy for limit changes. Inform API consumers at least four weeks in advance via changelogs, email and deprecation headers. Step 11: Load test the rate limiter to verify that limits are correctly enforced and the system remains stable during traffic spikes. Step 12: Evaluate the limits periodically based on monitoring data and adjust them when usage patterns change or new subscription tiers are introduced.

How MG Software can help

At MG Software we design and implement API rate limiting strategies that protect your system without compromising the user experience. Our API engineers map your current consumption patterns, choose the right algorithm and implement it in your existing infrastructure. We have experience with rate limiting in high-traffic environments and with distributed rate limiting behind load balancers. Additionally, we help design the developer experience around rate limiting: clear documentation, informative response headers and monitoring dashboards. Our implementation also includes setting up automated tests that verify limits are correctly enforced, and configuring alerting so you are immediately informed when clients consistently hit their limits or when unusual patterns indicate potential attacks. This keeps your API reliable and scalable, even during unexpected traffic spikes.

Frequently asked questions

Token bucket is the most flexible option and allows short bursts. Sliding window offers better distribution than fixed window without the complexity of token bucket. Fixed window is the simplest to implement but suffers from boundary problems. Choose based on your traffic pattern and implementation complexity. In practice token bucket is the most commonly chosen option for production APIs due to its balance between flexibility and predictability.

Analyse the current consumption pattern of your API: average requests per second, peaks and the distribution per client. Set limits generous enough for normal use with a buffer for peaks, but tight enough to prevent abuse. Monitor after launch and adjust based on actual behaviour. Start conservatively with broader limits and gradually tighten them as you gain more insight into the normal consumption pattern of your clients.

Preferably per API key because it is more accurate and tied to the user subscription. IP-based rate limiting is additionally useful for unauthenticated endpoints and as protection against brute force attacks, but is unreliable as the sole method due to shared IP addresses and NAT. Consider a combination of both: API key as the primary limit and IP address as an additional protection layer for endpoints that do not require authentication.

Document the limits in your API documentation with concrete numbers per endpoint and per subscription tier. Return informative headers on every response so clients can monitor their consumption. Proactively notify customers who consistently hit their limits. Communicate limit changes at least four weeks in advance through multiple channels: API changelogs, email and deprecation headers in the responses themselves.

Minimally, if implemented correctly. An in-memory rate limiter (such as Redis) typically adds less than one millisecond of latency per request. Ensure the rate limiter does not become the bottleneck by provisioning sufficient capacity in the state store and applying connection pooling. Load test the rate limiter under production-like traffic to verify the additional latency remains acceptable during peak traffic and that the state store can handle the load.

Want this implemented right away?

We set it up for you, production-ready.

Get in touch

What is API Security? A Complete Guide to Protecting Your Endpoints

API security guards against injection, broken authentication, and overload. Learn how input validation, rate limiting, OAuth 2.0, and the OWASP API Security Top 10 protect your endpoints and data from common attacks and breaches.

Functional Design Document Template - Free Download & Guide

Write a professional functional design document covering use cases, wireframes and acceptance criteria. Free FDD template with step-by-step instructions.

Software Requirements Specification (SRS) Template - Free Download

Capture every software requirement following IEEE 830. Free SRS template with functional and non-functional requirements, use cases, and traceability matrix.

API Documentation Template - Write Professional API Docs

Help developers make their first API call in five minutes. Template with endpoints, authentication, error codes, rate limits and getting started guide.

From our blog

Securing Your Business Software: The Essentials

Sidney · 8 min read

OpenClaw: The Open-Source AI Assistant That Took Over GitHub in Weeks

Sidney · 8 min read

OpenAI Codex Security: AI-Powered Vulnerability Scanning That Found 11,000 Critical Bugs in Beta

Sidney · 7 min read

Variations

Token Bucket Rate Limiter

Best for: Suited for APIs with variable traffic behaviour where short peaks are acceptable, such as public APIs for mobile apps or integration APIs supporting batch processing.

Sliding Window Rate Limiter

Best for: Ideal for APIs that need a predictable and fair distribution pattern without the implementation complexity of a token bucket.

Tiered Rate Limiting

Best for: Perfect for SaaS APIs with a freemium model where rate limits serve as a differentiator between subscription tiers and as an incentive for upgrades.

Per-endpoint Rate Limiting

Best for: Suited for APIs with widely varying endpoint costs, where a global limit would mean cheap endpoints are unnecessarily restricted or expensive endpoints are insufficiently protected.

Distributed Rate Limiting

Rate limiting across multiple API instances with a shared state store (Redis, Memcached). Ensures limits are consistently enforced regardless of which instance handles the request.

Best for: Necessary for APIs running behind a load balancer with multiple instances, where per-instance rate limiting would lead to inaccurate limits and potential quota overruns.

How to use

How MG Software can help

Frequently asked questions

API Rate Limiting Template - Free Design & Implementation Guide

Variations

Token Bucket Rate Limiter

Sliding Window Rate Limiter

Tiered Rate Limiting

Per-endpoint Rate Limiting

Distributed Rate Limiting

How to use

How MG Software can help

Frequently asked questions

Want this implemented right away?

Related articles

From our blog

API Rate Limiting Template - Free Design & Implementation Guide

Variations

Token Bucket Rate Limiter

Sliding Window Rate Limiter

Tiered Rate Limiting

Per-endpoint Rate Limiting

Distributed Rate Limiting

How to use

How MG Software can help

Frequently asked questions

Want this implemented right away?

Related articles

From our blog