system-design rate-limiting distributed-systems scalability backend sre

Rate Limiting — Token Bucket & Leaky Bucket

February 5, 20266 min read

Rate Limiting — Token Bucket & Leaky Bucket | Anil Gurindapalli

This piece follows Rate Limiting Algo — Sliding Window. It was originally published on Medium.

Where the story finally makes sense.

After Fixed Window fooled us and Sliding Window still let microbursts slip through, we realised something important:

The problem wasn’t counting. The problem was bursts.

Before choosing the next algorithm, we had to agree on what a burst actually is.

What Is a Burst (Really)?

A burst is a short period where the request rate is much higher than the long-term average, but the total amount of work is still reasonable.

Examples:

A user double-clicking a button
A mobile app retrying after reconnect
Thousands of users opening the app right after a campaign notification

Bursts are human behaviour, not abuse.

Any rate limiter that treats bursts as attacks will hurt user experience.

This is where Token Bucket enters.

Token Bucket (Industry Standard)

Token Bucket is the first algorithm that truly understands how humans use systems.

The Model

B → Bucket capacity (max tokens)
R → Refill rate (tokens added per second)
T → Idle time
Tokens available:

bytes

Tokens = min(B, R × T)

Each request consumes 1 token. If no token is available → request is rejected (429).

Why Token Bucket Works

Let’s put numbers on it.

Example

Bucket capacity B = 3000
Refill rate R = 300 RPS
Service idle for T = 10 seconds

bytes

Tokens = min(3000, 300 × 10)
Tokens = 3000

Now the client sends 3000 requests as a burst.

What happens?

All requests are accepted
No failures
Backend remains stable

Why? Because total admitted work is still bounded.

The Real Contract Token Bucket Enforces

At any point in time:

bytes

Total requests admitted ≤ B + R × T

This is the key insight.

No matter how bursty the traffic is:

You may see spikes
You may see short overloads
But the total work is always bounded

This is exactly what backends need.

A Common Misunderstanding About Token Bucket (Important)

At this point, a natural question comes up:

“If the system is idle for 20 seconds and R = 300 RPS, does that mean we can now handle 6000 requests?”

The answer is no.

The Bucket Has a Hard Ceiling

The key rule is this:

bytes

Tokens = min(B, R × T)

The bucket capacity B is a hard upper bound.

No matter how long the system is idle:

Tokens never exceed B
Unused capacity does not accumulate forever

Why This Boundary Matters

Let’s revisit the example:

Bucket capacity B = 3000
Refill rate R = 300 RPS

Now consider two idle periods:

Idle for 10 seconds

bytes

R × T = 300 × 10 = 3000
Tokens = min(3000, 3000) = 3000

Idle for 20 seconds

bytes

R × T = 300 × 20 = 6000
Tokens = min(3000, 6000) = 3000

Even after 20 seconds of inactivity, the system still allows only 3000 requests as a burst.

The bucket does not grow. The burst size remains bounded.

Why Token Bucket Is Safe Because of This

This upper bound is not a limitation — it’s the safety guarantee.

Because:

Bursts are allowed
But bursts are capped
And total admitted work is always bounded

Which means:

No unbounded memory growth
No surprise overload after long idle periods
No “credit accumulation” bugs

The Correct Mental Model

Think of Token Bucket like a water tank with a fixed volume:

Water flows in at rate R
The tank holds at most B
If it’s full, extra water is discarded
You can drain it quickly — but only up to its size

This is exactly why Token Bucket works so well in production.

Final Clarification

Idle time increases burst allowance only up to the bucket size — never beyond it.

That single constraint is what makes Token Bucket both:

User-friendly
And system-safe

Without it, the algorithm would be unusable.

What Token Bucket Optimises For

Absorbs short bursts naturally
Matches human behaviour
Preserves user experience
Prevents retry storms
Fails only under sustained overload

That’s why Token Bucket is the industry default at:

API gateways
CDNs
Edge rate limiters
Auth, search, feed services

Token Bucket protects users.

But There’s a Catch

Token Bucket allows bursts.

And some systems cannot tolerate bursts at all.

Databases
Payment providers
Kafka brokers
External APIs like Stripe, etc.

For those, we need a different guarantee.

Leaky Bucket (System Protector)

Leaky Bucket enforces one simple rule:

No matter how many requests arrive, only N requests per second are forwarded.

Think of it as a queue with a controlled drain rate.

How Leaky Bucket Works

Incoming requests enter a queue
Requests are released at a fixed rate (N/sec)
If the queue is full → requests are dropped or delayed

Unlike Token Bucket:

Bursts are not absorbed
Traffic is smoothed

Where Leaky Bucket Is Used

Leaky Bucket is typically placed:

Before databases
Before payment gateways (Stripe, Razorpay)
Before Kafka / message brokers
Before any fragile or stateful system

Its job is not to protect UX.

Its job is to prevent sudden spikes from reaching sensitive systems.

Concrete Example

Assume:

DB can safely handle 50 QPS
Incoming traffic spikes to 500 QPS

With Token Bucket

Burst passes through
DB connection pool explodes
Latency spikes
System destabilises

With Leaky Bucket

Only 50 QPS reaches DB
Remaining requests wait or drop
DB stays healthy
System survives

The Key Difference (Remember This)

Token Bucket allows bursts, enforces averages
Leaky Bucket enforces a smooth, constant rate

Or more bluntly:

Token Bucket protects users. Leaky Bucket protects systems.

What Big Systems Actually Do

Real systems don’t choose one.

They layer them.

Typical production setup:

Edge / Gateway: Token Bucket (UX-first)
Service boundary: Token Bucket + per-user limits
Before DB / external APIs: Leaky Bucket (stability-first)

This creates a buffer:

Humans get a smooth experience
Systems never see chaos

Final Verdict

Fixed Window → simple but dangerous
Sliding Window → fair but burst-blind
Token Bucket → UX-friendly, bounded work
Leaky Bucket → stability-first, burst-hostile

Once we added Token Bucket at the edge and Leaky Bucket near critical systems, the auth service finally behaved like a grown-up system.

And the 4 AM calls stopped.

In the next article, we’ll tie everything together:

How to choose algorithms based on invariants
Where exactly to place each limiter
And how misconfiguration causes outages

Because rate limiting isn’t an algorithm choice — it’s a system design decision.

#What Is a Burst (Really)?

#Token Bucket (Industry Standard)

#The Model

#Why Token Bucket Works

#The Real Contract Token Bucket Enforces

#A Common Misunderstanding About Token Bucket (Important)

#The Bucket Has a Hard Ceiling

#Why This Boundary Matters

#Why Token Bucket Is Safe Because of This

#The Correct Mental Model

#Final Clarification

#What Token Bucket Optimises For

#But There’s a Catch

#Leaky Bucket (System Protector)

#How Leaky Bucket Works

#Where Leaky Bucket Is Used

#Concrete Example

#The Key Difference (Remember This)

#What Big Systems Actually Do

#Final Verdict

What Is a Burst (Really)?

Token Bucket (Industry Standard)

The Model

Why Token Bucket Works

The Real Contract Token Bucket Enforces

A Common Misunderstanding About Token Bucket (Important)

The Bucket Has a Hard Ceiling

Why This Boundary Matters

Why Token Bucket Is Safe Because of This

The Correct Mental Model

Final Clarification

What Token Bucket Optimises For

But There’s a Catch

Leaky Bucket (System Protector)

How Leaky Bucket Works

Where Leaky Bucket Is Used

Concrete Example

The Key Difference (Remember This)

What Big Systems Actually Do

Final Verdict