Rate Limiting — Token Bucket & Leaky Bucket


This piece follows Rate Limiting Algo — Sliding Window. It was originally published on Medium.
Where the story finally makes sense.
After Fixed Window fooled us and Sliding Window still let microbursts slip through, we realised something important:
The problem wasn’t counting. The problem was bursts.
Before choosing the next algorithm, we had to agree on what a burst actually is.
A burst is a short period where the request rate is much higher than the long-term average, but the total amount of work is still reasonable.
Examples:
Bursts are human behaviour, not abuse.
Any rate limiter that treats bursts as attacks will hurt user experience.
This is where Token Bucket enters.
Token Bucket is the first algorithm that truly understands how humans use systems.
Tokens = min(B, R × T)Each request consumes 1 token. If no token is available → request is rejected (429).
Let’s put numbers on it.
Example
Tokens = min(3000, 300 × 10)
Tokens = 3000Now the client sends 3000 requests as a burst.
What happens?
Why? Because total admitted work is still bounded.
At any point in time:
Total requests admitted ≤ B + R × TThis is the key insight.
No matter how bursty the traffic is:
This is exactly what backends need.
At this point, a natural question comes up:
“If the system is idle for 20 seconds and R = 300 RPS, does that mean we can now handle 6000 requests?”
The answer is no.
The key rule is this:
Tokens = min(B, R × T)The bucket capacity B is a hard upper bound.
No matter how long the system is idle:
Let’s revisit the example:
Now consider two idle periods:
Idle for 10 seconds
R × T = 300 × 10 = 3000
Tokens = min(3000, 3000) = 3000Idle for 20 seconds
R × T = 300 × 20 = 6000
Tokens = min(3000, 6000) = 3000Even after 20 seconds of inactivity, the system still allows only 3000 requests as a burst.
The bucket does not grow. The burst size remains bounded.
This upper bound is not a limitation — it’s the safety guarantee.
Because:
Which means:
Think of Token Bucket like a water tank with a fixed volume:
This is exactly why Token Bucket works so well in production.
Idle time increases burst allowance only up to the bucket size — never beyond it.
That single constraint is what makes Token Bucket both:
Without it, the algorithm would be unusable.
That’s why Token Bucket is the industry default at:
Token Bucket protects users.
Token Bucket allows bursts.
And some systems cannot tolerate bursts at all.
For those, we need a different guarantee.
Leaky Bucket enforces one simple rule:
No matter how many requests arrive, only N requests per second are forwarded.
Think of it as a queue with a controlled drain rate.
Unlike Token Bucket:
Leaky Bucket is typically placed:
Its job is not to protect UX.
Its job is to prevent sudden spikes from reaching sensitive systems.
Assume:
With Token Bucket
With Leaky Bucket
Or more bluntly:
Token Bucket protects users. Leaky Bucket protects systems.
Real systems don’t choose one.
They layer them.
Typical production setup:
This creates a buffer:
Once we added Token Bucket at the edge and Leaky Bucket near critical systems, the auth service finally behaved like a grown-up system.
And the 4 AM calls stopped.
In the next article, we’ll tie everything together:
Because rate limiting isn’t an algorithm choice — it’s a system design decision.