Rate Limiting Algo — Fixed Window


This piece continues from Rate Limiting: Introduction. It was originally published on Medium.
Two days after the launch incident, things looked calm on the dashboards. CPU was steady. Latency was flat. No alerts.
So we did what most teams do at this stage — we added some rate limiting. Simple. Quick. Safe. Or so it seemed.
We chose the easiest option available: Fixed Window.
The idea behind Fixed Window rate limiting is straightforward:
Count the number of requests in a fixed time window and reject anything beyond the limit.
For example, when we say 100 RPM (Requests Per Minute), the system divides time into rigid windows:
Within each window, requests are counted independently.
At first glance, this feels reasonable. The math is simple. The implementation is trivial. The dashboards look clean.
Now consider what actually happens in real traffic.
A user sends:
From the rate limiter’s perspective:
From the system’s perspective:
We declared a limit of 100 RPM, yet the system just accepted 200 requests almost back-to-back.
The contract was technically respected. The intent was completely violated.
That burst doesn’t just affect the API layer.
It ripples outward:
What looks like “within limits” at the edge becomes overload inside the system.
This is how:
All without any single window ever exceeding its quota.
The core problem with Fixed Window is this:
It enforces limits on paper, not in reality.
Fixed windows ignore burst behaviour entirely.
Fixed Window rate limiting is:
And unsafe for any serious production API.
If your system has:
Then Fixed Window limits are not protection — they are false confidence.
The follow-up is Rate Limiting Algo — Sliding Window — how sliding windows try (and partially fail) to fix this exact problem.
Because once you’ve seen a system fail at the window boundary, you never forget it.