Rate Limiter
The Rate Limiter provides per-user request throttling using the token bucket algorithm. It’s designed to protect API endpoints from abuse while maintaining good user experience through gradual token refill.
Purpose
Rate limiting prevents users from overwhelming the system with too many requests. Unlike simple fixed-window counters, the token bucket algorithm allows for burst traffic while maintaining average rate limits over time.
How It Works
Token Bucket Algorithm
Each user gets a virtual “bucket” of tokens stored in Redis:
- Capacity: Maximum tokens the bucket can hold
- Refill Rate: Tokens added per second
- Cost: Tokens consumed per request
- TTL: Bucket expiration time (for cleanup of inactive users)
When a request arrives, the system checks if enough tokens are available. If yes, tokens are consumed and the request proceeds. If no, the request is rejected with RESOURCE_EXHAUSTED status.
Tokens gradually refill over time based on the refill rate, allowing users to make requests again after waiting.
Per-User Enforcement
Rate limits are enforced per-user, identified by their user ID from the authentication middleware. Each API endpoint can have its own rate limit configuration with a unique identifier.
Bucket keys in Redis follow the pattern: rate_limit:API[{api_name}]-{user_id}
Integration
As Middleware
RateLimitLayer is a Tower middleware that wraps individual gRPC service methods. Apply it to specific endpoints that need rate limiting:
use shield::rpc::middleware::{RateLimitLayer, RateLimitConfig};
use std::time::Duration;
let rate_limit = RateLimitLayer::new(
redis_connection,
RateLimitConfig {
api_name: "CreateOrder",
capacity: 10.0,
refill: 1.0, // 1 token per second
cost: 1.0,
ttl: Duration::from_secs(3600),
}
);
let service = OrderServiceServer::new(order_service)
.layer(rate_limit);
Configuration Parameters
- api_name: Unique identifier for the rate limit bucket (used in Redis key)
- capacity: Maximum burst size (how many requests can be made immediately)
- refill: Token recovery rate in tokens per second
- cost: Tokens consumed per request (usually 1.0, but can be higher for expensive operations)
- ttl: How long to keep inactive buckets in Redis
Tuning Guidelines
High-frequency, lightweight endpoints:
- Capacity: 30-100
- Refill: 10-30 tokens/second
- Cost: 1.0
Moderate-frequency endpoints:
- Capacity: 10-20
- Refill: 1-5 tokens/second
- Cost: 1.0
Expensive operations (e.g., report generation):
- Capacity: 3-5
- Refill: 0.1-0.5 tokens/second
- Cost: 1.0
Architecture
Redis-Backed State
Rate limit state is stored in Redis hash structures with two fields:
tokens: Current token count (float)last_refill: Last refill timestamp (unix timestamp)
A Lua script executes atomically on Redis to:
- Calculate elapsed time since last refill
- Add refilled tokens (capped at capacity)
- Check if enough tokens available
- Consume tokens if allowed
- Update state and TTL
This ensures thread-safe, distributed rate limiting across multiple server instances.
Authentication Dependency
The middleware requires UserId to be present in the request extensions, which is set by the authentication middleware. Therefore:
- Rate limiting middleware must be applied after authentication middleware in the layer stack
- Unauthenticated requests will be rejected before rate limiting is evaluated
- Anonymous/public endpoints cannot use this middleware (use hashcash CAPTCHA instead)
Error Handling
When rate limit is exceeded, the middleware returns:
- Status:
RESOURCE_EXHAUSTED - Message:
"Rate limit exceeded"
Frontend should handle this gracefully by:
- Displaying user-friendly messages
- Implementing exponential backoff for retries
- Showing remaining quota if applicable (requires separate API)
When to Use Rate Limiting
Good use cases:
- Order creation and payment endpoints
- Data export and report generation
- Account modification operations
- Support ticket creation
When NOT to use:
- Read-only queries (unless very expensive)
- Authentication endpoints (use hashcash instead)
- Static content delivery
- Public announcement viewing
Implementation Notes
- Uses the Processor pattern for the core rate limiting logic
- Middleware integrates with Tower/tonic service stack
- All Redis operations are atomic via Lua scripts
- Float precision is used for tokens to support fractional costs and refill rates
- TTL prevents memory leaks from abandoned user sessions