Mastering API Throttling in Java: Build a Smart, Tier-Aware Rate Limiter with Quarkus and Bucket4j
From FREE to PRO tenants. Learn how to dynamically control access to your APIs using custom annotations and the full power of Bucket4j in a Quarkus application.
Rate limiting is one of those things you don’t think about until your API gets hammered. Whether you’re building a public SaaS platform, offering free developer access, or protecting resource-heavy operations in a multi-tenant backend, rate limiting is essential. Without it, a single misbehaving client, or a coordinated attack, can drag your system to its knees.
API abuse is real. From noisy neighbors on a shared tier to overzealous bots hammering endpoints—an unguarded API can spell disaster for stability and cost. That’s where rate limiting shines, especially in multi-tenant SaaS systems where fairness and tier-based restrictions are essential.
In this hands-on guide, we’ll build CloudMetrics, a pretend SaaS API offering performance metrics and reports. Along the way, we’ll add intelligent rate-limiting using both the Bucket4j Quarkus extension and a more dynamic version directly using Bucket4j APIs, powered by the blazing-fast Quarkus framework.
Let’s go from basic global throttling to tenant-aware logic and subscription-tier enforcement.
What You’ll Learn
How to set up a Quarkus REST API with Bucket4j
How to apply global rate limits
How to implement per-tenant rate limiting using request headers
How to build a dynamic, tier-based rate limit system (Free vs. Pro)
How to combine multiple limits (burst + sustained)
Prerequisites
Java 17+
Apache Maven 3.8+
Your favorite IDE (IntelliJ IDEA, VS Code…)
curl
or Postman for testing
And if you like, you can go directly to my Github repository and download the working example.
Bootstrap the Quarkus Project
Start by creating a fresh Quarkus project with REST and Bucket4j support.
mvn io.quarkus.platform:quarkus-maven-plugin:create \
-DprojectGroupId=com.cloudmetrics \
-DprojectArtifactId=cloud-metrics \
-DclassName="com.cloudmetrics.api.MetricsResource" \
-Dpath="/metrics" \
-Dextensions="rest-jackson,quarkus-bucket4j"
cd cloud-metrics
Add a Simple CloudMetrics API
First, let's create a few endpoints for our CloudMetrics service. We won't apply any rate limiting just yet. Replace the generated MetricsResource.java
with the following code.
package com.cloudmetrics.api;
import jakarta.ws.rs.GET;
import jakarta.ws.rs.Path;
import jakarta.ws.rs.Produces;
import jakarta.ws.rs.core.MediaType;
@Path("/api/v1")
@Produces(MediaType.TEXT_PLAIN)
public class MetricsResource {
@GET
@Path("/metrics/cpu")
public String getCpuMetrics() {
return "CPU usage: 42%";
}
@GET
@Path("/metrics/memory")
public String getMemoryMetrics() {
return "Memory usage: 58%";
}
@GET
@Path("/reports/generate")
public String generateReport() {
// This is a more "expensive" operation
return "Report successfully generated for tenant.";
}
}
Start the dev mode:
./mvnw quarkus:dev
Now test it:
curl http://localhost:8080/api/v1/metrics/cpu
# Output: CPU usage: 42%
Nice. Let’s rate limit this thing.
Global Rate Limit
Our CPU metrics endpoint is very lightweight. Let's apply a simple, global rate limit to it: no more than 5 requests every 10 seconds for everyone combined.
In MetricsResource.java
, annotate the method:
import io.quarkiverse.bucket4j.runtime.RateLimited;
@GET
@Path("/metrics/cpu")
@RateLimited(bucket = "cpu-metrics-limit")
public String getCpuMetrics() {
return "CPU usage: 42%";
}
Now, define the bucket named cpu-metrics-limit in your application.properties
file.
quarkus.rate-limiter.buckets.cpu-metrics-limit.limits[0].permitted-uses=5
quarkus.rate-limiter.buckets.cpu-metrics-limit.limits[0].period=10S
permitted-uses=5: The bucket holds 5 tokens.
period=10S: The bucket is refilled with 5 tokens every 10 seconds.
Try hammering it:
Quarkus dev mode will automatically hot-reload the changes. Run the following loop in your terminal. You'll see the first 5 requests succeed:
for i in {1..7}; do curl -i http://localhost:8080/api/v1/metrics/cpu; echo; sleep 0.5; done
but the 6th will fail with a 429 Too Many Requests error.
HTTP/1.1 429 Too Many Requests
Retry-After: 0
content-length: 0
Perfect. Let’s get smarter.
Per-Tenant Rate Limiting
In a SaaS world, you don’t punish one tenant for another’s traffic. Let's isolate tenants using a custom header: X-Tenant-ID
.
We'll rate limit the /metrics/memory endpoint. Each tenant should get 3 requests every 20 seconds.
Create an IdentityKeyResolver TenantIdResolver.java
package com.cloudmetrics.api;
import io.quarkiverse.bucket4j.runtime.resolver.IdentityResolver;
import io.vertx.ext.web.RoutingContext;
import jakarta.enterprise.context.ApplicationScoped;
import jakarta.inject.Inject;
@ApplicationScoped
public class TenantIdResolver implements IdentityResolver {
@Inject
RoutingContext context;
@Override
public String getIdentityKey() {
// Extract the tenant ID from the custom header
String tenantId = context.request().getHeader("X-Tenant-ID");
// If the header is not present, we can deny the request
// or assign a default "anonymous" bucket. Here we deny.
if (tenantId == null || tenantId.isBlank()) {
// This will cause a 403 Forbidden because no key is resolved.
return null;
}
return tenantId;
}
}
Update the memory metrics endpoint:
@GET
@Path("/metrics/memory")
@RateLimited(bucket = "memory-metrics-limit", identityKey = TenantIdResolver.class)
public String getMemoryMetrics() {
return "Memory usage: 58%";
}
And configure per-tenant limits:
# Per-tenant limit for the memory metrics endpoint
quarkus.rate-limiter.buckets.memory-metrics-limit.limits[0].permitted-uses=3
quarkus.rate-limiter.buckets.memory-metrics-limit.limits[0].period=20S
Now, we'll make requests for two different tenants: tenant-a and tenant-b.
for i in {1..4}; do curl -i -H "X-Tenant-ID: tenant-a" http://localhost:8080/api/v1/metrics/memory; echo; done
curl -i -H "X-Tenant-ID: tenant-b" http://localhost:8080/api/v1/metrics/memory
You will see the first 3 requests succeed and the 4th fail with 429 Too Many Requests.
Tenant B is unaffected by A. That’s the beauty of identity-key resolution.
Tier-Based Limits (Free vs. Pro)
This is where our gateway gets really smart and a little more complex! The expensive /reports/generate endpoint should have different limits based on the tenant's subscription plan (e.g., "FREE" or "PRO").
In a real application, you'd look up the tenant's plan from a database. For this tutorial, we'll create a simple service with a Map. Create class TenantService.java:
package com.cloudmetrics.api;
import jakarta.enterprise.context.ApplicationScoped;
import java.util.Map;
import java.util.Optional;
@ApplicationScoped
public class TenantService {
private static final Map<String, String> TENANT_PLANS = Map.of(
"tenant-free-user", "FREE",
"tenant-pro-user", "PRO");
public Optional<String> getPlanForTenant(String tenantId) {
return Optional.ofNullable(TENANT_PLANS.get(tenantId));
}
}
This time, we can’t just get away with convenient resolvers though. To achieve dynamic, tier-based rate limiting, we'll build a custom annotation and apply programmatic bucket resolution using the Bucket4j core API.
Create @DynamicRateLimited
to mark endpoints with tier-aware throttling in DynamicRateLimited.java:
package com.cloudmetrics.api;
import jakarta.interceptor.InterceptorBinding;
import java.lang.annotation.*;
@InterceptorBinding
@Target({ ElementType.METHOD, ElementType.TYPE })
@Retention(RetentionPolicy.RUNTIME)
public @interface DynamicRateLimited {
}
We need a per-request tenant ID. We use a REST filter and a CDI context to get the tenant id handled:
Create TenantContext.java:
package com.cloudmetrics.api;
import jakarta.enterprise.context.RequestScoped;
@RequestScoped
public class TenantContext {
private String tenantId;
public String getTenantId() {
return tenantId;
}
public void setTenantId(String tenantId) {
this.tenantId = tenantId;
}
}
Create TenantFilter.java:
package com.cloudmetrics.api;
import jakarta.annotation.Priority;
import jakarta.inject.Inject;
import jakarta.ws.rs.Priorities;
import jakarta.ws.rs.container.ContainerRequestContext;
import jakarta.ws.rs.container.ContainerRequestFilter;
import jakarta.ws.rs.ext.Provider;
import org.jboss.logging.Logger;
@Provider
@Priority(Priorities.AUTHENTICATION)
public class TenantFilter implements ContainerRequestFilter {
@Inject
TenantContext context;
private static final Logger LOG = Logger.getLogger(TenantFilter.class);
@Override
public void filter(ContainerRequestContext req) {
context.setTenantId(req.getHeaderString("X-Tenant-ID"));
LOG.infof("X-Tenant-ID " + context.getTenantId());
}
}
Now we need to write a custom interceptor. Create class: DynamicRateLimitInterceptor.java
:
package com.cloudmetrics.api;
import java.time.Duration;
import java.util.Map;
import java.util.concurrent.ConcurrentHashMap;
import org.jboss.logging.Logger;
import io.github.bucket4j.Bandwidth;
import io.github.bucket4j.Bucket;
import io.github.bucket4j.local.LocalBucketBuilder;
import jakarta.annotation.Priority;
import jakarta.inject.Inject;
import jakarta.interceptor.AroundInvoke;
import jakarta.interceptor.Interceptor;
import jakarta.interceptor.InvocationContext;
import jakarta.ws.rs.WebApplicationException;
import jakarta.ws.rs.core.Response;
@DynamicRateLimited
@Interceptor
@Priority(Interceptor.Priority.PLATFORM_BEFORE + 200)
public class DynamicRateLimitInterceptor {
private static final Logger LOG = Logger.getLogger(DynamicRateLimitInterceptor.class);
@Inject
TenantService tenantService;
@Inject
TenantContext tenantContext;
private final Map<String, Bucket> buckets = new ConcurrentHashMap<>();
private static Bandwidth[] configFor(String plan) {
if ("PRO".equals(plan)) {
return new Bandwidth[] {
Bandwidth.builder()
.capacity(10)
.refillIntervally(10, Duration.ofMinutes(1))
.build(),
Bandwidth.builder()
.capacity(3)
.refillIntervally(3, Duration.ofSeconds(5))
.build()
};
} else {
return new Bandwidth[] {
Bandwidth.builder()
.capacity(2)
.refillIntervally(2, Duration.ofMinutes(1))
.build()
};
}
}
private Bucket resolveBucket(String tenantId) {
return buckets.computeIfAbsent(tenantId, key -> {
String plan = tenantService.getPlan(key).orElse("FREE");
Bandwidth[] limits = configFor(plan);
LocalBucketBuilder bucket = Bucket.builder();
for (Bandwidth limit : limits) {
bucket.addLimit(limit);
}
return bucket.build();
});
}
@AroundInvoke
public Object around(InvocationContext ctx) throws Exception {
String tenantId = tenantContext.getTenantId();
LOG.infof("Tenant ID: %s", tenantId);
if (tenantId == null || tenantId.isBlank()) {
throw new WebApplicationException("Missing tenant", Response.Status.FORBIDDEN);
}
Bucket bucket = resolveBucket(tenantId);
if (bucket.tryConsume(1)) {
return ctx.proceed();
}
throw new WebApplicationException("Too Many Requests", Response.Status.TOO_MANY_REQUESTS);
}
}
Now we need to apply the new interceptor with the annotation:
package com.cloudmetrics.api;
import io.quarkiverse.bucket4j.runtime.RateLimited;
import jakarta.inject.Inject;
import jakarta.ws.rs.GET;
import jakarta.ws.rs.Path;
import jakarta.ws.rs.Produces;
import jakarta.ws.rs.core.MediaType;
@Path("/api/v1")
@Produces(MediaType.TEXT_PLAIN)
public class MetricsResource {
@Inject
TenantService tenantService;
@Inject
TenantContext tenantContext;
// existing endpoints
@GET
@Path("/reports/generate")
@DynamicRateLimited
public String generateReport() {
return "Report generated for " + tenantContext.getTenantId();
}
}
Test both:
for i in {1..3}; do curl -i -H "X-Tenant-ID: tenant-free-user" http://localhost:8080/api/v1/reports/generate; echo; done
for i in {1..3}; do curl -i -H "X-Tenant-ID: tenant-pro-user" http://localhost:8080/api/v1/reports/generate; echo; done
Free fails on the 3rd. Pro sails through.
You did it!
You just built a smart API gateway with Quarkus and Bucket4j that can:
Enforce global throttling
Limit by tenant ID
Throttle by subscription tier
what about multiple instances of the service?