From Black Box to Blueprint: Tracing Every LLM Decision with Quarkus
Build trust, traceability, and visual insight into your AI-powered Java apps using LangChain4j, Ollama, and CDI interceptors.
In a world of agentic AI, knowing what your model did isn't enough. You need to know why. This is the tutorial that shows you how.
Modern LLMs don’t just spit out answers. They call tools, self-correct, and loop through logic like mini-autonomous systems. But what happens under the hood of those multi-step decisions? How can a developer trust or even debug an LLM-powered feature if the model's chain of thought is invisible?
I thought about tracing calls and representing them as diagrams since a while. Back at Devoxx UK, my amazing colleague Bruno implemented a trace with Camel and in this hands-on guide, I’ll build a transparent, inspectable agentic AI system with based on:
Quarkus as the reactive Java runtime
Langchain4j for structured AI service composition
Ollama for local LLM inference
CDI interceptors for non-invasive observability
Mermaid.js to render decision flowcharts
By the end, you’ll have a full-stack system that traces every step of an LLM interaction. From the prompt to tool invocations to guardrail corrections and renders it as a shareable graph.
Bootstrapping the LLM-Ready Quarkus App
Quarkus provides a command-line interface (CLI) that simplifies project creation and extension management. The project will be created with all necessary dependencies from the start. Execute the following command in a terminal to generate a new Maven-based Quarkus project:
quarkus create app com.example:llm-observability \
--extension='rest-jackson,quarkus-langchain4j-ollama' \
--no-code
cd llm-observability
And if you want to get a headstart and just gab the complete project, make sure to check out the Github repository.
Configure application.properties
The quarkus-langchain4j-ollama
extension enables a declarative configuration approach through the application.properties
file located in src/main/resources
. This file is the central point for telling the Quarkus application how to connect to and interact with the Ollama service.
Add the following properties to src/main/resources/application.properties
:
quarkus.langchain4j.ollama.chat-model.model-id=llama3.1:latest
quarkus.langchain4j.ollama.timeout=60s
quarkus.langchain4j.ollama.log-requests=true
quarkus.langchain4j.ollama.log-responses=true
quarkus.langchain4j.ollama.devservices.enabled=false
This setup gives you a declarative AI integration with no manual HTTP code or JSON parsing required.
Creating the Conversational AI Core
With the project configured, the next step is to build the simplest form of interaction: a basic chatbot. This involves creating a declarative Langchain4j AI Service and a JAX-RS resource to expose it via an HTTP endpoint. This service will serve as the foundational component upon which the more complex features of tools, guardrails, and tracing will be layered.
AI Service Interface
The core of the quarkus-langchain4j
integration is a powerful declarative programming model. Instead of writing a traditional service class with imperative logic, developers define a Java interface and annotate it. Quarkus and Langchain4j then work together to generate the implementation at build time.
Create a new Java interface named ChatbotAiService.java
in the org.acme
package with the following content:
package org.acme;
import dev.langchain4j.service.UserMessage;
import io.quarkiverse.langchain4j.RegisterAiService;
import io.quarkiverse.langchain4j.guardrails.InputGuardrails;
import io.quarkiverse.langchain4j.guardrails.OutputGuardrails;
@RegisterAiService(tools = CalculatorTools.class)
public interface ChatbotAiService {
@InputGuardrails(BannedWordGuard.class)
@OutputGuardrails(ConcisenessGuard.class)
String chat(@UserMessage String userMessage);
}
Note: I have already included an Input and an Output Guardrail, as well as a tool-call here. We will implement them later on.
REST Endpoint
To make the AI service accessible from the outside world, a standard JAX-RS REST resource is needed. This resource will expose an HTTP endpoint that clients can call.
Create a new class named ChatResource.java
in the org.acme
package:
package org.acme;
import org.acme.tracing.LLMCallTracking;
import jakarta.inject.Inject;
import jakarta.ws.rs.Consumes;
import jakarta.ws.rs.POST;
import jakarta.ws.rs.Path;
import jakarta.ws.rs.Produces;
import jakarta.ws.rs.core.MediaType;
@Path("/chat")
public class ChatResource {
@Inject
ChatbotAiService chatbot;
@POST
@Consumes(MediaType.TEXT_PLAIN)
@Produces(MediaType.TEXT_PLAIN)
@LLMCallTracking
public String chat(String message) {
return chatbot.chat(message);
}
}
Note: You can already see the @LLMCallTracking
interceptor binding that we will cover later on in the tutorial.
Adding Tool Calling Support
This section elevates the simple chatbot into a more capable "agent" by granting it the ability to use external tools. Tools are functions that the LLM can invoke to perform tasks it cannot do on its own, such as performing precise calculations or accessing real-time data from external APIs. This is the first step toward building a system whose behavior is not fully determined by a single prompt but emerges from a reasoning process.
A new CDI bean will be created to house the tool methods.
Create a new class named CalculatorTools.java
in the org.acme
package:
package org.acme;
import org.acme.tracing.LLMCallTracking;
import dev.langchain4j.agent.tool.Tool;
import io.quarkus.logging.Log;
import jakarta.enterprise.context.ApplicationScoped;
@ApplicationScoped
public class CalculatorTools {
@Tool("Calculates the sum of two numbers, 'a' and 'b'.")
@LLMCallTracking
public double add(double a, double b) {
Log.infof("Tool executed: add(%.2f, %.2f)%n", a, b);
return a + b;
}
@Tool("Calculates the difference between two numbers, 'a' and 'b'.")
@LLMCallTracking
public double subtract(double a, double b) {
Log.infof("Tool executed: subtract(%.2f, %.2f)%n", a, b);
return a - b;
}
}
Key elements of this class are:
@ApplicationScoped
: This makes the class a CDI bean, allowing it to be managed by the Quarkus container and discovered by the Langchain4j framework.@Tool
: This Langchain4j annotation marks a method as an available tool for the LLM.Tool Description: The string provided to the
@Tool
annotation (e.g.,"Calculates the sum of two numbers, 'a' and 'b'."
) is critically important. This natural language description is what Langchain4j sends to the LLM as part of the tool's specification. The LLM uses this description to understand what the tool does and decide whether to call it. A clear and precise description significantly increases the likelihood that the LLM will use the tool correctly.@LLMCallTracking
already introduces the interceptor we will be developing later.
Enforcing Guardrails
Now that the agent can act by using tools, its behavior must be constrained. This section introduces guardrails, a mechanism to enforce rules on both the LLM's input and its output. This tutorial will implement a particularly powerful feature: the ability for a guardrail to detect a faulty response and automatically "reprompt" the LLM to correct itself, demonstrating a dynamic, self-correction loop.
Input Guardrail
An input guardrail will be created to demonstrate how to validate and reject user prompts before they are sent to the LLM.
Create a new class BannedWordGuard.java
in the org.acme
package:
package org.acme;
import org.acme.tracing.LLMCallTracking;
import dev.langchain4j.data.message.UserMessage;
import io.quarkiverse.langchain4j.guardrails.InputGuardrail;
import io.quarkiverse.langchain4j.guardrails.InputGuardrailResult;
import jakarta.enterprise.context.ApplicationScoped;
@ApplicationScoped
@LLMCallTracking
public class BannedWordGuard implements InputGuardrail {
@Override
public InputGuardrailResult validate(UserMessage userMessage) {
String text = userMessage.singleText();
if (text.toLowerCase().contains("politics")) {
return fatal("This topic is not allowed.");
}
return success();
}
}
This guardrail checks if the user's message contains the word "politics". If it does, it returns a fatal
result, which immediately stops the processing chain and prevents the LLM from being called. The message "This topic is not allowed." would be propagated back as an error.
Output Guardrail with Reprompting
This is a key part of the tutorial, demonstrating the dynamic self-correction capability. An output guardrail will be created to check if the LLM's response is too verbose. If it is, instead of simply failing, it will trigger a reprompt.
Create a new class ConcisenessGuard.java
in the org.acme
package:
package org.acme;
import dev.langchain4j.data.message.AiMessage;
import io.quarkiverse.langchain4j.guardrails.OutputGuardrail;
import io.quarkiverse.langchain4j.guardrails.OutputGuardrailResult;
import jakarta.enterprise.context.ApplicationScoped;
@ApplicationScoped
public class ConcisenessGuard implements OutputGuardrail {
private static final int MAX_LENGTH = 1500;
@Override
public OutputGuardrailResult validate(AiMessage aiMessage) {
String text = aiMessage.text();
// Allow empty content (e.g., when AI is making tool calls)
if (text == null || text.isBlank()) {
return success();
}
if (text.length() > MAX_LENGTH) {
return reprompt("Response is too long.", "Please be more concise.");
}
return success();
}
}
Full-Stack Observability via CDI Interceptors
This section is the heart of the tutorial, where the core observability mechanism is built. A Quarkus CDI interceptor will be implemented to capture high-level information about the chat interactions and store it for later visualization. The architectural choice of using an interceptor over other alternatives will be justified as the most suitable for this application's needs.
Define a Tracing Annotation
A well-designed data model is essential for capturing the rich, structured context of the conversation. The following Java record
s, as specified in the initial prompt, will serve as the data transfer objects for the tracing system.
Create a new file TraceData.java
in the org.acme.tracing
package:
package org.acme.tracing;
import java.time.Duration;
import java.time.LocalDateTime;
import java.util.List;
import java.util.Map;
public class TraceData {
public record ConversationTrace(
String conversationId,
LocalDateTime startTime,
List<LLMInteraction> interactions,
List<ToolCall> toolCalls,
List<GuardrailViolation> violations,
Map<String, Object> metadata) {
}
public record LLMInteraction(
String prompt,
String response,
String model,
Integer inputTokenCount,
Integer outputTokenCount,
Duration duration) {
}
public record ToolCall(
String toolName,
String params,
String result,
Duration duration) {
}
public record GuardrailViolation(
String guardrail,
String violation,
String reprompt) {
}
}
Store Traces In-Memory
To associate the interceptor with the methods it should target, a custom annotation, known as an interceptor binding, is required.
Create a new annotation LLMCallTracking.java
in the org.acme.tracing
package:
package org.acme.tracing;
import java.lang.annotation.ElementType;
import java.lang.annotation.Retention;
import java.lang.annotation.RetentionPolicy;
import java.lang.annotation.Target;
import jakarta.interceptor.InterceptorBinding;
@InterceptorBinding
@Retention(RetentionPolicy.RUNTIME)
@Target({ ElementType.TYPE, ElementType.METHOD })
public @interface LLMCallTracking {
}
Implementing the LLMCallTracker
and LLMCallInterceptor
The tracing system will consist of two main components: a tracker service to store the traces and the interceptor itself to capture the events.
First, create the LLMCallTracker
service. This @ApplicationScoped
bean will hold the conversation traces in memory.
Create LLMCallTracker.java
in the org.acme.tracing
package:
package org.acme.tracing;
import java.time.LocalDateTime;
import java.util.ArrayList;
import java.util.Collections;
import java.util.Map;
import java.util.Optional;
import java.util.concurrent.ConcurrentHashMap;
import org.acme.tracing.TraceData.ConversationTrace;
import org.acme.tracing.TraceData.GuardrailViolation;
import org.acme.tracing.TraceData.LLMInteraction;
import org.acme.tracing.TraceData.ToolCall;
import jakarta.enterprise.context.ApplicationScoped;
@ApplicationScoped
public class LLMCallTracker {
private final Map<String, ConversationTrace> activeTraces = new ConcurrentHashMap<>();
public void startTrace(String conversationId, String initialPrompt) {
activeTraces.computeIfAbsent(conversationId, id -> new ConversationTrace(
id,
LocalDateTime.now(),
Collections.synchronizedList(new ArrayList<>()),
Collections.synchronizedList(new ArrayList<>()),
Collections.synchronizedList(new ArrayList<>()),
new ConcurrentHashMap<>()));
}
public void recordLLMInteraction(String conversationId, LLMInteraction interaction) {
Optional.ofNullable(activeTraces.get(conversationId))
.ifPresent(trace -> trace.interactions().add(interaction));
}
public void recordToolCall(String conversationId, ToolCall toolCall) {
Optional.ofNullable(activeTraces.get(conversationId))
.ifPresent(trace -> trace.toolCalls().add(toolCall));
}
public void recordGuardrailViolation(String conversationId, GuardrailViolation violation) {
Optional.ofNullable(activeTraces.get(conversationId))
.ifPresent(trace -> trace.violations().add(violation));
}
public Optional<ConversationTrace> getTrace(String conversationId) {
return Optional.ofNullable(activeTraces.get(conversationId));
}
}
Next, create the LLMCallInterceptor
in the org.acme.tracing
package:
package org.acme.tracing;
import java.time.Duration;
import java.time.Instant;
import java.util.UUID;
import org.acme.tracing.TraceData.GuardrailViolation;
import org.acme.tracing.TraceData.LLMInteraction;
import org.acme.tracing.TraceData.ToolCall;
import com.fasterxml.jackson.databind.ObjectMapper;
import dev.langchain4j.agent.tool.Tool;
import io.quarkiverse.langchain4j.guardrails.GuardrailResult;
import io.quarkiverse.langchain4j.guardrails.OutputGuardrailResult;
import io.quarkus.logging.Log;
import jakarta.annotation.Priority;
import jakarta.inject.Inject;
import jakarta.interceptor.AroundInvoke;
import jakarta.interceptor.Interceptor;
import jakarta.interceptor.InvocationContext;
import jakarta.ws.rs.POST;
@LLMCallTracking
@Interceptor
@Priority(Interceptor.Priority.APPLICATION + 1)
public class LLMCallInterceptor {
@Inject
LLMCallTracker tracker;
@Inject
RequestCorrelation correlation;
@Inject
ObjectMapper mapper; // For serializing tool parameters
@AroundInvoke
public Object track(InvocationContext context) throws Exception {
// Check if this is the entry point (the JAX-RS method)
if (context.getMethod().isAnnotationPresent(POST.class)) {
String conversationId = UUID.randomUUID().toString();
Log.info("CONVERSATION ID: " + conversationId);
correlation.setConversationId(conversationId);
tracker.startTrace(conversationId, (String) context.getParameters()[0]);
}
String conversationId = correlation.getConversationId();
if (conversationId == null) {
// Not part of a tracked conversation, proceed without tracking
return context.proceed();
}
Instant start = Instant.now();
Object result = null;
try {
result = context.proceed();
return result;
} finally {
Instant end = Instant.now();
Duration duration = Duration.between(start, end);
// Differentiate based on the type of method intercepted
if (context.getMethod().isAnnotationPresent(Tool.class)) {
handleToolCall(context, conversationId, result, duration);
} else if (result instanceof GuardrailResult) {
handleGuardrail(context, conversationId, (GuardrailResult) result);
} else if (context.getMethod().isAnnotationPresent(POST.class)) {
handleLLMInteraction(context, conversationId, (String) result, duration);
}
}
}
private void handleLLMInteraction(InvocationContext context, String conversationId, String response,
Duration duration) {
LLMInteraction interaction = new LLMInteraction(
(String) context.getParameters()[0],
response,
"ollama:llama3",
null, null,
duration);
tracker.recordLLMInteraction(conversationId, interaction);
}
private void handleToolCall(InvocationContext context, String conversationId, Object result, Duration duration) {
String paramsJson;
try {
paramsJson = mapper.writeValueAsString(context.getParameters());
} catch (Exception e) {
paramsJson = "Error serializing params: " + e.getMessage();
}
ToolCall toolCall = new ToolCall(
context.getMethod().getName(),
paramsJson,
String.valueOf(result),
duration);
tracker.recordToolCall(conversationId, toolCall);
}
private void handleGuardrail(InvocationContext context, String conversationId, GuardrailResult result) {
if (!result.isSuccess()) {
String reprompt = null;
if (result instanceof OutputGuardrailResult) {
reprompt = "Reprompt triggered"; // Simple fallback
}
GuardrailViolation violation = new GuardrailViolation(
context.getTarget().getClass().getSimpleName(),
"Guardrail violation detected", // Simple fallback message
reprompt);
tracker.recordGuardrailViolation(conversationId, violation);
}
}
}
This gives you AOP-style tracing without polluting business logic.
Correlating Events with Context
With the interceptor firing in multiple places, the next challenge is to distinguish between these different events and correlate them to the same conversation. This requires enhancing the interceptor to be context-aware and establishing a mechanism to pass the conversationId
.
A CDI Request Scoped bean is an excellent way to hold the conversationId
for the duration of a single HTTP request.
Create RequestCorrelation.java
in the org.acme.tracing
package:
package org.acme.tracing;
import jakarta.enterprise.context.RequestScoped;
@RequestScoped
public class RequestCorrelation {
private String conversationId;
public String getConversationId() {
return conversationId;
}
public void setConversationId(String conversationId) {
this.conversationId = conversationId;
}
}
This multi-target interception strategy, combined with a request-scoped correlation ID, is a powerful and general-purpose pattern. It allows the system to stitch together disparate invocations that occur across loosely coupled CDI beans during a single request, forming a single, coherent narrative of the entire workflow. This technique is not limited to LLM tracing; it can be applied to achieve deep observability in any complex CDI-based application, for instance, to trace a request through multiple internal services, database calls, and external API integrations.
Visualizing with Mermaid.js
With all the trace data being meticulously collected and correlated, this final implementation section focuses on the "last mile": transforming the raw ConversationTrace
object into a human-readable Mermaid.js flowchart and exposing it via a REST API. This will provide the intuitive visualization that makes the complex internal processes understandable at a glance.
Mermaid Generator
A dedicated service will be created to encapsulate the logic for converting a ConversationTrace
into a Mermaid graph definition. This separation of concerns makes the system cleaner and more maintainable.
Create a new class MermaidGraphGenerator.java
in the org.acme.tracing
package. I will spare you the details. Grab the full example from my Github repository.
@ApplicationScoped
public class MermaidGraphGenerator {
public String generate(ConversationTrace trace) {
// Turn trace into Mermaid.js flowchart string
}
}
This generator iterates through the collected events in the ConversationTrace
and constructs a valid Mermaid.js graph definition string, following the flowchart syntax. It uses unique identifiers for each node to build the graph structure.
Expose via REST
A new JAX-RS resource is needed to expose the tracing data. It will provide two endpoints: one for the raw JSON trace and another for the generated Mermaid diagram.
Create a new class LLMTraceResource.java
in the org.acme.tracing
package:
package org.acme.tracing;
import jakarta.inject.Inject;
import jakarta.ws.rs.GET;
import jakarta.ws.rs.Path;
import jakarta.ws.rs.PathParam;
import jakarta.ws.rs.Produces;
import jakarta.ws.rs.core.MediaType;
import jakarta.ws.rs.core.Response;
@Path("/llm-traces")
public class LLMTraceResource {
@Inject
LLMCallTracker tracker;
@Inject
MermaidGraphGenerator mermaidGenerator;
@GET
@Path("/{conversationId}/trace")
@Produces(MediaType.APPLICATION_JSON)
public Response getTrace(@PathParam("conversationId") String id) {
return tracker.getTrace(id)
.map(trace -> Response.ok(trace).build())
.orElse(Response.status(Response.Status.NOT_FOUND).build());
}
@GET
@Path("/{conversationId}/mermaid")
@Produces(MediaType.TEXT_PLAIN)
public Response getMermaidDiagram(@PathParam("conversationId") String id) {
return tracker.getTrace(id)
.map(mermaidGenerator::generate)
.map(mermaid -> Response.ok(mermaid).build())
.orElse(Response.status(Response.Status.NOT_FOUND).build());
}
}
I have also included a simple html website into /META-INF/resources to make it easier to display the generated Mermaid diagram.
Time to put all of this to work and see the result.
quarkus dev
After a while you see Quarkus being ready and accepting requests.
curl -X POST -H "Content-Type: text/plain" \
-d "What is 100 minus 18? Answer in a concise sentence." \
http://localhost:8080/chat
Grab the conversation ID from the log. Something like:
CONVERSATION ID: dd0b0cec-6889-42c7-9f04-f9d24d31674d
And point your browser to http://localhost:8080 where you can paste the conversation ID and see the resulting flow chart.
And please be aware that this is a very simplistic approach with a lot of features to be desired, but I wanted to make sure to give this a try and see how far I can push it. Feel free to play around with it and see if you can trace other happenings or even see an asynchronous call happening (LLM invoking more than one Tool at once).
Bonus: More Feature Ideas
Add some test coverage ;)
Use a persistent store (e.g., MongoDB) for
LLMCallTracker
Use OpenTelemetry for distributed traces
Offload trace recording to async workers
Scrub or mask sensitive prompts before storing
Final Words
If you're building AI agents in Java, this pattern can be your observability blueprint. Not just logs, not just metrics, this is the story of every decision your AI makes, rendered in real time.
Now go ahead and make that black box transparent.