From Unpredictable to Reliable: Mastering JSON Output with Quarkus, Langchain4j, and Ollama
Stop wrestling with malformed AI responses. Learn how to generate clean, validated JSON from local LLMs using Java records, schema enforcement, and self-healing guardrails.
Large Language Models (LLMs) promise magic: ask a question in natural language, and receive structured, machine-usable data in return. For Java developers working with Quarkus and Langchain4j, especially when connecting to local models like Llama 3 through Ollama, that dream often translates into a familiar goal: "Just give me some clean JSON."
But the reality is rarely that simple. What sounds like a straightforward request to "return a JSON object" often results in a mess of malformed strings, hallucinated fields, or structures that defy deserialization. It doesn’t matter whether you’re building a chatbot, a document parser, or a data extractor. If your JSON is broken, your system breaks down.
This guide aims to make that outcome a thing of the past. We’ll walk through the right way to request JSON from your local model, how to strongly guide it with schema enforcement, and how to handle those inevitable times when the model still gets it wrong. Whether you’re in prototyping mode or building for production, this is your go-to reference for structured output that works.
Getting the Basics Right: Two Ways to Request JSON
Before you start writing guardrails or custom logic, it’s essential to understand how Langchain4j and Quarkus interact with the Ollama API to even ask for JSON in the first place. There are two distinct modes available, and choosing the right one can make or break your results.
Option One: Just Ask Nicely (format=json
)
If your goal is simply to get JSON as a string, the easiest way is to tell the model to return its output in that format. Quarkus Langchain4j supports a shortcut for this: just set the following configuration property.
quarkus.langchain4j.ollama.chat-model.format=json
Behind the scenes, this adds "format": "json"
to the Ollama API request. It’s a polite request that works reasonably well with modern models—especially when your method returns a String
and you want that string to be valid JSON.
However, be aware: this is not a strict instruction. If the model is confused, tired, or just feeling creative, it might return something that looks like JSON but doesn’t actually parse. That’s why this approach works best when you don’t need the structure to be exact, or when you plan to parse the response manually.
Option Two: Let Java Define the Schema
The second, and more powerful, path leverages Java itself to define the desired output structure. Instead of returning a raw string, you define a Java class or record
as the return type in your AI service interface.
public record User(String name, int age) {}
@RegisterAiService
public interface UserExtractor {
User extractUserFrom(String text);
}
Now things get interesting. Quarkus and Langchain4j inspect the return type and automatically generate a complete JSON Schema for the User
record. This schema is passed to the LLM as part of the format
configuration, acting like a strict contract the model must follow.
In this mode, the model isn’t just asked to return some JSON—it’s explicitly told to produce JSON that conforms to the exact structure and types defined in your Java class. The benefit? Seamless deserialization, safer parsing, and fewer downstream errors.
Building Smarter Schemas: Best Practices from the Trenches
Defining a Java class is a good start but it’s not enough on its own. To truly guide the model, you need to enrich your data structures with semantic hints, constraints, and clean boundaries. These best practices take inspiration from the Python LLM community but adapt them to a Java-first, schema-driven mindset.
Add Meaning with @Description
When an LLM sees a field called name
, it might guess correctly. But is it a first name? A last name? A file name? Context matters. Langchain4j includes a helpful annotation, @Description
, that lets you embed field-level meaning directly into the schema.
import dev.langchain4j.model.output.structured.Description;
@Description("Contains extracted information about a person and their pet.")
public record PersonInfo(
@Description("The person's full name.")
String name,
@Description("The person's age, as a whole number.")
int age,
@Description("The type of pet the person owns, e.g., 'dog' or 'cat'.")
String petType
) {}
These descriptions are included in the JSON Schema sent to the model, helping it understand your intent with remarkable precision. Often, this one change is enough to dramatically improve the quality of your results.
Constrain Possibilities with Enums
When a field should accept only a handful of values, don’t rely on the model to guess. Define an enum and let the schema enforce it.
public enum Category {
TECHNICAL_SUPPORT,
BILLING_INQUIRY,
GENERAL_FEEDBACK
}
public record Ticket(
@Description("A summary of the user's issue.")
String summary,
@Description("The category that best fits the ticket.")
Category category
) {}
This maps to a JSON Schema enum
, telling the LLM exactly which values are valid and disallowing all others. The result? Cleaner output and fewer surprises.
Avoid Deep Nesting
Complex hierarchies might make sense in your Java codebase, but they’re often too much for LLMs, especially local ones. If you find yourself nesting objects more than two levels deep, it’s time to refactor.
Break your task into smaller, sequential steps. Each step can have a dedicated service method with a flat return type. Think of it like a pipeline: the output of one model call feeds into the next, with each doing one simple thing well.
Choose the Right Model
Finally, remember that your tooling can only do so much. The model has the final say. Llama 3, Mistral, and Phi-3 have much stronger JSON adherence than older generations. If you're using an older model and struggling with bad output, switching to a newer one can yield immediate benefits.
Going Beyond Happy Paths: Handling Malformed or Invalid Output
Even with schema guidance and modern models, things will still go wrong. The model might forget a field. It might return valid JSON with incorrect values. In a prototype, this might be tolerable. In production? Never.
Here’s how you can make your application resilient and capable of detecting problems and recovering from them automatically.
Self-Correction with Guardrails
The most advanced strategy is to give the model a chance to correct itself. Langchain4j supports this pattern using an OutputGuardrail
that can catch parsing errors and generate a new prompt asking the LLM to fix its mistake.
You start by implementing a custom output guardrail:
public class PersonInfoOutputGuardrail implements OutputGuardrail {
private final ObjectMapper mapper = new ObjectMapper();
@Override
public OutputGuardrailResult validate(String text) {
try {
PersonInfo personInfo = mapper.readValue(text, PersonInfo.class);
return Response.from(personInfo, new TokenUsage());
} catch (JsonProcessingException e) {
String correctionInstruction = String.format(
"The following output is not valid: %s\nError: %s\nPlease correct it.",
text, e.getMessage()
);
return reprompt(correctionInstruction);
}
}
}
Then apply it at the service level:
@RegisterAiService
@OutputGuardrail(PersonInfoOutputParser.class)
public interface ExtractorService {
Result<PersonInfo> extractPersonInfo(String document);
}
If the first attempt fails, the guardrail automatically triggers a correction prompt. It’s like a spell-check for JSON—only smarter.
Retrying with Business Validation
Guardrails catch formatting errors, but what if the model returns valid JSON with bad data? For example, an empty name or an age of 700?
In these cases, you need domain-specific validation and the ability to retry. Quarkus provides this via MicroProfile’s fault tolerance extension:
<dependency>
<groupId>io.quarkus</groupId>
<artifactId>quarkus-smallrye-fault-tolerance</artifactId>
</dependency>
Then build an orchestrator:
@ApplicationScoped
public class AiOrchestrator {
@Inject
ExtractorService extractorService;
@Retry(maxRetries = 2, retryOn = InvalidResponseException.class)
public PersonInfo extractWithValidation(String document) {
PersonInfo info = extractorService.extractPersonInfo(document).await().content();
if (info == null || info.name() == null || info.name().isBlank()) {
throw new InvalidResponseException("LLM returned an invalid name.");
}
return info;
}
}
Now your application can detect a bad result and rerun the entire interaction. If the first try fails, you have a second and third shot before giving up.
Final Thoughts
Structured JSON output from LLMs isn’t a luxury—it’s the foundation of real-world AI systems. And yet, too often we treat it like a roll of the dice.
With Quarkus Langchain4j and Ollama, we have all the tools to bring order to that chaos. By combining schema enforcement, field-level descriptions, enums, output repair, and fault-tolerant retries, we can turn unpredictable responses into dependable pipelines.
Stop hoping for well-formed output and start engineering for it.