From Text to POJO: Extracting Structured Data with Quarkus, Langchain4j, and Local LLMs

Learn how to turn messy text into clean Java objects using Quarkus, Langchain4j, and Ollama. No APIs, no cloud, just pure local LLM power.

May 21, 2025

Large Language Models (LLMs) are the new Swiss army knives of software development. They summarize, translate, complete code, and even answer philosophical questions with the elegance of a literature professor. But when you need them to do something a bit more mundane, like return a customer's name and age from a sentence, they often fall short in one key way: structure.

By default, LLMs generate human-friendly paragraphs. Great for chat, bad for machines. What if, instead, you could ask the model to skip the pleasantries and give you a nice, clean JSON object, something your backend can deserialize straight into a Java class? That’s exactly the kind of magic Langchain4j brings to Quarkus.

In this hands-on tutorial, you’ll build a Quarkus application that runs an LLM locally (no cloud APIs!) using Ollama. The goal? Extract structured data from plain text and map it directly into Java POJOs using a Quarkus @RegisterAIService.

You’ll build two examples:

A simple person extractor (name + age)
A more complex invoice parser (with nested objects)

Let’s gooooo. (Or take a look at the ready made example in my Github repository.)

Why Structured Output from LLMs Matters

Say you're building a data entry automation tool. You want to let your users upload emails, PDFs, or chat transcripts and then extract structured information from them: names, totals, items, addresses. But if the LLM responds with prose like "Sure! John Doe is 35 years old," you’re left trying to regex your way to salvation.

That’s brittle. And ugly.

Structured output flips the script. With Langchain4j, you define the target structure in Java, and the library:

Crafts a detailed prompt telling the LLM what to return.
Parses the result (usually JSON) into your POJO.

And with Quarkus + Ollama, you can do all of this locally: No API keys, no vendor lock-in, no usage limits.

Part 1: Project Setup

Create the Project

Fire up your terminal and generate a new Quarkus project with the required extensions:

mvn io.quarkus.platform:quarkus-maven-plugin:create \
  -DprojectGroupId=org.acme \
  -DprojectArtifactId=structured-ollama-tutorial \
  -Dextensions="rest-jackson,langchain4j-ollama"
cd structured-ollama-tutorial

This gives you:

quarkus-rest-jackson for REST endpoints and JSON handling
quarkus-langchain4j-ollama for local LLM interaction via Langchain4j

Open the project in your IDE. Check your pom.xml to verify the dependencies.

Part 2: Configure Ollama and Langchain4j

In src/main/resources/application.properties, add the following:

# Use a compact, instruction-following model
quarkus.langchain4j.ollama.chat-model.model-id=mistral:7b-instruct

# Local LLMs can take longer to respond, especially at first
quarkus.langchain4j.ollama.timeout=60s

Now here’s the magic part: when you run mvn quarkus:dev, Quarkus Dev Services detects the Ollama dependency and auto-starts an Ollama container with the model you configured. The first time, it’ll download the model, so be patient. Also, make sure to have a container runtime installed. I use Podman locally and you should too.

No setup, no docker-compose, no YAML. Just run and go.

Part 3: Extract a Person from Text

Step 1: Define the POJO

Create src/main/java/org/acme/Person.java:

package org.acme;

public class Person {
    private String name;
    private int age;

    public Person() {}
    public Person(String name, int age) {
        this.name = name; this.age = age;
    }

    public String getName() { return name; }
    public void setName(String name) { this.name = name; }

    public int getAge() { return age; }
    public void setAge(int age) { this.age = age; }

    @Override
    public String toString() {
        return "Person{name='" + name + "', age=" + age + '}';
    }
}

Step 2: Define the AI Interface

Create src/main/java/org/acme/PersonExtractor.java:

package org.acme;

import dev.langchain4j.service.structured.StructuredPrompt;
import io.quarkiverse.langchain4j.RegisterAiService;

@RegisterAiService
public interface PersonExtractor {
    @StructuredPrompt("Extract the name and age of the person described in the following text: {{text}}")
    Person extractPerson(String text);
}

Quarkus will:

Insert the input into the prompt
Tell the model to return data matching the Person class
Deserialize the result automatically

Step 3: Create the REST Endpoint

Create src/main/java/org/acme/PersonResource.java:

package org.acme;

import jakarta.inject.Inject;
import jakarta.ws.rs.Consumes;
import jakarta.ws.rs.POST;
import jakarta.ws.rs.Path;
import jakarta.ws.rs.Produces;

@Path("/person")
public class PersonResource {

    @Inject
    PersonExtractor extractor;

    @POST
    @Path("/extract")
    @Consumes("text/plain")
    @Produces("application/json")
    public Person extract(String text) {
        return extractor.extractPerson(text);
    }
}

Step 4: Run and Test

Start the app:

mvn quarkus:dev

In another terminal:

curl -X POST -H "Content-Type: text/plain" \
  --data "My colleague John Doe is 35 years old and lives in Berlin." \
  http://localhost:8080/person/extract

Expected output:

{
  "name": "John Doe",
  "age": 35
}

Boom. That’s Quarkus and Langchain4j parsing the model’s JSON and turning it into a Java object! All from a natural language sentence.

Part 4: Extract a Complex Invoice

Let’s get serious. What if you want to extract structured invoice data, including a list of line items?

Step 1: Create the POJOs

InvoiceItem.java

package org.acme;

import java.math.BigDecimal;

public class InvoiceItem {
    private String description;
    private int quantity;
    private BigDecimal unitPrice;

    public InvoiceItem() {}
    public InvoiceItem(String description, int quantity, BigDecimal unitPrice) {
        this.description = description;
        this.quantity = quantity;
        this.unitPrice = unitPrice;
    }

    // Getters & setters...
}

Invoice.java

package org.acme;

import java.math.BigDecimal;
import java.time.LocalDate;
import java.util.List;

public class Invoice {
    private String invoiceNumber;
    private LocalDate invoiceDate;
    private String customerName;
    private BigDecimal totalAmount;
    private List<InvoiceItem> items;

    public Invoice() {}

    // Getters & setters...
}

Step 2: Add a Date Helper

package org.acme;

import java.time.LocalDate;
import java.time.format.DateTimeFormatter;

public class CurrentDate {
    public static final String CURRENT_DATE_STR = LocalDate.now().format(DateTimeFormatter.ISO_DATE);
}

Step 3: Define the AI Interface

package org.acme;

import dev.langchain4j.service.SystemMessage;
import dev.langchain4j.service.UserMessage;

import io.quarkiverse.langchain4j.RegisterAiService;

@RegisterAiService
public interface InvoiceParser {
    @SystemMessage("""
            You are an invoice processing assistant that extracts structured invoice data from text into JSON and computes totals.

            Rules:
            1. Extract invoiceNumber, invoiceDate (calculate from 'yesterday'), customerName, and items.
            2. First, extract the items and their prices.
            3. Then multiply quantity × unitPrice for each.
            4. Finally, sum the results to get the totalAmount.
            5. Ignore any totalAmount in the text — always calculate it yourself.
            6. Use plain numbers (no currency symbols).
            7. If any field is missing, set it to null.
            8. Output ONLY the JSON — no extra explanations.

    """)
    @UserMessage("Extract the invoice as JSON. Today's date is {{current_date}}. {{text}}")
    Invoice parseInvoice(String text, String current_date);
}

And I really lied with this. Because I am actually using an even bigger @SystemMessage in the source code repository. But let’s keep it a little shorter here.

Step 4: Create the REST Endpoint

package org.acme;

import jakarta.inject.Inject;
import jakarta.ws.rs.Consumes;
import jakarta.ws.rs.POST;
import jakarta.ws.rs.Path;
import jakarta.ws.rs.Produces;

@Path("/invoice")
public class InvoiceResource {

    @Inject
    InvoiceParser parser;

    @POST
    @Path("/parse")
    @Consumes("text/plain")
    @Produces("application/json")
    public Invoice parse(String text) {
        return parser.parseInvoice(text, CurrentDate.CURRENT_DATE_STR);
    }
}

Step 5: Test It

Restart if needed, then, in a separate terminal window:

INVOICE_TEXT="Please process invoice INV-123 for customer ACME Corp, dated yesterday. It includes 2 Widgets at $50.00 each and 1 Gadget for $120.50. The total is $220.50."

curl -X POST -H "Content-Type: text/plain" \
     --data "$INVOICE_TEXT" \
     http://localhost:8080/invoice/parse

Expected output (simplified):

{
  "invoiceNumber": "INV-123",
  "invoiceDate": "2025-05-01",
  "customerName": "ACME Corp",
  "totalAmount": 220.50,
  "items": [
    {
      "description": "Widgets",
      "quantity": 2,
      "unitPrice": 50.00
    },
    {
      "description": "Gadget",
      "quantity": 1,
      "unitPrice": 120.50
    }
  ]
}

Recap: What Just Happened?

When you POST some text:

The Quarkus REST endpoint receives it.
Langchain4j uses your POJO definition to craft a structured prompt.
The LLM (via Ollama) responds with structured JSON.
Langchain4j parses it into a Java object.
Quarkus serializes it back into JSON for your HTTP response.

No manual JSON parsing. No brittle string manipulation. Just clean Java objects from free-form language.

And I have to say a warning here. The result is most likely not going to be correct if you run it on a small machine with a small model. As a matter of fact, small models are not really good at math. In fact it is a common issue with LLMs, even strong ones like Qwen3:8b: they often struggle with basic arithmetic, especially:

Summing multiple items.
Multiplying quantity × unit price correctly.
Comparing computed totals to an explicitly stated total (even when given!).

This happens because language models are not calculators. They can memorize or imitate arithmetic but don’t reliably reason over numbers unless explicitly prompted to do so.

Extra Credit: Error Handling and Improvements

Return Response objects for proper status codes.
Add @ServerExceptionMapper classes to translate exceptions.
Improve prompts with more details or few-shot examples.
Try bigger models like llama3:8b for better accuracy.
Validate LLM output in your Java logic if needed.
Use Guardrails to validate the JSON

Final Thoughts

You just built a powerful, AI-enhanced REST API that runs entirely on your local machine.

You didn’t need OpenAI keys.
You didn’t need to learn prompt engineering from scratch.
You just used Quarkus, Langchain4j, and Ollama.

What to Try Next

Use this technique to extract addresses, products, or metadata from documents.
Add authentication and make this a secure internal API.
Explore Quarkus and Langchain4j’s other features like tool calling or RAG.

This is what modern Java AI development looks like: fast, local, type-safe, and built with the tools you already know.