The Eye of the Model: A Quarkus Tale of Image Descriptions

Give your Java app the power to see. This hands-on tutorial blends AI, LangChain4j, and local vision models into a spellbinding Quarkus REST service that describes images from your filesystem.

Jun 12, 2025

In the age of modern Java, a new quest emerges. One not of dragons or rings, but of pixels and prompts. A company of brave developers sets out to awaken the Eye — a vision model that sees what men cannot describe, and speaks truths long hidden in forgotten JPEGs. Will they succeed? Or will their requests time out in the darkness...

Welcome, traveler. Today, you begin your journey as a Quarkus developer-wizard who wishes to grant your application the gift of sight. Guided by Langchain4j, empowered by Ollama’s Vision-Language model, and armed with the magic kubernetes-native framework known as the Quarkus, you will build a RESTful incantation that gazes upon any image you summon and returns a description worthy of Elvish poetry.

Act I — The Tools of the Realm

Before you begin, you must gather your weapons:

Quarkus: The framework forged in the fires of productivity.
Langchain4j: Your magical conduit to large language (and vision) models.
Ollama: The beast-master that tames the mighty LLaVA.

You Shall Not Pass... Unless You Have:

JDK 21+
Maven 3.8+
Podman or native Ollama install
A pulled vision capable model: ollama pull qwen2.5vl
A test image: e.g., /tmp/test_images/my_cat.jpg (or /C:/tmp/test_images/my_dog.png if you dwell in the realm of Windows)

Act II — Forging the Application in the Terminal of Power

You begin your journey by conjuring a new Quarkus project:

mvn io.quarkus.platform:quarkus-maven-plugin:create \
    -DprojectGroupId=org.acme \
    -DprojectArtifactId=quarkus-image-describer \
    -Dextensions="rest,langchain4j-ollama,quarkus-smallrye-openapi"
cd quarkus-image-describer

This summons the initial bones of your app. And if you just want the results, go and grab the example from my Github repository!

Act III — Writing the Scroll of Configuration

Into the sacred parchment known as application.properties, etch these runes:

quarkus.langchain4j.ollama.chat-model.model-id=qwen2.5vl
quarkus.langchain4j.ollama.timeout=180s

Act IV — Binding the Eye: The AI Service

You must now write the incantation that calls upon the Eye to speak.

Create src/main/java/org/acme/ImageDescriberAiService.java:

package org.acme;

import dev.langchain4j.data.image.Image;
import dev.langchain4j.service.SystemMessage;
import dev.langchain4j.service.UserMessage;
import io.quarkiverse.langchain4j.RegisterAiService;
import jakarta.enterprise.context.ApplicationScoped;

@RegisterAiService(chatMemoryProviderSupplier = RegisterAiService.NoChatMemoryProviderSupplier.class)
@ApplicationScoped
public interface ImageDescriberAiService {

    @SystemMessage("You are an expert image analyst. Describe this image like a poetic elf would - use flowing, archaic English with references to starlight, nature, and beauty..")
    @UserMessage("Describe this image.") //
    String describeImage(Image image); // Use Langchain4j's Image class
}

This contract binds the Eye, the LLaVA model, to your service, translating pixels into prose.

Act V — The RESTful Spellbook

Create src/main/java/org/acme/ImageDescriberResource.java:

package org.acme;

import java.io.IOException;
import java.nio.file.Files;

import org.jboss.logging.Logger;
import org.jboss.resteasy.reactive.RestForm;
import org.jboss.resteasy.reactive.multipart.FileUpload;

import dev.langchain4j.data.image.Image;
import jakarta.inject.Inject;
import jakarta.ws.rs.Consumes;
import jakarta.ws.rs.POST;
import jakarta.ws.rs.Path;
import jakarta.ws.rs.Produces;
import jakarta.ws.rs.core.MediaType;

@Path("/describe-image")
public class ImageDescriberResource {

    public static Logger LOG = Logger.getLogger(ImageDescriberResource.class);

    @Inject
    ImageDescriberAiService imageService;

    @POST
    @Path("/describe")
    @Consumes(MediaType.MULTIPART_FORM_DATA)
    @Produces(MediaType.TEXT_PLAIN)
    public String describeImage(@RestForm("image") FileUpload file) {
        if (file == null) {
            return "Error: No file uploaded.";
        }

        try {
            // 1. Read image bytes from the uploaded file
            byte[] imageBytes = Files.readAllBytes(file.uploadedFile());

            // 2. Get the MIME type of the uploaded image
            String mimeType = file.contentType();
            if (mimeType == null || (!mimeType.equals("image/png") && !mimeType.equals("image/jpeg"))) {
                // Add more supported types if needed by your model
                return "Error: Only PNG and JPEG images are supported. Uploaded type: " + mimeType;
            }

            LOG.info(mimeType + " image received for description.");
            String base64String = java.util.Base64.getEncoder().encodeToString(imageBytes);

            // Very LARGE log output, so commented out
            // LOG.info(base64String + " b64");

            // 3. Create a Langchain4j Image object
            Image langchainImage = Image.builder()
                    .base64Data(base64String) // Use the base64 encoded string
                    .mimeType(mimeType)
                    .build();

            // 4. Call the AI service
            String imageDescription = imageService.describeImage(langchainImage);
            return imageDescription;

        } catch (IOException e) {
            // Log the exception (e.g., using org.jboss.logging.Logger)
            e.printStackTrace();
            return "Error processing image: " + e.getMessage();
        } catch (Exception e) {
            // Catch other potential exceptions from the AI service
            e.printStackTrace();
            return "Error getting description from AI: " + e.getMessage();
        }
    }

}

This is your public-facing spell — a GET endpoint at /describe-image.

Act VI — Summoning the Image

Place a test image in a known path:

mkdir -p /tmp/test_images
cp ~/Downloads/cat.jpg /tmp/test_images/my_cat.jpg

Act VII — Awakening the Vision Model

First, awaken the beast:

ollama serve

Ensure the model is ready:

ollama list

Then run your Quarkus spellbook:

./mvnw quarkus:dev

You are now the keeper of a REST endpoint connected to an all-seeing Eye.

Act VIII — The Seeing Stone: Testing Your App

Now gaze into the Palantír:

curl -X 'POST' \
  'http://localhost:8080/describe-image/describe' \
  -H 'accept: text/plain' \
  -H 'Content-Type: multipart/form-data' \
  -F 'image=@Felis_silvestris_silvestris_small_gradual_decrease_of_quality.png;type=image/png'

Expected result:

Behold a feline gaze, luminous as starlight's dance upon the night's deep blue! This cat, with its coat of shimmering golden hues, doth recline amidst the white, as if the snow itself were a mantle of silver dreams. Its eyes, a curious green, gleam with an inner light, as if the forest's whispering leaves have found their reflection in those depths. Whiskers poised in delicate symmetry, they dance upon the air, hinting at the secrets of the woodland's heart. This creature, a masterpiece of nature's art, doth captivate with its quiet elegance and serene beauty.

(Descriptions will vary depending on the model’s vision... and possibly its mood.)

You can also use the Swagger-UI to try out images.

Appendix: The Hidden Magic

Behind the curtain, this is what happens:

You cast an HTTP request.
Quarkus reads your file and turns it into a Base64 encoded String
Langchain4j builds an Image object and calls the ImageDescriptionAiService.
The quarkus-langchain4j-ollama extension sends it to Ollama.
Ollama passes the image and prompt to qwen2.5vl.
The model returns a description.
You get a textual response in your terminal.

Epilogue — Notes from the Scribes

Use other vision capable models from ollama.com/library.
If you get errors, check your image path and Quarkus logs.
Play with the prompt to get different styles of descriptions.

Where to Next, Brave Developer?

Explore chaining multiple AI calls: e.g., detect → describe → summarize.
Use Qute to build a UI: upload an image and show the result.
Extend this project with Langchain4j memory, guardrails, or even audio analysis.

You Have Seen With the Eye

Your REST API can now look upon any image — and whisper its secrets back to you.

“All we have to decide is what to do with the model that is given to us.”