The Eye of the Model: A Quarkus Tale of Image Descriptions
Give your Java app the power to see. This hands-on tutorial blends AI, LangChain4j, and local vision models into a spellbinding Quarkus REST service that describes images from your filesystem.
In the age of modern Java, a new quest emerges. One not of dragons or rings, but of pixels and prompts. A company of brave developers sets out to awaken the Eye — a vision model that sees what men cannot describe, and speaks truths long hidden in forgotten JPEGs. Will they succeed? Or will their requests time out in the darkness...
Welcome, traveler. Today, you begin your journey as a Quarkus developer-wizard who wishes to grant your application the gift of sight. Guided by Langchain4j, empowered by Ollama’s Vision-Language model, and armed with the magic kubernetes-native framework known as the Quarkus, you will build a RESTful incantation that gazes upon any image you summon and returns a description worthy of Elvish poetry.
Act I — The Tools of the Realm
Before you begin, you must gather your weapons:
Quarkus: The framework forged in the fires of productivity.
Langchain4j: Your magical conduit to large language (and vision) models.
Ollama: The beast-master that tames the mighty LLaVA.
You Shall Not Pass... Unless You Have:
JDK 21+
Maven 3.8+
Podman or native Ollama install
A pulled vision capable model:
ollama pull qwen2.5vl
A test image: e.g.,
/tmp/test_images/my_cat.jpg
(or/C:/tmp/test_images/my_dog.png
if you dwell in the realm of Windows)
Act II — Forging the Application in the Terminal of Power
You begin your journey by conjuring a new Quarkus project:
mvn io.quarkus.platform:quarkus-maven-plugin:create \
-DprojectGroupId=org.acme \
-DprojectArtifactId=quarkus-image-describer \
-Dextensions="rest,langchain4j-ollama,quarkus-smallrye-openapi"
cd quarkus-image-describer
This summons the initial bones of your app. And if you just want the results, go and grab the example from my Github repository!
Act III — Writing the Scroll of Configuration
Into the sacred parchment known as application.properties
, etch these runes:
quarkus.langchain4j.ollama.chat-model.model-id=qwen2.5vl
quarkus.langchain4j.ollama.timeout=180s
Act IV — Binding the Eye: The AI Service
You must now write the incantation that calls upon the Eye to speak.
Create src/main/java/org/acme/ImageDescriberAiService.java
:
package org.acme;
import dev.langchain4j.data.image.Image;
import dev.langchain4j.service.SystemMessage;
import dev.langchain4j.service.UserMessage;
import io.quarkiverse.langchain4j.RegisterAiService;
import jakarta.enterprise.context.ApplicationScoped;
@RegisterAiService(chatMemoryProviderSupplier = RegisterAiService.NoChatMemoryProviderSupplier.class)
@ApplicationScoped
public interface ImageDescriberAiService {
@SystemMessage("You are an expert image analyst. Describe this image like a poetic elf would - use flowing, archaic English with references to starlight, nature, and beauty..")
@UserMessage("Describe this image.") //
String describeImage(Image image); // Use Langchain4j's Image class
}
This contract binds the Eye, the LLaVA model, to your service, translating pixels into prose.
Act V — The RESTful Spellbook
Create src/main/java/org/acme/ImageDescriberResource.java
:
package org.acme;
import java.io.IOException;
import java.nio.file.Files;
import org.jboss.logging.Logger;
import org.jboss.resteasy.reactive.RestForm;
import org.jboss.resteasy.reactive.multipart.FileUpload;
import dev.langchain4j.data.image.Image;
import jakarta.inject.Inject;
import jakarta.ws.rs.Consumes;
import jakarta.ws.rs.POST;
import jakarta.ws.rs.Path;
import jakarta.ws.rs.Produces;
import jakarta.ws.rs.core.MediaType;
@Path("/describe-image")
public class ImageDescriberResource {
public static Logger LOG = Logger.getLogger(ImageDescriberResource.class);
@Inject
ImageDescriberAiService imageService;
@POST
@Path("/describe")
@Consumes(MediaType.MULTIPART_FORM_DATA)
@Produces(MediaType.TEXT_PLAIN)
public String describeImage(@RestForm("image") FileUpload file) {
if (file == null) {
return "Error: No file uploaded.";
}
try {
// 1. Read image bytes from the uploaded file
byte[] imageBytes = Files.readAllBytes(file.uploadedFile());
// 2. Get the MIME type of the uploaded image
String mimeType = file.contentType();
if (mimeType == null || (!mimeType.equals("image/png") && !mimeType.equals("image/jpeg"))) {
// Add more supported types if needed by your model
return "Error: Only PNG and JPEG images are supported. Uploaded type: " + mimeType;
}
LOG.info(mimeType + " image received for description.");
String base64String = java.util.Base64.getEncoder().encodeToString(imageBytes);
// Very LARGE log output, so commented out
// LOG.info(base64String + " b64");
// 3. Create a Langchain4j Image object
Image langchainImage = Image.builder()
.base64Data(base64String) // Use the base64 encoded string
.mimeType(mimeType)
.build();
// 4. Call the AI service
String imageDescription = imageService.describeImage(langchainImage);
return imageDescription;
} catch (IOException e) {
// Log the exception (e.g., using org.jboss.logging.Logger)
e.printStackTrace();
return "Error processing image: " + e.getMessage();
} catch (Exception e) {
// Catch other potential exceptions from the AI service
e.printStackTrace();
return "Error getting description from AI: " + e.getMessage();
}
}
}
This is your public-facing spell — a GET endpoint at /describe-image
.
Act VI — Summoning the Image
Place a test image in a known path:
mkdir -p /tmp/test_images
cp ~/Downloads/cat.jpg /tmp/test_images/my_cat.jpg
Act VII — Awakening the Vision Model
First, awaken the beast:
ollama serve
Ensure the model is ready:
ollama list
Then run your Quarkus spellbook:
./mvnw quarkus:dev
You are now the keeper of a REST endpoint connected to an all-seeing Eye.
Act VIII — The Seeing Stone: Testing Your App
Now gaze into the Palantír:
curl -X 'POST' \
'http://localhost:8080/describe-image/describe' \
-H 'accept: text/plain' \
-H 'Content-Type: multipart/form-data' \
-F 'image=@Felis_silvestris_silvestris_small_gradual_decrease_of_quality.png;type=image/png'
Expected result:
Behold a feline gaze, luminous as starlight's dance upon the night's deep blue! This cat, with its coat of shimmering golden hues, doth recline amidst the white, as if the snow itself were a mantle of silver dreams. Its eyes, a curious green, gleam with an inner light, as if the forest's whispering leaves have found their reflection in those depths. Whiskers poised in delicate symmetry, they dance upon the air, hinting at the secrets of the woodland's heart. This creature, a masterpiece of nature's art, doth captivate with its quiet elegance and serene beauty.
(Descriptions will vary depending on the model’s vision... and possibly its mood.)
You can also use the Swagger-UI to try out images.
Appendix: The Hidden Magic
Behind the curtain, this is what happens:
You cast an HTTP request.
Quarkus reads your file and turns it into a Base64 encoded String
Langchain4j builds an
Image
object and calls theImageDescriptionAiService
.The
quarkus-langchain4j-ollama
extension sends it to Ollama.Ollama passes the image and prompt to
qwen2.5vl
.The model returns a description.
You get a textual response in your terminal.
Epilogue — Notes from the Scribes
Use other vision capable models from ollama.com/library.
If you get errors, check your image path and Quarkus logs.
Play with the prompt to get different styles of descriptions.
Where to Next, Brave Developer?
Explore chaining multiple AI calls: e.g., detect → describe → summarize.
Use Qute to build a UI: upload an image and show the result.
Extend this project with Langchain4j memory, guardrails, or even audio analysis.
You Have Seen With the Eye
Your REST API can now look upon any image — and whisper its secrets back to you.
“All we have to decide is what to do with the model that is given to us.”