Hibernate Search Noir: A Full-Text Mystery with Quarkus and Elasticsearch
Step into the shoes of a detective and learn how to build smart, searchable applications in Java using Quarkus, Hibernate Search, and custom analyzers: One indexed clue at a time.
I've always had a soft spot for Jo Nesbø novels. The icy streets of Oslo. The brooding detectives with too many scars and not enough closure. The tangled plots where nothing is what it seems. So when I sat down to explore Hibernate Search with Quarkus and Elasticsearch, I thought: Why not bring a little of that Nordic noir atmosphere into the tutorial?
What follows isn’t just a hands-on guide to full-text search, projections, and custom analyzers. It’s a story. One that follows Søren J. Thorsen, a fictional ex-coder turned detective, as he investigates missing books, buried clues, and the linguistic fingerprints of forgotten records. You’ll learn how to index complex entities, define advanced analyzers, and paginate search results but through the lens of a gritty mystery where the database never lies and every field tells a story.
This tutorial sticks closely to the official Quarkus Hibernate Search guide but wraps it in something more fun. An atmosphere of tension and noir investigation. If you're a Java developer like me who enjoys a little style with your substance, grab your trench coat and let’s dive into the case.
Case File: The Elasticsearch Enigma
Your mission, should you choose to accept it, is to build a system that can meticulously index and rapidly search through a library of authors and their works. We need to make every piece of data searchable, every connection traceable. We're talking about full-text search, the kind that uncovers hidden truths in mountains of information.
Chapter 1: Gearing Up – The Precinct (Prerequisites)
Before you hit the streets, make sure your toolkit is in order:
JDK 17+ (or Mandrel for native): Your trusty sidearm.
Apache Maven 3.9.6+: For assembling your case files and evidence.
Container Runtime ( Podman): To run your Elasticsearch "interrogation room" as Dev Services or standalone.
IDE: Your dimly lit office where you piece it all together.
If you are impatient and want to dive directly into the full solution, take a look at the Github repository.
Chapter 2: Building the Case – Project Setup
We start by laying the groundwork. Open your terminal, the gritty back alley of your OS, and issue the command to create your Quarkus project. This isn't just a project; it's your investigation headquarters.
mvn io.quarkus.platform:quarkus-maven-plugin:3.23.0:create \
-DprojectGroupId=org.acme \
-DprojectArtifactId=hibernate-search-orm-elasticsearch-quickstart \
-Dextensions="hibernate-orm-panache,jdbc-postgresql,hibernate-search-orm-elasticsearch,rest-jackson" \
-DnoCode
cd hibernate-search-orm-elasticsearch-quickstart
This command brings in your specialist team:
hibernate-orm-panache
: Your informant for easy data access.jdbc-postgresql
: The connection to your local evidence locker (the database).hibernate-search-orm-elasticsearch
: The star of this operation, your direct line to Elasticsearch's analytical prowess.rest-jackson
: For handling the incoming tips (HTTP requests) and reports (JSON responses).
If you're working an existing case (project), add the key specialist:
./mvnw quarkus:add-extension -Dextensions='hibernate-search-orm-elasticsearch'
Chapter 3: Profiling the Suspects – Entities
Every good detective knows their suspects. In our case, these are the Author
and Book
entities. We need to tag them for indexing, making sure Elasticsearch can find every detail. org/acme/hibernate/search/elasticsearch/model/Author.java
package org.acme.hibernate.search.elasticsearch.model;
import java.util.List;
import java.util.ArrayList; // Added for initialization
import jakarta.persistence.CascadeType;
import jakarta.persistence.Entity;
import jakarta.persistence.FetchType;
import jakarta.persistence.OneToMany;
import org.hibernate.search.mapper.pojo.mapping.definition.annotation.FullTextField;
import org.hibernate.search.mapper.pojo.mapping.definition.annotation.Indexed;
import org.hibernate.search.mapper.pojo.mapping.definition.annotation.IndexedEmbedded;
import org.hibernate.search.mapper.pojo.mapping.definition.annotation.KeywordField;
import org.hibernate.search.engine.backend.types.Sortable; // For Sortable.YES
import io.quarkus.hibernate.orm.panache.PanacheEntity;
@Entity
@Indexed // This "suspect" is now under surveillance (indexed)
public class Author extends PanacheEntity {
@FullTextField(analyzer = "name") // Analyzed for full-text search using the "name" analyzer
@KeywordField(name = "firstName_sort", sortable = Sortable.YES, normalizer = "sort") // For exact matches and
// sorting
public String firstName;
@FullTextField(analyzer = "name")
@KeywordField(name = "lastName_sort", sortable = Sortable.YES, normalizer = "sort")
public String lastName;
@OneToMany(mappedBy = "author", cascade = CascadeType.ALL, orphanRemoval = true, fetch = FetchType.EAGER)
@IndexedEmbedded // Their "associates" (books) are also part of the profile
public List<Book> books = new ArrayList<>(); // Initialize to avoid NullPointerExceptions
public Author() {
}
// Consider adding toString, equals, and hashCode if not relying solely on
// PanacheEntity's
}
org/acme/hibernate/search/elasticsearch/model/Book.java
package org.acme.hibernate.search.elasticsearch.model;
import jakarta.persistence.Entity;
import jakarta.persistence.ManyToOne;
import org.hibernate.search.engine.backend.types.Sortable;
import org.hibernate.search.mapper.pojo.mapping.definition.annotation.FullTextField;
import org.hibernate.search.mapper.pojo.mapping.definition.annotation.Indexed;
import org.hibernate.search.mapper.pojo.mapping.definition.annotation.KeywordField;
import com.fasterxml.jackson.annotation.JsonIgnore; // To break cycles during serialization
import io.quarkus.hibernate.orm.panache.PanacheEntity;
@Entity
@Indexed // This "item of interest" is also indexed
public class Book extends PanacheEntity {
@FullTextField(analyzer = "english") // Titles searched using the "english" analyzer
@KeywordField(name = "title_sort", sortable = Sortable.YES, normalizer = "sort")
public String title;
@ManyToOne
@JsonIgnore // Avoids a loop when serializing Author <-> Book
public Author author;
public Book() {
}
// Consider adding toString, equals, and hashCode
}
Key Clues (Annotations):
@Indexed
: The APB (All-Points Bulletin) on your entity. Hibernate Search, now knows to keep an eye on it.@FullTextField
: This field contains vital clues. It's broken down by an analyzer (like "english" or "name") into searchable terms. Think of it as dusting for fingerprints.@KeywordField
: For when you need the exact term, like a license plate (sortable = Sortable.YES
means you can use this for sorting your evidence). Thenormalizer = "sort"
ensures consistent sorting (e.g., case-insensitivity).@IndexedEmbedded
: Sometimes clues are hidden with associates. This tells Hibernate Search to look into related entities (Book
withinAuthor
) and include their relevant fields in the main suspect's file.
Chapter 4: The Forensics Lab – Analyzers and Normalizers
This is where the real detective work happens. An analyzer is a set of tools you use to process text for searching. It typically consists of:
Character Filters: Cleaning up the raw text (e.g., removing HTML).
Tokenizer: Breaking the text into individual words or "tokens."
Token Filters: Modifying the tokens (e.g., converting to lowercase, removing common words like "the", stemming words to their root).
A normalizer is simpler: it processes text but keeps it as a single token, often used for sorting or faceting on keyword fields (e.g., converting "Jo Nesbø" to "jo nesbø" for case-insensitive sorting).
Quarkus allows you to define custom analyzers for Elasticsearch. For instance, the name
analyzer used for Author.firstName
and Author.lastName
might be defined to handle names specifically, perhaps by splitting on whitespace and lowercasing. The english
analyzer for Book.title
is a standard one, good for English text, handling stemming (e.g., "running" becomes "run") and stop words.
There are built-in analyzers in Elasticsearch. Take a look at the full documentation.
You'd typically define these in a class implementing AnalysisConfigurer
:
org/acme/hibernate/search/elasticsearch/config/AnalysisConfig.java
package org.acme.hibernate.search.elasticsearch.config;
import org.hibernate.search.backend.elasticsearch.analysis.ElasticsearchAnalysisConfigurationContext;
import org.hibernate.search.backend.elasticsearch.analysis.ElasticsearchAnalysisConfigurer;
import io.quarkus.hibernate.search.orm.elasticsearch.SearchExtension;
@SearchExtension
public class AnalysisConfig implements ElasticsearchAnalysisConfigurer {
@Override
public void configure(ElasticsearchAnalysisConfigurationContext context) {
context.analyzer("name").custom() // Define the "name" analyzer
.tokenizer("standard")
.tokenFilters("lowercase", "asciifolding");
context.analyzer("english").custom() // Define (or override) the "english" analyzer
.tokenizer("standard")
.tokenFilters("lowercase", "asciifolding", "snowball");
context.normalizer("sort").custom() // Define the "sort" normalizer
.tokenFilters("lowercase", "asciifolding");
}
}
Chapter 5: The Interrogation Room – REST Endpoints & Search
Now, we set up the LibraryResource
. This is where you'll bring in your "suspects" (data) for questioning (CRUD operations) and run your searches.
org/acme/hibernate/search/elasticsearch/LibraryResource.java
package org.acme.hibernate.search.elasticsearch;
import java.util.List;
import jakarta.enterprise.event.Observes;
import jakarta.inject.Inject;
import jakarta.transaction.Transactional;
import jakarta.ws.rs.Consumes;
import jakarta.ws.rs.DELETE;
import jakarta.ws.rs.GET;
import jakarta.ws.rs.POST;
import jakarta.ws.rs.PUT;
import jakarta.ws.rs.Path;
import jakarta.ws.rs.Produces;
import jakarta.ws.rs.core.MediaType;
import org.acme.hibernate.search.elasticsearch.model.Author;
import org.acme.hibernate.search.elasticsearch.model.Book;
import org.hibernate.search.mapper.orm.mapping.SearchMapping;
import org.hibernate.search.mapper.orm.session.SearchSession;
import org.jboss.resteasy.reactive.RestForm;
import org.jboss.resteasy.reactive.RestQuery; // For query parameters
import org.jboss.logging.Logger;
import io.quarkus.runtime.StartupEvent;
@Path("/library")
@Produces(MediaType.APPLICATION_JSON)
@Consumes(MediaType.APPLICATION_JSON) // Default for body if not form params
public class LibraryResource {
private static final Logger log = Logger.getLogger(LibraryResource.class);
@Inject
SearchSession searchSession;
// --- Author CRUD ---
@PUT
@Path("author")
@Transactional
@Consumes(MediaType.APPLICATION_FORM_URLENCODED) // As per original guide
public void addAuthor(@RestForm String firstName, @RestForm String lastName) {
Author author = new Author();
author.firstName = firstName;
author.lastName = lastName;
author.persist();
}
@GET
@Path("author")
public List<Author> getAllAuthors() {
return Author.listAll();
}
@POST
@Path("author/{id}")
@Transactional
@Consumes(MediaType.APPLICATION_FORM_URLENCODED)
public void updateAuthor(Long id, @RestForm String firstName, @RestForm String lastName) {
Author author = Author.findById(id);
if (author != null) {
author.firstName = firstName;
author.lastName = lastName;
author.persist();
}
}
@DELETE
@Path("author/{id}")
@Transactional
public void deleteAuthor(Long id) {
Author author = Author.findById(id);
if (author != null) {
author.delete(); // This will also remove associated books if CascadeType.ALL is set
}
}
// --- Book CRUD ---
@PUT
@Path("book")
@Transactional
@Consumes(MediaType.APPLICATION_FORM_URLENCODED)
public void addBook(@RestForm String title, @RestForm Long authorId) {
Author author = Author.findById(authorId);
if (author == null) {
// Consider throwing a WebApplicationException (e.g., NotFoundException)
return;
}
Book book = new Book();
book.title = title;
book.author = author;
book.persist(); // Persist the book
author.books.add(book); // Add book to author's list
author.persist(); // Update the author to establish the relationship fully
}
@GET
@Path("book")
public List<Book> getAllBooks() {
return Book.listAll();
}
@DELETE
@Path("book/{id}")
@Transactional
public void deleteBook(Long id) {
Book book = Book.findById(id);
if (book != null) {
if (book.author != null) {
book.author.books.remove(book); // Maintain consistency
// book.author.persist(); // Not strictly needed if managed by Hibernate
}
book.delete();
}
}
// --- Search Operations ---
@GET
@Path("author/search")
@Transactional
public List<Author> searchAuthors(@RestQuery String pattern, @RestQuery Integer size) {
log.infof("Pattern: " + pattern);
log.infof("Size: " + size);
if (size == null)
size = 10; // Default page size
return searchSession.search(Author.class)
.where(f -> (pattern == null || pattern.isBlank()) ? f.matchAll()
: f.simpleQueryString()
.fields("firstName", "lastName", "books.title") // Search across author names and book
// titles
// titles
.matching(pattern))
.fetchHits(size);
}
@GET
@Path("book/search")
@Transactional
public List<Book> searchBooks(@RestQuery String pattern, @RestQuery String authorLastName,
@RestQuery Integer size) {
log.infof("Pattern: " + pattern);
log.infof("Author: " + authorLastName);
if (size == null)
size = 10;
return searchSession.search(Book.class)
.where(f -> {
var bool = f.bool();
bool.must(pattern == null || pattern.isBlank() ? f.matchAll()
: f.simpleQueryString()
.field("title")
.matching(pattern));
if (authorLastName != null && !authorLastName.isBlank()) {
bool.must(f.match().field("author.lastName_sort").matching(authorLastName)); // Match on keyword
// field for
// author's
// last name
}
return bool;
})
.sort(f -> f.field("title_sort").asc()) // Assuming you add a title_sort KeywordField to Book
.fetchHits(size);
}
@Inject
SearchMapping searchMapping;
void onStart(@Observes StartupEvent ev) throws InterruptedException {
// only reindex if we imported some content
if (Book.count() > 0) {
searchMapping.scope(Object.class)
.massIndexer()
.startAndWait();
}
}
}
Note: For searchBooks
to sort by title, you'd add @KeywordField(name = "title_sort", sortable = Sortable.YES, normalizer = "sort")
to your Book.title
field, similar to how it's done for author names.
The searchSession
is your gateway to the Elasticsearch cluster via Hibernate Search. You use it to build queries targeting specific fields and patterns. Because we are going to import a bunch of entries from a file in a second, we are using this “workaround” to start the indexing for the imported data manually in the onStart
method.
Let’s continue adding some initial data to play around with:
resources/import.sql
INSERT INTO author(id, firstname, lastname) VALUES (1, 'John', 'Irving');
INSERT INTO author(id, firstname, lastname) VALUES (2, 'Paul', 'Auster');
INSERT INTO author(id, firstname, lastname) VALUES (3, 'Jo', 'Nesbø');
ALTER SEQUENCE author_seq RESTART WITH 3;
INSERT INTO book(id, title, author_id) VALUES (1, 'The World According to Garp', 1);
INSERT INTO book(id, title, author_id) VALUES (2, 'The Hotel New Hampshire', 1);
INSERT INTO book(id, title, author_id) VALUES (3, 'The Cider House Rules', 1);
INSERT INTO book(id, title, author_id) VALUES (4, 'A Prayer for Owen Meany', 1);
INSERT INTO book(id, title, author_id) VALUES (5, 'Last Night in Twisted River', 1);
INSERT INTO book(id, title, author_id) VALUES (6, 'In One Person', 1);
INSERT INTO book(id, title, author_id) VALUES (7, 'Avenue of Mysteries', 1);
INSERT INTO book(id, title, author_id) VALUES (8, 'The New York Trilogy', 2);
INSERT INTO book(id, title, author_id) VALUES (9, 'Mr. Vertigo', 2);
INSERT INTO book(id, title, author_id) VALUES (10, 'The Brooklyn Follies', 2);
INSERT INTO book(id, title, author_id) VALUES (11, 'Invisible', 2);
INSERT INTO book(id, title, author_id) VALUES (12, 'Sunset Park', 2);
INSERT INTO book(id, title, author_id) VALUES (101, 'The Snowman', 3);
INSERT INTO book(id, title, author_id) VALUES (102, 'The Leopard', 3);
INSERT INTO book(id, title, author_id) VALUES (103, 'The Redeemer', 3);
INSERT INTO book(id, title, author_id) VALUES (104, 'The Bat', 3);
ALTER SEQUENCE book_seq RESTART WITH 14;
Chapter 6: Connecting to the Wire – Elasticsearch Configuration
You need to tell Quarkus more about your database and your Elasticsearch. This goes into your src/main/resources/application.properties
file.
# Datasource (PostgreSQL in this case)
quarkus.datasource.db-kind=postgresql
# Hibernate ORM
quarkus.hibernate-orm.database.generation=drop-and-create
quarkus.hibernate-orm.log.sql=true
# Hibernate Search + Elasticsearch
quarkus.hibernate-search-orm.elasticsearch.version=8
quarkus.elasticsearch.devservices.image-name=docker.io/elastic/elasticsearch:8.18.1
quarkus.hibernate-search-orm.schema-management.strategy=drop-and-create-and-drop
If quarkus.hibernate-search-orm.elasticsearch.hosts
is not specified, Quarkus Dev Services will magically spin up an Elasticsearch instance for you during development. For production, you'll point this to your actual Elasticsearch cluster. A further note: I have deliberatly used a more up-to-date version of the elasticsearch image here because I ran into a bug on my M4.
Chapter 7: Running the Sting Operation
Time to bring the system online.
./mvnw quarkus:dev
Quarkus will start, and an Elasticsearch container is spun up as Dev Service. When the container downloaded, Hibernate Search will connect and create the necessary indexes based on your
@Indexed
entities and analyzer configurations. The@Observes StartupEvent
inLibraryResource
will populate some initial data.Test your Endpoints:
Use curl or a tool like Postman:
Add an author:
curl -X PUT -d "firstName=Harry&lastName=Hole" localhost:8080/library/author
(You'll need to find the ID of this new author if you want to add books to them, or adjust
addBook
to take author names).Search authors by pattern:
curl "localhost:8080/library/author/search?pattern=Harry"
curl "localhost:8080/library/author/search?pattern=Nesbo"
Search books by title and author's last name:
curl "localhost:8080/library/book/search?pattern=Redeemer"
curl "localhost:8080/library/book/search?pattern=Bat&authorLastName=Nesbo"
Case Closed... For Now.
You've set up a sophisticated data investigation system, Detective. Your entities are profiled, your analyzers are calibrated, and your search operations are ready to cut through the digital fog. This city's data won't know what hit it.
Remember, this is just the beginning. The original Quarkus guide has more leads on advanced configurations, projections, and fine-tuning your Elasticsearch setup. But for now, you've got the core of a powerful search solution. Good work.