Dynamic CSV Uploads in Java: Building Smart, Flexible Databases with Quarkus
Master the art of ingesting any CSV file into PostgreSQL dynamically using Quarkus, Hibernate Panache, and Jackson CSV — no rigid schemas required.
You know how it goes. You're focused on your work when suddenly a manager drops a pile of random CSV files onto your desk with the classic instruction: "Just get these into the database somehow." There’s no fixed structure, no guarantees about the columns, and of course, the deadline was yesterday. You need a solution that's flexible, fast, and production-ready.
Fortunately, with the right tools, Quarkus, Hibernate Panache, PostgreSQL, and Jackson CSV, you can build exactly what’s needed without losing your sanity.
In this article, you’ll build a dynamic CSV upload application that:
Accepts any CSV file.
Automatically creates the database schema.
Inserts the data safely.
Tracks metadata about each upload.
Lists the dynamic tables and their content
Minimal assumptions. Maximal flexibility. You’ll probably need roughly 30 minutes to build this yourself but you can also just check out the repository on my github for a quickstart.
Let’s get to work.
Project Setup
First, create a new project using the Quarkus CLI:
quarkus create app org.acme:csv-uploader --extension="quarkus-rest,quarkus-hibernate-orm-panache,quarkus-jdbc-postgresql,quarkus-smallrye-openapi,quarkus-jackson"
cd csv-uploader
You'll also need Jackson's CSV module. Add this to your pom.xml
manually:
<dependency>
<groupId>com.fasterxml.jackson.dataformat</groupId>
<artifactId>jackson-dataformat-csv</artifactId>
</dependency>
Why these choices?
quarkus-rest: A lightweight, fast HTTP layer.
Hibernate Panache: Simplifies entity management.
PostgreSQL: A robust relational database.
SmallRye OpenAPI: Auto-generates API docs.
Jackson: For parsing JSON and CSV formats easily.
Database Configuration
In your src/main/resources/application.properties
, configure PostgreSQL:
quarkus.datasource.db-kind=postgresql
quarkus.datasource.username=postgres
quarkus.datasource.password=postgres
quarkus.hibernate-orm.database.generation=update
quarkus.hibernate-orm.log.sql=true
This lets Hibernate create and update our permanent metadata table automatically. Dynamic tables will be created manually based on each CSV.
Metadata Entity: Tracking Uploads
We'll track every upload with a small Panache entity:
package org.acme.entity;
import io.quarkus.hibernate.orm.panache.PanacheEntity;
import jakarta.persistence.Entity;
import java.time.LocalDateTime;
@Entity
public class CsvUploadMetadata extends PanacheEntity {
public String tableName;
public LocalDateTime uploadTime;
public int recordCount;
}
Why?
You want to know what got uploaded, when, and how many records.
Auditing matters, even for quick tools.
Parsing CSV Files Dynamically
CSV structures are unpredictable, so instead of mapping to a fixed Java class, we’ll parse them into simple key-value maps.
package org.acme.util;
import com.fasterxml.jackson.databind.MappingIterator;
import com.fasterxml.jackson.dataformat.csv.CsvMapper;
import com.fasterxml.jackson.dataformat.csv.CsvSchema;
import jakarta.enterprise.context.ApplicationScoped;
import org.jboss.resteasy.reactive.multipart.FileUpload;
import java.io.IOException;
import java.util.List;
import java.util.Map;
@ApplicationScoped
public class CsvReader {
public List<Map<String, String>> parseCsv(FileUpload fileUpload) throws IOException {
CsvMapper mapper = new CsvMapper();
CsvSchema schema = CsvSchema.emptySchema().withHeader();
MappingIterator<Map<String, String>> it = mapper.readerFor(Map.class)
.with(schema)
.readValues(fileUpload.uploadedFile().toFile());
return it.readAll();
}
}
Each row is now a Map<String, String>
, perfect for dynamic processing.
Schema Validation and Type Inference
Before creating a database table, we’ll validate the header and infer column types based on sample data.
Here’s a basic type inference utility:
package org.acme.util;
import java.util.regex.Pattern;
public class TypeInference {
private static final Pattern NUMBER_PATTERN = Pattern.compile("-?\\d+(\\.\\d+)?");
public static String inferType(String value) {
if (value == null || value.isBlank()) {
return "TEXT"; // Default fallback
}
if (NUMBER_PATTERN.matcher(value).matches()) {
return value.contains(".") ? "DOUBLE PRECISION" : "INTEGER";
}
return "TEXT";
}
}
Bottom line:
Bad headers are rejected.
Types are assigned (
INTEGER
,DOUBLE PRECISION
, or fallbackTEXT
).
Creating Dynamic Tables and Inserting Data
Now the heart of the system: dynamic table creation and insertion.
Here’s the CsvUploadService
:
package org.acme.service;
import jakarta.enterprise.context.ApplicationScoped;
import jakarta.inject.Inject;
import jakarta.persistence.EntityManager;
import jakarta.transaction.Transactional;
import org.acme.entity.CsvUploadMetadata;
import org.acme.util.CsvReader;
import org.acme.util.TypeInference;
import org.jboss.resteasy.reactive.multipart.FileUpload;
import java.io.IOException;
import java.time.LocalDateTime;
import java.util.*;
import java.util.stream.Collectors;
@ApplicationScoped
public class CsvUploadService {
@Inject
EntityManager entityManager;
@Inject
CsvReader csvReader;
@Transactional
public CsvUploadMetadata uploadCsv(FileUpload fileUpload) throws IOException {
List<Map<String, String>> rows = csvReader.parseCsv(fileUpload);
if (rows.isEmpty()) {
throw new RuntimeException("Empty CSV file");
}
String tableName = "csv_" + UUID.randomUUID().toString().replace("-", "");
Map<String, String> sampleRow = rows.get(0);
Map<String, String> columnTypes = new HashMap<>();
sampleRow.forEach((key, value) -> {
if (key == null || key.trim().isEmpty()) {
throw new RuntimeException("Invalid column name detected in CSV header.");
}
columnTypes.put(key, TypeInference.inferType(value));
});
String createTableSql = "CREATE TABLE " + tableName + " (id SERIAL PRIMARY KEY, " +
columnTypes.entrySet().stream()
.map(entry -> "\"" + entry.getKey() + "\" " + entry.getValue())
.collect(Collectors.joining(", "))
+ ")";
entityManager.createNativeQuery(createTableSql).executeUpdate();
for (Map<String, String> row : rows) {
String columns = row.keySet().stream()
.map(col -> "\"" + col + "\"")
.collect(Collectors.joining(", "));
String values = row.values().stream()
.map(value -> value == null ? "NULL" : "'" + value.replace("'", "''") + "'")
.collect(Collectors.joining(", "));
String insertSql = "INSERT INTO " + tableName + " (" + columns + ") VALUES (" + values + ")";
entityManager.createNativeQuery(insertSql).executeUpdate();
}
CsvUploadMetadata metadata = new CsvUploadMetadata();
metadata.tableName = tableName;
metadata.uploadTime = LocalDateTime.now();
metadata.recordCount = rows.size();
metadata.persist();
return metadata;
}
}
Notes:
Table names are random UUIDs to avoid conflicts.
SQL injection risks are mitigated because columns are controlled.
Data is inserted carefully, escaping single quotes.
REST API: Upload Endpoint
Simple and clean, here’s your resource:
package org.acme.resource;
import jakarta.inject.Inject;
import jakarta.ws.rs.Consumes;
import jakarta.ws.rs.POST;
import jakarta.ws.rs.Path;
import jakarta.ws.rs.Produces;
import jakarta.ws.rs.core.MediaType;
import org.acme.entity.CsvUploadMetadata;
import org.acme.service.CsvUploadService;
import org.jboss.resteasy.reactive.MultipartForm;
import org.jboss.resteasy.reactive.multipart.FileUpload;
import org.eclipse.microprofile.openapi.annotations.Operation;
import org.eclipse.microprofile.openapi.annotations.responses.APIResponse;
@Path("/csv")
public class CsvUploadResource {
@Inject
CsvUploadService csvUploadService;
@POST
@Consumes(MediaType.MULTIPART_FORM_DATA)
@Produces(MediaType.APPLICATION_JSON)
@Operation(summary = "Upload a CSV and create a dynamic table", description = "Uploads a CSV file, stores it in a dynamically created table, and returns metadata.")
@APIResponse(responseCode = "200", description = "Upload successful")
public CsvUploadMetadata uploadCsv(@MultipartForm FileUpload fileUpload) throws Exception {
return csvUploadService.uploadCsv(fileUpload);
}
}
Thanks to SmallRye OpenAPI
, this endpoint will be automatically visible at /q/swagger-ui/
and you can use it to upload a test.csv like the following:
first_name,last_name,email
John,Doe,john.doe@example.com
Jane,Smith,jane.smith@example.com
I have just added it to the src/main/resources folder.
The CSV Table Viewer Endpoint
Once you can upload random CSVs into dynamic tables, the next obvious step is being able to see what’s inside.
That’s exactly what the CsvShowResource
class is built for: a clean, secure, and JSON-friendly way to view the uploaded data.
Let’s walk through how it works.
Resource Structure: /csv-show
The CsvShowResource
class exposes all viewer functionality under a dedicated path:
@Path("/csv-show")
@Produces(MediaType.APPLICATION_JSON)
public class CsvShowResource {
@Inject
EntityManager entityManager;
}
Key points:
Every endpoint sits under
/csv-show
.Every response is clean JSON.
It uses JPA’s
EntityManager
directly for dynamic SQL — because we’re dealing with tables not tied to static entities.
No unnecessary abstractions here. We keep it simple and explicit.
Listing Uploaded Tables
You probably want to see which tables exist first. Here’s the endpoint:
@GET
@Path("/tables")
public List<Map<String, Object>> listUploadedTables() {
return CsvUploadMetadata.<CsvUploadMetadata>listAll().stream()
.map(metadata -> Map.<String, Object>of(
"uid", metadata.tableName,
"uploadTime", metadata.uploadTime,
"recordCount", metadata.recordCount
))
.toList();
}
What it does:
Queries all metadata entries using Panache’s
listAll()
.Maps each metadata record into a small JSON object.
Keeps the output clean: table name, upload time, record count.
Example output:
[
{
"uid": "csv_78fb63ed80ea4931b297bd9ed7ce49d7",
"uploadTime": "2024-03-21T10:30:00",
"recordCount": 42
}
]
No extra noise>: Just the essentials you need to know what’s available.
Viewing Table Content
When you want to look inside a dynamic table, this endpoint handles it:
@GET
@Path("/table/{tableName}")
@Transactional
@SuppressWarnings("unchecked")
public List<Map<String, Object>> showTableContent(@PathParam("tableName") String tableName) {
if (!isTableAllowed(tableName)) {
throw new IllegalArgumentException("Table not found");
}
String sql = "SELECT * FROM " + tableName;
List<Object[]> rows = (List<Object[]>) entityManager
.createNativeQuery(sql, Object[].class)
.getResultList();
List<String> columnNames = entityManager.createNativeQuery(
"SELECT column_name FROM information_schema.columns WHERE table_name = '" + tableName + "' ORDER BY ordinal_position"
).getResultList();
return mapResults(rows, columnNames);
}
What’s happening here:
Security first: It checks if the requested table is in
CsvUploadMetadata
.Data retrieval: Native SQL fetches all rows from the table.
Schema discovery: Column names are queried dynamically from PostgreSQL’s
information_schema
.Result mapping: Rows are paired with their corresponding column names into JSON objects.
The @Transactional
annotation ensures everything runs smoothly inside a managed persistence context.
Security: Only Allow Valid Tables
Security is simple but critical: No random table names allowed.
private boolean isTableAllowed(String tableName) {
return CsvUploadMetadata.find("tableName", tableName).firstResult() != null;
}
This check:
Verifies the table name exists in our metadata.
Prevents SQL injection or accidental access to unrelated database tables.
You don’t want users poking around your production schema, right?
Mapping the Results
Finally, once the data and columns are available, we stitch them together:
private List<Map<String, Object>> mapResults(List<Object[]> rows, List<String> columnNames) {
return rows.stream().map(row -> {
Map<String, Object> map = new java.util.HashMap<>();
for (int i = 0; i < row.length; i++) {
map.put(columnNames.get(i), row[i]);
}
return map;
}).toList();
}
How it works:
Every database row (
Object[]
) is zipped with the column names.Builds a proper JSON object per row.
Streams everything efficiently.
This keeps your response nicely structured, no matter how wild the original CSV was.
Other Approaches: Beyond Dynamic Tables
Creating dynamic tables isn’t the only way to handle unpredictable data.
One alternative is using a document store like MongoDB. Document stores naturally accept varying schemas, making them a great fit for unknown CSV structures. You lose some SQL querying power, but gain a lot of flexibility.
Another option is using JSON blob fields inside PostgreSQL. Instead of creating tables, you could store each CSV row as a JSON document inside a JSONB
field. Querying individual fields becomes trickier unless you carefully index them, but it’s simpler for ingestion.
Finally, dedicated ingestion platforms like Apache NiFi or Debezium can automate and monitor complex data flows. But that's heavy machinery for a simple upload tool.
The dynamic-table model we chose keeps everything SQL-friendly, lightweight, and manageable, which fits most internal app needs.
Real-World Considerations
A few important points before you unleash this:
Schema Evolution: Decide what happens when new CSVs have extra/missing fields.
Big Files: For large CSVs, switch from row-by-row inserts to batch operations or PostgreSQL’s
COPY
feature.Security: Use a restricted DB user to limit permissions.
Operational Management: Clean up old tables regularly if you expect hundreds of uploads.
Query Performance: Consider adding indexes if the tables are actively queried.
Conclusion
You just built a flexible, powerful dynamic CSV ingestion system with:
Smart type inference.
Safe schema validation.
Automated database handling.
Quarkus performance under the hood.
With only a few classes and clear patterns, you now have the foundation for admin tools, data sandboxes, or internal ingestion APIs.
And thanks to Quarkus, it’s fast, efficient, and fun to build. Make sure to try it out yourself and learn more about Quarkus.