In today’s fast-paced digital landscape, software performance is critical, particularly in systems like OpenLMIS, where efficient data handling directly impacts user experience (UX) and operational workflows. Optimizing the performance of import features within OpenLMIS – such as products (orderable), programs, and facilities – enhances system responsiveness, providing users with quicker, more reliable access to crucial information.
Following these principles, I led the effort to improve OpenLMIS import. In this post, I’ll share how I achieved a remarkable 90% performance boost, turning a once time-consuming task into a high-speed, efficient asset for users and administrators alike.
Hibernate batches
My first step was optimizing batching configuration. Batching allows you to send multiple insert operations in a single request to the database, which significantly reduces the overhead associated with multiple database round trips. This improvement can be achieved through the configuration of Hibernate’s batch settings.
In Hibernate, the option spring.jpa.properties.hibernate.jdbc.batch_size
works together with spring.jpa.properties.hibernate.order_inserts
and enables batching for inserts.
For updates, there are spring.jpa.properties.hibernate.jdbc.batch_versioned_data
and spring.jpa.properties.hibernate.order_updates
[1].
Tip
Take note that the latest documentation of Hibernate 6.6 doesn’t mention these properties
Sorting is crucial when dealing with entities that have @ManyToOne
or @OneToOne
relationships with other entities, and having certain cascade options [2]. Without sorting, Hibernate would execute inserts for each entity individually, followed by inserts for the related entity/-s which would practically disable batching for that entity. An ID field with GenerationType.IDENTITY
would also disable batching – but that’s not something that can be fixed by configuration.
In this case, when the tested application was started locally with the database running on the same server, these batch settings didn’t yield much improvement. The difference, as tested locally, was negligible.
Merging inserts with PostgreSQL
Upon inspecting the actual SQLs executed by Hibernate, I noticed that they weren’t quite what I expected. Hibernate was executing individual insert statements, which, at first glance, seemed odd. However, it’s important to understand that what Hibernate refers to as “batching” is simply sending multiple insert statements to the database driver at once, and waiting for all of them to complete, rather than executing them one by one. This is different from the true batch processing I was aiming for. What I really wanted was a more optimized batch process, where multiple records are handled in a single operation.
The PostgreSQL Driver [3] exposes the boolean property reWriteBatchedInserts
that can help make database inserts more efficient. When this setting is enabled, instead of sending multiple insert statements separately to the database, the driver combines them into one single insert statement. This reduces the number of operations the database has to handle, improving performance.
To enable this feature, I added reWriteBatchedInserts=true
to the connection URL when setting up the connection to the PostgreSQL database:
postgresql://user@host:5432/dbname?reWriteBatchedInserts=true
With these changes applied, the local test showed 50% improvement. Unfortunately, it works for inserts only.
Reducing transaction size
Since the configuration optimization didn’t provide enough of a boost, the code refactor could not be avoided. The import often works with files of 10000 rows or more, and the good practice regarding Transactions is to keep them small – around 1000-5000 entities. For that reason I choose to introduce simple Transaction-level batching as the first code refactor.
Tip
This comes with some issues regarding rollback. Since each batch runs in its own transaction, an error doesn’t roll back the whole file “out-of-the-box”. In this case, since the import works in Create&Update mode, I decided to allow partial imports.
A simple split of data into chunks of 1000 rows – in my case, safe to do since the rows of a single import file are unrelated to each other – and processing chunks in its own transactions didn’t provide any serious boost – a modest 10% improvement.
?
You might notice that I provide a single percentage when talking about improvements. It’s because, during tests, I’ve limited myself to a single data size which was required to be imported in a reasonable time. What’s more, I was testing only a selected part of the data. This is not a formal way of performance testing. For formal performance testing see: Connect for Life/Vxnaid Performance Testing.
More refactoring
Sometimes, configuration changes or quick refactors are not enough, and you have to simply dive into the code and change the algorithms – especially since some low-quality designs are visible.
The general idea of the former import algorithm was to import each line individually. If a related entity was needed, a call to Entity Manager or Repository’s findBySomething was made. In other words, for each line, the code would search for dependencies individually, use them, and discard them. It helped me easily integrate transactions, but is a suboptimal solution in general.
I’ve refactored the code to inspect the file’s rows in search of any related entity that is going to be needed – many of them would repeat – and load them with single queries. In that way, I would build – or pre-heat – an in-memory cache that would be used during the actual importing process.
Tip
The first entity: the cache pre-heating produced another 10% improvement.
The second entity (the one with the most related entities): the cache resulted in a whopping 91% improvement.
Summary
In this post, we’ve explored the journey of optimizing the import feature within OpenLMIS to achieve over a 90% improvement in performance. By leveraging Hibernate batch configurations, PostgreSQL-specific optimizations, transaction-level batching, and targeted code refactoring, I was able to streamline data handling and drastically reduce import time.
These combined enhancements transformed the import feature from a cumbersome task into a swift, robust process that benefits both users and administrators. The task which would take over 15 min, now takes under 1 minute.