MySQL batch insertion with Spring Data JPA

MySQL batch insertion with Spring Data JPA

Batch insertion is essential when it comes to large data sets batch operations. It significantly reduces the database round trips and boosts the operation performance. Hence, it facilitates scaling the application with good performance. In this post, we will discuss how to do MySQL batch insertion with Spring Data JPA.

Introduction

Spring Data JPA uses Hibernate underneath. Hibernate supports batch insertion with some tweaks, namely the identity generator strategy. To have a working batch insertion in Hibernate, we must select the right @GeneratedValue strategy. At the moment, Hibernate supports three strategies as follows:

  • TABLE: Generates ids using Hibernate sequence table, which is very slow by default
  • SEQUENCE: Ids generated by relying upon database sequence
  • IDENTITY: Generates ids by the database identity column

Hibernate does not support IDENTITY strategy

Unfortunately, Hibernate disables batch insertion when using the IDENTITY generator. Read further here. That puts MySQL users at a disadvantage since MySQL does not support the SEQUENCE generator. Therefore, to achieve batch insertion in MySQL, we are left with only TABLE by default is painfully slow. However, if you must do batch insertion, you can apply TABLE generation with some optimizations offered by Vlad Mihalcea.

Now let’s presume that we are already using the TABLE generator and everything is ready for batch insertion utilization. What remains is to add the following config to our application.properties or application.yml.

spring.datasource.url=jdbc:mysql://localhost:3306/studiesdb?rewriteBatchedStatements=true
spring.jpa.properties.hibernate.jdbc.batch_size=50

The first line sets a flag in the database connection to support batch insertion. The second line sets the batch size, 50 in our example.

Now we can use the save method and pass an iterable to perform database batch insertion as follows,

List<Customer> save(customers);

What if TABLE generator is not an option

If you cannot use the table generator, you only are left with three options. In the following section, we discuss each with its pros and cons.

#1 Change of the database

One naive solution is to change your RDBMS to another engine that supports the sequence generator natively. However, a change like that imposes much overhead with little gain as it introduces a drastic infrastructure change and the application redesign. Needless to state a high risk of breaking the application.

#2 Use UUID

Another approach is to switch from ID to UUID. This way, we can set the UUID programmatically before persisting a record. Hence, we can still use batch insertion with MySQL.

The downside of this approach is that we need to assign the UUID manually for each insert. That results in an additional select query. Hibernates has to query the database for every insert statement to ensure the uniqueness of the UUID. If the record doesn’t exist, Hibernate inserts it. Otherwise, Hibernate updates the existing record.

The upside of this approach is its reliability. It is more reliable and less error-prone compared to the following solution since it does not have concurrency issues.

# 3 Batch insertion using JDBC

If we can ensure that the application will not face concurrency issues, we can get a JDBC connection from the Hibernate Session. Then do batch insertions using the plain old JDBC prepared statement.

To get a JDBC connection from Hibernate, we can do something like below.

Session session = entityManager.unwrap(Session.class);
session.doWork(connection -> {
    // JDBC batch insertion code here
});

Then we can write a regular JDBC batch insertion inside of the doWork block. If you don’t know how to write a JDBC prepared statement for batch insertion, check our how to increase insert performance of Amazon AWS Redshift using JDBC article.

Conclusion

In this article, we discussed how to do MySQL batch insertion with Spring Data JPA. As stated, Spring Data JPA uses Hibernate underneath. Unfortunately, Hibernate only supports batch insertion with table strategy in MySQL. The table strategy suffers from performance issues. To tackle that, we proposed three approaches; changing the database, using UUID instead of ID, and batch insertion with JDBC.

Inline/featured images credits