Java parallel stream is fantastic for processing gigantic data sets. It utilizes concurrency and multiple cores efficiency perfectly. So far my experience with it is very positive. Though, the major parallel stream has one major caveat. It uses the default application fork-join pool. And also the parallelism is not controllable. In this article, we cover how to control threads number in Java parallel stream by using a custom fork-join pool.
Control parallel stream threads with custom fork-join pool
The idea is to create a custom fork-join pool with a desirable number of threads and execute the parallel stream within it. This allows developers to control the threads that parallel stream uses.
Additionally, it separates the parallel stream thread pool from the application pool which is considered a good practice.
Let’s do the coding,
import java.util.ArrayList;
import java.util.List;
import java.util.UUID;
import java.util.concurrent.Callable;
import java.util.concurrent.ForkJoinPool;
import java.util.stream.IntStream;
public class Application {
private void processLargeDataSet() {
List<String> largeDataset = getLargeDataset();
ForkJoinPool customThreadPool = new ForkJoinPool(5);
customThreadPool.submit(() -> largeDataset.parallelStream().forEach(System.out::println));
customThreadPool.shutdownNow();
}
private List<String> getLargeDataset() {
List<String> largeDataset = new ArrayList<>();
IntStream.range(0, Integer.MAX_VALUE).forEach(i -> {
largeDataset.add(UUID.randomUUID().toString());
});
return largeDataset;
}
}
As you can see, first we created a very large fake data set containing over 2 billion records. Then we created a custom fork-join pool with a size of five. Finally, we executed the parallelStream
statement inside of the submit
block. This control threads number in Java parallel stream and enforces the stream to process five records max at the same time.
Keep in mind that in the example above, we started and stopped the pool in the method. That’s not a good approach to take. It’s just for demonstration only. You should create the pool when the application started and shut it down when the application before the application is stopped. The reason is creating and destroying the thread pool is an expensive process.
Limit common fork-join pool size with configuration
There’s an alternative approach that doesn’t require creating a fork-join pool. That’s by setting a JVM argument as follow,
-Djava.util.concurrent.ForkJoinPool.common.parallelism=5
Using the above option is highly discouraged. Because it limits the thread pool size of the entire of your application. It harms the application performance and highly limits the hardware utilization. For more Java tips and tricks check here.
Conclusion
In this article, we discussed how to control threads number in Java parallel stream by passing a custom fork-join pool and executing the parallel stream code there. Additionally, we demonstrated how to control the concurrency level in the common fork-join pool create by the JVM.
Inline/featured images credits
- Background picture (stairs) by ArtisanalPhoto on Unsplash
- Foreground picture (Java Duke) by Oracle