Java 8 Aggregate Operations: A Comprehensive Guide

We use Collections not just to store objects but also to retrieve, remove and update those objects. Aggregate operations are used to perform those actions using lambda expressions. We're going to use lambda expressions throughout this article and if you want a refresher on those I suggest you to read lambda expressions in java 8 article.

Let's use the already defined Employee class from lambda expressions post.

First, we create an employee data and then retrieve all the male members from the data.

Employee[] empData = employees.toArray(new Employee[employees.size()]);

// lets print male members in the employee data.
for (Employee e : empData) {
    if (e.getGender() == Employee.Sex.MALE) {
        System.out.println(e.getName());
    }
}

That's a fine code but we can do better. The above code can be refactored to use a pipeline or a stream of aggregate operations. Before we do anything we need to convert our data to Stream. Stream is a sequence of elements supporting sequential and parallel aggregate operations. Stream operations are divided into intermediate and terminal operations which are combined to form stream pipelines. Intermediate operations such as filter() are lazy and returns a new stream while terminal operations such as forEach produces a result or a side-effect. For example:

Arrays.stream(empData).filter(e -> e.getGender() == Employee.Sex.MALE)
    .forEach(e -> System.out.println(e.getName()));

Let's look into another example with a few more operations.

double average = Arrays.stream(empData)
        .filter(e -> e.getGender() == Employee.Sex.MALE)
        .mapToInt(Employee::getAge)// mapToInt returns a stream (IntStream) that contains ages of all male members.
        .average() //Returns an OptionalDouble describing the arithmetic mean of elements of this stream, or an empty optional if this stream is empty. This is a special case of a reduction.  This is a terminal operation.
        .getAsDouble(); // If a value is present in this OptionalDouble, returns the value, otherwise throws NoSuchElementException.

Reduction operations are terminal operations that return one value by combining the contents of a stream. ex: average, sum, max, min and count. In the above code we've used average as the reduction operation.

We can use Stream.reduce as a general purpose reduction operation. It takes an identity and an accumulator.

T reduce(T identity, BinaryOperator<T> accumulator);

Reduce and accumulator both produces a new value and if we want a new collection or some other complex object then it hinders performance because accumulator adds an element to a new collection. If that's the case then we need to use Stream.collect method which modifies or mutates an existing value.

Reduction operations operates on stream as a whole rather than individual elements. A properly constructed reduce operation is inherently parallelizable because implementation can operate on subsets of data in parallel and then combine the intermediate results to get the final correct answer. This is only applicable when the given functions used to process the elements are associative and stateless. For example:

double dubData = numData.stream().reduce(0, Double::sum);

// this can safely run in parallel
double parallelData = numData.parallelStream().reduce(0, Double::sum);

There's one more method called collect that is used a lot but it's a mutable reduction.

List<String> namesOfMaleMembers = Arrays.stream(empData).filter(e -> e.getGender() == Employee.Sex.MALE).map(e -> e.getName()).collect(Collectors.toList());

As the name implies mutable reduction operation accumulates input elements into a mutable result container such as a Collection or StringBuilder, as it processes the elements in the stream. For example: Let's take a stream of strings and concatenate them into a single long string:

String concatenate = strings.reduce("", String::concat);

That is some beautiful code! If we were to write the same code without reduce operation it would be like 6-8 lines of code. If you've a big application and have lots of data manipulation then this will save the day! Elegant, readable code is the best code.

References: Oracle docs on java util stream and Oracle tutorial on aggregate operations