Avoid the flatMap-join-groupBy pattern

When two datasets are already grouped by key and you want to join them and keep them grouped, you can just use cogroup. That avoids all the overhead associated with unpacking and repacking the groups.

PreviousAvoid reduceByKey when the input and output value types are different NextUse TreeReduce/TreeAggregate instead of Reduce/Aggregate

Last updated 2 years ago

Was this helpful?