Avoid the flatMap-join-groupBy pattern
When two datasets are already grouped by key and you want to join them and keep them grouped, you can just use cogroup. That avoids all the overhead associated with unpacking and repacking the groups.
PreviousAvoid reduceByKey when the input and output value types are differentNextUse TreeReduce/TreeAggregate instead of Reduce/Aggregate
Last updated
Was this helpful?