Use coalesce to repartition in decrease number of partition

Use coalesce if you decrease number of partition of the RDD instead of repartition. coalesce is usefull because avoids a full shuffle, It uses existing partitions to minimize the amount of data that's shuffled.

PreviousHash-partition before transformation over pair RDD NextTreeReduce and TreeAggregate Demystified

Last updated 2 years ago

Was this helpful?