Because no partitioner is passed to
reduceByKey, the default partitioner will be used, resulting in rdd1 and rdd2 both hash-partitioned. These two
reduceByKeyswill result in two shuffles. If the RDDs have the same number of partitions, the join will require no additional shuffling. Because the RDDs are partitioned identically, the set of keys in any single partition of rdd1 can only show up in a single partition of rdd2. Therefore, the contents of any single output partition of rdd3 will depend only on the contents of a single partition in rdd1 and single partition in rdd2, and a third shuffle is not required.