Avoid reduceByKey when the input and output value types are different
rdd.map(kv => (kv._1, new Set[String]() + kv._2)) .reduceByKey(_ ++ _)val zero = new collection.mutable.Set[String]()
rdd.aggregateByKey(zero)( (set, v) => set += v, (set1, set2) => set1 ++= set2)PreviousAvoid groupByKey when performing an associative reductive operationNextAvoid the flatMap-join-groupBy pattern
Last updated