References
[1] Controlling Parallelism in Spark
[3] Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Stra…
[4] How-to: Tune Your Apache Spark Jobs (Part 1)
[5] How-to: Tune Your Apache Spark Jobs (Part 2)
[6] Top 5 Mistakes to Avoid When Writing Apache Spark Applications
[7] Spark best practices
[8] Best practice for retrieving big data from RDD to local machine
[9] Optimizing Spark Machine Learning for Small Data
[10] Tuning and Debugging in Apache Spark
[11] Advantage of Broadcast Variables
[12] When to use Broadcast variable?
[13] Implement treeReduce and treeAggregate
[14] Shufflling and repartitioning of RDD’s in apache spark
[15] Resource Allocation Configuration for Spark on YARN:
[16] Apache Spark: Config Cheatsheet:
[17] Tuning Java Garbage Collection for Apache Spark Applications:
[18] How to set Apache Spark Executor memory
[19] How to interpret RDD.treeAggregate
[20] Apache Spark 1.1: MLlib Performance Improvements
[21] Spark group multiple rdd items by key
[22] Spark Corner Cases
[23] Writing efficient Spark jobs
[24] Efficient Data Storage for Analytics with Parquet 2.0
[25] Understanding Query Plans and Spark UIs
[26] Spark best practices
Last updated