# References

* \[1] Controlling Parallelism in Spark
  * <http://www.bigsynapse.com/spark-input-output>
* \[2] Avoid GroupByKey
  * <https://databricks.gitbooks.io/databricks-spark-knowledge-base/content/best_practices/prefer_reducebykey_over_groupbykey.html>
* \[3] Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Stra…
  * <http://www.slideshare.net/databricks/strata-sj-everyday-im-shuffling-tips-for-writing-better-spark-programs>
* \[4] How-to: Tune Your Apache Spark Jobs (Part 1)
  * <http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-1/>
* \[5] How-to: Tune Your Apache Spark Jobs (Part 2)
  * <http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-2/>
* \[6] Top 5 Mistakes to Avoid When Writing Apache Spark Applications
  * <https://intellipaat.com/blog/top-5-mistakes-writing-apache-spark-applications/>
* \[7] Spark best practices
  * <https://robertovitillo.com/2015/06/30/spark-best-practices/>
* \[8] Best practice for retrieving big data from RDD to local machine
  * <http://stackoverflow.com/questions/21698443/spark-best-practice-for-retrieving-big-data-from-rdd-to-local-machine>
* \[9] Optimizing Spark Machine Learning for Small Data
  * <http://eugenezhulenev.com/blog/2015/09/16/spark-ml-for-big-and-small-data/>
* \[10] Tuning and Debugging in Apache Spark
  * <http://www.slideshare.net/pwendell/tuning-and-debugging-in-apache-spark>
* \[11] Advantage of Broadcast Variables
  * <http://stackoverflow.com/questions/26884871/advantage-of-broadcast-variables>
* \[12] When to use Broadcast variable?
  * <https://blog.knoldus.com/2016/04/30/broadcast-variables-in-spark-how-and-when-to-use-them/>
* \[13] Implement treeReduce and treeAggregate
  * <https://issues.apache.org/jira/browse/SPARK-2174>
* \[14] Shufflling and repartitioning of RDD’s in apache spark
  * <https://blog.knoldus.com/2015/06/19/shufflling-and-repartitioning-of-rdds-in-apache-spark/>
* \[15] Resource Allocation Configuration for Spark on YARN:
  * <https://www.mapr.com/blog/resource-allocation-configuration-spark-yarn>
* \[16] Apache Spark: Config Cheatsheet:
  * <http://c2fo.io/c2fo/spark/aws/emr/2016/07/06/apache-spark-config-cheatsheet/>
* \[17] Tuning Java Garbage Collection for Apache Spark Applications:
  * <https://databricks.com/blog/2015/05/28/tuning-java-garbage-collection-for-spark-applications.html>
* \[18] How to set Apache Spark Executor memory
  * <http://stackoverflow.com/questions/26562033/how-to-set-apache-spark-executor-memory>
* \[19] How to interpret RDD.treeAggregate
  * <http://stackoverflow.com/questions/29860635/how-to-interpret-rdd-treeaggregate>
* \[20] Apache Spark 1.1: MLlib Performance Improvements
  * [https://databricks.com/blog/2014/09/22/spark-1-1-mllib-performance-improvements.html](https://www.gitbook.com/book/umbertogriffo/apache-spark-best-practices-and-tuning/edit)
* \[21] Spark group multiple rdd items by key
  * <http://stackoverflow.com/questions/36447057/spark-group-multiple-rdd-items-by-key>
* \[22] Spark Corner Cases
  * <http://codingjunkie.net/spark-corner-cases/>
* \[23] Writing efficient Spark jobs
  * <http://fdahms.com/2015/10/04/writing-efficient-spark-jobs/>
* \[24] Efficient Data Storage for Analytics with Parquet 2.0
  * [Efficient Data Storage for Analytics with Parquet 2.0](https://www.slideshare.net/InfoQ/efficient-data-storage-for-analytics-with-parquet-20)
* \[25] Understanding Query Plans and Spark UIs
  * [Understanding Query Plans and Spark UIs](https://www.slideshare.net/databricks/understanding-query-plans-and-spark-uis)
* \[26] Spark best practices
  * [Spark best practices](https://robertovitillo.com/2015/06/30/spark-best-practices/)
