How to estimate the size of a Dataset
number Of Megabytes = M = (N*V*W) / 1024^2 N = number of records
V = number of variables
W = average width in bytes of a variable 1 string identifier of length 20 20
10 small integers (1 byte each) 10
4 standard integers (2 bytes each) 8
5 floating-point numbers (4 bytes each) 20
--------------------------------------------------------
20 variables total 58Explanation of formula
PreviousUse the right level of parallelismNextHow to estimate the number of partitions, executor's and driver's params (YARN Cluster Mode)
Last updated