Don't use count() when you don't need to return the exact number of rows
When you don't need to return the exact number of rows use:
DataFrame inputJson = sqlContext.read().json(...);
if (inputJson.takeAsList(1).size() == 0) {...}
or
if (inputJson.queryExecution.toRdd.isEmpty()) {...}
instead of:
if (inputJson.count() == 0) {...}
In RDD you can use isEmpty() because if you see the code: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/RDD.scala
def isEmpty(): Boolean = withScope {
partitions.length == 0 || take(1).length == 0
}
Last updated
Was this helpful?