Apache Spark Scala : groupbykey maintains order of values in input RDD or not -


may asking basic question apology that, didn't find it's answer on internet. groupbykey maintains order of values. value occur first in input rdd should come first in output rdd. tried , mainlining order, wanted confirm expert. need below

input rdd [int, int]  1 20  2 10  1 8  1 25  output rdd  1 20 8 25  2 10 

no.

group values each key in rdd single sequence. hash-partitions resulting rdd existing partitioner/parallelism level. ordering of elements within each group not guaranteed, , may differ each time resulting rdd evaluated.

https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.rdd.pairrddfunctions@groupbykey():org.apache.spark.rdd.rdd[(k,iterable[v])]


Comments