may asking basic question apology that, didn't find it's answer on internet. groupbykey maintains order of values. value occur first in input rdd should come first in output rdd. tried , mainlining order, wanted confirm expert. need below
input rdd [int, int]  1 20  2 10  1 8  1 25  output rdd  1 20 8 25  2 10 
no.
group values each key in rdd single sequence. hash-partitions resulting rdd existing partitioner/parallelism level. ordering of elements within each group not guaranteed, , may differ each time resulting rdd evaluated.
Comments
Post a Comment