may asking basic question apology that, didn't find it's answer on internet. groupbykey maintains order of values. value occur first in input rdd should come first in output rdd. tried , mainlining order, wanted confirm expert. need below
input rdd [int, int] 1 20 2 10 1 8 1 25 output rdd 1 20 8 25 2 10
no.
group values each key in rdd single sequence. hash-partitions resulting rdd existing partitioner/parallelism level. ordering of elements within each group not guaranteed, , may differ each time resulting rdd evaluated.
Comments
Post a Comment