may asking basic question apology that, didn't find it's answer on internet. have paired rdd want use aggragatebykey , concatenating values key. value occur first in input rdd should come first in aggragated rdd.
input rdd [int, int] 2 20 1 10 2 8 2 25 output rdd (aggregated rdd) 2 20 8 25 1 10
i tried aggregatebykey , gropbykey, both giving me ouput, order of values not maintained. please suggest in this.
since groupbykey
, aggregatebykey
indeed cannot preserve order - you'll have artificially add "hint" each record can order hint after grouping:
val input = sc.parallelize(seq((2, 20), (1, 10), (2, 8), (2, 25))) val withindex: rdd[(int, (long, int))] = input .zipwithindex() // adds index each record, used order result .map { case ((k, v), i) => (k, (i, v)) } // restructure (key, (index, value)) val result: rdd[(int, list[int])] = withindex .groupbykey() .map { case (k, it) => (k, it.tolist.sortby(_._1).map(_._2)) } // order values , remove index
Comments
Post a Comment