join
returns rdd containing pairs of elements matching keys.
https://spark.apache.org/docs/1.6.2/api/python/pyspark.html#pyspark.rdd.join
example:
truedupsrdd = (rdd1.join(rdd2))
how can perform disjoin?
i tried:
notmatchingrdd = (rdd1.join(!rdd2))
use subtractbykey
:
return each (key, value) pair in c{self} has no pair matching key in c{other}.
rdd1.subtractbykey(rdd2)
Comments
Post a Comment