Spark streaming 2.0.0 - freezes after several days under load -


we running on aws emr 5.0.0 spark 2.0.0. consuming 125 shard kinesis stream. feeding 19k events/s using 2 message producers, each message 1k in size. consuming using cluster of 20 machines. code has flatmap(), groupbykey(), persist(storagelevel.memory_and_disk_ser_2()) , repartition(19); storing s3 using foreachrdd(); using backpressure , kryo:

sparkconf.set("spark.streaming.backpressure.enabled", "true"); sparkconf.set("spark.serializer", "org.apache.spark.serializer.kryoserializer"); 

while running, ganglia show consistent increase in used memory without gc. @ point, when there's no more free memory allocate, spark stops processing micro batches , incoming queue growing. that's freeze point - spark streaming not able recover. in our case, spark froze after 3.5 days of running under pressure.

the problem: need streaming run @ least week (preferably more) without restarting.

spark cofiguration:

spark.executor.extrajavaoptions -verbose:gc -xx:+printgcdetails -xx:+printgcdatestamps -xx:+useconcmarksweepgc -xx:cmsinitiatingoccupancyfraction=70 -xx:maxheapfreeratio=70 -xx:permsize=256m -xx:maxpermsize=256m -xx:onoutofmemoryerror='kill -9 %p'  spark.driver.extrajavaoptions -dspark.driver.log.level=info -xx:+useconcmarksweepgc -xx:permsize=256m -xx:maxpermsize=256m -xx:onoutofmemoryerror='kill -9 %p'  spark.master yarn-cluster spark.executor.instances 19 spark.executor.cores 7 spark.executor.memory 7500m spark.driver.memory 7500m spark.default.parallelism 133 spark.yarn.executor.memoryoverhead 2950 spark.yarn.driver.memoryoverhead 2950 spark.eventlog.enabled false spark.eventlog.dir hdfs:///spark-logs/ 

thanks in advance.


Comments