data engineering

Spark append mode for partitioned text file fails with SaveMode.Append – IOException File already Exists

Code- dataDF.write.partitionBy(“year”, “month”, “date”).mode(SaveMode.Append).text(“s3://data/test2/events/”) Error- 16/07/06 02:15:05 ERROR datasources.DynamicPartitionWriterContainer: Aborting task. File already exists:s3://path/1839dd1ed38a.gz at at org.apache.hadoop.fs.FileSystem.create( at org.apache.hadoop.fs.FileSystem.create( at org.apache.hadoop.fs.FileSystem.create( at… Read More »Spark append mode for partitioned text file fails with SaveMode.Append – IOException File already Exists

How to write gzip compressed Json in spark data frame

A compressed format can be specified in spark as : conf = SparkConf() conf.set(“spark.hadoop.mapred.output.compress”, “true”) conf.set(“spark.hadoop.mapred.output.compression.codec”, “true”) conf.set(“spark.hadoop.mapred.output.compression.codec”, “”) conf.set(“spark.hadoop.mapred.output.compression.type”, “BLOCK”) The same can be… Read More »How to write gzip compressed Json in spark data frame