Minimal Spark hello world

1. Build Sbt

Create a build.sbt file. This manages all dependencies and stuffs that would had been in your pom file-

import AssemblyKeys._

import sbtassembly.Plugin._

name := "FeedSystem"

version := "1.0"

scalaVersion := "2.10.5"

organization := "com.snapdeal"

resolvers += "Typesafe Repo" at "http://repo.typesafe.com/typesafe/releases/"

libraryDependencies ++= Seq("org.apache.spark" % "spark-core_2.10" % "1.3.1" % "provided",
"org.apache.spark" % "spark-mllib_2.10" % "1.3.1" % "provided",
"com.amazonaws" % "aws-java-sdk" % "1.9.27",
"org.scalatest" % "scalatest_2.10" % "2.2.5" % "test")

scalacOptions += "-deprecation"

scalacOptions += "-feature"

// This statement includes the assembly plugin capabilities
assemblySettings

// Configure jar named used with the assembly plug-in
jarName in assembly := "testspark-assembly.jar"

// A special option to exclude Scala itself form our assembly jar, since Spark
// already bundles Scala.
assemblyOption in assembly := (assemblyOption in assembly).value.copy(includeScala = false)

2.Assembly Sbt

Create an assembly.sbt in case you would like to bundle jars as a fat jar. Ignore if not required. Add the below line in the assembly.sbt file-

addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.11.2")

3. Sample Spark Code

package com.fun.test

import org.apache.spark._
import org.apache.spark.{SparkContext, SparkConf}

object TestSystem {
  def main(args: Array[String]) {
    val file = "file:///usr/local/spark/README.md";

    val conf = new SparkConf().setAppName("TestSystem")
    val sc = new SparkContext(conf)
    val accessLog = sc.textFile(file)

    println("Number of entries: " + accessLog.count())
  }
}

 

 

Yash Sharma is a Big Data & Machine Learning Engineer, A newbie OpenSource contributor, Plays guitar and enjoys teaching as part time hobby.
Talk to Yash about Distributed Systems and Data platform designs.

Leave a Reply

Your email address will not be published. Required fields are marked *