最近在公司换了一个项目组,做Spark项目的,于是开始学Spark。
版本:
<scala.version>2.13.8</scala.version><spark.version>3.3.0</spark.version>
1.首先安装Scala
Scala安装成功之后:
2.新建maven项目
pom文件:
<?xml version="1.0" encoding="UTF-8"?><project xmlns="/POM/4.0.0"xmlns:xsi="/2001/XMLSchema-instance"xsi:schemaLocation="/POM/4.0.0 /xsd/maven-4.0.0.xsd"><modelVersion>4.0.0</modelVersion><groupId>org.xu</groupId><artifactId>spark-study</artifactId><version>1.0-SNAPSHOT</version><properties><scala.version>2.13.8</scala.version><spark.version>3.3.0</spark.version></properties><dependencies><dependency><groupId>org.scala-lang</groupId><artifactId>scala-library</artifactId><version>${scala.version}</version></dependency><dependency><groupId>org.apache.spark</groupId><artifactId>spark-core_2.13</artifactId><version>${spark.version}</version></dependency><dependency><groupId>org.apache.spark</groupId><artifactId>spark-sql_2.13</artifactId><version>${spark.version}</version></dependency></dependencies><build><sourceDirectory>src/main/scala</sourceDirectory><testSourceDirectory>src/test/scala</testSourceDirectory><plugins><plugin><groupId>net.alchim31.maven</groupId><artifactId>scala-maven-plugin</artifactId><version>3.2.2</version><executions><execution><goals><goal>compile</goal><goal>testCompile</goal></goals><configuration><args><arg>-dependencyfile</arg><arg>${project.build.directory}/.scala_dependencies</arg></args></configuration></execution></executions></plugin><plugin><groupId>org.apache.maven.plugins</groupId><artifactId>maven-shade-plugin</artifactId><version>2.4.3</version><executions><execution><phase>package</phase><goals><goal>shade</goal></goals><configuration><filters><filter><artifact>*:*</artifact><excludes><exclude>META-INF/*.SF</exclude><exclude>META-INF/*.DSA</exclude><exclude>META-INF/*.RSA</exclude></excludes></filter></filters><transformers><transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer"></transformer></transformers></configuration></execution></executions></plugin></plugins></build></project>
新建WordCount.scala文件:
import org.apache.spark.{SparkConf, SparkContext}object WordCount {def main(args: Array[String]): Unit = {val conf = new SparkConf().setAppName("wordCount").setMaster("local[1]")val sc = new SparkContext(conf)val fileRdd=sc.textFile("F:\\bigdata\\test.txt")val wordsRdd=fileRdd.flatMap(line=>line.split(" "))val wordMapRdd=wordsRdd.map(word=>(word,1))val countRdd=wordMapRdd.reduceByKey(_+_)countRdd.collect().foreach(println)}}
test.txt文件为:
运行结果:
第一个Spark程序就运行成功了。