Launching Spark Application on Yarn Cluster

Launching Spark Application on Yarn Cluster

New Scala Project

使用IDEA依序建立Scala project



此後的設定使用預設即可,點擊Next直到專案建立完成

Add Properties and Dependency in pom.xml

在pom.xml中添加Spark application所需的Properties與Dependency

Properties

1
2
3
4
<properties>
<scala.version>2.11.8</scala.version>
<spark.version>2.2.0</spark.version>
</properties>

Dependency

1
2
3
4
5
6
7
<dependencies>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-yarn_2.11</artifactId>
<version>${spark.version}</version>
</dependency>
</dependencies>

Sacla Code

新增Scala object,命名為YarnApp,並放置在com.demo的package內

1
2
3
4
5
6
7
8
9
10
11
12
13
14
import org.apache.spark.{SparkConf, SparkContext}
object YarnApp {
def main(args: Array[String]): Unit = {
val conf: SparkConf = new SparkConf()setAppName("SparkOnYarmDemo")
val sc: SparkContext = new SparkContext(conf)
val lines = sc.textFile("hdfs:///testFile")
val words = lines.flatMap(_.split("\t"))
val pairs = words.map(x=> (x, 1))
val word_count = pairs.reduceByKey(_+_, 5)
word_count.saveAsTextFile("hdfs:///testFileResult")
sc.stop()
}
}

Run Maven Build

雙擊下圖中的Package

IDEA下方會出現對應的Build資訊,若完成會出現對應的Log

1
2
3
4
5
6
7
8
9
10
11
## ...
[INFO] Building jar: /home/hadoop/YarnDemo/target/YarnDemo-1.0-SNAPSHOT.jar
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 13.811 s
[INFO] Finished at: 2018-04-17T23:08:02+08:00
[INFO] Final Memory: 31M/576M
[INFO] ------------------------------------------------------------------------
Process finished with exit code 0

Start YARN

1
2
3
4
5
[hadoop@testmain ~]$ cd $HADOOP_HOME
[hadoop@testmain hadoop]$ ./sbin/start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /opt/software/hadoop-2.8.1/logs/yarn-hadoop-resourcemanager-testmain.out
localhost: starting nodemanager, logging to /opt/software/hadoop-2.8.1/logs/yarn-hadoop-nodemanager-testmain.out

Submit Spark Application

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
[hadoop@testmain hadoop]$ cd $SPARK_HOME
## --class com.demo.YarnApp 需與Sacla project的Package和object對應
[hadoop@testmain spark]$ ./bin/spark-submit --master yarn --deploy-mode cluster --class com.demo.YarnApp --jars /opt/software/hive/lib/mysql-connector-java-5.1.44-bin.jar /home/hadoop//YarnDemo/target/YarnDemo-1.0-SNAPSHOT.jar
18/04/17 23:58:38 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
18/04/17 23:58:39 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
18/04/17 23:58:39 INFO yarn.Client: Requesting a new application from cluster with 1 NodeManagers
18/04/17 23:58:39 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container)
18/04/17 23:58:39 INFO yarn.Client: Will allocate AM container, with 1408 MB memory including 384 MB overhead
18/04/17 23:58:39 INFO yarn.Client: Setting up container launch context for our AM
18/04/17 23:58:39 INFO yarn.Client: Setting up the launch environment for our AM container
18/04/17 23:58:39 INFO yarn.Client: Preparing resources for our AM container
18/04/17 23:58:40 WARN yarn.Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
18/04/17 23:58:44 INFO yarn.Client: Uploading resource file:/tmp/spark-943af3c8-efdf-4891-9286-04a6f0c6bbbf/__spark_libs__6708108576073677879.zip -> hdfs://192.168.128.91:9000/user/hadoop/.sparkStaging/application_1523980627954_0001/__spark_libs__6708108576073677879.zip
18/04/17 23:58:45 INFO yarn.Client: Uploading resource file:/home/hadoop/yarnDemo-1.0-SNAPSHOT.jar -> hdfs://192.168.128.91:9000/user/hadoop/.sparkStaging/application_1523980627954_0001/yarnDemo-1.0-SNAPSHOT.jar
18/04/17 23:58:45 INFO yarn.Client: Uploading resource file:/opt/software/hive/lib/mysql-connector-java-5.1.44-bin.jar -> hdfs://192.168.128.91:9000/user/hadoop/.sparkStaging/application_1523980627954_0001/mysql-connector-java-5.1.44-bin.jar
18/04/17 23:58:45 INFO yarn.Client: Uploading resource file:/tmp/spark-943af3c8-efdf-4891-9286-04a6f0c6bbbf/__spark_conf__5651503554508990660.zip -> hdfs://192.168.128.91:9000/user/hadoop/.sparkStaging/application_1523980627954_0001/__spark_conf__.zip
18/04/17 23:58:45 INFO spark.SecurityManager: Changing view acls to: hadoop
18/04/17 23:58:45 INFO spark.SecurityManager: Changing modify acls to: hadoop
18/04/17 23:58:45 INFO spark.SecurityManager: Changing view acls groups to:
18/04/17 23:58:45 INFO spark.SecurityManager: Changing modify acls groups to:
18/04/17 23:58:45 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop); groups with view permissions: Set(); users with modify permissions: Set(hadoop); groups with modify permissions: Set()
18/04/17 23:58:45 INFO yarn.Client: Submitting application application_1523980627954_0001 to ResourceManager
18/04/17 23:58:45 INFO impl.YarnClientImpl: Submitted application application_1523980627954_0001
18/04/17 23:58:46 INFO yarn.Client: Application report for application_1523980627954_0001 (state: ACCEPTED)
18/04/17 23:58:46 INFO yarn.Client:
client token: N/A
diagnostics: AM container is launched, waiting for AM container to Register with RM
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1523980725728
final status: UNDEFINED
tracking URL: http://testmain:8088/proxy/application_1523980627954_0001/
user: hadoop
18/04/17 23:58:47 INFO yarn.Client: Application report for application_1523980627954_0001 (state: ACCEPTED)
## ...
18/04/17 23:58:57 INFO yarn.Client: Application report for application_1523980627954_0001 (state: RUNNING)
18/04/17 23:58:57 INFO yarn.Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: 192.168.128.91
ApplicationMaster RPC port: 0
queue: default
start time: 1523980725728
final status: UNDEFINED
tracking URL: http://testmain:8088/proxy/application_1523980627954_0001/
user: hadoop
18/04/17 23:58:58 INFO yarn.Client: Application report for application_1523980627954_0001 (state: RUNNING)
## ...
18/04/17 23:59:15 INFO yarn.Client: Application report for application_1523980627954_0001 (state: FINISHED)
18/04/17 23:59:15 INFO yarn.Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: 192.168.128.91
ApplicationMaster RPC port: 0
queue: default
start time: 1523980725728
final status: SUCCEEDED
tracking URL: http://testmain:8088/proxy/application_1523980627954_0001/
user: hadoop
18/04/17 23:59:15 INFO util.ShutdownHookManager: Shutdown hook called
18/04/17 23:59:15 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-943af3c8-efdf-4891-9286-04a6f0c6bbbf

Check Application output

1
2
3
4
5
6
7
8
[hadoop@testmain ~]$ hdfs dfs -ls /testFileResult
Found 6 items
-rw-r--r-- 1 hadoop supergroup 0 2018-04-17 23:59 /testFileResult/_SUCCESS
-rw-r--r-- 1 hadoop supergroup 22 2018-04-17 23:59 /testFileResult/part-00000
-rw-r--r-- 1 hadoop supergroup 9 2018-04-17 23:59 /testFileResult/part-00001
-rw-r--r-- 1 hadoop supergroup 28 2018-04-17 23:59 /testFileResult/part-00002
-rw-r--r-- 1 hadoop supergroup 0 2018-04-17 23:59 /testFileResult/part-00003
-rw-r--r-- 1 hadoop supergroup 0 2018-04-17 23:59 /testFileResult/part-00004