Spark on YARN Deployment Modes

Spark on YARN Deployment Modes

YARN Client mode

  • ① YARN Client向ResourceManager提交申請
  • ② ResourceManager接收到請求後,在集群中選擇一個NodeManager分配Container,並在Container中啟動ApplicationMaster process
    • Dirver process在YARN client上運行,並初始化Spark Context
  • ③ Spark Context初始完後,與ApplicationMaster進行溝通,透過ApplicationMaster向ResourceManager申請Container,ResourceManager收到請求後,在集群中選擇一個NodeManager分配Container
  • ④ ApplicationMaster通知NodeManager在Container中啟動Spark executor
  • ⑤ Spark executor向Driver註冊,並在之後將自身狀態回報給Driver
  • ⑥ Spark Context將Task分配給Spark executor
  • ⑦ 所有Task執行結束,YARN Client向ResourceManager提交註銷ApplicationMaster

YARN Cluster mode

  • ② ApplicationMaster process會執行Driver,並初始化Spark Context,Spark Context會運行在與ApplicationMaster相同的集群節點上
  • ⑦ Spark executor向註冊ApplicationMaster,並在之後將自身狀態回報給ApplicationMaster

YARN Cluster vs Client

  • Spark driver:
    • YARN Client: 運行在提交Application本地端
    • YARN Cluster: 運行在與ApplicationMaster相同的集群節點上
    • 無論是哪種模式,Driver都要跟NodeManager進行通信,故盡量讓Driver與NodeManager在相同的集群內,可以有效降低網路傳輸
  • ApplicationMaster:
    • YARN Client: 僅負責申請資源,由Spark driver監控Task的運行,所以Client在整個Application生命週期中都不能退出
    • YARN Cluster: 不僅負責申請資源,並負責監控Task的運行狀況,因此Client可以退出
  • Spark interactively:
    • 交互式的Spark application不能運行在Cluster mode上,例如spark-shell與pyspark
  • Client network:
    • 盡可能讓Client與ResourceManager與NodeManager在同一個集群內,有效降低網路傳輸
  • Client loading:
    • YARN Client: 會佔用提交機器的資源,需特別注意資源是否足夠

World Count on YARN Deployment Modes

YARN Client mode

1
2
3
4
5
6
7
[hadoop@testmain hadoop]$ cd $SPARK_HOME
[hadoop@testmain spark]$ ./bin/spark-shell --master yarn --jars /opt/software/hive/lib/mysql-connector-java-5.1.44-bin.jar
val lines = sc.textFile("hdfs:///testFile")
val words = lines.flatMap(_.split("\t"))
val pairs = words.map(x=> (x, 1))
val wordcount = pairs.reduceByKey(_+_, 5)
wordcount.collect

Jps Infomation

1
2
3
4
5
[hadoop@testmain ~]$ jps -m
37568 SparkSubmit --master yarn --class org.apache.spark.repl.Main --name Spark shell --jars /opt/software/hive/lib/mysql-connector-java-5.1.44-bin.jar spark-shell
37763 CoarseGrainedExecutorBackend --driver-url spark://CoarseGrainedScheduler@192.168.128.91:50650 --executor-id 1 --hostname testmain --cores 1 --app-id application_1523988458629_0001 --user-class-path file:/tmp/hadoop-hadoop/nm-local-dir/usercache/hadoop/appcache/application_1523988458629_0001/container_1523988458629_0001_01_000002/__app__.jar --user-class-path file:/tmp/hadoop-hadoop/nm-local-dir/usercache/hadoop/appcache/application_1523988458629_0001/container_1523988458629_0001_01_000002/mysql-connector-java-5.1.44-bin.jar
37795 CoarseGrainedExecutorBackend --driver-url spark://CoarseGrainedScheduler@192.168.128.91:50650 --executor-id 2 --hostname testmain --cores 1 --app-id application_1523988458629_0001 --user-class-path file:/tmp/hadoop-hadoop/nm-local-dir/usercache/hadoop/appcache/application_1523988458629_0001/container_1523988458629_0001_01_000003/__app__.jar --user-class-path file:/tmp/hadoop-hadoop/nm-local-dir/usercache/hadoop/appcache/application_1523988458629_0001/container_1523988458629_0001_01_000003/mysql-connector-java-5.1.44-bin.jar
37716 ExecutorLauncher --arg 192.168.128.91:50650 --properties-file /tmp/hadoop-hadoop/nm-local-dir/usercache/hadoop/appcache/application_1523988458629_0001/container_1523988458629_0001_01_000001/__spark_conf__/__spark_conf__.properties
  • CoarseGrainedExecutorBackend
    • 分別對應為2個Container process
    • executor-id指出自身的id
    • app-id指出對應的Application id

YARN Web Infomation

History Infomation

YARN Cluster mode

  • 如何上傳Spark application至YARN cluster,請參考
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
[hadoop@testmain ~]$ spark-submit --master yarn --deploy-mode cluster --class com.demo.YarnApp --jars /opt/software/hive/lib/mysql-connector-java-5.1.44-bin.jar /home/hadoop/yarnDemo-1.0-SNAPSHOT.jar
18/04/17 23:58:38 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
18/04/17 23:58:39 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
18/04/17 23:58:39 INFO yarn.Client: Requesting a new application from cluster with 1 NodeManagers
18/04/17 23:58:39 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container)
18/04/17 23:58:39 INFO yarn.Client: Will allocate AM container, with 1408 MB memory including 384 MB overhead
18/04/17 23:58:39 INFO yarn.Client: Setting up container launch context for our AM
18/04/17 23:58:39 INFO yarn.Client: Setting up the launch environment for our AM container
18/04/17 23:58:39 INFO yarn.Client: Preparing resources for our AM container
18/04/17 23:58:40 WARN yarn.Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
18/04/17 23:58:44 INFO yarn.Client: Uploading resource file:/tmp/spark-943af3c8-efdf-4891-9286-04a6f0c6bbbf/__spark_libs__6708108576073677879.zip -> hdfs://192.168.128.91:9000/user/hadoop/.sparkStaging/application_1523980627954_0001/__spark_libs__6708108576073677879.zip
18/04/17 23:58:45 INFO yarn.Client: Uploading resource file:/home/hadoop/yarnDemo-1.0-SNAPSHOT.jar -> hdfs://192.168.128.91:9000/user/hadoop/.sparkStaging/application_1523980627954_0001/yarnDemo-1.0-SNAPSHOT.jar
18/04/17 23:58:45 INFO yarn.Client: Uploading resource file:/opt/software/hive/lib/mysql-connector-java-5.1.44-bin.jar -> hdfs://192.168.128.91:9000/user/hadoop/.sparkStaging/application_1523980627954_0001/mysql-connector-java-5.1.44-bin.jar
18/04/17 23:58:45 INFO yarn.Client: Uploading resource file:/tmp/spark-943af3c8-efdf-4891-9286-04a6f0c6bbbf/__spark_conf__5651503554508990660.zip -> hdfs://192.168.128.91:9000/user/hadoop/.sparkStaging/application_1523980627954_0001/__spark_conf__.zip
18/04/17 23:58:45 INFO spark.SecurityManager: Changing view acls to: hadoop
18/04/17 23:58:45 INFO spark.SecurityManager: Changing modify acls to: hadoop
18/04/17 23:58:45 INFO spark.SecurityManager: Changing view acls groups to:
18/04/17 23:58:45 INFO spark.SecurityManager: Changing modify acls groups to:
18/04/17 23:58:45 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop); groups with view permissions: Set(); users with modify permissions: Set(hadoop); groups with modify permissions: Set()
18/04/17 23:58:45 INFO yarn.Client: Submitting application application_1523980627954_0001 to ResourceManager
18/04/17 23:58:45 INFO impl.YarnClientImpl: Submitted application application_1523980627954_0001
18/04/17 23:58:46 INFO yarn.Client: Application report for application_1523980627954_0001 (state: ACCEPTED)
18/04/17 23:58:46 INFO yarn.Client:
client token: N/A
diagnostics: AM container is launched, waiting for AM container to Register with RM
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1523980725728
final status: UNDEFINED
tracking URL: http://testmain:8088/proxy/application_1523980627954_0001/
user: hadoop
18/04/17 23:58:47 INFO yarn.Client: Application report for application_1523980627954_0001 (state: ACCEPTED)
## ...
18/04/17 23:58:57 INFO yarn.Client: Application report for application_1523980627954_0001 (state: RUNNING)
18/04/17 23:58:57 INFO yarn.Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: 192.168.128.91
ApplicationMaster RPC port: 0
queue: default
start time: 1523980725728
final status: UNDEFINED
tracking URL: http://testmain:8088/proxy/application_1523980627954_0001/
user: hadoop
18/04/17 23:58:58 INFO yarn.Client: Application report for application_1523980627954_0001 (state: RUNNING)
## ...
18/04/17 23:59:15 INFO yarn.Client: Application report for application_1523980627954_0001 (state: FINISHED)
18/04/17 23:59:15 INFO yarn.Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: 192.168.128.91
ApplicationMaster RPC port: 0
queue: default
start time: 1523980725728
final status: SUCCEEDED
tracking URL: http://testmain:8088/proxy/application_1523980627954_0001/
user: hadoop
18/04/17 23:59:15 INFO util.ShutdownHookManager: Shutdown hook called
18/04/17 23:59:15 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-943af3c8-efdf-4891-9286-04a6f0c6bbbf

Check Application output

1
2
3
4
5
6
7
8
[hadoop@testmain ~]$ hdfs dfs -ls /testFileResult
Found 6 items
-rw-r--r-- 1 hadoop supergroup 0 2018-04-17 23:59 /testFileResult/_SUCCESS
-rw-r--r-- 1 hadoop supergroup 22 2018-04-17 23:59 /testFileResult/part-00000
-rw-r--r-- 1 hadoop supergroup 9 2018-04-17 23:59 /testFileResult/part-00001
-rw-r--r-- 1 hadoop supergroup 28 2018-04-17 23:59 /testFileResult/part-00002
-rw-r--r-- 1 hadoop supergroup 0 2018-04-17 23:59 /testFileResult/part-00003
-rw-r--r-- 1 hadoop supergroup 0 2018-04-17 23:59 /testFileResult/part-00004

Jps Infomation

1
2
3
4
5
[hadoop@testmain ~]$ jps -m
30969 CoarseGrainedExecutorBackend --driver-url spark://CoarseGrainedScheduler@192.168.128.91:37456 --executor-id 1 --hostname testmain --cores 1 --app-id application_1523980627954_0001 --user-class-path file:/tmp/hadoop-hadoop/nm-local-dir/usercache/hadoop/appcache/application_1523980627954_0001/container_1523980627954_0001_01_000002/__app__.jar --user-class-path file:/tmp/hadoop-hadoop/nm-local-dir/usercache/hadoop/appcache/application_1523980627954_0001/container_1523980627954_0001_01_000002/mysql-connector-java-5.1.44-bin.jar
31017 CoarseGrainedExecutorBackend --driver-url spark://CoarseGrainedScheduler@192.168.128.91:37456 --executor-id 2 --hostname testmain --cores 1 --app-id application_1523980627954_0001 --user-class-path file:/tmp/hadoop-hadoop/nm-local-dir/usercache/hadoop/appcache/application_1523980627954_0001/container_1523980627954_0001_01_000003/__app__.jar --user-class-path file:/tmp/hadoop-hadoop/nm-local-dir/usercache/hadoop/appcache/application_1523980627954_0001/container_1523980627954_0001_01_000003/mysql-connector-java-5.1.44-bin.jar
30875 ApplicationMaster --class com.demo.YarnApp --jar file:/home/hadoop/yarnDemo-1.0-SNAPSHOT.jar --properties-file /tmp/hadoop-hadoop/nm-local-dir/usercache/hadoop/appcache/application_1523980627954_0001/container_1523980627954_0001_01_000001/__spark_conf__/__spark_conf__.properties
30764 SparkSubmit --master yarn --deploy-mode cluster --class com.demo.YarnApp --jars /opt/software/hive/lib/mysql-connector-java-5.1.44-bin.jar /home/hadoop/yarnDemo-1.0-SNAPSHOT.jar
  • 相較Client模式,Cluster模式多出ApplicationMaster process

YARN Web Infomation

History Infomation