Compile Spark Source Code
Why Spark
編譯步驟
基本訊息
- OS: CentOS 6.5 64bit/macOS Sierra
- JDK: 8u144
- Maven: 3.3.9(Spark source code自帶)
- Apache Spark下載位置 如上圖,在Spark頁面選擇Source code進行下載,並進行解壓縮。
|
|
- Building Spark官方文檔
- Spark官方文檔詳細描述如何編譯Spark,下面列出Spark 2.2.0編譯所需步驟與指令。
Building Spark using Maven requires:
- Maven 3.3.9 or newer
- Java 8+, Java 7 was removed as of Spark 2.2.0.
- Set JAVA_HOME12345678[root@hadoop-01 spark-2.2.0]# cd build/ ## Spark source code目錄內自帶的Maven[root@hadoop-01 build]# mvn -version ## 確認Maven與Java版本,分別為Maven 3.3.9與Java 1.8.0_144Apache Maven 3.3.9 (bb52d8502b132ec0a5a3f4c09453c07478323dc5; 2015-11-11T00:41:47+08:00)Maven home: /opt/software/apache-maven-3.3.9Java version: 1.8.0_144, vendor: Oracle CorporationJava home: /usr/java/jdk1.8.0_144/jreDefault locale: en_US, platform encoding: UTF-8OS name: "linux", version: "2.6.32-431.el6.x86_64", arch: "amd64", family: "unix"
Setting up Maven’s Memory Usage
- 使用MAVEN_OPTS參數設置Maven可使用的記憶體上限。1[root@hadoop-01 spark-2.2.0]# export MAVEN_OPTS="-Xmx2g -XX:ReservedCodeCacheSize=512m"
build/mvn
- 使用Spark source code內在build目錄下自帶Maven進行編譯
- 此命令會自動下載編譯所需的資源(Maven, Scala, and Zinc)在build目錄下並安裝
- 下列指令使用預設值編譯Spark1[root@hadoop-01 spark-2.2.0]# ./build/mvn -DskipTests clean package
Building a Runnable Distribution
|
|
- 由pom.xml得知訊息
- properties section:
- 編譯預設支援hadoop2.6.5
- 編譯預設支援yarn的版本與hadoop相同
- profiles section:
- 可支援的hadoop為2.6與2.7,預設為2.6
- properties section:
- 目標平台使用Hadoop版本為2.8.1,使用下述命令進行編譯
- ./dev/make-distribution.sh –name spark-2.2.0 –tgz -Pyarn -Phadoop-2.8 -Phive -Phive-thriftserver -Dhadoop.version=2.8.1
- 參數說明:
- –name: 輸出名稱,make-distribution.sh內指出此參數將會替代spark-$VERSION-bin-$NAME中的$NAME
- –tgz: 輸出tgz包
- -Pyarn: 支援yarn, 指定profile中對應的id
- -Phadoop-2.7: 由Spark官方文檔得知,若要支援Hadoop 2.7+的版本,此參數指定為-Phadoop-2.7
- -Phive: 支援hive
- -Phive-thriftserver: 支援hive-thriftserver,指定profile中對應的id
- -Dhadoop.version=2.8.1: 指定property中Hadoop版本為2.8.1,替換掉pom.xml內的2.6.5
- make-distribution.sh內最終會執行/build/mvn,並自動設置MAVEN_OPTS,以及在編譯命令中加上 -DskipTests clean package
- 在編譯命令中加上-X,可以得到詳細的輸出訊息,可用於錯誤原因確認1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950[root@hadoop-01 spark-2.2.0]# ./dev/make-distribution.sh --name spark-2.2.0 --tgz -Pyarn -Phadoop-2.8 -Phive -Phive-thriftserver -Dhadoop.version=2.8.1main:[INFO] Executed tasks[INFO] ------------------------------------------------------------------------[INFO] Reactor Summary:[INFO][INFO] Spark Project Parent POM ........................... SUCCESS [ 7.014 s][INFO] Spark Project Tags ................................. SUCCESS [ 7.707 s][INFO] Spark Project Sketch ............................... SUCCESS [ 6.783 s][INFO] Spark Project Networking ........................... SUCCESS [ 20.351 s][INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [ 15.513 s][INFO] Spark Project Unsafe ............................... SUCCESS [ 15.655 s][INFO] Spark Project Launcher ............................. SUCCESS [ 19.657 s][INFO] Spark Project Core ................................. SUCCESS [03:15 min][INFO] Spark Project ML Local Library ..................... SUCCESS [ 15.572 s][INFO] Spark Project GraphX ............................... SUCCESS [ 27.722 s][INFO] Spark Project Streaming ............................ SUCCESS [01:01 min][INFO] Spark Project Catalyst ............................. SUCCESS [01:50 min][INFO] Spark Project SQL .................................. SUCCESS [02:45 min][INFO] Spark Project ML Library ........................... SUCCESS [01:51 min][INFO] Spark Project Tools ................................ SUCCESS [ 2.650 s][INFO] Spark Project Hive ................................. SUCCESS [ 59.211 s][INFO] Spark Project REPL ................................. SUCCESS [ 8.718 s][INFO] Spark Project YARN Shuffle Service ................. SUCCESS [ 18.745 s][INFO] Spark Project YARN ................................. SUCCESS [ 18.137 s][INFO] Spark Project Hive Thrift Server ................... SUCCESS [ 38.423 s][INFO] Spark Project Assembly ............................. SUCCESS [ 4.710 s][INFO] Spark Project External Flume Sink .................. SUCCESS [ 16.745 s][INFO] Spark Project External Flume ....................... SUCCESS [ 17.172 s][INFO] Spark Project External Flume Assembly .............. SUCCESS [ 4.797 s][INFO] Spark Integration for Kafka 0.8 .................... SUCCESS [ 15.995 s][INFO] Kafka 0.10 Source for Structured Streaming ......... SUCCESS [ 13.452 s][INFO] Spark Project Examples ............................. SUCCESS [ 30.937 s][INFO] Spark Project External Kafka Assembly .............. SUCCESS [ 6.090 s][INFO] Spark Integration for Kafka 0.10 ................... SUCCESS [ 14.209 s][INFO] Spark Integration for Kafka 0.10 Assembly .......... SUCCESS [ 5.213 s][INFO] ------------------------------------------------------------------------[INFO] BUILD SUCCESS[INFO] ------------------------------------------------------------------------[INFO] Total time: 17:37 min[INFO] Finished at: 2017-09-02T00:22:22+08:00[INFO] Final Memory: 83M/352M+ TARDIR_NAME=spark-2.2.0-bin-custom-spark+ TARDIR=/opt/sourcecode/spark-2.2.0/spark-2.2.0-bin-custom-spark## 由TARDIR_NAME與TARDIR變量可得知編譯輸出tgz包的名稱與位置[root@hadoop-01 spark-2.2.0]# ll | grep spark-2.2.0-bin-custom-spark-rw-r--r--. 1 root root 194582530 Sep 2 00:22 spark-2.2.0-bin-custom-spark.tgz