Spark Sql Built-in and UDF Functions
Spark Sql中常用的函數分兩類,分別為Built-in函數與UDF函數,以下分別提出案例說明該如何使用
Built-in Functions
- 大部分在functions.sacla1234567891011121314151617181920212223242526272829303132333435363738394041424344import org.apache.spark.sql.SparkSessionobject SqlBuiltinFunctionsDemo {def main(args: Array[String]): Unit = {val spark = SparkSession.builder().master("local[2]").appName("DataFrameApiApp").getOrCreate()import spark.implicits._val logDF = spark.sparkContext.textFile("file/pvuv.log").map(_.split(",")).map(x => Log(x(0), x(1).toInt)).toDF()/*file/pvuv.log:tv1,1001tv2,1001tv3,1002tv1,1001tv1,1003tv2,1003tv3,1003tv1,1003*/import org.apache.spark.sql.functions._// sum is a function from functions.scala, and it returns the sum of all values in the given column.logDF.groupBy($"name").agg(sum("times").as("sum")).show()/*+----+-----+|name|count|+----+-----+| tv2| 2|| tv3| 2|| tv1| 4|+----+-----+*/spark.stop()}case class Log(name: String, times: Int)}
UDF Functions
- UDF函數使用時,可以分為以下三個步驟:
- 定義函數
- 註冊函數
- 使用函數
|
|