Hive Built-in Functions

Hive Built-in Functions

Built-in Functions

透過Hive command列出Built-in Functions

  • 顯示出當前session所支持的Built-in Functions
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    hive> show functions;
    OK
    !
    !=
    %
    &
    *
    +
    -
    /
    <
    <=
    <=>
    <>
    =
    ==
    >
    >=
    ^
    abs
    acos
    add_months
    and
    array
    array_contains
    ascii
    asin
    assert_true
    atan
    avg
    -- ...
    Time taken: 0.471 seconds, Fetched: 210 row(s)

查看特定Built-in function說明。

  • function名稱可以是show functions所顯示出來的任一個。
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    hive> desc function to_unix_timestamp;
    OK
    to_unix_timestamp(date[, pattern]) - Returns the UNIX timestamp
    Time taken: 0.084 seconds, Fetched: 1 row(s)
    hive> desc function !;
    OK
    ! a - Logical not
    Time taken: 0.05 seconds, Fetched: 1 row(s)
    hive> desc function substr;
    OK
    substr(str, pos[, len]) - returns the substring of str that starts at pos and is of length len orsubstr(bin, pos[, len]) - returns the slice of byte array that starts at pos and is of length len
    Time taken: 0.026 seconds, Fetched: 1 row(s)

顯示更詳細的Built-in function說明

  • 包含註釋與使用範例。
  • Synonyms關鍵字顯示,相同作用的function。
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    hive> desc function extended substr;
    OK
    substr(str, pos[, len]) - returns the substring of str that starts at pos and is of length len orsubstr(bin, pos[, len]) - returns the slice of byte array that starts at pos and is of length len
    Synonyms: substring
    pos is a 1-based index. If pos<0 the starting position is determined by counting backwards from the end of str.
    Example:
    > SELECT substr('Facebook', 5) FROM src LIMIT 1;
    'book'
    > SELECT substr('Facebook', -5) FROM src LIMIT 1;
    'ebook'
    > SELECT substr('Facebook', 5, 1) FROM src LIMIT 1;
    'b'
    Time taken: 0.028 seconds, Fetched: 10 row(s)

使用Built-in function

  • 使用des function extented命令,可以得知此function如何使用,下述使用substr作為示例。
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    -- 列出紀錄
    hive> select ename from emp limit 1;
    OK
    SMITH
    Time taken: 0.239 seconds, Fetched: 1 row(s)
    -- 使用substr處理紀錄
    hive> select substr(ename, 3) from emp limit 1;
    OK
    ITH
    Time taken: 0.119 seconds, Fetched: 1 row(s)

Date Built-in Functions

取得當前時間

1
2
3
4
5
6
7
8
9
hive> select to_date(from_unixtime(unix_timestamp())) from emp limit 1;
OK
2017-09-25
Time taken: 0.083 seconds, Fetched: 1 row(s)
hive> select from_unixtime(unix_timestamp()) from emp limit 1;
OK
2017-09-25 00:11:09
Time taken: 0.103 seconds, Fetched: 1 row(s)

取的兩個時間的天數差異

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
hive> desc function extended datediff;
OK
datediff(date1, date2) - Returns the number of days between date1 and date2
date1 and date2 are strings in the format 'yyyy-MM-dd HH:mm:ss' or 'yyyy-MM-dd'. The time parts are ignored.If date1 is earlier than date2, the result is negative.
Example:
> select datediff('2009-07-30', '2009-07-31') FROM src LIMIT 1;
1
Time taken: 0.011 seconds, Fetched: 5 row(s)
-- 使用datediff函數,輸入格式為yyyy-MM-dd
hive> select datediff('2010-07-31', '2009-07-31') from emp limit 1;
OK
365
Time taken: 0.045 seconds, Fetched: 1 row(s)
-- 使用datediff函數,輸入格式為yyyy-MM-dd HH:mm:ss
hive> select datediff('2010-07-31 10:20:00', '2009-07-31 11:21:00') FROM emp LIMIT 1;
OK
365
Time taken: 0.047 seconds, Fetched: 1 row(s)

Type Conversion Functions

使用Type Conversion Functions

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
hive> desc emp;
OK
empno int
ename string
job string
mgr int
hiredate string
sal double
comm double
deptno int
Time taken: 0.164 seconds, Fetched: 8 row(s)
-- 將double類型的sal紀錄,轉換成int類型
hive> select sal, cast(sal as int) from emp;
OK
800.0 800
1600.0 1600
1250.0 1250
2975.0 2975
1250.0 1250
2850.0 2850
2450.0 2450
3000.0 3000
5000.0 5000
1500.0 1500
1100.0 1100
950.0 950
3000.0 3000
1300.0 1300
10300.0 10300
Time taken: 0.082 seconds, Fetched: 15 row(s)

Type Conversion Failure

  • 當轉換失敗時,Type Conversion Function將會回傳NUll
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    hive> select ename, cast(ename as int) from emp;
    OK
    SMITH NULL
    ALLEN NULL
    WARD NULL
    JONES NULL
    MARTIN NULL
    BLAKE NULL
    CLARK NULL
    SCOTT NULL
    KING NULL
    TURNER NULL
    ADAMS NULL
    JAMES NULL
    FORD NULL
    MILLER NULL
    HIVE NULL
    Time taken: 0.088 seconds, Fetched: 15 row(s)

Binary Type Conversion

  • 若Binary要轉成Int類型,須先轉換成String類型,才能換成Int類型。

常用built-in functions補充

isnull and isnotnull

  • isnull: 若紀錄為null,則回傳true。
  • isnotnull: 與isnull相反。若紀錄不為null,則回傳true。
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    hive> select comm, isnull(comm), isnotnull(comm) from emp;
    OK
    NULL true false
    300.0 false true
    500.0 false true
    NULL true false
    1400.0 false true
    NULL true false
    NULL true false
    NULL true false
    NULL true false
    0.0 false true
    NULL true false
    NULL true false
    NULL true false
    NULL true false
    NULL true false
    Time taken: 0.079 seconds, Fetched: 15 row(s)

assert_true

  • 判斷指定欄位內的紀錄中,若有任何一個紀錄不符合,則拋出異常。
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    hive> select assert_true(sal > 700) from emp;
    OK
    NULL
    NULL
    NULL
    NULL
    NULL
    NULL
    NULL
    NULL
    NULL
    NULL
    NULL
    NULL
    NULL
    NULL
    NULL
    Time taken: 0.096 seconds, Fetched: 15 row(s)
    hive> select assert_true(sal > 800) from emp;
    OK
    Failed with exception java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: ASSERT_TRUE(): assertion failed.
    Time taken: 0.07 seconds