Hive DML

Hive Data Manipulation Language

Loading files into tables

Syntax

1
LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)]

  • LOCAL: 是否從本地文件中載入資料,否則從HDFS中載入。
  • ‘filepath’: 可以為本地文件路徑,或HDFS路徑。
  • OVERWRITE: 是否重寫資料,否則在尾端追加數據。

    Loading file from local file

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
-- 創建一張空表
hive>
create table emp(
empno int,
ename string,
job string,
mgr int,
hiredate string,
sal double,
comm double,
deptno int
)row format delimited fields terminated by '\t';
OK
Time taken: 0.838 seconds
hive> show tables;
OK
emp
Time taken: 0.199 seconds, Fetched: 2 row(s)
hive> select * from emp;
OK
-- 從本地文件載入至emp
hive> load data local inpath '/home/hadoop/data/emp.txt' overwrite into table emp;
Loading data to table default.emp
Table default.emp stats: [numFiles=1, numRows=0, totalSize=700, rawDataSize=0]
OK
Time taken: 2.909 seconds
hive> select * from emp;
OK
7369 SMITH CLERK 7902 1980-12-17 800.0 NULL 20
7499 ALLEN SALESMAN 7698 1981-2-20 1600.0 300.0 30
7521 WARD SALESMAN 7698 1981-2-22 1250.0 500.0 30
7566 JONES MANAGER 7839 1981-4-2 2975.0 NULL 20
7654 MARTIN SALESMAN 7698 1981-9-28 1250.0 1400.0 30
7698 BLAKE MANAGER 7839 1981-5-1 2850.0 NULL 30
7782 CLARK MANAGER 7839 1981-6-9 2450.0 NULL 10
7788 SCOTT ANALYST 7566 1987-4-19 3000.0 NULL 20
7839 KING PRESIDENT NULL 1981-11-17 5000.0 NULL 10
7844 TURNER SALESMAN 7698 1981-9-8 1500.0 0.0 30
7876 ADAMS CLERK 7788 1987-5-23 1100.0 NULL 20
7900 JAMES CLERK 7698 1981-12-3 950.0 NULL 30
7902 FORD ANALYST 7566 1981-12-3 3000.0 NULL 20
7934 MILLER CLERK 7782 1982-1-23 1300.0 NULL 10
8888 HIVE PROGRAM 7839 1988-1-23 10300.0 NULL NULL
Time taken: 0.105 seconds, Fetched: 15 row(s)
-- 從本地文件讀取數據,並追加至emp尾部
hive> load data local inpath '/home/hadoop/data/emp.txt' into table emp;
Loading data to table default.emp
Table default.emp stats: [numFiles=2, numRows=0, totalSize=1400, rawDataSize=0]
OK
Time taken: 0.673 seconds
hive> select count(*) from emp;
Query ID = hadoop_20170914005050_bccf96a2-df73-425a-83b5-6e7fe5e6772f
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Job running in-process (local Hadoop)
2017-09-14 01:03:29,160 Stage-1 map = 100%, reduce = 100%
Ended Job = job_local1150048630_0002
MapReduce Jobs Launched:
Stage-Stage-1: HDFS Read: 9800 HDFS Write: 2800 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
30

在進行資料追加時,顯示下面這行輸出:

1
Table default.emp stats: [numFiles=2, numRows=0, totalSize=1400, rawDataSize=0]

當中的numFiles表示在對應的hdfs中,包含2個文件。可以使用下列方式進行驗證。

1
2
3
4
[hadoop@cdh001 ~]$ hadoop fs -ls /user/hive/warehouse/emp
Found 2 items
-rwxr-xr-x 1 hadoop supergroup 700 2017-09-14 01:20 /user/hive/warehouse/emp/emp.txt
-rwxr-xr-x 1 hadoop supergroup 700 2017-09-14 01:21 /user/hive/warehouse/emp/emp_copy_1.txt

Loading file from HDFS

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
-- 先將emp清空
hive> truncate table emp;
OK
Time taken: 0.222 seconds
-- 從HDFS上將數據載入
-- hdfs://cdh001:9000: 對應Hadoop中core-site.xml中的fs.defaultFS配置
hive> load data inpath 'hdfs://cdh001:9000/data/emp.txt' overwrite into table emp;
Loading data to table default.emp
Table default.emp stats: [numFiles=1, numRows=0, totalSize=700, rawDataSize=0]
OK
Time taken: 0.42 seconds
hive> select count(*) from emp;
Query ID = hadoop_20170914005050_bccf96a2-df73-425a-83b5-6e7fe5e6772f
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Job running in-process (local Hadoop)
2017-09-14 01:16:58,335 Stage-1 map = 100%, reduce = 100%
Ended Job = job_local978141749_0004
MapReduce Jobs Launched:
Stage-Stage-1: HDFS Read: 11214 HDFS Write: 2800 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
15
Time taken: 1.518 seconds, Fetched: 1 row(s)

Inserting data into Hive Tables from queries

Syntax

1
2
INSERT OVERWRITE TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...) [IF NOT EXISTS]] select_statement1 FROM from_statement;
INSERT INTO TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...)] select_statement1 FROM from_statement;

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
hive>
create table emp(
empno int,
ename string,
job string,
mgr int,
hiredate string,
sal double,
comm double,
deptno int
)row format delimited fields terminated by '\t';
OK
Time taken: 1.36 seconds
-- 從emp將全部的資料載入,並複寫emp2
hive> insert overwrite table emp2 select * from emp;
Query ID = hadoop_20170914014141_a03e8794-0c51-414f-87d7-50236e8e1b36
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there no reduce operator
Job running in-process (local Hadoop)
2017-09-14 01:45:44,802 Stage-1 map = 100%, reduce = 0%
Ended Job = job_local2086758437_0002
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to: hdfs://cdh001:9000/user/hive/warehouse/emp2/.hive-staging_hive_2017-09-14_01-45-43_184_5687343240557641860-1/-ext-10000
Loading data to table default.emp2
Table default.emp2 stats: [numFiles=1, numRows=15, totalSize=708, rawDataSize=693]
MapReduce Jobs Launched:
Stage-Stage-1: HDFS Read: 3586 HDFS Write: 2963 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
Time taken: 1.999 seconds
hive> select * from emp2;
OK
7369 SMITH CLERK 7902 1980-12-17 800.0 NULL 20
7499 ALLEN SALESMAN 7698 1981-2-20 1600.0 300.0 30
7521 WARD SALESMAN 7698 1981-2-22 1250.0 500.0 30
7566 JONES MANAGER 7839 1981-4-2 2975.0 NULL 20
7654 MARTIN SALESMAN 7698 1981-9-28 1250.0 1400.0 30
7698 BLAKE MANAGER 7839 1981-5-1 2850.0 NULL 30
7782 CLARK MANAGER 7839 1981-6-9 2450.0 NULL 10
7788 SCOTT ANALYST 7566 1987-4-19 3000.0 NULL 20
7839 KING PRESIDENT NULL 1981-11-17 5000.0 NULL 10
7844 TURNER SALESMAN 7698 1981-9-8 1500.0 0.0 30
7876 ADAMS CLERK 7788 1987-5-23 1100.0 NULL 20
7900 JAMES CLERK 7698 1981-12-3 950.0 NULL 30
7902 FORD ANALYST 7566 1981-12-3 3000.0 NULL 20
7934 MILLER CLERK 7782 1982-1-23 1300.0 NULL 10
8888 HIVE PROGRAM 7839 1988-1-23 10300.0 NULL NULL
Time taken: 0.127 seconds, Fetched: 15 row(s)
-- 從emp將全部的資料載入,並追加至emp2
hive> insert into table emp2 select * from emp;
Query ID = hadoop_20170914014141_a03e8794-0c51-414f-87d7-50236e8e1b36
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since theres no reduce operator
Job running in-process (local Hadoop)
2017-09-14 01:50:12,327 Stage-1 map = 100%, reduce = 0%
Ended Job = job_local1847471355_0004
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to: hdfs://cdh001:9000/user/hive/warehouse/emp2/.hive-staging_hive_2017-09-14_01-50-10_029_6901561759110802758-1/-ext-10000
Loading data to table default.emp2
Table default.emp2 stats: [numFiles=2, numRows=30, totalSize=1416, rawDataSize=1386]
MapReduce Jobs Launched:
Stage-Stage-1: HDFS Read: 6540 HDFS Write: 4517 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
Time taken: 2.686 seconds
hive> select * from emp2;
OK
7369 SMITH CLERK 7902 1980-12-17 800.0 NULL 20
7499 ALLEN SALESMAN 7698 1981-2-20 1600.0 300.0 30
7521 WARD SALESMAN 7698 1981-2-22 1250.0 500.0 30
7566 JONES MANAGER 7839 1981-4-2 2975.0 NULL 20
7654 MARTIN SALESMAN 7698 1981-9-28 1250.0 1400.0 30
7698 BLAKE MANAGER 7839 1981-5-1 2850.0 NULL 30
7782 CLARK MANAGER 7839 1981-6-9 2450.0 NULL 10
7788 SCOTT ANALYST 7566 1987-4-19 3000.0 NULL 20
7839 KING PRESIDENT NULL 1981-11-17 5000.0 NULL 10
7844 TURNER SALESMAN 7698 1981-9-8 1500.0 0.0 30
7876 ADAMS CLERK 7788 1987-5-23 1100.0 NULL 20
7900 JAMES CLERK 7698 1981-12-3 950.0 NULL 30
7902 FORD ANALYST 7566 1981-12-3 3000.0 NULL 20
7934 MILLER CLERK 7782 1982-1-23 1300.0 NULL 10
8888 HIVE PROGRAM 7839 1988-1-23 10300.0 NULL NULL
7369 SMITH CLERK 7902 1980-12-17 800.0 NULL 20
7499 ALLEN SALESMAN 7698 1981-2-20 1600.0 300.0 30
7521 WARD SALESMAN 7698 1981-2-22 1250.0 500.0 30
7566 JONES MANAGER 7839 1981-4-2 2975.0 NULL 20
7654 MARTIN SALESMAN 7698 1981-9-28 1250.0 1400.0 30
7698 BLAKE MANAGER 7839 1981-5-1 2850.0 NULL 30
7782 CLARK MANAGER 7839 1981-6-9 2450.0 NULL 10
7788 SCOTT ANALYST 7566 1987-4-19 3000.0 NULL 20
7839 KING PRESIDENT NULL 1981-11-17 5000.0 NULL 10
7844 TURNER SALESMAN 7698 1981-9-8 1500.0 0.0 30
7876 ADAMS CLERK 7788 1987-5-23 1100.0 NULL 20
7900 JAMES CLERK 7698 1981-12-3 950.0 NULL 30
7902 FORD ANALYST 7566 1981-12-3 3000.0 NULL 20
7934 MILLER CLERK 7782 1982-1-23 1300.0 NULL 10
8888 HIVE PROGRAM 7839 1988-1-23 10300.0 NULL NULL
Time taken: 0.086 seconds, Fetched: 30 row(s)

Writing data into the filesystem from queries

Syntax

1
2
3
INSERT OVERWRITE [LOCAL] DIRECTORY directory1
[ROW FORMAT row_format] [STORED AS file_format] (Note: Only available starting with Hive 0.11.0)
SELECT ... FROM ...

Writing data into the local file

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
hive> insert overwrite local directory '/home/hadoop/hive_data'
row format delimited fields terminated by '\t' select * from emp;
Query ID = hadoop_20170914014141_a03e8794-0c51-414f-87d7-50236e8e1b36
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Job running in-process (local Hadoop)
2017-09-15 00:17:19,523 Stage-1 map = 100%, reduce = 0%
Ended Job = job_local1952623901_0005
Copying data to local directory /home/hadoop/hive_data
MapReduce Jobs Launched:
Stage-Stage-1: HDFS Read: 8725 HDFS Write: 4517 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
Time taken: 2.298 seconds
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
## 在對應的本地路徑確認導出的數據,並確認文件內容
[hadoop@cdh001 hive_data]$ ls
000000_0
[hadoop@cdh001 hive_data]$ vim 000000_0
7369 SMITH CLERK 7902 1980-12-17 800.0 \N 20
7499 ALLEN SALESMAN 7698 1981-2-20 1600.0 300.0 30
7521 WARD SALESMAN 7698 1981-2-22 1250.0 500.0 30
7566 JONES MANAGER 7839 1981-4-2 2975.0 \N 20
7654 MARTIN SALESMAN 7698 1981-9-28 1250.0 1400.0 30
7698 BLAKE MANAGER 7839 1981-5-1 2850.0 \N 30
7782 CLARK MANAGER 7839 1981-6-9 2450.0 \N 10
7788 SCOTT ANALYST 7566 1987-4-19 3000.0 \N 20
7839 KING PRESIDENT \N 1981-11-17 5000.0 \N 10
7844 TURNER SALESMAN 7698 1981-9-8 1500.0 0.0 30
7876 ADAMS CLERK 7788 1987-5-23 1100.0 \N 20
7900 JAMES CLERK 7698 1981-12-3 950.0 \N 30
7902 FORD ANALYST 7566 1981-12-3 3000.0 \N 20
7934 MILLER CLERK 7782 1982-1-23 1300.0 \N 10
8888 HIVE PROGRAM 7839 1988-1-23 10300.0 \N \N

Writing data into the HDFS

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
hive> insert overwrite directory 'hdfs://cdh001:9000/hive_data' row format delimited fields terminated by '\t' select * from emp;
Query ID = hadoop_20170914014141_a03e8794-0c51-414f-87d7-50236e8e1b36
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since theres no reduce operator
Job running in-process (local Hadoop)
2017-09-15 00:27:00,633 Stage-1 map = 100%, reduce = 0%
Ended Job = job_local1161905320_0006
Stage-3 is selected by condition resolver.
Stage-2 is filtered out by condition resolver.
Stage-4 is filtered out by condition resolver.
Moving data to: hdfs://cdh001:9000/hive_data/.hive-staging_hive_2017-09-15_00-26-59_185_2874371608777769743-1/-ext-10000
Moving data to: hdfs://cdh001:9000/hive_data
MapReduce Jobs Launched:
Stage-Stage-1: HDFS Read: 9425 HDFS Write: 5225 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
Time taken: 1.524 seconds
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
## 在對應的HDFS路徑確認導出的資料,並確認文件內容。
[hadoop@cdh001 hive_data]$ hadoop fs -ls /hive_data
Found 1 items
-rwxr-xr-x 1 hadoop supergroup 708 2017-09-15 00:27 /hive_data/000000_0
[hadoop@cdh001 hive_data]$ hadoop fs -cat /hive_data/000000_0
7369 SMITH CLERK 7902 1980-12-17 800.0 \N 20
7499 ALLEN SALESMAN 7698 1981-2-20 1600.0 300.0 30
7521 WARD SALESMAN 7698 1981-2-22 1250.0 500.0 30
7566 JONES MANAGER 7839 1981-4-2 2975.0 \N 20
7654 MARTIN SALESMAN 7698 1981-9-28 1250.0 1400.0 30
7698 BLAKE MANAGER 7839 1981-5-1 2850.0 \N 30
7782 CLARK MANAGER 7839 1981-6-9 2450.0 \N 10
7788 SCOTT ANALYST 7566 1987-4-19 3000.0 \N 20
7839 KING PRESIDENT \N 1981-11-17 5000.0 \N 10
7844 TURNER SALESMAN 7698 1981-9-8 1500.0 0.0 30
7876 ADAMS CLERK 7788 1987-5-23 1100.0 \N 20
7900 JAMES CLERK 7698 1981-12-3 950.0 \N 30
7902 FORD ANALYST 7566 1981-12-3 3000.0 \N 20
7934 MILLER CLERK 7782 1982-1-23 1300.0 \N 10
8888 HIVE PROGRAM 7839 1988-1-23 10300.0 \N \N

Inserting values into tables from SQL

  • 大數據場景下不常使用。
  • Hive 0.14開始支援。

Update

  • 大數據場景下不常使用。
  • Hive 0.14開始支援。

Delete

  • 大數據場景下不常使用。
  • Hive 0.14開始支援。

Query

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
-- 查詢emp中,所有的數據。
hive> select * from emp;
OK
7369 SMITH CLERK 7902 1980-12-17 800.0 NULL 20
7499 ALLEN SALESMAN 7698 1981-2-20 1600.0 300.0 30
7521 WARD SALESMAN 7698 1981-2-22 1250.0 500.0 30
7566 JONES MANAGER 7839 1981-4-2 2975.0 NULL 20
7654 MARTIN SALESMAN 7698 1981-9-28 1250.0 1400.0 30
7698 BLAKE MANAGER 7839 1981-5-1 2850.0 NULL 30
7782 CLARK MANAGER 7839 1981-6-9 2450.0 NULL 10
7788 SCOTT ANALYST 7566 1987-4-19 3000.0 NULL 20
7839 KING PRESIDENT NULL 1981-11-17 5000.0 NULL 10
7844 TURNER SALESMAN 7698 1981-9-8 1500.0 0.0 30
7876 ADAMS CLERK 7788 1987-5-23 1100.0 NULL 20
7900 JAMES CLERK 7698 1981-12-3 950.0 NULL 30
7902 FORD ANALYST 7566 1981-12-3 3000.0 NULL 20
7934 MILLER CLERK 7782 1982-1-23 1300.0 NULL 10
8888 HIVE PROGRAM 7839 1988-1-23 10300.0 NULL NULL
Time taken: 0.076 seconds, Fetched: 15 row(s)
-- 查詢emp中,ename為'SMITH'的紀錄
hive> select * from emp where ename='SMITH';
OK
7369 SMITH CLERK 7902 1980-12-17 800.0 NULL 20
Time taken: 0.221 seconds, Fetched: 1 row(s)
-- 查詢emp table中,sal區間在800~1500的紀錄
-- where between and 操作,是左右閉合的
hive> select ename, sal from emp where sal between 800 and 1500;
OK
SMITH 800.0
WARD 1250.0
MARTIN 1250.0
TURNER 1500.0
ADAMS 1100.0
JAMES 950.0
MILLER 1300.0
Time taken: 0.081 seconds, Fetched: 7 row(s)
-- 查詢emp中,deptno是10, 30的數據。
hive> select ename, deptno from emp where deptno in (10,30);
OK
ALLEN 30
WARD 30
MARTIN 30
BLAKE 30
CLARK 10
KING 10
TURNER 30
JAMES 30
MILLER 10
Time taken: 0.068 seconds, Fetched: 9 row(s)

Aggregate functions

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
hive> select count(1), max(sal), min(sal), avg(sal) from emp;
Query ID = hadoop_20170914014141_a03e8794-0c51-414f-87d7-50236e8e1b36
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Job running in-process (local Hadoop)
2017-09-15 00:44:49,597 Stage-1 map = 100%, reduce = 100%
Ended Job = job_local1747076311_0014
MapReduce Jobs Launched:
Stage-Stage-1: HDFS Read: 42650 HDFS Write: 10450 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
15 10300.0 800.0 2621.6666666666665
Time taken: 1.344 seconds, Fetched: 1 row(s)

Group by

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
-- 查詢每個部門的平均薪水
hive> select deptno, avg(sal) from emp group by deptno;
Query ID = hadoop_20170914014141_a03e8794-0c51-414f-87d7-50236e8e1b36
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Job running in-process (local Hadoop)
2017-09-16 00:17:45,930 Stage-1 map = 100%, reduce = 100%
Ended Job = job_local1889358448_0027
MapReduce Jobs Launched:
Stage-Stage-1: HDFS Read: 62250 HDFS Write: 10450 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
NULL 10300.0
10 2916.6666666666665
20 2175.0
30 1566.6666666666667
Time taken: 1.328 seconds, Fetched: 4 row(s)
-- 查詢每個工作中在不同部門的最高薪水
hive> select job, deptno, max(sal) from emp group by job, deptno;
Query ID = hadoop_20170914014141_a03e8794-0c51-414f-87d7-50236e8e1b36
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Job running in-process (local Hadoop)
2017-09-16 00:15:51,279 Stage-1 map = 0%, reduce = 0%
2017-09-16 00:15:52,281 Stage-1 map = 100%, reduce = 100%
Ended Job = job_local1502275066_0025
MapReduce Jobs Launched:
Stage-Stage-1: HDFS Read: 59450 HDFS Write: 10450 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
ANALYST 20 3000.0
CLERK 10 1300.0
CLERK 20 1100.0
CLERK 30 950.0
MANAGER 10 2450.0
MANAGER 20 2975.0
MANAGER 30 2850.0
PRESIDENT 10 5000.0
PROGRAM NULL 10300.0
SALESMAN 30 1600.0
Time taken: 3.667 seconds, Fetched: 10 row(s)
-- 查詢各部門1982-01-01之前入職員工的平均薪水
hive> select deptno, avg(sal) from emp where hiredate > to_date('1982-01-01') group by deptno;
Query ID = hadoop_20170914014141_a03e8794-0c51-414f-87d7-50236e8e1b36
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Job running in-process (local Hadoop)
2017-09-16 00:48:00,159 Stage-1 map = 100%, reduce = 100%
Ended Job = job_local1055215937_0030
MapReduce Jobs Launched:
Stage-Stage-1: HDFS Read: 91650 HDFS Write: 10450 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
NULL 10300.0
10 1300.0
20 2050.0
Time taken: 1.267 seconds, Fetched: 3 row(s)
  • 在返回集字段中,這些字段不是包含在group by語句之後,作為分組的依據,否則一定包含在聚合函數中(Aggregate functions),若不成立,則會報錯,如下:
    1
    2
    hive> select deptno, ename, max(sal) from emp group by deptno;
    FAILED: SemanticException [Error 10025]: Line 1:15 Expression not in GROUP BY key 'ename'

Case when then

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
hive> select ename, sal,
> case
> when sal > 1 and sal <= 1000 then 'LOWER'
> when sal > 1000 and sal <= 2000 then 'MIDDLE'
> when sal > 2000 and sal <= 4000 then 'HIGH'
> else 'HIGJEST' end
> from emp;
OK
SMITH 800.0 LOWER
ALLEN 1600.0 MIDDLE
WARD 1250.0 MIDDLE
JONES 2975.0 HIGH
MARTIN 1250.0 MIDDLE
BLAKE 2850.0 HIGH
CLARK 2450.0 HIGH
SCOTT 3000.0 HIGH
KING 5000.0 HIGJEST
TURNER 1500.0 MIDDLE
ADAMS 1100.0 MIDDLE
JAMES 950.0 LOWER
FORD 3000.0 HIGH
MILLER 1300.0 MIDDLE
HIVE 10300.0 HIGJEST
Time taken: 0.105 seconds, Fetched: 15 row(s)

Import/Export

Export

Syntax

1
2
EXPORT TABLE tablename [PARTITION (part_column="value"[, ...])]
TO 'export_target_path' [ FOR replication('eventid') ]

1
2
3
4
5
6
7
hive> export table emp to 'hdfs://cdh001:9000/hive_export_table';
Copying data from file:/tmp/hadoop/2d0fe1f1-954f-4cb7-bc14-1f05d5de3555/hive_2017-09-16_01-23-59_248_7891813049757812255-1/-local-10000/_metadata
Copying file: file:/tmp/hadoop/2d0fe1f1-954f-4cb7-bc14-1f05d5de3555/hive_2017-09-16_01-23-59_248_7891813049757812255-1/-local-10000/_metadata
Copying data from hdfs://cdh001:9000/user/hive/warehouse/emp
Copying file: hdfs://cdh001:9000/user/hive/warehouse/emp/emp.txt
OK
Time taken: 0.125 seconds
1
2
3
4
[hadoop@cdh001 hive_data]$ hadoop fs -ls /hive_export_table
Found 2 items
-rwxr-xr-x 1 hadoop supergroup 1570 2017-09-16 01:23 /hive_export_table/_metadata
drwxr-xr-x - hadoop supergroup 0 2017-09-16 01:23 /hive_export_table/data

Import

Syntax

1
2
3
IMPORT [[EXTERNAL] TABLE new_or_original_tablename [PARTITION (part_column="value"[, ...])]]
FROM 'source_path'
[LOCATION 'import_target_path']

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
hive> show tables;
OK
emp
emp2
emp3
emptest
hdfsimport
Time taken: 0.024 seconds, Fetched: 5 row(s)
hive> select * from hdfsimport;
OK
7369 SMITH CLERK 7902 1980-12-17 800.0 NULL 20
7499 ALLEN SALESMAN 7698 1981-2-20 1600.0 300.0 30
7521 WARD SALESMAN 7698 1981-2-22 1250.0 500.0 30
7566 JONES MANAGER 7839 1981-4-2 2975.0 NULL 20
7654 MARTIN SALESMAN 7698 1981-9-28 1250.0 1400.0 30
7698 BLAKE MANAGER 7839 1981-5-1 2850.0 NULL 30
7782 CLARK MANAGER 7839 1981-6-9 2450.0 NULL 10
7788 SCOTT ANALYST 7566 1987-4-19 3000.0 NULL 20
7839 KING PRESIDENT NULL 1981-11-17 5000.0 NULL 10
7844 TURNER SALESMAN 7698 1981-9-8 1500.0 0.0 30
7876 ADAMS CLERK 7788 1987-5-23 1100.0 NULL 20
7900 JAMES CLERK 7698 1981-12-3 950.0 NULL 30
7902 FORD ANALYST 7566 1981-12-3 3000.0 NULL 20
7934 MILLER CLERK 7782 1982-1-23 1300.0 NULL 10
8888 HIVE PROGRAM 7839 1988-1-23 10300.0 NULL NULL
Time taken: 0.062 seconds, Fetched: 15 row(s)