Hive 数据模型

发布时间：2020-07-09 21:21:02 作者：菜鸟的征程
来源：网络阅读：1590

Hive 数据模型

Hive 数据表有五种类型：内部表，外部表，分区表，桶表，视图表，默认以 tab 分隔

* MySQL (Oracle) 表默认以逗号分隔，因此，要想导入 MySQL(Oracle) 数据，需要设置分隔符，在建表语句后加：

row format delimited fields terminated by ',';

内部表：相当于 MySQL 中的表，将数据保存到Hive 自己的数据仓库目录中：/usr/hive/warehouse

例子：

create table emp

(empno int,

ename string,

job string,

mgr int,

hiredate string,

sal int,

comm int,

deptno int

);

导入数据到表中：本地、HDFS

load语句、insert语句

load语句相当于ctrl+X

load data inpath '/scott/emp.csv' into table emp; ----> 导入HDFS

load data local inpath '/root/temp/***' into table emp; ----> 导入本地文件

创建表，并且指定分隔符

create table emp1

(empno int,

ename string,

job string,

mgr int,

hiredate string,

sal int,

comm int,

deptno int

)row format delimited fields terminated by ',';

创建部门表，保存部门数据

create table dept

(deptno int,

dname string,

loc string

)row format delimited fields terminated by ',';

load data inpath '/scott/dept.csv' into table dept;

外部表：相对于内部表，数据不在自己的数据仓库中，只保存数据的元信息

例子：

（*）实验的数据

[root@bigdata11 ~]# hdfs dfs -cat /students/student01.txt

1,Tom,23

2,Mary,24

[root@bigdata11 ~]# hdfs dfs -cat /students/student02.txt

3,Mike,26

（*）定义：（1）表结构（2）指向的路径

create external table students_ext

(sid int,sname string,age int)

row format delimited fields terminated by ','

location '/students';

分区表：将数据按照设定的条件分开存储，提高查询效率，分区-----> 目录

例子：

（*）根据员工的部门号建立分区

create table emp_part

(empno int,

ename string,

job string,

mgr int,

hiredate string,

sal int,

comm int

)partitioned by (deptno int)

row format delimited fields terminated by ',';

往分区表中导入数据：指明分区

insert into table emp_part partition(deptno=10) select empno,ename,job,mgr,hiredate,sal,comm from emp1 where deptno=10;

insert into table emp_part partition(deptno=20) select empno,ename,job,mgr,hiredate,sal,comm from emp1 where deptno=20;

insert into table emp_part partition(deptno=30) select empno,ename,job,mgr,hiredate,sal,comm from emp1 where deptno=30;

桶表：本质上也是一种分区表，类似 hash 分区桶 ----> 文件

例子：

创建一个桶表，按照员工的职位job分桶

create table emp_bucket

(empno int,

ename string,

job string,

mgr int,

hiredate string,

sal int,

comm int,

deptno int

)clustered by (job) into 4 buckets

row format delimited fields terminated by ',';

使用桶表，需要打开一个开关

set hive.enforce.bucketing=true;

使用子查询插入数据

insert into emp_bucket select * from emp1;

视图表：视图表是一个虚表，不存储数据，用来简化复杂的查询

例子：

查询部门名称、员工的姓名

create view myview

select dept.dname,emp1.ename

from emp1,dept

where emp1.deptno=dept.deptno;

select * from myview;

Hive 数据模型

相关阅读