博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
hive学习(三) hive的分区
阅读量:4352 次
发布时间:2019-06-07

本文共 4058 字,大约阅读时间需要 13 分钟。

1.Hive 分区partition

必须在表定义时指定对应的partition字段
a、单分区建表语句:
create table day_table (id int, content string) partitioned by (dt string);
单分区表,按天分区,在表结构中存在id,content,dt三列。
以dt为文件夹区分
b、 双分区建表语句:
create table day_hour_table (id int, content string) partitioned by (dt string, hour string);
双分区表,按天和小时分区,在表结构中新增加了dt和hour两列。
先以dt为文件夹,再以hour子文件夹区分
 

2.创建2个表psn2 psn3

create table psn2 (id int,name string,hobby array
,address map
)partitioned by (age int)ROW FORMAT DELIMITEDFIELDS TERMINATED BY ','COLLECTION ITEMS TERMINATED BY '-'MAP KEYS TERMINATED BY ':'LINES TERMINATED BY '\n';

 

3.插入数据

load data local inpath '/root/data' into table psn2 partition (age=10);

4.查询psn2

hive> select * from psn2;OK1    小明1    ["lol","book","movie"]    {"beijing":"changping","shanghai":"pudong"}    102    小明2    ["lol","book","movie"]    {"beijing":"changping","shanghai":"pudong"}    103    小明3    ["lol","book","movie"]    {"beijing":"changping","shanghai":"pudong"}    104    小明4    ["lol","book","movie"]    {"beijing":"changping","shanghai":"pudong"}    105    小明5    ["lol","movie"]    {"beijing":"changping","shanghai":"pudong"}    106    小明6    ["lol","book","movie"]    {"beijing":"changping","shanghai":"pudong"}    107    小明7    ["lol","book"]    {"beijing":"changping","shanghai":"pudong"}    108    小明8    ["lol","book"]    {"beijing":"changping","shanghai":"pudong"}    109    小明9    ["lol","book","movie"]    {"beijing":"changping","shanghai":"pudong"}    10Time taken: 7.93 seconds, Fetched: 9 row(s)
理论上分区可以无限分,但是实际需要根据需求来分区。
如:历史数据按天分区
 

5.错误实例psn3

create table psn3 (id int,name string,age int,hobby array
,address map
)partitioned by (age int)ROW FORMAT DELIMITEDFIELDS TERMINATED BY ','COLLECTION ITEMS TERMINATED BY '-'MAP KEYS TERMINATED BY ':'LINES TERMINATED BY '\n';
报错提示:FAILED: SemanticException [Error 10035]: Column repeated in partitioning columns
原因:分区字段不能再表的列中
 

6.同时创建两个分区

create table psn3 (id int,name string,hobby array
,address map
)partitioned by (age int,sex string)ROW FORMAT DELIMITEDFIELDS TERMINATED BY ','COLLECTION ITEMS TERMINATED BY '-'MAP KEYS TERMINATED BY ':'LINES TERMINATED BY '\n';

 

 
注意:添加两个字段相应的插入数据时要指定两个字段,一个字段会报错。
hive> create table psn3 (    > id int,    > name string,    > hobby array
, > address map
> ) > partitioned by (age int,sex string) > ROW FORMAT DELIMITED > FIELDS TERMINATED BY ',' > COLLECTION ITEMS TERMINATED BY '-' > MAP KEYS TERMINATED BY ':' > LINES TERMINATED BY '\n';OKTime taken: 1.167 seconds

7.向双分区加载数据

hive> load data local inpath '/root/data' into table psn3 partition (age=10);FAILED: SemanticException [Error 10006]: Line 1:63 Partition not found '10'hive> load data local inpath '/root/data' into table psn3 partition (age=10,sex='boy');Loading data to table default.psn3 partition (age=10, sex=boy)OKTime taken: 3.115 secondshive>

8.删除分区

alter table psn2 drop partition (sex='boy');hive> alter table psn3 drop partition (sex='boy');Dropped the partition age=10/sex=boyOKTime taken: 0.195 seconds

9.结论:

添加分区的时候,必须在现有分区的基础之上
删除分区的时候,会将所有存在的分区都删除
 

10.添加分区

添加时必须指定age=10 还是 age=20的分区删除,不然会报错
hive> alter table psn3 add partition(sex='man');FAILED: ValidationFailureSemanticException default.psn3: partition spec {sex=man} doesn't contain all (2) partition columnshive> alter table psn3 add partition(age=10,sex='man');OKTime taken: 0.418 seconds
删除前age=10 和age=20下分别有boy和man两个目录

11.执行删除

hive> alter table psn3 drop partition(sex='man');Dropped the partition age=10/sex=manOKTime taken: 0.389 seconds
load 加载数据的过程其实是在上传文件 partition 是对应hdfs的目录
 

12.通过一个表的查询结果的数据插入到另一个表中

create table psn4 (id int,name string,hobby array
)ROW FORMAT DELIMITEDFIELDS TERMINATED BY ','COLLECTION ITEMS TERMINATED BY '-'LINES TERMINATED BY '\n';
这种插入数据会转换成mapreduce任务
from psn3 insert overwrite table psn4 select id,name,hobby;

13.这种操作的作用:

1.复制表
2.可以作为中间表存在
 
 

转载于:https://www.cnblogs.com/benjamin77/p/10248022.html

你可能感兴趣的文章
laravel连接sql server 2008
查看>>
Ubuntu菜鸟入门(五)—— 一些编程相关工具
查看>>
valgrind检测linux程序内存泄露
查看>>
“==”运算符与equals()
查看>>
单工、半双工和全双工的定义
查看>>
Hdu【线段树】基础题.cpp
查看>>
时钟系统
查看>>
BiTree
查看>>
5个基于HTML5的加载动画推荐
查看>>
水平权限漏洞的修复方案
查看>>
静态链接与动态链接的区别
查看>>
如何使用mysql
查看>>
敏捷开发中软件测试团队的职责和产出是什么?
查看>>
在mvc3中使用ffmpeg对上传视频进行截图和转换格式
查看>>
python的字符串内建函数
查看>>
Spring - DI
查看>>
微软自己的官网介绍 SSL 参数相关
查看>>
Composite UI Application Block (CAB) 概念和术语
查看>>
ajax跨域,携带cookie
查看>>
阶段3 2.Spring_02.程序间耦合_7 分析工厂模式中的问题并改造
查看>>