Hive内置函数的使用(1)
工作中用到了一些,总结一下:
1、 行拆分、合并
1. 分拆explode(ARRAY)
返回值:多行
SELECT explode(myCol) AS myNewCol FROM myTable;
注:1.使用UDTF函数时,select中不可以包含其他表达式;
2.UDTF函数不能嵌套使用;
3.UDTF不支持GROUP BY / CLUSTER BY / DISTRIBUTE BY / SORT BY;
2. 合并去重collect_set(col)
返回值:数组(去重的效果)
Xjt2原始数据:
1 one
2 two
2 two
2 two
3 three
4 four
4 four
1 one one
1 one two
2 twotwo
2 twoone
select id,collect_set(name) from xjt2 group by id;
1 ["one two","oneone","one"]
2 ["twoone","twotwo","two"]
3 ["three"]
4 ["four"]
select collect_set(name) from xjt2;
["three","one","four","two"]
注:当collect_set(col)与其他字段同时在select语句中时,必须使用group by other_fields;
2、 时间函数
1. 获取当前Unix时间戳unix_timestamp()
返回值类型:BIGINT
select unix_timestamp() from xjt1;
1383012276
2. 将日期转时间戳unix_timestamp(string date)
返回值类型:BIGINT,若转换失败,则返回0
select unix_timestamp('2013-01-13 00:00:00') from xjt1;
1358006400
3. 转化指定格式(pattern)日期转时间戳unix_timestamp(string date, string pattern)
返回值类型:BIGINT,若转化失败,则返回0
select unix_timestamp('2013-01-13 00:00:00','yyyyMMdd')from xjt1;
1354291200
4. 将Unix时间戳转日期from_unixtime(BIGINT,’format’)
select from_unix(unix_timestamp(),'yyyyMMdd') from xjt1;
20131029
5. 取日期to_date()、取年year()、取月month()、取天数day()
返回值类型:SRING
select to_date('1990-10-10 00:00:00') from xjt1;
1990-10-10
6. 日期增加函数date_add(string startdate, int days)
返回值类型:STRING
select date_add('2013-10-29',10) from xjt1;
2013-11-08
7. 日期减少函数date_sub(string startdate, intdays)
返回值类型:STRING
select date_sub('2013-10-29',10) from xjt1;
2013-10-19
8. 日期比较函数datediff(string enddate, string startdate)
返回值类型:INT(结束日期减去开始日期,结束日期放在前面)
select datediff('2013-10-29','2013-12-10') from xjt1;
-42
3、 条件判断函数CASE
返回值:T/F
语法:CASE a WHEN b THENc [WHEN d THEN e]* [ELSE f] END
说明:如果 a 等于 b ,那么返回 c ;如果 a 等于 d ,那么返回 e ;否则返回 f
4、 字符串分割函数split(stringstr, string pat)
返回值类型:ARRAY
selectsplit('hello world hello hive',' ') from xjt1;
["hello","world","hello","hive"]