hive行列转换总结
1、单列转换成多行
比如:
pageid paged
page1 a,b,c
要转换成
page1 a
page1 b
page1 c
select pageid,p from test lateral view explode(split(paged,',')) adtable as p;
通过split拆成多个元素的集合,再通过split打散成多行,lateral view作用是解决explode这种UDTF函数拆成的数据不能与其他字段共同服务问题
2、多行转换成单列
即上面的结果又反推回去,想到oracle的sys_connect_path,便写个UDF实现类似功能
import java.util.ArrayList;import org.apache.hadoop.hive.ql.exec.UDF;public class RowToCol extends UDF {private static int MAX_VALUE = 50;private static String prerows[] = new String[MAX_VALUE];//全局变量,保留上一条记录的值private static StringBuilder val=null;public String evaluate (Object ...args){ArrayList<String> thisrowlist=new ArrayList<String>();for(int i=0;i<args.length;i++)thisrowlist.add(args[i].toString()) ;if (val==null){for(int i=0;i<thisrowlist.size();i++)prerows[i] = thisrowlist.get(i);val=new StringBuilder(thisrowlist.get(thisrowlist.size()-1));//最后一个参数值作为返回值return val.toString();}for(int i=0;i<thisrowlist.size()-1;i++){if ( !prerows[i].equals(thisrowlist.get(i)) ) //跳到b的时候prerows[0]="a" thisrowlist[0]="b" 满足这种情况重新置1{for (int j=0;j<thisrowlist.size();j++){prerows[j] = thisrowlist.get(j);}val=new StringBuilder(thisrowlist.get(thisrowlist.size()-1));return val.toString();}}return val.append(",").append(thisrowlist.get(thisrowlist.size()-1)).toString();}public static void main(String[] args){RowToCol rt=new RowToCol();System.out.println(rt.evaluate("page1","a"));System.out.println(rt.evaluate("page2","d"));System.out.println(rt.evaluate("page2","e"));System.out.println(rt.evaluate("page2","x"));System.out.println(rt.evaluate("page2","x"));System.out.println(rt.evaluate("page2","y"));}}
最后生成结果
select pageid,max(row_to_col(pageid,paged)) from (select pageid,paged from test2 distribute by pageid sort by pageid,paged) a group by pageid;
3、多行转多列、多列转多行
与数据库中SQL写法一样,参考博文:http://blog.csdn.net/u011750989/article/details/11797463