大数据系列5:Pig – 大数据分析平台
wget http://mirror.bit.edu.cn/apache/pig/pig-0.11.1/pig-0.11.1.tar.gz
tar -xzvf pig-0.11.1.tar.gz
sudo vi /etc/profile
增加:
??????export PIG_HOME=/home/ysc/pig-0.11.1
exportPATH=$PATH:$PIG_HOME/bin
source /etc/profile
cp conf/log4j.properties.template conf/log4j.properties
pig --help
LocalMode:
1、pig -x local
2、java -cp /home/ysc/pig-0.11.1/pig-0.11.1.jar org.apache.pig.Main -x local
MapreduceMode(Default):
1、pig
2、pig -x mapreduce
3、java -cp /home/ysc/pig-0.11.1/pig-0.11.1.jar:/home/ysc/hadoop-1.2.1/conf org.apache.pig.Main
4、java -cp /home/ysc/pig-0.11.1/pig-0.11.1.jar:/home/ysc/hadoop-1.2.1/conf org.apache.pig.Main -x mapreduce
准备数据:
hadoop fs -put /etc/passwd passwd
Interactive Mode:
进入Pig shell(Local或Mapreduce Mode):
pig(pig -x local)
grunt> A = load 'passwd' using PigStorage(':');
grunt> B = foreach A generate $0 as id;
grunt> dump B;
Batch Mode:
编写脚本:
vi id.pig
输入:
/* id.pig */
-- load the passwd file
A = load 'passwd' using PigStorage(':');
-- extract the user IDs
B = foreach A generate $0 as id;
-- write the results to a file name id.out
store B into 'id.out';
运行脚本(Local或Mapreduce Mode):
pig(pig -x local) id.pig
查看结果:
hadoopfs -cat id.out/part-m-00000
Pig使用HCatalog管理数据:
启动Metastore
hcat_server.sh start & (或:hive --service metastore &)
sudo vi /etc/profile
增加:
export PIG_CLASSPATH=$HCAT_HOME/share/hcatalog/hcatalog-*.jar:\
$HIVE_HOME/lib/hive-metastore-*.jar:$HIVE_HOME/lib/libthrift-*.jar:\
$HIVE_HOME/lib/hive-exec-*.jar:$HIVE_HOME/lib/libfb303-*.jar:\
$HIVE_HOME/lib/jdo2-api-*-ec.jar:$HIVE_HOME/lib/slf4j-api-*.jar
export PIG_OPTS=-Dhive.metastore.uris=thrift://host001:9083
?????? source /etc/profile
创建表:
?????? ?????? hcat -e "CREATETABLE students (name STRING, age INT)??ROW FORMAT DELIMITED?? FIELDS TERMINATED BY '\t'?? LINES TERMINATED BY'\n'?? STORED AS TEXTFILE; "
准备数据:
?????? vi students.txt
?????? 输入:
刘德华51
张学友52
刘亦菲41
杨尚川27
成龙?? 55
洪金宝52
林志玲40
?? hadoop fs -put students.txt /user/ysc/students.txt
启动pig:
pig -Dpig.additional.jars=$PIG_CLASSPATH
存储数据:
????? students = LOAD '/user/ysc/students.txt' AS (name:chararray, age:int);
????? dump students;
STORE students INTO 'students' USING org.apache.hcatalog.pig.HCatStorer();
加载数据:
A= LOAD 'students' USING org.apache.hcatalog.pig.HCatLoader();
???????dump A;