首页 诗词 字典 板报 句子 名言 友答 励志 学校 网站地图
当前位置: 首页 > 教程频道 > 服务器 > 云计算 >

查看Hadoop里的LZO资料的内容

2012-11-26 
查看Hadoop里的LZO文件的内容最近常常需要查看LZO文件里面的内容,这些文件通常很大,放在hdfs上。我没有好的

查看Hadoop里的LZO文件的内容

  最近常常需要查看LZO文件里面的内容,这些文件通常很大,放在hdfs上。我没有好的方法,我以前偶尔查看其中内容都是直接get到本地然后用lzop解压缩然后再more的。这样做当你偶尔使用的时候即使文件稍微大点,也许也是可以接受的。但现在我需要常常grep里面的内容,就不那么欢乐了。

  所以写了个shell脚本lzoc[ lzo cat],用来专门查看HDFS里LZO文件的内容,正常情况下它不输出任何多余的东西,这样就可以和more 、 head、tail等工具一起结合使用了。

  代码如下:

  它有三个选项:

        -c 指示删除已经存在当前目录的同名文件,这往往是为了删除旧的副本而制定的,

       -d 指示最后阶段删除当前目录里中间文件,因为我们会把文件从hdfs中get出来

       -i 指示输出一些交互信息,如果你cat出来的内容要用作它用,那么你不要使用这个选项

使用示例:

        $./lzoc  /user/hadoop/output/filename.lzo | more


#! /bin/sh#description:#   cat the lzo file on hadoopfilePath=""       #full Path of the hadoop lzo filelzoFileName=""    #file with .lzo as extension after hadoop fs -get ....fileName=""       #file name without extension-namedeleteAfterExecute=N  #has -c option, which indicates that old files should be deleteddeleteBeforeExecute=N #has -d option, which indicates that related files should be deleted in the final stateinteractiveMsg=N            #only the text of the file should printif [ $# -lt 1 ]  then     echo "must has aleast one parameter, which is the fileName."    exit -1else    #normal command style    eval filePath=\${$#}  #get the last parameter, must guarantee that it is less then 9    lzoFileName=${filePath##*/}    fileName=${lzoFileName%.lzo*}fi#parase optionsif [ $# -gt 1 ]  then    while getopts cdi OPTION    do      case $OPTION        in          c)            deleteBeforeExecute=Y;;          d)            deleteAfterExecute=Y;;          i)            interactiveMsg=Y;;          \?)            echo "illegal option:$OPTION";            exit -2;;      esac    donefi#delete old file if neededif [ $deleteBeforeExecute == "Y" ]; then    if [ -e $fileName ]; then      echo "delete old file"        rm $fileName;    fi    if [ -e $lzoFileName ]; then      echo "delete old lzo file"        rm $lzoFileName    fifi#make sure hadoop is onwhich hadoop > /dev/null 2>&1if [ $? -eq 1 ]; then  echo "Command not exist,hadoop may not have been started."  exit -3fi#make sure fileExist,should not be a directoryhadoop fs -test -e $filePath > /dev/null 2>&1 if [ $? -ne 0 ]; then  echo "No such file for directory:"$filePath  exit -4fi#can not cat a directoryhadoop fs -test -d $filePath > /dev/null 2>&1if [ $? -eq 0 ]; then  echo "Can not cat a directory:"$filePath  exit -4fi#make sure lzop is installedwhich lzop > /dev/null 2>&1if [ $? -eq 1 ]; then  echo "Tool missed:lzop is not installed."  exit -5fi#test whether lzo file existif [ -e $lzoFileName ]; then  if [ $interactiveMsg == "Y" ]; then    echo "LZO file already exist."  fielse  if [ $interactiveMsg == "Y" ]; then    echo "LZO file not exist."  fi  #get the file from hadoop  hadoop fs -get $filePath .fi#test whether file existif [ -e $fileName ]; then  if [ $interactiveMsg == "Y" ]; then    echo "File already exist."  fielse   if [ $interactiveMsg == "Y" ]; then    echo "File not exist."  fi  #decomopress the lzo file  lzop -dv $lzoFileName > /dev/null 2>&1fi#clear#cat the filecat $fileName#delete files in the final state is neededif [ $deleteAfterExecute == "Y" ]  then    if [ -e $fileName ]; then      rm $fileName    fi    if [ -e $lzoFileName ]; then      rm $lzoFileName    fi    if [ $interactiveMsg == "Y" ]; then      echo "files has been deleted"    fifi


热点排行