hadoop学习入门之伪分布式部署及测试 解压所下载的Hadoop发行版。编辑 conf/hadoop-env.sh文件,至少需要将JAVA_HOME设置为Java安装根路径。 如下命令: 用以下三种支持的模式中的一种启动Hadoop集群: Hadoop可以在单节点上以所谓的伪分布式模式运行,此时每一个Hadoop守护进程都作为一个独立的Java进程运行。 使用如下的 conf/core-site.xml: <configuration> conf/mapred-site.xml: <configuration> 首先,请求 namenode 对 DFS 文件系统进行格式化。在安装过程中完成了这个步骤,但是了解是否需要生成干净的文件系统是有用的。 bin/hadoop namenode -format 11/11/3009:53:56.NameNode::/************************************************************STARTUP_MSG: Starting NameNodeSTARTUP_MSG: host = ubuntu1/192.168.0.101STARTUP_MSG: args = [-format]STARTUP_MSG: version = 0.20.2STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010************************************************************/11/11/3009:53:56.FSNamesystem:=,11/11/3009:53:56.FSNamesystem:=11/11/3009:53:56.FSNamesystem:=true11/11/3009:53:56.Storage:Image94in0.11/11/3009:53:57.Storage:Storage//-//.11/11/3009:53:57.NameNode::/************************************************************SHUTDOWN_MSG: Shutting down NameNode at ubuntu1/192.168.0.101************************************************************/ ,//-0.20.2//..//---.out:,//-0.20.2//..//---.out:,//-0.20.2//..//---.out,//-0.20.2//..//---.out:,//-0.20.2//..//---.out 检查hdfs :bin/hadoop fs -ls / 输出目录文件则正常。 hadoop文件系统操作: bin/hadoop fs -mkdir test bin/hadoop fs -ls test bin/hadoop fs -rmr test 测试hadoop: bin/hadoop fs -mkdir input 自己建立两个文本文件:file1和file2放在/opt/hadoop/sourcedata下 执行:bin/hadoopfs -put/opt/hadoop/sourcedata/file* input 执行:bin/hadoop jar hadoop-0.20.2-examples.jar wordcount input output 输出: 11/11/3010:15:38.FileInputFormat:Total:211/11/3010:15:52.JobClient:Running:11/11/3010:15:53.JobClient:0%0%11/11/3010:19:07.JobClient:50%0%11/11/3010:19:14.JobClient:100%0%11/11/3010:19:46.JobClient:100%100%11/11/3010:19:54.JobClient:Job:11/11/3010:19:59.JobClient:Counters:1711/11/3010:19:59.JobClient:JobCounters11/11/3010:19:59.JobClient:Launched=111/11/3010:19:59.JobClient:Launched=211/11/3010:19:59.JobClient:Data-local=211/11/3010:19:59.JobClient:FileSystemCounters11/11/3010:19:59.JobClient:=14611/11/3010:19:59.JobClient:=6411/11/3010:19:59.JobClient:=36211/11/3010:19:59.JobClient:=6011/11/3010:19:59.JobClient:Map-ReduceFramework11/11/3010:19:59.JobClient:Reduce=911/11/3010:19:59.JobClient:Combine=1311/11/3010:19:59.JobClient:Map=211/11/3010:19:59.JobClient:Reduce=10211/11/3010:19:59.JobClient:Reduce=911/11/3010:19:59.JobClient:SpilledRecords=2611/11/3010:19:59.JobClient:Map=12011/11/3010:19:59.JobClient:Combine=1411/11/3010:19:59.JobClient:Map=1411/11/3010:19:59.JobClient:Reduce=13 其他查看结果命令: 11/11/3010:28:37.FileInputFormat:Total:211/11/3010:28:40.JobClient:Running:11/11/3010:28:41.JobClient:0%0%11/11/3010:34:16.JobClient:66%0%11/11/3010:37:40.JobClient:100%11%11/11/3010:37:50.JobClient:100%22%11/11/3010:37:54.JobClient:100%66%11/11/3010:38:15.JobClient:100%100%11/11/3010:38:30.JobClient:Job:11/11/3010:38:32.JobClient:Counters:1811/11/3010:38:32.JobClient:JobCounters11/11/3010:38:32.JobClient:Launched=111/11/3010:38:32.JobClient:Launched=311/11/3010:38:32.JobClient:Data-local=311/11/3010:38:32.JobClient:FileSystemCounters11/11/3010:38:32.JobClient:=4011/11/3010:38:32.JobClient:=7711/11/3010:38:32.JobClient:=18811/11/3010:38:32.JobClient:=10911/11/3010:38:32.JobClient:Map-ReduceFramework11/11/3010:38:32.JobClient:Reduce=111/11/3010:38:32.JobClient:Combine=211/11/3010:38:32.JobClient:Map=211/11/3010:38:32.JobClient:Reduce=4611/11/3010:38:32.JobClient:Reduce=111/11/3010:38:32.JobClient:SpilledRecords=411/11/3010:38:32.JobClient:Map=3011/11/3010:38:32.JobClient:Map=6411/11/3010:38:32.JobClient:Combine=211/11/3010:38:32.JobClient:Map=211/11/3010:38:32.JobClient:Reduce=211/11/3010:38:36.JobClient:UseGenericOptionsParserfor.ApplicationsToolfor. 输出:2hadoop 成功完成伪分布式的部署及测试。如有问题,请留言!
安装运行伪分布式Hadoop(以0.20.2版本为例)准备工作
$ bin/hadoop
将会显示hadoop 脚本的使用文档。下面介绍伪分布式的配置。
伪分布式模式的操作方法
配置
<configuration><property><name></name><value></value></property></configuration>
conf/hdfs-site.xml:
<property>
<name>fs.replication</name>
<value>1</value>
</property>
</configuration>
<property>
<name>mapred.job.tracker</name>
<value>192.168.0.101:9001</value></property></configuration>
输出:
执行:bin/start-all.sh输出:
执行成功!/-////-/--00000
bin/-/--00000|-13
bin/hadoop-get/--00000..|-5
bin/-
执行:bin/hadoop fs -cat output/part-00000