Linux下安装hadoop的步骤
一、前期准备:下载hadoop: http://hadoop.apache.org/core/releases.htmlhttp://hadoop.apache.org/common/releases.htmlhttp://www.apache.org/dyn/closer.cgi/hadoop/core/http://labs.xiaonei.com/apache-mirror/hadoop/core/hadoop-0.20.1/hadoop-0.20.1.tar.gzhttp://labs.xiaonei.com/apache-mirror/hadoop/二、硬件环境共有3台机器,均使用的CentOS,Java使用的是jdk1.6.0。
三、安装JAVA6sudo apt-get install sun-java6-jdk
/etc/environment打开之后加入:#中间是以英文的冒号隔开,记得windows中是以英文的分号做为分隔的CLASSPATH=.:/usr/local/java/libJAVA_HOME=/usr/local/java
三、配置host表[root@hadoop ~]# vi /etc/hosts127.0.0.1 ? ? ? localhost192.168.13.100 ? ? namenode192.168.13.108 ? ? datanode1192.168.13.110 ? ? datanode2
[root@test ~]# vi /etc/hosts127.0.0.1 ? ? ? localhost192.168.13.100 ? ? namenode192.168.13.108 ? ? datanode1
[root@test2 ~]# vi /etc/host127.0.0.1 ? ? ? localhost192.168.13.100 ? ? namenode192.168.13.110 ? ? datanode2添加用户和用户组addgroup hadoopadduser ?hadoopusermod -a -G hadoop hadooppasswd hadoop
配置ssh:
服务端:su hadoopssh-keygen ?-t ?rsacp id_rsa.pub authorized_keys
客户端chmod 700 /home/hadoopchmod 755 /home/hadoop/.sshsu hadoopcd /homemkdir .ssh
服务端:chmod 644 /home/hadoop/.ssh/authorized_keysscp authorized_keys datanode1:/home/hadoop/.ssh/scp authorized_keys datanode2:/home/hadoop/.ssh/
ssh datanode1ssh datanode2
如果ssh配置好了就会出现以下提示信息The authenticity of host [dbrg-2] can't be established.Key fingerpr is 1024 5f:a0:0b:65:d3:82:df:ab:44:62:6d:98:9c:fe:e9:52.Are you sure you want to continue connecting (yes/no)? OpenSSH告诉你它不知道这台主机但是你不用担心这个问题你是第次登录这台主机键入“yes”这将把这台主机“识别标记”加到“~/.ssh/know_hosts”文件中第 2次访问这台主机时候就不会再显示这条提示信
不过别忘了测试本机ssh dbrg-1
mkdir /home/hadoop/HadoopInstalltar -zxvf hadoop-0.20.1.tar.gz -C /home/hadoop/HadoopInstall/cd /home/hadoop/HadoopInstall/ln ?-s ?hadoop-0.20.1 ?hadoop
export JAVA_HOME=/usr/local/javaexport CLASSPATH=.:/usr/local/java/libexport HADOOP_HOME=/home/hadoop/HadoopInstall/hadoopexport HADOOP_CONF_DIR=/home/hadoop/hadoop-confexport PATH=$HADOOP_HOME/bin:$PATH
cd $HADOOP_HOME/conf/mkdir /home/hadoop/hadoop-confcp hadoop-env.sh core-site.xml hdfs-site.xml mapred-site.xml masters slaves /home/hadoop/hadoop-conf
vi $HADOOP_HOME/hadoop-conf/hadoop-env.sh
# The java implementation to use. ?Required. --修改成你自己jdk安装的目录export JAVA_HOME=/usr/local/java??export ?HADOOP_CLASSPATH=.:/usr/local/java/lib# The maximum amount of heap to use, in MB. Default is 1000.--根据你的内存大小调整export HADOOP_HEAPSIZE=200 ? ? ? ? ?
vi /home/hadoop/.bashrcexport JAVA_HOME=/usr/local/javaexport CLASSPATH=.:/usr/local/java/libexport HADOOP_HOME=/home/hadoop/HadoopInstall/hadoopexport HADOOP_CONF_DIR=/home/hadoop/hadoop-confexport PATH=$HADOOP_HOME/bin:$PATH
配置
namenode
#vi $HADOOP_CONF_DIR/slaves192.168.13.108192.168.13.110
#vi $HADOOP_CONF_DIR/core-site.xml<?xml version="1.0"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?>?<!-- Put site-specific property overrides in this file. -->?<configuration><property>? ?<name>fs.default.name</name>? ?<value>hdfs://192.168.13.100:9000</value></property></configuration>
#vi $HADOOP_CONF_DIR/hdfs-site.xml?<?xml version="1.0"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?>?<!-- Put site-specific property overrides in this file. -->?<configuration><property>? <name>dfs.replication</name>? <value>3</value>? <description>Default block replication.? The actual number of replications can be specified when the file is created.? The default is used if replication is not specified in create time.? </description></property></configuration>
#vi $HADOOP_CONF_DIR/mapred-site.xml?
<?xml version="1.0"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?>?<!-- Put site-specific property overrides in this file. -->?<configuration><property>? ? ? ? <name>mapred.job.tracker</name>? ? ? ? <value>192.168.13.100:11000</value>? ?</property></configuration>~ ? ? ? ? ? ? ? ?
在slave上的配置文件如下(hdfs-site.xml不需要配置):[root@test12 conf]# cat core-site.xml?<?xml version="1.0"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>? ?<property>? ? ? <name>fs.default.name</name>? ? ? <value>hdfs://namenode:9000</value>? ?</property></configuration>
[root@test12 conf]# cat mapred-site.xml?<?xml version="1.0"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>? ?<property>? ? ? ? <name>mapred.job.tracker</name>? ? ? ? <value>namenode:11000</value>? ?</property></configuration>
启动export PATH=$HADOOP_HOME/bin:$PATH
hadoop namenode -formatstart-all.sh停止stop-all.sh
在hdfs上创建danchentest文件夹,上传文件到此目录下$HADOOP_HOME/bin/hadoop fs -mkdir danchentest$HADOOP_HOME/bin/hadoop fs -put $HADOOP_HOME/README.txt danchentest
cd $HADOOP_HOMEhadoop jar hadoop-0.20.1-examples.jar wordcount ?/user/hadoop/danchentest/README.txt output109/12/21 18:31:44 INFO input.FileInputFormat: Total input paths to process : 109/12/21 18:31:45 INFO mapred.JobClient: Running job: job_200912211824_000209/12/21 18:31:46 INFO mapred.JobClient: ?map 0% reduce 0%09/12/21 18:31:53 INFO mapred.JobClient: ?map 100% reduce 0%09/12/21 18:32:05 INFO mapred.JobClient: ?map 100% reduce 100%09/12/21 18:32:07 INFO mapred.JobClient: Job complete: job_200912211824_000209/12/21 18:32:07 INFO mapred.JobClient: Counters: 1709/12/21 18:32:07 INFO mapred.JobClient: ? Job Counters?09/12/21 18:32:07 INFO mapred.JobClient: ? ? Launched reduce tasks=1
查看输出结果文件,这个文件在hdfs上[root@test11 hadoop]# hadoop fs -ls output1Found 2 itemsdrwxr-xr-x ? - root supergroup ? ? ? ? ?0 2009-09-30 16:01 /user/root/output1/_logs-rw-r--r-- ? 3 root supergroup ? ? ? 1306 2009-09-30 16:01 /user/root/output1/part-r-00000
[root@test11 hadoop]# hadoop fs -cat output1/part-r-00000(BIS), ?1(ECCN) ?1
查看hdfs运行状态,可以通过web界面来访问http://192.168.13.100:50070/dfshealth.jsp;查看map-reduce信息,可以通过web界面来访问http://192.168.13.100:50030/jobtracker.jsp;下面是直接命令行看到的结果。
出现08/01/25 16:31:40 INFO ipc.Client: Retrying connect to server: foo.bar.com/1.1.1.1:53567. Already tried 1 time(s).的原因是没有格式化:hadoop namenode -format
文章链接源自:http://blog.chinaunix.net/space.php?uid=11121450&do=blog&id=359078