hadoop搭建
下载hadoop-1.0.3
验证ssh
>which ssh
>which sshd
>which ssh-keygen
生成ssh密钥对
>ssh-kengen -t rsa
查看ssh公钥
>more /home/hadoopadmin/.ssh/id_rsa.pub (hadoopadmin是ubuntu用户名)
将公钥复制到主节点及各个从节点
>scp ~/.ssh/id_rsa.pub hadoopadmin@target:~/master_key (target是从节点的IP)
手动在从节点添加authorized_keys
~>mkdir .ssh
~>mv master_key .ssh/authorized_keys
这样,就把master机器上生成的ssh公钥部署到了slave上,再进行ssh slave就不会提醒输入密码了。
报错:Agent admitted failure to sign up the key.
方法:手动将私钥加进来,
http://blog.csdn.net/jiangsq12345/article/details/6187144
~>ssh-add .ssh/id_rsa
启动hadoop
修改conf/hadoop-env.sh:
export JAVA_HOME=/work/env/jdk1.6.0_26
修改ssh localhost:
~>ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
~>cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
hadoop主要配置文件:
core-site.xml
hdfs-site.xml
mapred-site.xml
这样,ssh localhost的时候就不用再输入密码了。
(1) 单机模式
(2) 伪分布模式
修改hadoop配置文件
core-site.xml:
查看hadoop状态:http://localhost:50070/dfshealth.jsp
(3) 全分布模式
类似伪分布模式
core-site.xml 定义namenode
mapred-site.xml 定义jobtracker
hdfs-site.xml 定义文件备份数量
masters 定义SecondaryNode
slaves 定义datanode和tasktracker
hadoop默认的工作目录是/tmp/$USER,当机器重启后,这个目录可能会被删掉。可以指定其工作目录,在conf/hdfs-site.xml中:
<?xml version="1.0"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?><!-- Put site-specific property overrides in this file. --><configuration><property><name>dfs.replication</name><value>2</value></property><property><name>dfs.name.dir</name><value>/work/hadoop/hadoop-1.0.3/workdir/namenode</value></property><property><name>dfs.data.dir</name><value>/work/hadoop/hadoop-1.0.3/workdir/datanode</value></property><property><name>hadoop.tmp.dir</name><value>/work/hadoop/hadoop-1.0.3/workdir/tmp</value></property></configuration>