环境:
- 192.168.0.101:Ubuntu 16.04 充当master节点
- 192.168.0.102:Ubuntu 16.04 充当slave节点
- 我使用两个节点为例,你可以添加更多的slave节点
参考官方文档:http://hadoop.apache.org/docs/stable/
#1 配置hosts
编辑192.168.0.101主机的hosts文件:
1 |
$ sudo vim /etc/hosts |
1 2 |
192.168.0.101 localhost master 192.168.0.102 slave1 |
编辑192.168.0.102主机的hosts文件:
1 2 |
192.168.0.102 localhost slave1 192.168.0.101 master |
#2 安装JDK
hadoop的运行需要java运行环境,在两个主机上安装JDK:
1 |
$ sudo apt install default-jdk |
或安装Oracle JDK:Ubuntu 16.04 安装 Oracle JDK9
JDK的安装路径:
1 2 |
$ ls /usr/lib/jvm default-java java-1.8.0-openjdk-amd64 java-8-openjdk-amd64 |
把jdk添加到.bashrc
文件中:
1 2 |
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 export PATH=$PATH:/usr/lib/jvm/java-8-openjdk-amd64/bin |
使生效:
1 |
$ source ~/.bashrc |
#3 安装配置SSH
在两个主机上安装ssh服务:
1 |
$ sudo apt install openssh-server |
在master主机上执行下面命令生成SSH密钥:
1 |
$ ssh-keygen -t rsa -P "" |
保存路径默认即可。
允许ssh使用前面创建的key连接master:
1 |
$ cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys |
测试:
把在master上生成的public密钥发送到slave1主机上:
1 |
$ ssh-copy-id -i $HOME/.ssh/id_rsa.pub user@slave1 |
#4 安装hadoop
在两个服务器上分别安装hadoop,下载hadoop,目前版本是2.7.3。
master、slave1主机:
1 2 3 4 |
$ cd $ wget http://apache.fayea.com/hadoop/common/hadoop-2.7.3/hadoop-2.7.3.tar.gz $ tar -xvf hadoop-2.7.3.tar.gz $ cd hadoop-2.7.3 |
在.bashrc文件中添加一些环境变量:
1 2 3 4 5 6 7 8 |
export HADOOP_HOME=$HOME/hadoop-2.7.3 export HADOOP_CONF_DIR=$HOME/hadoop-2.7.3/etc/hadoop export HADOOP_MAPRED_HOME=$HOME/hadoop-2.7.3 export HADOOP_COMMON_HOME=$HOME/hadoop-2.7.3 export HADOOP_HDFS_HOME=$HOME/hadoop-2.7.3 export YARN_HOME=$HOME/hadoop-2.7.3 export PATH=$PATH:$HOME/hadoop-2.7.3/bin |
使环境变量生效:
1 |
$ source ~/.bashrc |
编辑hadoop-env.sh文件,配置hadoop使用的JAVA:
创建NameNode和DataNode目录:
1 2 3 |
$ cd $ mkdir -p $HADOOP_HOME/hadoop2_data/hdfs/namenode $ mkdir -p $HADOOP_HOME/hadoop2_data/hdfs/datanode |
Hadoop的配置文件较多,我的配置如下。
hadoop-2.7.3/etc/hadoop/core-site.xml:
1 2 3 4 5 6 |
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://master:9000</value> </property> </configuration> |
hadoop-2.7.3/etc/hadoop/hdfs-site.xml:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
<configuration> <property> <name>dfs.replication</name> <value>2</value> </property> <property> <name>dfs.permissions</name> <value>false</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>/home/ubuntu/hadoop-2.7.3/hadoop2_data/hdfs/namenode</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>/home/ubuntu/hadoop-2.7.3/hadoop2_data/hdfs/datanode</value> </property> </configuration> |
hadoop-2.7.3/etc/hadoop/yarn-site.xml:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
<configuration> <!-- Site specific YARN configuration properties --> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> </configuration> |
1 2 3 |
$ cd ~/hadoop-2.7.3/etc/hadoop/ $ cp mapred-site.xml.template mapred-site.xml $ vim mapred-site.xml |
1 2 3 4 5 6 |
<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration> |
master主机:
1 |
vim ~/hadoop-2.7.3/etc/hadoop/masters |
写入:
1 |
master |
1 |
$ vim ~/hadoop-2.7.3/etc/hadoop/slaves |
写入:
1 2 |
master slave1 |
slave主机:
1 |
vim ~/hadoop-2.7.3/etc/hadoop/masters |
写入:
1 |
master |
#5 运行hadoop
格式化HDFS,只执行一次:
1 |
$ hadoop namenode -format |
启动 NameNode、DataNode、ResourceManager和NodeManager服务:
1 2 3 |
$ cd ~/hadoop-2.7.3 $ sbin/start-dfs.sh $ sbin/start-yarn.sh |
查看运行的服务:
1 |
$ jps |
master运行的服务:
slave1运行的服务:
使用浏览器访问:http://master(192.168.0.101):50070/dfshealth.html,查看NameNode状态。