hadoop完全分布式搭建

hadoop kda 1343℃ 0评论

1:集群域名配置

/etc/hosts文件增加项目
vim /etc/hosts

192.168.93.201 master201
192.168.93.202 slave202
192.168.93.203 slave203
192.168.93.204 slave204

2:新增hadoop用户。

如果直接用root来安装配置hadoop等软件,会有安全隐患。最好新增一个hadoop用户,把所有的任务交给这个用户。不过没有也没关系。
(1)增加hadoop用户
sudo adduser hadoop
passwd hadoop

账户密码都是hadoop方便记忆。
在创建hadoop用户的同时也创建了hadoop用户组,下面我们把hadoop用户加入到hadoop用户组
输入
sudo usermod -a -G hadoop hadoop
然后再把hadoop用户赋予root权限,让他可以使用sudo命令
切换到可以root的用户输入
sudo gedit /etc/sudoers

(2)给hadoop用户增加权限
sudo vi /etc/sudoers
在图形界面可以用第一个命令,是ubuntu自带的一个文字编辑器,终端命令界面使用第二个命令。有关vi编辑器的使用自行百度。
修改文件如下:
# User privilege specification
root ALL=(ALL) ALL
hadoop ALL=(ALL) ALL
保存退出,hadoop用户就拥有了root权限root ALL=(ALL) ALL
user ALL=(ALL) ALL

另外一种方法
visudo

3:网络设置,NAT模式,需要ping通网络

(1)配置网卡 vim /etc/sysconfig/network-scripts/ifcfg-eno16777736

TYPE=Ethernet
BOOTPROTO=none
DEFROUTE=yes
IPV4_FAILURE_FATAL=no
IPV6INIT=no
IPV6_AUTOCONF=no
IPV6_DEFROUTE=no
IPV6_FAILURE_FATAL=no
NAME=eno16777736
UUID=17a315a8-df51-4de0-a587-d1655acc8187
DEVICE=eno16777736
PEERDNS=yes
PEERROUTES=yes
IPV6_PEERDNS=no
IPV6_PEERROUTES=no
ONBOOT=yes
IPADDR=192.168.93.202
PREFIX=24
GATEWAY=192.168.93.2
DNS=192.168.93.2

(2)更改域名服务
vim /etc/resolv.conf

nameserver 192.168.93.2

(3)测试是否连通
service network restart
ping www.baidu.com

(4)网络如果不连通
看一看是否是vmware的dhcp,NAT服务是否正常开启
vmware在windows中有vmware DHCP Service,vmware NAT Service
(5)关闭防火墙
sudo systemctl disable firewalld.service //"开机自启"禁用
sudo systemctl stop firewalld.service //停止防火墙

[centos7]防火墙命令

 systemctl enable firewalld.service //"开机启动"启用
sudo systemctl disable firewalld.service //"开机自启"禁用
sudo systemctl start firewalld.service //启动防火墙
sudo systemctl stop firewalld.service //停止防火墙
sudo systemctl status firewalld.service //查看防火墙状态


4:下载安装jdk

需要把这些软件上传或者下载到centos当中,可以使用下载命令,也可以通过
winscp软件
(1)卸载原有jdk
centos7,自带Jdk。可以卸载,也可以不卸载。

rpm -qa | grep java
rpm -e --nodeps java-1.8.0-openjdk
sudo rpm -e --nodeps java-1.8.0-openjdk
sudo rpm -e --nodeps java-1.8.0-openjdk-headless
sudo rpm -e --nodeps java-1.7.0-openjdk
sudo rpm -e --nodeps java-1.7.0-openjdk-headless

[[email protected] local]$ sudo rpm -e --nodeps java-1.
java-1.7.0-openjdk java-1.8.0-openjdk-headless
java-1.7.0-openjdk-headless
[[email protected] local]$ sudo rpm -e --nodeps java-1.
java-1.7.0-openjdk java-1.8.0-openjdk-headless
java-1.7.0-openjdk-headless
[[email protected] local]$ sudo rpm -e --nodeps java-1.8.0-openjdk-headless
[[email protected] local]$ rpm -qa | grep java
javapackages-tools-3.4.1-11.el7.noarch
tzdata-java-2015g-1.el7.noarch
java-1.7.0-openjdk-1.7.0.91-2.6.2.3.el7.x86_64
java-1.7.0-openjdk-headless-1.7.0.91-2.6.2.3.el7.x86_64
python-javapackages-3.4.1-11.el7.noarch
[[email protected] local]$ java -version
java version "1.7.0_91"
OpenJDK Runtime Environment (rhel-2.6.2.3.el7-x86_64 u91-b00)
OpenJDK 64-Bit Server VM (build 24.91-b01, mixed mode)
[[email protected] local]$ sudo rpm -e --nodeps java-1.7.0-openjdk
java-1.7.0-openjdk java-1.7.0-openjdk-headless
[[email protected] local]$ sudo rpm -e --nodeps java-1.7.0-openjdk
java-1.7.0-openjdk java-1.7.0-openjdk-headless
[[email protected] local]$ sudo rpm -e --nodeps java-1.7.0-openjdk
[[email protected] local]$ sudo rpm -e --nodeps java-1.7.0-openjdk-headless
[[email protected] local]$ java -version
bash: java: 未找到命令...

(2)安装jdk
tar -zxf hadoop-2.7.3.tar.gz -C /usr/local/
sudo chown -R hadoop:hadoop /usr/local/hadoop-2.7.3

有三个环境变量配置文件分别是
/etc/profile;
~/.bashrc;
~.bash_profile

这三个文件,各有用处,我们配置其中一个。
vim ./.bashrc

export JAVA_HOME=/usr/local/jdk1.8.0_151
export JRE_HOME=$JAVA_HOME/jre
export CLASS_PATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$JRE_HOME/lib
export PATH=$PATH:$JAVA_HOME/bin:$JRE_HOME/bin

5:下载安装hadoop

(1)用户的使用
如果安装在/usr/local文件夹下,就需要使用root用户。因为这个文件夹属于root的。
如果安装在本用户的home目录下,就不需要切换到root用户。
tar -xzf hadoop -C /usr/local
hadoop使用root安装,或者非root安装都无所谓,
重要的是配好用户的环境变量就好

(2)环境变量配置
如果想要配置全局环境变量,就用root用户下
vim /etc/profile
如果只是用户自己的环境变量,就不用切换root用户
直接 vim ./.bashrc 就行
环境变量的内容参考如下

export JAVA_HOME=/usr/local/jdk1.8.0_151
export JRE_HOME=$JAVA_HOME/jre
export CLASS_PATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$JRE_HOME/lib
export PATH=$PATH:$JAVA_HOME/bin:$JRE_HOME/bin

export HADOOP_HOME=/usr/local/hadoop-2.7.3
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin

(3)安装成功
配置完成标志,下面几个命令都可以正常显示
echo $JAVA_HOME
java -version
echo $HADOOP_HOME
hadoop version

(4)如果配置个人用户的环境变量
要把两个安装文件改成个人用户所有。否则,个人用户没有启动权限。
chown -R hadoop /usr/local/hadoop
chown -R hadoop /usr/local/hadoop-2.7.3/

6:配置hadoop
{HADOOP_HOME}/etc/hadoop/目录下
(1)配置core-site.xml

[[email protected] hadoop]$ vim core-site.xml

fs.default.name
hdfs://master201:9000
指定HDFS的默认名称
fs.defaultFS
hdfs://master201:9000
HDFS的URI
hadoop.tmp.dir
/usr/local/hadoop/tmp
节点上本地的hadoop临时文件夹

(2)配置hdfs-site.xml
[[email protected] hadoop]$ vim hdfs-site.xml

dfs.namenode.name.dir
file:/usr/local/hadoop/hdfs/name
namenode上存储hdfs名字空间元数据
dfs.datanode.data.dir
file:/usr/local/hadoop/hdfs/data
datanode上数据块的物理存储位置
dfs.replication
3
副本个数,默认是3,应小于datanode机器数量

(3)配置yarn-site.xml
[[email protected] hadoop]$ vim yarn-site.xml


yarn.resourcemanager.hostname
master201
指定resourcemanager所在的hostname
yarn.nodemanager.aux-services
mapreduce_shuffle

NodeManager上运行的附属服务。
需配置成mapreduce_shuffle,才可运行MapReduce程序

(4)配置mapred-site.xml
[[email protected] hadoop]$ vim mapred-site.xml

mapreduce.framework.name
yarn
指定mapreduce使用yarn框架

(5)配置slaves
[[email protected] hadoop]$ vim slaves
slave202
slave203
slave204

6:拷贝副本

文件复制到另外的机器上。完成了个centos的jdk和hadoop安装,就可以把安装的内容复制多份。
(1)直接使用VMWare的复制虚拟机

(2)自己新建centos虚拟机,然后复制文件
scp -r hadoop-2.7.3 [email protected]:/usr/local
输入密码
scp -r /usr/local/jdk1.8.0_151/ [email protected]:/usr/local

复制完之后,记得改回用户属性

7:配置ssh免密登录
配置ssh免密登录,是指master和每台slave之间ssh免密登录,slave之间也需要免密登录。
如果是root用户,就要在root用户下进行配置。如果非root用户,那就在那个用户(如hadoop)下配置。

(1)修改 /etc/hostname文件
vim /etc/hostname
改成对应的文章开头的hostname

(2)个人用户免密登录本机localhost四步:
ssh-keygen -t rsa
cd .ssh
cat id_rsa.pub >> authorized_keys
chmod 600 authorized_keys

出现问题:
ssh localhost
主要是不能够有写的权限,至于执行的权限,就不清楚了
遇到问题:
Agent admitted failure to sign using the key
解決方式 使用 ssh-add 指令将私钥 加进来 (根据个人的密匙命
名不同更改 id_rsa)
# ssh-add ~/.ssh/id_rsa

(3)不同机器之间免密登录
scp id_rsa.pub [email protected]:/home/hadoop/.ssh/id_rsa.pub.slave202
cat id_rsa.pub.slave202 >> authorized_key

或者
ssh-copy-id servername
来回要
//不ssh localhost情况下会怎样,也可以成功
scp id_rsa.pub [email protected]:/home/hadoop/.ssh/id_rsa.pub.slave203
cat id_rsa.pub.slave203>> authorized_keys

(4)任何机器免密登录两步骤(简化之前的命令)
ssh-keygen -t -rsa
ssh-copy-id master201
ssh-copy-id slave204

8:启动hadoop
(1)格式化namenode
hdfs namenode -format

(2)启动hdfs

start-dfs.sh
[[email protected] local]$ start-dfs.sh
18/06/04 01:44:27 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting namenodes on [master201]
master201: starting namenode, logging to /usr/local/hadoop-2.7.3/logs/hadoop-hadoop-namenode-master201.out
slave204: starting datanode, logging to /usr/local/hadoop-2.7.3/logs/hadoop-hadoop-datanode-slave204.out
slave203: starting datanode, logging to /usr/local/hadoop-2.7.3/logs/hadoop-hadoop-datanode-slave203.out
slave202: starting datanode, logging to /usr/local/hadoop-2.7.3/logs/hadoop-hadoop-datanode-slave202.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /usr/local/hadoop-2.7.3/logs/hadoop-hadoop-secondarynamenode-master201.out
18/06/04 01:44:56 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

(3)启动yarn

start-yarn.sh
[[email protected] local]$ start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /usr/local/hadoop-2.7.3/logs/yarn-hadoop-resourcemanager-master201.out
slave202: starting nodemanager, logging to /usr/local/hadoop-2.7.3/logs/yarn-hadoop-nodemanager-slave202.out
slave203: starting nodemanager, logging to /usr/local/hadoop-2.7.3/logs/yarn-hadoop-nodemanager-slave203.out
slave204: starting nodemanager, logging to /usr/local/hadoop-2.7.3/logs/yarn-hadoop-nodemanager-slave204.out

(3)启动historyserver

[[email protected] local]$ mr-jobhistory-daemon.sh start historyserver
starting historyserver, logging to /usr/local/hadoop-2.7.3/logs/mapred-hadoop-historyserver-master201.out

9:查看hadoop相关信息
(1)hdfs后台管理页面
http://192.168.93.201:50070/dfshealth.html#tab-overview

(2)yarn后台管理页面
http://192.168.93.201:8088/cluster

(3)jps进程查看
(3.1)主节点的进程

[[email protected] local]$ jps
19616 SecondaryNameNode
20341 ResourceManager
20600 Jps
19150 NameNode

有historyserve

[[email protected] local]$ jps
19616 SecondaryNameNode
20705 JobHistoryServer
20341 ResourceManager
20744 Jps
19150 NameNode

(3.2)从节点进程

[[email protected] hadoop]# jps
34148 NodeManager
33643 DataNode
34477 Jps

(3)hdfs进程报告
hdfs dfsadmin -report

10:附环境变量配置总的变化

export JAVA_HOME=/usr/local/jdk1.8.0_151
export JRE_HOME=$JAVA_HOME/jre
export CLASS_PATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$JRE_HOME/lib
export PATH=$PATH:$JAVA_HOME/bin:$JRE_HOME/bin

export HADOOP_HOME=/usr/local/hadoop-2.7.3
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin

转载请注明:沐雨语曦 » hadoop完全分布式搭建

喜欢 (2)
发表我的评论
取消评论

表情

Hi,您需要填写昵称和邮箱!

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址