前言前面我们已经搭建了 本地模式 、伪分布式模式 和 完全分布式模式
接着上节我们开始搭建一个HA高可用模式
,上节我们创建了三个虚拟机,分别为 master1
、slave1
和 slave2
。
为啥要使用HA高可用架构
?
单点故障问题 : 如果集群中的 NameNode
挂掉了,HDFS
服务就无法正常使用。HDFS HA
就可以从新从可用的 NameNode
中选举一个新的主 NameNode
。内存受限问题 :Federation
联邦通过多个 namenode/namespace
把元数据的存储和管理分散到多个节点中,使到 namenode/namespace
可以通过增加机器来进行水平扩展。HDFS HA
高可用架构图:
Active NameNode 和 Standby NameNode :两台 NameNode
形成互备,一台处于 Active
状态,为主 NameNode
,另外一台处于 Standby
状态,为备 NameNode
,只有主 NameNode
才能对外提供读写服务。主备切换控制器 ZKFailoverController :ZKFailoverController
作为独立的进程运行,对 NameNode
的主备切换进行总体控制。ZKFailoverController
能及时检测到 NameNode
的健康状况,在主 NameNode
故障时借助 Zookeeper
实现自动的主备选举和切换,当然 NameNode
目前也支持不依赖于 Zookeeper
的手动主备切换。Zookeeper 集群 :为主备切换控制器提供主备选举支持。共享存储系统 :共享存储系统是实现 NameNode
的高可用最为关键的部分,共享存储系统保存了 NameNode
在运行过程中所产生的 HDFS
的元数据。主 NameNode
和 NameNode
通过共享存储系统实现元数据同步。在进行主备切换的时候,新的主 NameNode
在确认元数据完全同步之后才能继续对外提供服务。DataNode 节点 :除了通过共享存储系统共享 HDFS
的元数据信息之外,主 NameNode
和备 NameNode
还需要共享 HDFS
的数据块和 DataNode
之间的映射关系。DataNode
会同时向主 NameNode
和备 NameNode
上报数据块的位置信息。这个是hadoop
官网关于部署 HA高可用
的 文档 HDFS High Availability
我们的HA
架构如下:
HostName IP NN DN RM NM JN ZK ZKFC master1 192.168.8.101 √ √ √ √ slave1 192.168.8.201 √ √ √ √ √ √ slave2 192.168.8.202 √ √ √ √ √
NN 是 NameNode
DN 是 DataNode
RM 是 ResourceManager
NM 是 NodeManager
JN 是 JournalNode
ZKFC 是 DFSZKFailoverController
准备HA
高可用需要部署 zookepper
集群. zk 下载地址
我们直接在 master1
下载并配置好 zookepper
然后同步到 slave1
和 slave2
.
1 2 3 4 cd /opt/bigdatawget https://archive.apache.org/dist/zookeeper/zookeeper-3.6.1/apache-zookeeper-3.6.1-bin.tar.gz tar -zxvf apache-zookeeper-3.6.1-bin.tar.gz mv apache-zookeeper-3.6.1-bin zookeeper-3.6.1
zookeeper 集群我们分别在 master1
、slave1
和 slave2
分别部署,zookeeper
服务。
1 cd /opt/bigdata/zookeeper-3.6.1
将 zoo_sample.cfg
文件复制并重命名为 zoo.cfg
文件。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 # The number of milliseconds of each tick tickTime=2000 # The number of ticks that the initial # synchronization phase can take initLimit=10 # The number of ticks that can pass between # sending a request and getting an acknowledgement syncLimit=5 # the directory where the snapshot is stored. dataDir=/opt/bigdata/zookeeper-3.6.1/data # the port at which the clients will connect clientPort=2181 server.1=master1:2888:3888 server.2=slave1:2888:3888 server.3=slave2:2888:3888
在 dataDir
指定的目录下,创建 myid
文件。
1 2 mkdir vim data echo 1 > data/myid
写入对应的集群id
.
配置环境变量
添加如下配置(三台机器都要配置):
1 2 export ZK_HOME=/opt/bigdata/zookeeper-3.6.1 PATH=$PATH:$ZK_HOME/bin
执行 source /etc/profile
发送 配置好的 ZK_HOME
到 slave1
和 slave2
,可以使用 xsync
脚本同步
注意
要修改 slave1
的 myid 为 1,slave2
的 myid 为 2
启动 zookeeper
服务
使用 xsync
脚本 同时开启 zkServer
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 bennie@master1:~$ xsync cmd "$ZK_HOME/bin/zkServer.sh start" master1 execute cmd /opt/bigdata/zookeeper-3.6.1/bin/zkServer.sh start ZooKeeper JMX enabled by default Using config: /opt/bigdata/zookeeper-3.6.1/bin/../conf/zoo.cfg Starting zookeeper ... STARTED slave1 execute cmd /opt/bigdata/zookeeper-3.6.1/bin/zkServer.sh start ZooKeeper JMX enabled by default Using config: /opt/bigdata/zookeeper-3.6.1/bin/../conf/zoo.cfg Starting zookeeper ... STARTED slave2 execute cmd /opt/bigdata/zookeeper-3.6.1/bin/zkServer.sh start ZooKeeper JMX enabled by default Using config: /opt/bigdata/zookeeper-3.6.1/bin/../conf/zoo.cfg Starting zookeeper ... STARTED
查看集群状态
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 bennie@master1:~$ xsync cmd "$ZK_HOME/bin/zkServer.sh status" master1 execute cmd /opt/bigdata/zookeeper-3.6.1/bin/zkServer.sh status ZooKeeper JMX enabled by default Using config: /opt/bigdata/zookeeper-3.6.1/bin/../conf/zoo.cfg Client port found: 2181. Client address: localhost. Mode: follower slave1 execute cmd /opt/bigdata/zookeeper-3.6.1/bin/zkServer.sh status ZooKeeper JMX enabled by default Using config: /opt/bigdata/zookeeper-3.6.1/bin/../conf/zoo.cfg Client port found: 2181. Client address: localhost. Mode: leader slave2 execute cmd /opt/bigdata/zookeeper-3.6.1/bin/zkServer.sh status ZooKeeper JMX enabled by default Using config: /opt/bigdata/zookeeper-3.6.1/bin/../conf/zoo.cfg Client port found: 2181. Client address: localhost. Mode: follower
zkServer
脚本命令
1 2 3 4 5 6 7 8 9 10 11 zkServer.sh start zkServer.sh stop zkServer.sh restart zkServer.sh status
hadoop 配置我们修改 hadoop 的 env 脚本文件
在 hadoop-env.sh
添加以下内容:1 2 JAVA_HOME=/opt/java/jdk1.8.0_271 HADOOP_OPTS="$HADOOP_OPTS -Duser.timezone=GMT+08"
在 yarn-env.sh
添加以下内容:1 2 JAVA_HOME=/opt/java/jdk1.8.0_271 YARN_OPTS="$YARN_OPTS -Duser.timezone=GMT+08"
同步到其他机器1 xsync file $HADOOP_HOME /etc/hadoop/
hdfs-site.xml修改 hdfs-site.xml
文件,添加如下配置:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 <configuration > <property > <name > dfs.nameservices</name > <value > mycluster</value > </property > <property > <name > dfs.ha.namenodes.mycluster</name > <value > master1,slave1</value > </property > <property > <name > dfs.namenode.rpc-address.mycluster.master1</name > <value > master1:8020</value > </property > <property > <name > dfs.namenode.rpc-address.mycluster.slave1</name > <value > slave1:8020</value > </property > <property > <name > dfs.namenode.http-address.mycluster.master1</name > <value > master1:50070</value > </property > <property > <name > dfs.namenode.http-address.mycluster.slave1</name > <value > slave1:50070</value > </property > <property > <name > dfs.namenode.shared.edits.dir</name > <value > qjournal://slave1:8485;slave2:8485/mycluster</value > </property > <property > <name > dfs.client.failover.proxy.provider.mycluster</name > <value > org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value > </property > <property > <name > dfs.ha.fencing.methods</name > <value > sshfence</value > <value > shell(/bin/true)</value > </property > <property > <name > dfs.ha.fencing.ssh.private-key-files</name > <value > /home/bennie/.ssh/id_rsa</value > </property > <property > <name > dfs.ha.fencing.ssh.connect-timeout</name > <value > 30000</value > </property > <property > <name > dfs.journalnode.edits.dir</name > <value > /opt/bigdata/hadoop-2.7.7/journal/node/local/data</value > </property > <property > <name > dfs.permissions.enable</name > <value > false</value > </property > <property > <name > dfs.ha.automatic-failover.enabled</name > <value > true</value > </property > <property > <name > dfs.namenode.secondary.http-address</name > <value > master1:9001</value > </property > <property > <name > dfs.namenode.name.dir</name > <value > file:/opt/bigdata/hadoop-2.7.7/name</value > </property > <property > <name > dfs.datanode.data.dir</name > <value > file:/opt/bigdata/hadoop-2.7.7/data</value > </property > <property > <name > dfs.replication</name > <value > 2</value > </property > <property > <name > dfs.webhdfs.enabled</name > <value > true</value > </property > <property > <name > dfs.permissions</name > <value > false</value > </property > <property > <name > dfs.web.ugi</name > <value > supergroup</value > </property > </configuration >
core-site.xml修改 core-site.xml
文件,添加如下配置:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 <configuration > <property > <name > fs.defaultFS</name > <value > hdfs://mycluster</value > </property > <property > <name > hadoop.tmp.dir</name > <value > file:/opt/bigdata/hadoop-2.7.7/tmp</value > </property > <property > <name > ha.zookeeper.quorum</name > <value > master1:2181,slave1:2181,slave2:2181</value > </property > </configuration >
yarn-site.xml修改 yarn-site.xml
文件,添加如下配置:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 <configuration > <property > <name > yarn.nodemanager.aux-services</name > <value > mapreduce_shuffle</value > </property > <property > <name > yarn.nodemanager.aux-services.mapreduce.shuffle.class</name > <value > org.apache.hadoop.mapred.ShuffleHandler</value > </property > <property > <name > yarn.resourcemanager.ha.enabled</name > <value > true</value > </property > <property > <name > yarn.resourcemanager.cluster-id</name > <value > yarnCluster</value > <description > 集群唯一标识</description > </property > <property > <name > yarn.resourcemanager.ha.rm-ids</name > <value > rm1,rm2</value > <description > 两个RM的唯一标识</description > </property > <property > <name > yarn.resourcemanager.hostname.rm1</name > <value > master1</value > <description > 第一个RM部署在的机器名</description > </property > <property > <name > yarn.resourcemanager.hostname.rm2</name > <value > slave2</value > <description > 第二个RM部署在的机器名</description > </property > <property > <name > yarn.resourcemanager.webapp.address.rm1</name > <value > master1:8088</value > <description > 第一个RM的web ui的端口</description > </property > <property > <name > yarn.resourcemanager.webapp.address.rm2</name > <value > slave2:8088</value > <description > 第二个RM的web ui的端口</description > </property > <property > <name > yarn.resourcemanager.zk-address</name > <value > master1:2181,slave1:2181,slave2:2181</value > <description > zk的部署的主机名和端口</description > </property > <property > <name > yarn.resourcemanager.recovery.enabled</name > <value > true</value > </property > </configuration >
启动 HDFS清空所有节点的 data
,name
目录
1 2 xsync cmd 'rm -rf $HADOOP_HOME/data' xsync cmd 'rm -rf $HADOOP_HOME/name'
启动集群的 JournalNode
1 2 3 bennie@master1:~$ $HADOOP_HOME /sbin/hadoop-daemons.sh start journalnode slave2: starting journalnode, logging to /opt/bigdata/hadoop-2.7.7/logs/hadoop-bennie-journalnode-slave2.out slave1: starting journalnode, logging to /opt/bigdata/hadoop-2.7.7/logs/hadoop-bennie-journalnode-slave1.out
在 master1
节点格式化 NameNode
1 2 hdfs namenode -format $HADOOP_HOME /sbin/hadoop-daemon.sh start namenode
namenode 格式化结果中出现 has been successfully formatted.说明格式化成功了。
在 slave1
节点进行同步。
1 hdfs namenode -bootstrapStandby
在 master1
初始化 zkfc
结果中出现 Successfully created /hadoop-ha/hdfs1 in ZK. 说明 ZK 格式化成功!
停止已启动的 hdfs
1 $HADOOP_HOME /sbin/stop-dfs.sh
启动 hdfs
服务
1 2 3 4 5 6 7 8 9 10 11 12 13 bennie@master1:~$ $HADOOP_HOME /sbin/start-dfs.sh Starting namenodes on [master1 slave1] master1: starting namenode, logging to /opt/bigdata/hadoop-2.7.7/logs/hadoop-bennie-namenode-master1.out slave1: starting namenode, logging to /opt/bigdata/hadoop-2.7.7/logs/hadoop-bennie-namenode-slave1.out slave2: starting datanode, logging to /opt/bigdata/hadoop-2.7.7/logs/hadoop-bennie-datanode-slave2.out slave1: starting datanode, logging to /opt/bigdata/hadoop-2.7.7/logs/hadoop-bennie-datanode-slave1.out Starting journal nodes [slave1 slave2] slave2: starting journalnode, logging to /opt/bigdata/hadoop-2.7.7/logs/hadoop-bennie-journalnode-slave2.out slave1: starting journalnode, logging to /opt/bigdata/hadoop-2.7.7/logs/hadoop-bennie-journalnode-slave1.out Starting ZK Failover Controllers on NN hosts [master1 slave1] master1: starting zkfc, logging to /opt/bigdata/hadoop-2.7.7/logs/hadoop-bennie-zkfc-master1.out slave1: starting zkfc, logging to /opt/bigdata/hadoop-2.7.7/logs/hadoop-bennie-zkfc-slave1.out
jps
查看每台机器的 java
进程
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 bennie@master1:~$ xsync cmd $JAVA_HOME /bin/jps master1 execute cmd /opt/java/jdk1.8.0_271/bin/jps 7909 DFSZKFailoverController 1530 QuorumPeerMain 7595 NameNode 8015 Jps slave1 execute cmd /opt/java/jdk1.8.0_271/bin/jps 12469 DFSZKFailoverController 8805 QuorumPeerMain 12550 Jps 12216 DataNode 12329 JournalNode 12125 NameNode slave2 execute cmd /opt/java/jdk1.8.0_271/bin/jps 8867 JournalNode 8759 DataNode 7227 QuorumPeerMain 8956 Jps
浏览器输入 http://master1:50070
查看 master1
的 namenode
节点状态
浏览器输入 http://slave1:50070
查看 slave1
的 namenode
节点状态
然后我们 kill master1
的 namenode
, 再去看下 slave1
的 namenode
节点状态
此时发现slave1
已经切换为 active
状态。
启动 Yarn1 2 3 4 5 6 bennie@master1:~$ $HADOOP_HOME /sbin/start-yarn.sh starting yarn daemons starting resourcemanager, logging to /opt/bigdata/hadoop-2.7.7/logs/yarn-bennie-resourcemanager-master1.out slave2: starting nodemanager, logging to /opt/bigdata/hadoop-2.7.7/logs/yarn-bennie-nodemanager-slave2.out slave1: starting nodemanager, logging to /opt/bigdata/hadoop-2.7.7/logs/yarn-bennie-nodemanager-slave1.out
到 slave2
节点启动 resourcemanager
1 2 3 bennie@slave2:~$ $HADOOP_HOME /sbin/yarn-daemon.sh start resourcemanager starting resourcemanager, logging to /opt/bigdata/hadoop-2.7.7/logs/yarn-bennie-resourcemanager-slave2.out
查询主备状态
1 2 3 4 5 bennie@slave2:~$ yarn rmadmin -getServiceState rm1 active bennie@slave2:~$ yarn rmadmin -getServiceState rm2 standby
手工切换主备状态
1 2 yarn rmadmin -transitionToActive rm1 yarn rmadmin -transitionToStandby rm1
对于自动切换模式,可以强制手工切换
1 2 yarn rmadmin -transitionToActive rm1 --forcemanual yarn rmadmin -transitionToStandby rm1 --forcemanual
kill
杀掉 master1
的 resourcemanager
进程,再次查看 rm2
的状态
1 2 3 4 5 6 7 8 9 bennie@master1:~$ jps 1530 QuorumPeerMain 11291 DFSZKFailoverController 11389 NameNode 12526 Jps 12223 ResourceManager bennie@master1:~$ kill 12223 bennie@master1:~$ yarn rmadmin -getServiceState rm2 active
参考文献Hadoop NameNode 高可用 (High Availability) 实现解析