Hadoop完全分布式部署

一、规划

192.168.10.135 Master

192.168.10.132 Slave1

192.168.10.133 Slave2

注:均已关闭SELinux和Firewalld.

二、部署前准备

a. 添加Hadoop用户并设置密码

# useradd hadoop
# passwd --stdin hadoop

b. 添加sudo权限

# ls -la /etc/sudoers
# chmod u+w /etc/sudoers       ##添加写权限
# vi /etc/sudoers
98 root    ALL=(ALL)       ALL
99 hadoop  ALL=(ALL)       ALL         ##添加Hadoop用户
# chmod u-w /etc/sudoers        ##撤销写权限

c. 安装相关软件

$ yum -y install java-1.7.0-openjdk java-1.7.0-openjdk-devel rsync openssh-server openssh-clients
$ java -version
java version "1.7.0_91"
OpenJDK Runtime Environment (rhel-2.6.2.3.el7-x86_64 u91-b00)
OpenJDK 64-Bit Server VM (build 24.91-b01, mixed mode)

d. 配置ssh免密登录

Master:

# su -l hadoop                ##切换到Hadoop用户
$ ssh-keygen -t rsa -P ""         ##生成密钥对
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys     ##把id_rsa.pub追加到授权的key里面去
$ chmod 600 ~/.ssh/authorized_keys     ##修改权限
$ scp ~/.ssh/id_rsa.pub [email protected]:~/
$ scp ~/.ssh/id_rsa.pub [email protected]:~/

Slave:

# su -l hadoop                  ##切换到Hadoop用户
$ mkdir ~/.ssh                   
$ chmod 700 ~/.ssh
$ cat ~/id_rsa.pub >> ~/.ssh/authorized_keys       ##追加到授权文件"authorized_keys"
$ chmod 600 ~/.ssh/authorized_keys     ##修改权限
$ rm ~/id_rsa.pub               ##删除公钥

在master上进行测试

$ ssh localhost      
$ ssh slave1
$ ssh slave2

三、Hadoop部署

在hadoop用户登录的环境中进行下列操作:

a. 下载Hadoop

$ wget http://apache.fayea.com/hadoop/common/hadoop-2.7.1/hadoop-2.7.1.tar.gz

b. 解压并安装

$ tar -zxvf hadoop-2.7.1.tar.gz
$ sudo mv hadoop-2.7.1 /usr/local/hadoop
$ sudo chown -R hadoop:hadoop /usr/local/hadoop/

c. 配置环境变量

$ vi /home/hadoop/.bashrc
# Hadoop Start
export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk
export HADOOP_HOME=/usr/local/hadoop
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:
# Hadoop End
$ source /home/hadoop/.bashrc

d. 添加hosts

$ sudo vi /etc/host
192.168.10.135 Master
192.168.10.132 Slave1
192.168.10.133 Slave2

e. 配置Master

①配置core-site.xml:

$ sudo vi /usr/local/hadoop/etc/hadoop/core-site.xml

        
                fs.defaultFS
                hdfs://Master:9000
        
        
                io.file.buffer.size
                4096
        

注:

属性”fs.defaultFS“表示文件系统默认名称节点(即NameNode节点)地址,由”hdfs://主机名(或ip):端口号”组成;

属性“io.file.buffer.size”表示SequenceFiles在读写中可以使用的缓存大小,可减少I/O次数。

②配置hdfs-site.xml:

$ sudo vi /usr/local/hadoop/etc/hadoop/hdfs-site.xml

       
                dfs.namenode.name.dir
                /usr/local/hadoop/dfs/name
        
        
                dfs.datanode.data.dir
                /usr/local/hadoop/dfs/data
        
        
                dfs.replication
                2
        
        
        dfs.namenode.secondary.http-address
        Master:50090
        

注:

属性”dfs.namenode.name.dir“表示NameNode存储命名空间和操作日志相关的元数据信息的本地文件系统目录,该项默认本地路径为”/tmp/hadoop-{username}/dfs/name“;

属性”dfs.datanode.data.dir“表示DataNode节点存储HDFS文件的本地文件系统目录,由”file://本地目录”组成,该项默认本地路径为”/tmp/hadoop-{username}/dfs/data”;

属性“dfs.replication”表示分布式文件系统的数据块复制份数,有几个datanode节点就复制几份;

属性“dfs.namenode.secondary.http-address”表示SecondNameNode主机及端口号(如果无需额外指定SecondNameNode角色,可以不进行此项配置)

③配置mapred-site.xml:

$ sudo cp /usr/local/hadoop/etc/hadoop/mapred-site.xml.template /usr/local/hadoop/etc/hadoop/mapred-site.xml
$ sudo vi /usr/local/hadoop/etc/hadoop/mapred-site.xml

        
                mapreduce.framework.name
                yarn
                true
        
        
        mapreduce.jobhistory.address
        Master:10020
    
    
        mapreduce.jobhistory.webapp.address
        Master:19888
    

注:

属性”mapreduce.framework.name“表示执行mapreduce任务所使用的运行框架,默认为local,需要将其改为”yarn”。

④配置yarn-site.xml:

$ sudo vi /usr/local/hadoop/etc/hadoop/yarn-site.xml


   
            yarn.acl.enable
               false
   
   
            yarn.admin.acl
                *
   
    
          yarn.log-aggregation-enable
                  false
    
    
     yarn.nodemanager.aux-services.mapreduce.shuffle.class
            org.apache.hadoop.mapred.ShuffleHandler
    
    
              yarn.resourcemanager.address
                 Master:8032
    
    
            yarn.resourcemanager.scheduler.address
                     Master:8030
    
    
      yarn.resourcemanager.resource-tracker.address
                Master:8035
    
    
            yarn.resourcemanager.admin.address
                   Master:8033
    
    
          yarn.resourcemanager.webapp.address
                  Master:8088
   
  
            yarn.resourcemanager.hostname
                  Master
  
   
           yarn.nodemanager.aux-services
              mapreduce_shuffle
  

注:

属性”yarn.nodemanager.aux-service“表示MR applicatons所使用的shuffle工具类

⑤指定JAVA_HOME安装目录

$ sudo vi /usr/local/hadoop/etc/hadoop/hadoop-env.sh
26 export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk

⑥指定集群中的master节点(NameNode、ResourceManager)所拥有的slaver节点

$ sudo vi /usr/local/hadoop/etc/hadoop/slaves
Slave1
Slave2

⑦向Slave复制Hadoop

$ scp -r /usr/local/hadoop slave1:/usr/local/
$ scp -r /usr/local/hadoop slave2:/usr/local/

四、运行Hadoop

a. 格式化分布式文件系统

$ hdfs namenode -format
15/12/21 12:23:49 INFO namenode.NameNode: STARTUP_MSG: 
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = Master/192.168.10.135     ##主机
STARTUP_MSG:   args = [-format]               ##格式化
STARTUP_MSG:   version = 2.7.1                ##Hadoop版本号
......
STARTUP_MSG:   java = 1.7.0_91               ##Java版本号
************************************************************/
......
15/12/21 12:24:21 INFO namenode.NameNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at Master/192.168.10.135
************************************************************/

b. 启动Hadoop

$ start-all.sh

c. 检测守护进程启动情况

①查看master后台java进程(前面是进程号,后面是进程)

Master:

$ jps
9863 NameNode
10459 Jps
10048 SecondaryNameNode
10202 ResourceManager

Slave:

# jps
2217 NodeManager
2138 DataNode
2377 Jps

②查看DFS使用状况

$ hadoop dfsadmin -report
Configured Capacity: 39631978496 (36.91 GB)
Present Capacity: 33985548288 (31.65 GB)
DFS Remaining: 33985531904 (31.65 GB)
DFS Used: 16384 (16 KB)
DFS Used%: 0.00%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0
-------------------------------------------------
Live datanodes (2):
Name: 192.168.10.132:50010 (Slave1)
Hostname: Slave1
Decommission Status : Normal
Configured Capacity: 19815989248 (18.46 GB)
DFS Used: 8192 (8 KB)
Non DFS Used: 2777395200 (2.59 GB)
DFS Remaining: 17038585856 (15.87 GB)
DFS Used%: 0.00%
DFS Remaining%: 85.98%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Tue Dec 22 19:27:27 CST 2015
Name: 192.168.10.133:50010 (Slave2)
Hostname: Slave2
Decommission Status : Normal
Configured Capacity: 19815989248 (18.46 GB)
DFS Used: 8192 (8 KB)
Non DFS Used: 2869035008 (2.67 GB)
DFS Remaining: 16946946048 (15.78 GB)
DFS Used%: 0.00%
DFS Remaining%: 85.52%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Tue Dec 22 19:27:27 CST 2015

③检查是否运行成功

a. 浏览器输入 http:// localhost:8088 进入ResourceManager管理页面

b.浏览器输入 http://localhost:50070 进入HDFS页面

五、测试验证

a. 首先创建相关文件夹(要一步一步的创建):

$ hadoop dfs -mkdir /user
$ hadoop dfs -mkdir /user/hadoop
$ hadoop dfs -mkdir /user/hadoop/input

b.建立测试文件

$ vi test.txt

hello hadoop

hello World

Hello Java

CentOS System

c. 将测试文件放到测试目录中

$ hadoop dfs -put test.txt /user/hadoop/input

d.执行Wordcount程序

$ cd /usr/local/hadoop/
$ hadoop jar share/hadoop/mapreduce/sources/hadoop-mapreduce-examples-2.7.1-sources.jar org.apache.hadoop.examples.WordCount /user/hadoop/input /user/hadoop/output

f. 查看生成的单词统计数据

$ cd /usr/local/hadoop/
$ hadoop dfs -ls /user/hadoop/output
Found 2 items
-rw-r--r--   2 hadoop supergroup          0 2015-12-22 19:54 /user/hadoop/output/_SUCCESS
-rw-r--r--   2 hadoop supergroup         58 2015-12-22 19:53 /user/hadoop/output/part-r-00000
$ hadoop dfs -cat /user/hadoop/output/part-r-00000
CentOS1
Hello1
Java1
System1
World1
hadoop1
hello2

详情: http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-common/ClusterSetup.html

稿源:记事本-51CTO技术博客 (源链) | 关于 | 阅读提示

本站遵循[CC BY-NC-SA 4.0]。如您有版权、意见投诉等问题,请通过eMail联系我们处理。
酷辣虫 » 后端存储 » Hadoop完全分布式部署

喜欢 (0)or分享给?

专业 x 专注 x 聚合 x 分享 CC BY-NC-SA 4.0

使用声明 | 英豪名录