怎样配置hadoop 自启动 centos 环境下
发布网友
发布时间:2022-04-21 17:41
我来回答
共1个回答
热心网友
时间:2022-04-12 13:08
序号 名称 描述
1 系统环境 Centos6.5
2 JAVA环境 JDK1.7
3 Haoop版本 hadoop2.2.0
安装步骤如下:
序号 步骤
1 解压hadoop,并配置环境变量
2 使用which hadoop命令,测试是否成功
3 配置core-site.xml
4 配置hdfs-site.xml
5 配置yarn-site.xml(非必须,使用默认也可)
6 配置mapred-site.xml
7 配置mapred-env.sh里面的JAVA路径
8 如上的配置完成后,需要新建几个文件夹,来提供HDFS的格式化底层一个是hadoop的tmp文件夹,另外的是namenode和datanode的文件夹,来分别存储各自的信息
9 上面一切完成后,即可执行hadoop namenode -format 进行格式化
10 然后启动伪分布式集群,执行命令sbin/start-all.sh 最后使用JPS查看JAVA进程
11 配置本机的hosts文件,映射主机名信息
下面来详细说下,各个步骤的具体操作:
在这之前,最好配置好本机的SSH无密码登陆操作,避免hadoop进程之间通信时,需要验证。
下面是各个xml文件的配置
<!-- core-site.xml配置 -->
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://192.168.46.28:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/root/hadoop/tmp</value>
</property>
</configuration>
<!-- hdfs-site.xml配置 -->
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/root/hadoop/nddir</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/root/hadoop/dddir</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
</configuration>
<!-- yarn-site.xml 不用配置,使用默认属性即可 -->
<configuration>
</configuration>
<!-- mapred-site.xml的配置 -->
<configuration>
<property>
<name>maprece.cluster.temp.dir</name>
<value></value>
<description>No description</description>
<final>true</final>
</property>
<property>
<name>maprece.cluster.local.dir</name>
<value></value>
<description>No description</description>
<final>true</final>
</property>
</configuration>
mapred-env.sh里面的配置
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
export JAVA_HOME=/usr/local/jdk
export HADOOP_JOB_HISTORYSERVER_HEAPSIZE=1000
export HADOOP_MAPRED_ROOT_LOGGER=INFO,RFA
#export HADOOP_JOB_HISTORYSERVER_OPTS=
#export HADOOP_MAPRED_LOG_DIR="" # Where log files are stored. $HADOOP_MAPRED_HOME/logs by default.
#export HADOOP_JHS_LOGGER=INFO,RFA # Hadoop JobSummary logger.
#export HADOOP_MAPRED_PID_DIR= # The pid files are stored. /tmp by default.
#export HADOOP_MAPRED_IDENT_STRING= #A string representing this instance of hadoop. $USER by default
#export HADOOP_MAPRED_NICENESS= #The scheling priority for daemons. Defaults to 0.
然后,建对应的几个文件夹即可,路径都在hadoop2.2的根目录下即可,
然后执行格式化命令bin/hadoop namenode -format即可;
最后使用jps命令,查看进程,如果有以下几个进程名称,就代表部署伪分布式成功
4887 NodeManager
4796 ResourceManager
4661 SecondaryNameNode
4524 DataNode
4418 NameNode
6122 Jps
然后,访问界面端口,注意与之前的配置文件里一致,namenode的端口号仍为50070,原来1.x的50030端口,已经没有了,可以访问8088的yarn的端口,来查看,具体的截图如下:
OK,此时,我们已经成功的完成伪分布式的部署,下一步我们要做的,就是跑一个经典的MR程序Hellow World,来测试我们的集群了。