希望通过Dockerfile能做到一次编写,多次构建运行,方便其他同学快速搭建hadoop的学习环境。
docker
jdk-8u261-linux-x64(https://www.oracle.com/java/technologies/javase/javase8u211-later-archive-downloads.html)
hadoop-3.2.2(https://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/hadoop-3.2.2/hadoop-3.2.2.tar.gz)
docker pull centos:8
FROM centos:8# 安装必要软件RUN yum install -y net-tools which openssh-clients openssh-server iproute.x86_64 wget passwd vim \ # 修改root密码 && echo "root:root" | chpasswd \ # 添加root用户到sudoers,允许使用sudo命令 && echo "root ALL=(ALL) ALL" >> /etc/sudoers \ # ssh用的公私钥认证,需要把PAM认证中的UsePAM yes改为no && sed -i 's/UsePAM yes/UsePAM no/g' /etc/ssh/sshd_config \ # 创建ssh所需的文件夹/var/run/sshd,否则ssh启动不起来 && mkdir /var/run/sshd \ # 生成无密码的公私密钥对 && ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa \ # 公钥添加授权文件authorized_keys中 && cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys \ # 设置初次ssh登录时,无需yes确认登陆 && echo "StrictHostKeyChecking no" >~/.ssh/config \ && echo "alias ll='ls -l'" >> ~/.bashrc \ && source ~/.bashrcEXPOSE 22CMD ["/usr/sbin/sshd", "-D"]
docker build -t centos8-ssh:1.0 .
core-site.xml
<?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?><configuration> <!-- 指定 NameNode 的地址 --> <property> <name>fs.defaultFS</name> <value>hdfs://nn:8020</value> </property> <!-- 指定 hadoop 数据的存储目录 --> <property> <name>hadoop.tmp.dir</name> <value>/usr/local/hadoop/hadoop-3.2.2/data</value> </property></configuration>
hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?><configuration> <!-- nn web 端访问地址--> <property> <name>dfs.namenode.http-address</name> <value>nn:9870</value> </property> <!-- 2nn web 端访问地址--> <property> <name>dfs.namenode.secondary.http-address</name> <value>dn2:9868</value> </property></configuration>
yarn-site.xml
<?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?><configuration> <!-- 指定 MR 走 shuffle --> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <!-- 指定 ResourceManager 的地址--> <property> <name>yarn.resourcemanager.hostname</name> <value>dn1</value> </property> <!-- 环境变量的继承 --> <property> <name>yarn.nodemanager.env-whitelist</name> <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value> </property></configuration>
workers文件
nndn1dn2
FROM centos8-ssh:1.0# software文件夹一定要和Dockerfile在同一个目录ARG software_home=softwareRUN mkdir $software_homeCOPY $software_home/* $software_home/RUN mkdir /usr/local/java/ \ && tar -xvf $software_home/jdk-8u261-linux-x64.tar.gz -C /usr/local/java/ \ && mkdir /usr/local/hadoop/ \ && tar -xvf $software_home/hadoop-3.2.2.tar.gz -C /usr/local/hadoop/ \ && echo "export JAVA_HOME=/usr/local/java/jdk1.8.0_261" >> ~/.bashrc \ && source ~/.bashrc \ && echo "export JRE_HOME=$JAVA_HOME/jre" >> ~/.bashrc \ && source ~/.bashrc \ && echo "export PATH=$JAVA_HOME/bin:$PATH:$JRE_HOME/bin" >> ~/.bashrc \ && echo "export HADOOP_HOME=/usr/local/hadoop/hadoop-3.2.2" >> ~/.bashrc \ && source ~/.bashrc \ && echo "export HADOOP_CONFIG_HOME=$HADOOP_HOME/etc/hadoop" >> ~/.bashrc \ && echo "export PATH=$PATH:$HADOOP_HOME/bin" >> ~/.bashrc \ && source ~/.bashrc \ && echo "export PATH=$PATH:$HADOOP_HOME/sbin" >> ~/.bashrc \ && source ~/.bashrc \ && rm -rf $HADOOP_CONFIG_HOME/core-site.xml \ && rm -rf $HADOOP_CONFIG_HOME/hdfs-site.xml \ && rm -rf $HADOOP_CONFIG_HOME/yarn-site.xml \ && rm -rf $HADOOP_CONFIG_HOME/workers \ && rm -rf $HADOOP_HOME/sbin/start-dfs.sh \ && rm -rf $HADOOP_HOME/sbin/start-yarn.sh \ && rm -rf $HADOOP_HOME/sbin/stop-dfs \ && rm -rf $HADOOP_HOME/sbin/stop-yarn.sh \ && cp $software_home/core-site.xml $HADOOP_CONFIG_HOME \ && cp $software_home/hdfs-site.xml $HADOOP_CONFIG_HOME \ && cp $software_home/yarn-site.xml $HADOOP_CONFIG_HOME \ && cp $software_home/workers $HADOOP_CONFIG_HOME \ && cp $software_home/start-dfs.sh $HADOOP_HOME/sbin \ && cp $software_home/start-yarn.sh $HADOOP_HOME/sbin \ && cp $software_home/stop-dfs.sh $HADOOP_HOME/sbin \ && cp $software_home/stop-yarn.sh $HADOOP_HOME/sbin \ && chmod 777 $HADOOP_HOME/sbin/*.sh \ && chmod 777 $HADOOP_HOME/bin/*
构建镜像前,需要在Dockerfile同级下创建software文件,以下文件拷贝到本目录下:
core-site.xml hdfs-site.xmlyarn-site.xmlworkersstart-dfs.shstart-yarn.sh stop-dfs.sh stop-yarn.shhadoop-3.2.2.tar.gz jdk-8u261-linux-x64.tar.gz
构建镜像:
docker build -t hqg-hadoop:1.0 .
docker network create --subnet=172.20.0.0/16 hnet
docker run -itd -p 9870:9870 --name=nn --hostname=nn --network=hnet --ip=172.20.1.0 --add-host=dn1:172.20.1.1 --add-host=dn2:172.20.1.2 --privileged hqg-hadoop:1.0 /usr/sbin/init
docker run -itd -p 8088:8088 --name=dn1 --hostname=dn1 --network=hnet --ip=172.20.1.1 --add-host=nn:172.20.1.0 --add-host=dn2:172.20.1.2 --privileged hqg-hadoop:1.0 /usr/sbin/init
docker run -itd --name=dn2 --hostname=dn2 --network=hnet --ip=172.20.1.2 --add-host=nn:172.20.1.0 --add-host=dn1:172.20.1.1 --privileged hqg-hadoop:1.0 /usr/sbin/init
hdfs namenode -format
在NameNode节点上,执行 sbin/start-dfs.sh。
对于root用户,需要在start-dfs.sh,stop-dfs.sh顶部添加以下:
#!/usr/bin/env bashHDFS_DATANODE_USER=rootHADOOP_SECURE_DN_USER=hdfsHDFS_NAMENODE_USER=rootHDFS_SECONDARYNAMENODE_USER=root
在ResourceManager节点上,执行sbin/start-yarn.sh。
对于root用户,需要在start-yarn.sh,stop-yarn.sh顶部添加以下:
#!/usr/bin/env bashYARN_RESOURCEMANAGER_USER=rootHADOOP_SECURE_DN_USER=yarnYARN_NODEMANAGER_USER=root
查看HDFS的NameNode节点 http://虚拟机/物理机ip:9870/
查看YARN的ResourceManager http://虚拟机/物理机ip:8088/