I prepared a physical server for this installation. I installed CentOS 5.6 and xen hypervisor which meant configured to use five VMs for all hadoop nodes.
In my case, there were a master namenode, a secondary namenode and three datanodes. I allocated 1GB of RAM and 100GB storage for each. I didn't consider its performance factor in the environment, because I had a priority on better understand how it worked.
1. Add cloudera repository file in /etc/yum.repos.d/ directory.
If don't have any cloudera repo file there, you can create new file. for example, you create a file named "cloudera-cdh3.repo" and save following lines in it.
[cloudera-cdh3]
name=Cloudera's Distribution for Hadoop, Version 3
mirrorlist=http://archive.cloudera.com/redhat/cdh/3/mirrors
gpgkey = http://archive.cloudera.com/redhat/cdh/RPM-GPG-KEY-cloudera
gpgcheck = 1
Now, you can search and install hadoop packages via yum
$ yum search hadoop-0.20
$ yum install hadoop-0.20
2. Install components
hadoop install file is composed of several demon types.
- namenode
- datanode
- secondarynamenode
- jobtracker
- tasktraker
You can execute yum like below:
$ yum install hadoop-0.20-<demon type>
* Before start installation, it'd good to create a user who control hadoop for the security reason. I made a user named "huser" to give privileges of the job related to hadoop.
# Add a user
$ useradd huser
# Allows members of "huser" group to run superuser command.
$ vi /etc/sudoers
%huser ALL=(ALL) NOPASSWD: ALL
# Check /etc/hosts, (Never edit the first line.)
127.0.0.1 localhost.localdomain localhost
XXX.XXX.XXX.171 name01.hadoop.com name01
XXX.XXX.XXX.172 name02.hadoop.com name02
XXX.XXX.XXX.173 node01.hadoop.com node01
XXX.XXX.XXX.174 node02.hadoop.com node02
XXX.XXX.XXX.175 node03.hadoop.com node03
# Generate SSH key on master node and slave nodes
$ ssh-keygen -t rsa
generating public/private rsa key pair.
Enter file in which to save the key (~/.ssh/id_rsa):
Creating directory '~/.ssh'.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your public key has been saved in ~/.ssh/id_rsa.pub.
The key fingerprint is:
$ cp ~/.ssh/id_rsa.pub ~/.ssh/authorized_keys
# Copy SSH key of master node to all salves (here, it incluses secondary namenode and datanodes)
$ scp ~/.ssh/id_rsa.pub
2.1) On the namenode:
$ sudo yum install hadoop-0.20
$ sudo yum install hadoop-0.20-namenode
$ sudo yum install hadoop-0.20-jobtracker
2.2) On the datanode:
$ sudo yum install hadoop-0.20
$ sudo yum install hadoop-0.20-datanode
$ sudo yum install hadoop-0.20-tasktracker
2.3) On the secondarynamenode
$ yum install hadoop-0.20
Originally, Although I had to install secondarynamenode deamon on this node, it worked well without installing this module.
After installation, directories are followings:
- Hadoop home: /usr/lib/hadoop-0.20
- JDK home: /usr/java/jdk1.6.0_29
$ sudo chown -R huser:huser /usr/lib/hadoop-0.20
$ sudo chown -R huser:huser /etc/hadoop-0.20/conf
$ sudo chown -R huser:huser /var/log/hadoop-0.20
$ sudo chown -R huser:huser /var/run/hadoop-0.20
It should create hadoop directory for hdfs and MapReduce
$ mkdir /hadoop
3. Hadoop configuration
Go to the directory /usr/lib/hadoop-0.20/conf and modify config files of haoop.
3.1) hadoop-env.sh
export JAVA_HOME=/usr/java/jdk1.6.0_29
export HADOOP_HOME=/usr/lib/hadoop-0.20
export HADOOP_LOG_DIR=${HADOOP_HOME}/logs
export HADOOP_SLAVES=${HADOOP_HOME}/conf/slaves
export HADOOP_PID_DIR=${HADOOP_HOME}/pids
3.2) core-site.xml
<property>
<name>fs.default.name</name>
<value>hdfs://name01.hadoop.com:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/hadoop/tmp</value>
</property>
3.3) hdfs-site.xml
<property>
<name>dfs.name.dir</name>
<value>/hadoop/dfs/name</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
3.4) mapred-site.xml
<property>
<name>mapred.job.tracker</name>
<value>name01.hadoop.com:9001</value>
</property>
<property>
<name>mapred.local.dir</name>
<value>/hadoop/mapred/local</value>
</property>
</property>
<name>mapred.system.dir</name>
<value>/hadoop/mapred/system</value>
</property>
3.5) slaves
node01
node02
node03
3.6) masters
name02
4. Format namenode
$hadoop namenode --format
5. Start hadoop on the master namenode
Go to /usr/lib/hadoop-0.20/bin and then execute ./start-all.sh
$ ./start-all.sh
starting namenode, logging to .....
node01: starting datanode, logging to ......
node03: starting datanode, logging to ......
node02: starting datanode, logging to ......
name02: starting secondarynamenode, logging to .....
starting jobtracker, logging to .....
node01: starting tasktracker, logging to ....
node02: starting tasktracker, logging to ....
node03: starting tasktracker, logging to ....
# View MapReduce job on the web browser
http://<ip address>:50030
# View HDFS on the web browser
http://<ip address>:50070
You said, "It should create hadoop directory for hdfs and MapReduce."
답글삭제What is "it?"
It meant that a directory named "hadoop" needed to be created.
답글삭제Regards,
Yeonki