Thursday, 10 January 2013

Hadoop (HDFS) version upgrade without losing data

 We have achieved upgrading the version from

 Hadoop 0.20.205.0 to ==> hadoop 1.0.3

 Hbase 0.90.4 to      ==> hbase 0.94.1

 

 We have followed the following steps and hope it helps

 

 Before upgrading the HDFS make sure existing cluster is working fine and filesystem is Healthy.

 1.       Stop all client applications running on the MapReduce cluster.

 stop-mapred.sh

 2.       kill any orphaned task process on the TaskTrackers.

 3.       Perform a filesystem check:

 hadoop fsck / -files -blocks -locations  dfs-v-old-fsck-1.log

 4.       Save a complete listing of the HDFS namespace to a local file.

 hadoop dfs -lsr /  dfs-v-old-lsr-1.log

 5.       Create a list of DataNodes participating in the cluster.

 hadoop dfsadmin -report  dfs-v-old-report-1.log.

 6.    stop and restart HDFS cluster( To create an checkpoint of the old version)

 stop-dfs.sh

 start-dfs.sh

 7.       Before stop the dfs take the backup of the Data Directory specified for storing image and other files of the HDFS

         (name specified in conf/hdfs-site.xml for <namedfs.data.dir</name property)

 8.       stop the hdfs cluster.

 stop-dfs.sh            

 After you have installed the new Hadoop version

 1.       Change the following files to redirect

 conf/slaves , conf/masters, conf/core-site.xml , conf/hdfs-site.xml, conf/mapred-site.xml

 2.       Start the actual HDFS upgrade process.

 hadoop-daemon.sh start namenode –upgrade

 3.       Check the upgrade process status

 hadoop dfsadmin -upgradeProgress status this should give you

 Upgrade for version –(new version_no) has been completed.

 Upgrade is not finalized.

 4.       Compare the namespace log by taking the new log.

 hadoop dfs -lsr /  dfs-v-new-lsr-0.log

 Compare it with old

 5.       Perform a filesystem check

 hadoop fsck / -files -blocks -locations  dfs-v-new-fsck-1.log

 and compare it with old
 6. Create list of DataNodes participating in the cluster.

    hadoop dfsadmin -report  dfs-v-old-report-1.log.

     and compare it with old

 7.       Start the HDFS cluster

 start-dfs.sh

 8.       Start the MapReduce cluster

 start-mapred.sh

 9.       Finalize the upgrade

        hadoop dfsadmin –finalizeUpgrade