The Hadoop pseudo setup is based on this doc:
In a nutshell..
1) Make sure your Hadoop pseudo config is installed (here is a $ sudo yum install hadoop-0.20-conf-pseudo
2) I had some problems where /var directories important for Hadoop did not exist. So make sure these directories exist and are writeable:
lrwxrwxrwx 1 root root 20 Apr 8 2011 /usr/lib/hadoop-0.20/pids -> /var/run/hadoop-0.20
lrwxrwxrwx 1 root root 20 Apr 8 2011 /usr/lib/hadoop-0.20/logs -> /var/log/hadoop-0.20 ll /var/lock/subsys/
drwxrwxrwx 2 root root 4096 Dec 22 00:41 subsys
I took the easy way out and just chmod'd them:
$ sudo chmod 777 /var/run/hadoop-0.20 /var/log/hadoop-0.20
After which, you should be able to start the services:
linux-z6tw:/var/log # for service in /etc/init.d/hadoop-0.20-*; do sudo $service start; done
Starting Hadoop datanode daemon (hadoop-datanode): done
starting datanode, logging to /usr/lib/hadoop-0.20/logs/hadoop-hadoop-datanode-linux-z6tw.out
Starting Hadoop jobtracker daemon (hadoop-jobtracker): done
starting jobtracker, logging to /usr/lib/hadoop-0.20/logs/hadoop-hadoop-jobtracker-linux-z6tw.out
Starting Hadoop namenode daemon (hadoop-namenode): done
starting namenode, logging to /usr/lib/hadoop-0.20/logs/hadoop-hadoop-namenode-linux-z6tw.out
Starting Hadoop secondarynamenode daemon (hadoop-secondarynamenode): done
starting secondarynamenode, logging to /usr/lib/hadoop-0.20/logs/hadoop-hadoop-secondarynamenode-linux-z6tw.out
Starting Hadoop tasktracker daemon (hadoop-tasktracker): done
starting tasktracker, logging to /usr/lib/hadoop-0.20/logs/hadoop-hadoop-tasktracker-linux-z6tw.out
Always look at the logs to make sure all the daemons are working:
sodo@linux-z6tw:/var/log/hadoop-0.20> ls -ltr
-rw-r--r-- 1 mapred mapred 394764 Dec 27 14:25 hadoop-hadoop-jobtracker-linux-z6tw.log
-rw-r--r-- 1 hdfs hdfs 789914 Dec 27 14:25 hadoop-hadoop-namenode-linux-z6tw.log
-rw-r--r-- 1 hdfs hdfs 536726 Dec 27 14:25 hadoop-hadoop-datanode-linux-z6tw.log
-rw-r--r-- 1 mapred mapred 2524526 Dec 27 14:25 hadoop-hadoop-tasktracker-linux-z6tw.log
Also, view the name node and job tracker status interfaces as outlined here:
One of them being the jobtracker:
Estimated value of Pi is 3.14118000000000000000
1) Hadoop may hang if you have an incorrect /etc/hosts entry
Since I didn't have a DHCP reservation for my machine's IP, the IP address changed and the name node was sending packets out my gateway. Hardcoding an /etc/hosts entry fixed this.
2) 11/05/02 23:59:47 WARN hdfs.DFSClient: DataStreamer Exception: org.apache.hadoop.ipc.RemoteException: File /blah/blah could only be replicated to 0 nodes
Stupidly, I built my root filesystem with only 8GB of space. So the data node ran out of space when it tried to run any Hadoop job. I got the above error when that happened.
Hadoop DFSadmin utility is good for diagnosing issues like the above:
sodo@linux-z6tw:/var/log/hadoop-0.20> hadoop dfsadmin -report
Configured Capacity: 33316270080 (31.03 GB)
Present Capacity: 26056040448 (24.27 GB)
DFS Remaining: 21374869504 (19.91 GB)
DFS Used: 4681170944 (4.36 GB)
DFS Used%: 17.97%
Under replicated blocks: 9
Blocks with corrupt replicas: 0
Missing blocks: 6
Datanodes available: 1 (1 total, 0 dead)
Decommission Status : Normal
Configured Capacity: 33316270080 (31.03 GB)
DFS Used: 4681170944 (4.36 GB)
Non DFS Used: 7260229632 (6.76 GB)
DFS Remaining: 21374869504(19.91 GB)
DFS Used%: 14.05%
DFS Remaining%: 64.16%
Last contact: Tue Dec 27 15:11:27 EST 2011
The resolution was to:
1) add a new filesystem to my virtual machine
2) moved the data node /tmp directory to larger filesystem
3) also moved my mysql installation to that new filesystem (nice instructions here for that:
3) name node cannot start due to permissions
I moved my /tmp directory to a new filesystem and it did not have the proper permissions to write to the new temp directory. So I set perms like so:
$ chmod 777 /tmp
4) ERROR 1148 (42000) at line 40: The used command is not allowed with this MySQL version
Security issue with local data loads:
You must use "--local-infile" as a parameter to the mysql command line like so:
sodo@linux-z6tw:~/trendingtopics/lib/sql> mysql -u user trendingtopics_development < loadSampleData.sql --local-infile
Hadoop safemode must be disabled
Map Reduce Tutorial
"could only be replicated to 0 nodes" FAQ
HDFS Basics for Developers
Thanks for the detailed information.How are u debugging if any error occured.are you using any log4j at HDFS level.
ReplyDeleteTo find out the HDFS basic understanding HDFS Hadoop basics understanding
ReplyDeleteI list out a few debugging techniques in this post:
This comment has been removed by a blog administrator.