Once I had a vm running, I then went through the install docs and got both Cloudera for Hadoop Beta 3 and Whirr installed. Finally, I was able to fire up a cluster using Whirr. I describe Whirr configuration, Hadoop commands and perform a quick MapReduce test in a follow-up post.
Cloudera for Hadoop Beta 3
Following https://docs.cloudera.com/display/DOC/CDH3+Quick+Start+Guide
- installed JDK (http://www.oracle.com/technetwork/java/javase/downloads/index.html)
- added Cloudera repo (zypper addrepo -f http://archive.cloudera.com/sles/11/x86_64/cdh/cloudera-cdh3.repo)
- zypper install hadoop-0.20-conf-pseudo
- testing
Following instructions here..
https://wiki.cloudera.com/display/DOC/Whirr+Installation
Whirr Notes
ip-10-212-121-180:~ # cat .bashrc
alias whirr='java -jar /usr/lib/whirr/whirr-cli-0.3.0-CDH3B4.jar'
alias whirr-ec2='whirr --identity=058DSRMMTFQMQRER2 --credential=VHmAq9QzCxzKpxhQxBoA5jOxZksq62jpO5mbD'
ip-10-212-121-180:~ # cat .bash_profile
export WHIRR_HOME=/usr/lib/whirr
export AWS_ACCESS_KEY_ID="058DSRMMT"
export AWS_SECRET_ACCESS_KEY="VHmAq9QzCxzKpxhQxBoA5jOxZks"
Hadoop Properties
ip-10-212-121-180:~ # cat hadoop.properties
whirr.cluster-name=myhadoopcluster
whirr.instance-templates=1 jt+nn,1 dn+tt
whirr.provider=ec2
whirr.identity=${env:AWS_ACCESS_KEY_ID}
whirr.credential=${env:AWS_SECRET_ACCESS_KEY}
whirr.private-key-file=${sys:user.home}/.ssh/id_rsa
whirr.public-key-file=${sys:user.home}/.ssh/id_rsa.pub
Passphraseless SSH for Localhost
ip-10-212-121-180:~ # ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
*make sure you ssh login as ec2-user to any hadoop cluster members
Launching a Hadoop cluster with Whirr
ip-10-212-121-180:~ # whirr launch-cluster --config hadoop.properties ip-10-212-121-180:~
#
Bootstrapping cluster
Configuring template
Starting 1 node(s) with roles [tt, dn]
Configuring template
Starting 1 node(s) with roles [jt, nn]
Nodes started: [[id=us-east-1/i-2a55fb45, providerId=i-2a55fb45, tag=myhadoopcluster, name=null, location=[id=us-east-1d, scop
e=ZONE, description=us-east-1d, parent=us-east-1], uri=null, imageId=us-east-1/ami-2a1fec43, os=[name=null, family=amzn-linux,
version=2011.02.1, arch=paravirtual, is64Bit=false, description=amzn-ami-us-east-1/amzn-ami-2011.02.1.i386.manifest.xml], use
rMetadata={}, state=RUNNING, privateAddresses=[10.98.103.208], publicAddresses=[184.72.64.110], hardware=[id=m1.small, provide
rId=m1.small, name=m1.small, processors=[[cores=1.0, speed=1.0]], ram=1740, volumes=[[id=null, type=LOCAL, size=10.0, device=/
dev/sda1, durable=false, isBootDevice=true], [id=null, type=LOCAL, size=150.0, device=/dev/sda2, durable=false, isBootDevice=f
alse]], supportsImage=Not(is64Bit())]]]
Nodes started: [[id=us-east-1/i-c455fbab, providerId=i-c455fbab, tag=myhadoopcluster, name=null, location=[id=us-east-1d, scop
e=ZONE, description=us-east-1d, parent=us-east-1], uri=null, imageId=us-east-1/ami-2a1fec43, os=[name=null, family=amzn-linux,
version=2011.02.1, arch=paravirtual, is64Bit=false, description=amzn-ami-us-east-1/amzn-ami-2011.02.1.i386.manifest.xml], use
rMetadata={}, state=RUNNING, privateAddresses=[10.112.27.95], publicAddresses=[67.202.27.150], hardware=[id=m1.small, provider
Id=m1.small, name=m1.small, processors=[[cores=1.0, speed=1.0]], ram=1740, volumes=[[id=null, type=LOCAL, size=10.0, device=/d
ev/sda1, durable=false, isBootDevice=true], [id=null, type=LOCAL, size=150.0, device=/dev/sda2, durable=false, isBootDevice=fa
lse]], supportsImage=Not(is64Bit())]]]
Authorizing firewall
Running configuration script
Configuration script run completed
Running configuration script
Configuration script run completed
Completed configuration of myhadoopcluster
Web UI available at http://ec2-67-202-27-150.compute-1.amazonaws.com
Wrote Hadoop site file /root/.whirr/myhadoopcluster/hadoop-site.xml
Wrote Hadoop proxy script /root/.whirr/myhadoopcluster/hadoop-proxy.sh
Wrote instances file /root/.whirr/myhadoopcluster/instances
Started cluster of 2 instances
Cluster{instances=[Instance{roles=[tt, dn], publicAddress=/184.72.64.110, privateAddress=/10.98.103.208, id=us-east-1/i-2a55fb
45}, Instance{roles=[jt, nn], publicAddress=/67.202.27.150, privateAddress=/10.112.27.95, id=us-east-1/i-c455fbab}], configura
tion={hadoop.job.ugi=root,root, mapred.job.tracker=ec2-67-202-27-150.compute-1.amazonaws.com:8021, hadoop.socks.server=localho
st:6666, fs.s3n.awsAccessKeyId=05TNM8DSRMM, fs.s3.awsSecretAccessKey=VhmAq9QzCxzKpxhQxBoA5jOxZksq62jpO5mbD, fs.s3.
awsAccessKeyId=058DSRM, hadoop.rpc.socket.factory.class.default=org.apache.hadoop.net.SocksSocketFactory, fs.defa
ult.name=hdfs://ec2-67-202-27-150.compute-1.amazonaws.com:8020/, fs.s3n.awsSecretAccessKey=VhmAq9QzCxzKpxhQxBoA5j
O5mbD}}
Destroying a Cluster
ip-10-212-121-180:~ # whirr destroy-cluster --config hadoop.properties
Destroying myhadoopcluster cluster
Cluster myhadoopcluster destroyed
More Whirr, Hadoop and MapReduce testing here
References
CDH3 Beta install and test cases
https://docs.cloudera.com/display/DOC/CDH3+Quick+Start+Guide
Pseudo-distributed mode and passphraseless SSH
http://archive.cloudera.com/cdh/3/hadoop-0.20.2-CDH3B4/single_node_setup.html
https://cwiki.apache.org/confluence/display/WHIRR/Quick+Start+Guide
http://incubator.apache.org/whirr/quick-start-guide.html (for MapReduce jobs sample)
Hadoop Shell Commands
Hi. I hope you have changed those secret access keys, or edited them before posting, as otherwise you should change those keys immediately to stop someone else using your login.
ReplyDelete-steve
Steve,
ReplyDeleteYes, the names have been changed to protect the innocent.
:)
'sodo