Installing Accumulo on Cloudera CDH3 (Centos 6.3)

If you are familiar with Apache HBase, then you might have heard of Apache Accumulo.  It is also a distributed key value map with a similar architecture. Where as the solution to Hadoop security can be customized and is often left to the administrators of the Hadoop clusters, Accumulo was developed with the security of the data as a requirement from the beginning.  For clients with security concerns, Accumulo may be beneficial.This blog post will document my experience during the installation of Accumulo.

Cloudera cdh3u5
When installing a new Hadoop component, it's always best to check the Distro you are using for compatibility.  This can save man hours of time tracking down things like class path or version mismatch issues during runtime. I have been using Cloudera cdh3u5 so I checked this release page. http://archive.cloudera.com/cdh/3/  You will see Accumulo is no not distributed.

What version of Accumulo?
If you check the Accumulo site, then you will see 1.5.0 is not marked as stable but the 1.4.3 version is stable.  Now since we have many flavors of Hadoop these days, we have to find out what version of Hadoop 1.4.3 Accumulo supports.

What version of Hadoop?
The only place I could find what the supported Hadoop version of Accumulo was in the Accumulo README file for each Accumulo version.

https://svn.apache.org/repos/asf/accumulo/tags/1.4.3/README

 For Accumulo 1.4.3, the tested version is 0.20.2.  It is important to note, there isn't really a supported version of Hadoop since it is the community providing support.



Install Accumulo
To install Accumulo 1.4.3, you download this tar file

http://www.apache.org/dyn/closer.cgi/accumulo/1.4.3/accumulo-1.4.3-dist.tar.gz

The uncompress it to your desired location

tar xvf accumulo-1.4.3-dist.tar.gz

Install Zookeeper
Like HBase, Accumulo also depends on Zookeeper. However, the Zookeeper jar file is not shipped with Accumulo like it is for some versions of HBase.  If you don't have HBase on the system with a version of Zookeeper that is compatible with Accumulo, then you have to install Zookeeper separately. We want to make sure we get the version of Zookeeper that is compatible with Accumulo.  If you check the Accumulo README file again, it says Accumulo is compatible with Zookeeper version 3.3 or higher.  Luckily Cloudera cdh3u5 is on Zookeeper 3.3.5.

To get the archive:


Then unpack the archive

tar xvf zookeeper-3.3.5-cdh3u5.tar.gz

Environment variables
There is a sh file that ships with Accumulo where you can add the environment variables but I prefer setting them globally for all the Hadoop components.  In my  /etc/profile file, I set the following variables.

export ZOOKEEPER_HOME=/home/spry/cdh3/zookeeper-3.3.5-cdh3u5
export ACCUMULO_HOME=/home/spry/cdh3/accumulo-1.4.3
export ACCUMULO_LOG_DIR=/home/spry/cdh3/accumulo-1.4.3/log


StarrtingAccumulo
Of course Hadoop needs to be started using :

 $HADOOP_HOME/bin/start-all.sh 

Also make sure Zookeeper is started before starting Accumulo

cd $ZOOKEEPER_HOME/bin

sudo ./zkServer.sh start

You can then start Accumulo by using the start-all.sh file.   It is important to note this is the same name as my Hadoop start-all.sh file and you have to be careful if your Hadoop bin directory is in your path or this command may not execute.  Likewise, the stop-all.sh file stops Accumulo as it does Hadoop.


cd $ACCUMULO_HOME/bin
./start-all.sh


The first time you run Accumulo you have to run the command to specify the root password for accumulo.

$ACCUMULO_HOME/bin/accumulo init

You can try the Accumulo shell and see the Accumulo version

$ACCUMULO_HOME/bin/accumulo shell -u root

Verify the installation by visiting this Url to see the Accumulo console:

http://localhost:50095

.

No comments:

Post a Comment