Linux tab completion for Hadoop

This post explains how to set up your Linux box so that tab completion works with the hadoop command.

Note that I wasn't able to get Hadoop command line completion to work perfectly on my CentOS box. However, I did have luck on my Ubuntu 12.04 machine. It may be nothing though... the CentOS box could just be out of date. I documented the issue near the end.



The bash-completion package usually gets installed along with Bash, but if it's not, install with:
sudo apt-get install bash-completion


Install EPEL repo

src : CentOS / RHEL / Scientific Linux 6 Enable & Install EPEL Repo

By default my CentOS 5 machine does not know about the EPEL yum repository. If you're running a different RHEL version, see the source link, above.
sudo rpm -ivh epel-release-5-4.noarch.rpm

Install Bash completion

src : RHEL / CentOS: Install and Activate Bash Autocomplete Feature
sudo yum install bash-completion
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
 * base:
 * epel:
 * extras:
 * updates:
epel                                                            | 3.6 kB  00:00
epel/primary_db                                                 | 3.8 MB  00:01
Setting up Install Process
Resolving Dependencies
--> Running transaction check
---> Package bash-completion.noarch 1:1.3-7.el5 set to be updated
--> Finished Dependency Resolution

Dependencies Resolved

 Package                 Arch      Version        Repository    Size
 bash-completion         noarch    1:1.3-7.el5    epel         216 k

Transaction Summary
Install       1 Package(s)
Upgrade       0 Package(s)

Total download size: 216 k
Is this ok [y/N]: y
Downloading Packages:
bash-completion-1.3-7.el5.noarch.rpm                            | 216 kB     00:00
warning: rpmts_HdrFromFdno: Header V4 DSA signature: NOKEY, key ID 217521f6
epel/gpgkey                                                     | 1.7 kB     00:00
Importing GPG key 0x217521F6 "Fedora EPEL <>" from /etc/pki/rpm-gpg/RPM-GPG-KEY-EPEL
Is this ok [y/N]: y
Running rpm_check_debug
Running Transaction Test
Finished Transaction Test
Transaction Test Succeeded
Running Transaction
  Installing     : bash-completion                                    1/1

  bash-completion.noarch 1:1.3-7.el5


Once installed, the /etc/bash_completion.d/ directory should exist.  Logout of the current terminal and log back in before proceeding.

Setup tab completion for Hadoop (0.20.2)

I am using CDH3u6 which installed Hadoop v0.20.2.
Next, get the packaged source code for this release and extract:
tar zxf hadoop-0.20.2.tar.gz

Copy the Hadoop bash completion code into the /etc/bash_completion.d/ directory which should already exist) and fixup permissions.
sudo cp -p hadoop-0.20.2/src/contrib/bash-tab-completion/ /etc/bash_completion.d/
sudo chmod 0644 /etc/bash_completion.d/
sudo chown root:root /etc/bash_completion.d/
Log out of the current terminal and log back in before using completion.

Buggy on CentOS 5.7

Note that I wasn't able to get hadoop command line completion to work perfectly. The command portion works fine, as you can see below.

Bash completion functionality invoked like: "hadoop dfs[TAB][TAB]":

However, the option portion can result in awk errors after hitting [TAB] [TAB], as witnessed when using the fs and dfsadmin commands.

Bash completion functionality invoked like: "hadoop fs [TAB][TAB]":

Bash completion functionality invoked with TAB's, like: "hadoop dfsadmin [TAB][TAB]":

No comments:

Post a Comment