Hadoop Fundamentals - Custom Counters


The Hadoop framework keeps track of a set of built-in counters that it uses to track job progress. As an example, some of the ones displayed in the summary web page in the JobTracker are: the number of bytes read by the input format, the number of launched reduce tasks, the number of data local map tasks, and the number of bytes written by the output format. There is a mechanism that allows user-defined custom counters to be tracked alongside the built-in ones using the existing counter framework.

Creating Custom Counters

In order to create and use a custom counter, the first step is to create an Enum that will contain the names of all custom counters for a particular job. This can be achieved as follows:

enum CustomCounters {RED, BLUE}

Inside the map or reduce task, the counter can be adjusted using the following:

    context.getCounter(CustomCounters.RED).increment(1); // increase the counter by 1
else if(blue)
    context.getCounter(CustomCounters.BLUE).increment(-1); // decrease the counter by 1

Programmatically Retrieving Custom Counter Values

The custom counter values will be displayed alongside the built-in counter values on the summary web page for a job viewed through the JobTracker. In addition to this, the values can be accessed programmatically with the following:

long redCounterValue = job.getCounters().findCounter(CustomCounters.RED);

Increasing the Counter Limit

The Hadoop framework does impose an upper bound on the number of counters that can be tracked per application. This is in place in order to ensure that no single job will accidentally use all available JobTracker memory during the course of its execution. While this may not be an issue for individual MapReduce jobs, some frameworks also make use of this method to keep track of their custom counters. Cascading is one such example, and the details of its Counter logic is described in a previous Spry blog post. Another example is the graph framework Giraph. In either case, if the number of Counters the framework attempts to track exceeds the limit on the number of counters (120 by default), the job will fail.

Fortunately, the property that limits the number of counters is configurable through mapred-site.xml.

In Hadoop 1.0: "mapreduce.job.counters.max".
In Hadoop 2.0: "mapreduce.job.counters.limit".

Where can I go for more information?

Giraph FAQ
Map/Reduce JIRA ticket to make counter limits configurable
External Blog - Limiting Usage Counters In Hadoop
Or feel free to leave your question or comment below!

No comments:

Post a Comment