Apache Giraph: A Brief Troubleshooting Guide

Apache Giraph is a component in the Hadoop ecosystem that provides iterative graph processing on top of MapReduce or YARN. It is the open-source answer to Google's Pregel, and has been around since early 2012. The tool has a very active developer community that contributes to the code base. However, since the emphasis is on making Giraph as stable and robust as possible, there are some gaps in the documentation about how to use the tool - specifically in the realm of interpreting error messages.

This blog post will contribute information on some of the most common error messages that might be encountered when running the Giraph examples.

Class Not Found Exceptions

In order to avoid these types of errors:

Exception in thread "main" java.lang.ClassNotFoundException: org.giraph.examples.SimpleShortestPathsVertex

Ensure that any necessary jars from Giraph are being located on the classpath. One way to do this is to copy the jar files from giraph-core/target and giraph-examples/target into the Hadoop lib folder with the following commands:

cp giraph-core/target/*.jar /usr/lib/hadoop/lib
cp giraph-examples/target/*.jar /usr/lib/hadoop/lib

ZooKeeper Null Pointer Exceptions

At this point, Giraph requires ZooKeeper (configured externally). If the ZooKeeper information is not passed in as an argument when the example is launched, an exception like the following one will be thrown:
Exception in thread "main" java.lang.NullPointerException at org.apache.giraph.yarn.GiraphYarnClient.checkJobLocalZooKeeperSupported(GiraphYarnClient.java:460)

In order to avoid that issue, be sure to specify the ZooKeeper address on the command line in the following way:

-Dgiraph.zkList="localhost:2181"

Processing Hangs at "Wait To Finish .."

When setting up the application, Giraph will attempt to launch all of the containers it needs at once. If the system is unable to accommodate the container requests, Giraph will sit at the "Wait To Finish" stage until enough resources are free. In an environment where there will never be enough resources to accommodate the requests, the application will appear to hang at this stage.

In order to address this situation, increase the memory specified in the following configuration properties in Ambari or similar until the containers are able to start up successfully:

yarn.scheduler.minimum-allocation-mb
yarn.scheduler.maximum-allocation-mb
resourcemanager_heapsize
nodemanager_heapsize
yarn.nodemanager.resource.memory-mb
yarn_heapsize

"Unable to delete file /tmp/giraph-conf.xml"

Even if the hadoop.tmp.dir property is set to something other than the default in core-site.xml, Giraph will attempt to write its temporary files into the /tmp directory. If the permissions are not set appropriately, this will result in the following error:

WARN yarn.YarnUtils: Unable to delete file /tmp/giraph-conf.xml

In order to fix this problem, make sure that the group permissions are set so that the user Giraph is operating as has the correct permissions in the local file system. In most cases, this will be "yarn", but the username is printed in the logs in this message:

Setting username in ContainerLaunchContext to: yarn

To ensure that the correct file permissions are set on the /tmp folder, follow the instructions described on this page.

Where can I go for more information?


Or feel free to leave your question or comment below!

No comments:

Post a Comment