How MapR Uses its Own Storage Layer and What this Means to Oozie

MapR uses its own distributed file system known as MapRFS.  They do this because they believe that HDFS is inherently flawed.  As a result, when running a set of cross distribution benchmark tests using the Oozie workflow tool, the Oozie configuration file needs to be modified to accommodate this difference in distributions. Notably, you'll need to change the HDFS urls to match the specific Hadoop distribution's storage layer.

Here are three different (but equivalent) Oozie configuration files with the associated Oozie execution command:




Hortonworks

(HDP v1.1.1.16; Oozie v3.1.3)

job.properties file:
oozie.wf.application.path=hdfs://localhost:8020/mnt/WordCount
jobTracker=localhost:50300
nameNode=hdfs://localhost:8020

Oozie execution command:
$ oozie job -run -config /opt/share/WordCount/job.properties -oozie http://localhost:11000/oozie


MapR

(MapR M3 v2.1.1.17042.GA-1; Oozie v3.2.0)

job.properties file:
oozie.wf.application.path=maprfs:/mnt/WordCount
jobTracker=maprfs:///
nameNode=maprfs:///

Oozie execution command:
$ oozie job -run -config /opt/share/WordCount/job.properties -oozie http://localhost:11000/oozie



CDH3u5

(Cloudera CDH3u5; Oozie v2.3.2-cdh3u5)

job.properties file:
oozie.wf.application.path=hdfs://localhost:8020/mnt/WordCount
jobTracker=localhost:8021
nameNode=hdfs://localhost:8020

Oozie execution command:
$ oozie job -run -config /opt/share/WordCount/job.properties -oozie http://localhost:11000/oozie

No comments:

Post a Comment