Cascading and PL/SQL Operations. Selecting Fields Under Oracle's DECODE and CASE functions

Overview

 

Cascading is an abstracted framework that aids to accomplish complicated data workflows inside of Apache Hadoop.  In some cases, Cascading can be looked to as an alternative to other commonly implemented data workflow platforms sometimes seen as resource expensive in particular contexts.  Oracle's procedural language extension of SQL, or PL/SQL, can prove handy in many situations when complex workflows need to be defined, but also surfaces as a good candidate for reimplementation through Hadoop coupled with well-defined cascading operations when data loads are heavy.  In situations where conversion is the right choice, discovering ways in which to re-implement PL/SQL functionalities in cascading language will prove a significant part of the transition.  Here, two very common PL/SQL functions are examined and reestablished as cascading operations, the DECODE, and the CASE functions.

A Typical PL/SQL DECODE Statement

 

A typical PL/SQL DECODE statement will work to set a field value depending on the value of a separate field, and will commonly display in the following fashion:

SELECT...

DECODE (a.fieldA,
'VALUE_A', newValue_A,
'VALUE_B', newValue_B) AS New_Field;




A Possible Cascading Solution

 

One can implement a simple custom cascading function to handle the IF/ELSE behavior of the DECODE.  For example:

 public static class SetFieldsOnValue<Context> extends BaseOperation<Context> implements Function<Context> {

        SetFieldsOnValue(Fields fields) {
            super(1, fields);
        }
 

@Override
public void operate(@SuppressWarnings("rawtypes") FlowProcess flowProcess, FunctionCall<Context> functionCall) {

Object O = functionCall.getArguments().getObj("a.fieldA");

switch(a.fieldA) {

Object New_Field;

case VALUE_A: New_Field = "newValue_A";
 break;
case VALUE_B: New_Field = "newValue_B";
 break;

}

functionCall.getOutputCollector().add(new Tuple(New_Field));

        }
    } 



The FlowDef  may then contain something similar to the following:


Pipe newPipe = new Each(primaryPipe, Fields.ALL, new SetFieldsOnValue<Object>(Fields.ALL));


A Typical PL/SQL CASE Statement 

 

A typical PL/SQL CASE is similar to the DECODE statement may follow a format such as:


(CASE
WHEN a.fieldA = 'NONE'
THEN a.fieldB
ELSE 0
END) AS New_Field,

 

A Possible Cascading Solution

 

Once again, a simple custom function is a possible approach:



public static class SetCase<Context> extends BaseOperation<Context> implements Function<Context> {

        SetFieldsOnValue(Fields fields) {
            super(1, fields);
        }
 

@Override
public void operate(@SuppressWarnings("rawtypes") FlowProcess flowProcess, FunctionCall<Context> functionCall) {

Object O = functionCall.getArguments().getObj("a.fieldA");

if(O.equals("NONE")) {Object New_Field = functionCall.getArguments().getObj("a.fieldB");
functionCall.getOutputCollector().add(new Tuple(New_Field))};

        }
    }

 

The FlowDef  may then contain something similar to the following:


Pipe newPipe = new Each(primaryPipe, Fields.ALL, new SetCase<Object>(Fields.ALL));


 

No comments:

Post a Comment