Cascading: ExpressionFilter or Custom Filters?


Cascading offers more than one option when dealing with the filtering of data streams.  Filters may coincide with particular conditions one wants to apply on the output of a pipe, or particular changes to be implemented on stream fields depending on conditions of other values.  In general, the dominant filter flavors usually switch between expression filters and custom filters.  The following entry will deal with large filtering conditions and ways in which to implement such conditions on a stream using both filter types.


Pipe A consists of fields, A, B, C, D, and E.  A resulting output stream is to contain data where:

String A = "NONE";
Integer B = 10;
String C = "VALID";
Integer D >= 100; 
Long E = 1L;

Cascading Expression Filter


A Cascading Expression Filter involving multiple inputs must have the Field names and data types explicitly defined in Object arrays, resembling the following:

Expression Filter myFilter = 
new ExpressionFilter("a.equals("NONE") && b.equals(10) " +
                      "&& c.equals("VALID") && (d >= 100) " +
                      "&& e.equals(1L), new String[]{"a", "b",
                      "c", "d", "e"}, new Class[]{String.class, 
                      Integer.class, String.class, Integer.class,

Cascading Custom Filter


A cascading custom filter may implement the same conditionals as above, but at the same time, provide a more configurable filtering platform upon situations in which field values change.  

   public static class MyFilter<Context> extends BaseOperation<Context> implements Filter<Context> {

        MyFilter(Fields fields) {
            super(1, fields);

        public boolean isRemove(@SuppressWarnings("rawtypes") FlowProcess flowProcess, FilterCall<Context> filterCall) {

          TupleEntry arguments = filterCall.getArguments();

a.equals("NONE") && b.equals("NONE") && c.equals("VALID") 

                 && (d >= 100) && e.equals(1L);



The filter would be implemented in the following fashion:

Pipe newPipe = new Each(previousPipe, new Fields("a", "b", "c", "d", "e"), new MyFilter<Object>(Fields.ALL));  



The extent of the difference in functionality and usage of Cascading filter types are not fully defined here, but this post gives an alternative of approach when facing groupings of custom conditions to be applied to a single stream.  Implementing an expression filter in this case is of course possible, but can involve long explicit lists.   Calling the same conditions in a custom filter offers a little less listing, a more configurable platform, and is also a candidate for reuse if such conditions surface within other areas of the project.   

No comments:

Post a Comment