Making Use of AspectJ to Test Hive UDTFs

Unit Testing Hive UDFs

As discussed in previous posts, a User Defined Table Function (UDTF) is a variant on the normal Hive UDF. Instead of reading one or more columns as input and writing a single column as output, the UDTF takes in one or more columns and writes multiple rows.

For UDFs, input records are passed to the following function for processing, with the result being used as the return value:

public static evaluate();

This fits the normal JUnit testing framework, so traditional testing methods can be applied.

However, for UDTFs, the input records are passed to the following function:

public void process(Object[] record);

Notice that the return value is "void". In the case of UDTFs, output values are written through calls to the "forward" method:

protected final void forward(java.lang.Object o);

Since both the process and forward methods have a void return value, this does not conform to the JUnit testing process, and an alternative approach is required.

AspectJ

AspectJ is an extension to the Java language that allows programmers to define "Aspects" - structures that provide mechanisms for overriding functionality in particular methods, or for supplementing additional functionality before or after a particular event. Events can be method calls, modifications of variables, initialization of classes, or thrown exceptions.

This technology is applicable to the UDTF case because it will allow us to apply AspectJ "advice" around the forward method - calling the normal Hive method during normal execution and calling a custom method that will fit into the JUnit framework during the testing phase.

Unit Testing a UDTF

Extension of the GenericUDTF Class

In order to ensure that the AspectJ modification does not affect the implementation of the UDTF, a customized version of the GenericUDTF must be created. This custom class will contain the "testCompatibleForward" method around which the AspectJ advice should be applied.

Aspect Definition

The UDTFTestingAspect.java should be created so that it contains the following code. The "interceptAndSave" function will "intercept" all records intended for the forward method, and save it to an ArrayList for JUnit to inspect.

@Aspect
public class UDTFTestingAspect {
    private static ArrayList<Object[]> result = new ArrayList<Object[]>(); // the ArrayList that will contain the output
    private static boolean initialized = false; // if this is not set to true, the values will be passed to the "forward" method

    /**
     * The user must call this in order to begin the interceptions
     */
    public static void initialize() {
        initialized = true;
    }

    /**
     * This advice is applied at the testCompatibleForward join point. The @Around annotation is used to show that the original
     * functionality will be entirely replaced by this, except in the case where invocation.proceed is called.
     *
     * The intercept functionality will be to capture the Object[] that was intended for the "forward" method and store it
     * into the "result" array for inspection by the unit test.
     * @param invocation Information about the join point where the advice is being applied
     * @throws Throwable If the join point throws an exception, this function will as well
     */
    @Around("execution(* org.spryinc.hive.utils.SpryGenericUDTF.testCompatibleForward(*) throws *)") // the function where the advice is applied
    public void interceptAndSave(ProceedingJoinPoint invocation) throws Throwable {
        if (initialized) {
            // store each of the output Object[] values into the ArrayList
            for (Object i : invocation.getArgs()) {
                Object[] iObj = (Object[]) i;
                result.add(iObj);
            }
        } else
            invocation.proceed(); // normal execution if the intercept behavior is not enabled
    }

    /**
     * @return Returns the ArrayList containing all output values intended for the GenericUDTF.forward method
     */
    public static ArrayList<Object[]> getResult() {
        return result;
    }

    /**
     * Clears the ArrayList so that the results are valid between method calls
     */
    public static void clearResult() {
        result.clear();
    }
}

Modifications to the Unit Tests

Unit tests can be implemented in the normal method - defining the expected outputs, creating the required inputs, and calling the "process" method with the inputs passed in as an argument. In order to take advantage of the AspectJ handling, three actions must be taken:

Make use of the @Before annotation to initialize the testing process

The default behavior is to use the built-in forward method, so that normal usage of the UDTF is unaffected. In order to enable the use of the version of the forward method required for testing, the UDTFTestingAspect.initialize() function must be called. This is accomplished by placing the following lines into the unit test file:


@Before
public void initialize() {
    UDTFTestingAspect.initialize();
}

Retrieve the generated output using a custom function

Since the output from the interceptAndSave method is only available through the Aspect, the following method must be used in order to gain access:

UDTFTestingAspect.getResult()

Make use of the @After annotation to reset the testing process

Since the interceptAndSave method has no built-in mechanism for clearing the results, this must be performed after each test. The following lines will clear the gathered output:

@After
public void clearSavedResults() {
    UDTFTestingAspect.clearResult();
}

Where can I go for more information?

This post assumes that AspectJ is installed and accessible through the IDE. Instructions on installing AspectJ are available here.
For more information on AspectJ
For more information on Hive UDTFs
Or feel free to leave your question or comment below!

No comments:

Post a Comment