Guide: Simple If/Else Control Structure Using Pig

Although Pig is an abstraction for MapReduce in the form of a scripting language, there is no explicit if/else control structure like the ones found in Python or Perl. However, that does not mean that the if/else control structure is unavailable. There are two keywords in Pig that can be used to achieve the same goal.

The first is the FILTER keyword:

<alias2> = FILTER <alias1> BY <expression>;

This should be used in cases where the results only need to be saved if the expression is true. Each tuple in alias1 will be evaluated in terms of the expression, and only the tuples that meet the condition will be copied into alias2.

The second is the SPLIT keyword:

SPLIT <alias> INTO <alias> IF <expression>, <alias> IF <expression> [, <alias> IF <expression> ...] [, <alias> otherwise];

This keyword works similar to a series of single if statements: every expression will always be evaluated for each tuple in the input alias, and a tuple in can appear in multiple output aliases after the split is complete. In Pig 0.10.0, the "otherwise" keyword can be used with SPLIT in order to achieve behavior similar to "else". No "else" behavior is available in previous Pig versions.

No comments:

Post a Comment