Configuring Hadoop with Guava MapSplitters

Follow Us »
Facebook
Twitter

Configuring Hadoop with Guava MapSplitters

About Bill Bejeck

In this post we are going to provide a new twist on passing configuration parameters to a Hadoop Mapper via the Context object. Typically, we set configuration parameters as key/value pairs on the Context object when starting a map-reduce job. Then in the Mapper we use the key(s) to retrieve the value(s) to use for our configuration needs. The twist is we will set a specially formatted string on the Context object and when retrieving the value in the Mapper, use a Guava MapSplitter to convert the formatted string into a HashMap that will be used for obtaining configuration parameters. We may be asking ourselves why go to this trouble? By doing configuration this way we are able to pass multiple parameters to a Mapper with a single key-value pair set on the Context object. To illustrate one possible usage, we will revisit the last post, where we covered how to perform reduce-side joins. There are two problems with the proposed solution in that post. First, we are assuming the key to join on is always the first value in a delimited string from a file. Second, we assume that the same delimiter is used for each file. What if we want to join data from files where the key is located in different locations per file and some files use different delimiters? Additionally we want to use the same delimiter (if any) for all of the data we output regardless of the delimiter used in any of the input files. While this is admittedly a contrived situation, it will serve well for demonstration purposes. First let’s go over what the MapSplitter class is and how we can use it.

Source : http://www.javacodegeeks.com/2013/09/configuring-hadoop-with-guava-mapsplitters.html

News

Configuring Hadoop with Guava MapSplitters

About Bill Bejeck

Make auto fresh content

Articles Trump

Tube On Fire

Popular Posts