org.apache.hadoop.mapreduce.lib.input
Class NLineInputFormat

java.lang.Object
  extended by org.apache.hadoop.mapreduce.InputFormat<K,V>
      extended by org.apache.hadoop.mapreduce.lib.input.FileInputFormat<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text>
          extended by org.apache.hadoop.mapreduce.lib.input.NLineInputFormat

public class NLineInputFormat
extends org.apache.hadoop.mapreduce.lib.input.FileInputFormat<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text>

NLineInputFormat which splits N lines of input as one split. In many "pleasantly" parallel applications, each process/mapper processes the same input file (s), but with computations are controlled by different parameters.(Referred to as "parameter sweeps"). One way to achieve this, is to specify a set of parameters (one set per line) as input in a control file (which is the input path to the map-reduce application, where as the input dataset is specified via a config variable in JobConf.). The NLineInputFormat can be used in such applications, that splits the input file such that by default, one line is fed as a value to one map task, and key is the offset. i.e. (k,v) is (LongWritable, Text). The location hints will span the whole mapred cluster.


Nested Class Summary
 
Nested classes/interfaces inherited from class org.apache.hadoop.mapreduce.lib.input.FileInputFormat
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.Counter
 
Field Summary
static String LINES_PER_MAP
           
 
Constructor Summary
NLineInputFormat()
           
 
Method Summary
 org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text> createRecordReader(org.apache.hadoop.mapreduce.InputSplit genericSplit, org.apache.hadoop.mapreduce.TaskAttemptContext context)
           
static int getNumLinesPerSplit(org.apache.hadoop.mapreduce.JobContext job)
          Get the number of lines per split
 List<org.apache.hadoop.mapreduce.InputSplit> getSplits(org.apache.hadoop.mapreduce.JobContext job)
          Logically splits the set of input files for the job, splits N lines of the input as one split.
static List<org.apache.hadoop.mapreduce.lib.input.FileSplit> getSplitsForFile(org.apache.hadoop.fs.FileStatus status, org.apache.hadoop.conf.Configuration conf, int numLinesPerSplit)
           
static void setNumLinesPerSplit(org.apache.hadoop.mapreduce.Job job, int numLines)
          Set the number of lines per split
 
Methods inherited from class org.apache.hadoop.mapreduce.lib.input.FileInputFormat
addInputPath, addInputPaths, computeSplitSize, getBlockIndex, getFormatMinSplitSize, getInputPathFilter, getInputPaths, getMaxSplitSize, getMinSplitSize, isSplitable, listStatus, setInputPathFilter, setInputPaths, setInputPaths, setMaxInputSplitSize, setMinInputSplitSize
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

LINES_PER_MAP

public static final String LINES_PER_MAP
See Also:
Constant Field Values
Constructor Detail

NLineInputFormat

public NLineInputFormat()
Method Detail

createRecordReader

public org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text> createRecordReader(org.apache.hadoop.mapreduce.InputSplit genericSplit,
                                                                                                                                org.apache.hadoop.mapreduce.TaskAttemptContext context)
                                                                                                                         throws IOException
Specified by:
createRecordReader in class org.apache.hadoop.mapreduce.InputFormat<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text>
Throws:
IOException

getSplits

public List<org.apache.hadoop.mapreduce.InputSplit> getSplits(org.apache.hadoop.mapreduce.JobContext job)
                                                       throws IOException
Logically splits the set of input files for the job, splits N lines of the input as one split.

Overrides:
getSplits in class org.apache.hadoop.mapreduce.lib.input.FileInputFormat<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text>
Throws:
IOException
See Also:
FileInputFormat.getSplits(JobContext)

getSplitsForFile

public static List<org.apache.hadoop.mapreduce.lib.input.FileSplit> getSplitsForFile(org.apache.hadoop.fs.FileStatus status,
                                                                                     org.apache.hadoop.conf.Configuration conf,
                                                                                     int numLinesPerSplit)
                                                                              throws IOException
Throws:
IOException

setNumLinesPerSplit

public static void setNumLinesPerSplit(org.apache.hadoop.mapreduce.Job job,
                                       int numLines)
Set the number of lines per split

Parameters:
job - the job to modify
numLines - the number of lines per split

getNumLinesPerSplit

public static int getNumLinesPerSplit(org.apache.hadoop.mapreduce.JobContext job)
Get the number of lines per split

Parameters:
job - the job
Returns:
the number of lines per split


Copyright © 2013 University of Waikato, Hamilton, NZ. All Rights Reserved.