org.apache.hadoop.mapreduce.lib.input
Class NLineInputFormat
java.lang.Object
org.apache.hadoop.mapreduce.InputFormat<K,V>
org.apache.hadoop.mapreduce.lib.input.FileInputFormat<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text>
org.apache.hadoop.mapreduce.lib.input.NLineInputFormat
public class NLineInputFormat
- extends org.apache.hadoop.mapreduce.lib.input.FileInputFormat<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text>
NLineInputFormat which splits N lines of input as one split.
In many "pleasantly" parallel applications, each process/mapper
processes the same input file (s), but with computations are
controlled by different parameters.(Referred to as "parameter sweeps").
One way to achieve this, is to specify a set of parameters
(one set per line) as input in a control file
(which is the input path to the map-reduce application,
where as the input dataset is specified
via a config variable in JobConf.).
The NLineInputFormat can be used in such applications, that splits
the input file such that by default, one line is fed as
a value to one map task, and key is the offset.
i.e. (k,v) is (LongWritable, Text).
The location hints will span the whole mapred cluster.
| Nested classes/interfaces inherited from class org.apache.hadoop.mapreduce.lib.input.FileInputFormat |
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.Counter |
|
Method Summary |
org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text> |
createRecordReader(org.apache.hadoop.mapreduce.InputSplit genericSplit,
org.apache.hadoop.mapreduce.TaskAttemptContext context)
|
static int |
getNumLinesPerSplit(org.apache.hadoop.mapreduce.JobContext job)
Get the number of lines per split |
List<org.apache.hadoop.mapreduce.InputSplit> |
getSplits(org.apache.hadoop.mapreduce.JobContext job)
Logically splits the set of input files for the job, splits N lines
of the input as one split. |
static List<org.apache.hadoop.mapreduce.lib.input.FileSplit> |
getSplitsForFile(org.apache.hadoop.fs.FileStatus status,
org.apache.hadoop.conf.Configuration conf,
int numLinesPerSplit)
|
static void |
setNumLinesPerSplit(org.apache.hadoop.mapreduce.Job job,
int numLines)
Set the number of lines per split |
| Methods inherited from class org.apache.hadoop.mapreduce.lib.input.FileInputFormat |
addInputPath, addInputPaths, computeSplitSize, getBlockIndex, getFormatMinSplitSize, getInputPathFilter, getInputPaths, getMaxSplitSize, getMinSplitSize, isSplitable, listStatus, setInputPathFilter, setInputPaths, setInputPaths, setMaxInputSplitSize, setMinInputSplitSize |
| Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
LINES_PER_MAP
public static final String LINES_PER_MAP
- See Also:
- Constant Field Values
NLineInputFormat
public NLineInputFormat()
createRecordReader
public org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text> createRecordReader(org.apache.hadoop.mapreduce.InputSplit genericSplit,
org.apache.hadoop.mapreduce.TaskAttemptContext context)
throws IOException
- Specified by:
createRecordReader in class org.apache.hadoop.mapreduce.InputFormat<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text>
- Throws:
IOException
getSplits
public List<org.apache.hadoop.mapreduce.InputSplit> getSplits(org.apache.hadoop.mapreduce.JobContext job)
throws IOException
- Logically splits the set of input files for the job, splits N lines
of the input as one split.
- Overrides:
getSplits in class org.apache.hadoop.mapreduce.lib.input.FileInputFormat<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text>
- Throws:
IOException- See Also:
FileInputFormat.getSplits(JobContext)
getSplitsForFile
public static List<org.apache.hadoop.mapreduce.lib.input.FileSplit> getSplitsForFile(org.apache.hadoop.fs.FileStatus status,
org.apache.hadoop.conf.Configuration conf,
int numLinesPerSplit)
throws IOException
- Throws:
IOException
setNumLinesPerSplit
public static void setNumLinesPerSplit(org.apache.hadoop.mapreduce.Job job,
int numLines)
- Set the number of lines per split
- Parameters:
job - the job to modifynumLines - the number of lines per split
getNumLinesPerSplit
public static int getNumLinesPerSplit(org.apache.hadoop.mapreduce.JobContext job)
- Get the number of lines per split
- Parameters:
job - the job
- Returns:
- the number of lines per split
Copyright © 2013 University of Waikato, Hamilton, NZ. All Rights Reserved.