public class RegexSequenceRecordReader extends FileRecordReader implements SequenceRecordReader
Pattern and Matcher to do the splitting into groups
Example: Data in format "2016-01-01 23:59:59.001 1 DEBUG First entry message!"RegexSequenceRecordReader.LineErrorHandling. Invalid
lines that don't match the provided regex can result in an exception (FailOnInvalid), can be skipped silently (SkipInvalid),
or skip invalid but log a warning (SkipInvalidWithWarning)| Modifier and Type | Class and Description |
|---|---|
static class |
RegexSequenceRecordReader.LineErrorHandling
Error handling mode: How should invalid lines (i.e., those that don't match the provided regex) be handled?
FailOnInvalid: Throw an IllegalStateException when an invalid line is found SkipInvalid: Skip invalid lines (quietly, with no warning) SkipInvalidWithWarning: Skip invalid lines, but log a warning |
| Modifier and Type | Field and Description |
|---|---|
static Charset |
DEFAULT_CHARSET |
static RegexSequenceRecordReader.LineErrorHandling |
DEFAULT_ERROR_HANDLING |
static org.slf4j.Logger |
LOG |
static String |
SKIP_NUM_LINES |
appendLabel, conf, currentFile, inputSplit, iter, labelsAPPEND_LABEL, LABELS, NAME_SPACE| Constructor and Description |
|---|
RegexSequenceRecordReader(String regex,
int skipNumLines) |
RegexSequenceRecordReader(String regex,
int skipNumLines,
Charset encoding,
RegexSequenceRecordReader.LineErrorHandling errorHandling) |
| Modifier and Type | Method and Description |
|---|---|
void |
initialize(Configuration conf,
InputSplit split)
Called once at initialization.
|
void |
reset()
Reset record reader iterator
|
Collection<Collection<Writable>> |
sequenceRecord()
Returns a sequence record`
|
Collection<Collection<Writable>> |
sequenceRecord(URI uri,
DataInputStream dataInputStream)
Load a sequence record from the given DataInputStream
Unlike
RecordReader.next() the internal state of the RecordReader is not modified
Implementations of this method should not close the DataInputStream |
close, doInitialize, getConf, getCurrentLabel, getLabels, hasNext, initialize, next, record, setConf, setLabelsclone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitgetLabels, hasNext, initialize, next, recordgetConf, setConfpublic static final String SKIP_NUM_LINES
public static final Charset DEFAULT_CHARSET
public static final RegexSequenceRecordReader.LineErrorHandling DEFAULT_ERROR_HANDLING
public static final org.slf4j.Logger LOG
public RegexSequenceRecordReader(String regex, int skipNumLines)
public RegexSequenceRecordReader(String regex, int skipNumLines, Charset encoding, RegexSequenceRecordReader.LineErrorHandling errorHandling)
public void initialize(Configuration conf, InputSplit split) throws IOException, InterruptedException
RecordReaderinitialize in interface RecordReaderinitialize in class FileRecordReaderconf - a configuration for initializationsplit - the split that defines the range of records to readIOExceptionInterruptedExceptionpublic Collection<Collection<Writable>> sequenceRecord()
SequenceRecordReadersequenceRecord in interface SequenceRecordReaderpublic Collection<Collection<Writable>> sequenceRecord(URI uri, DataInputStream dataInputStream) throws IOException
SequenceRecordReaderRecordReader.next() the internal state of the RecordReader is not modified
Implementations of this method should not close the DataInputStreamsequenceRecord in interface SequenceRecordReaderIOException - if error occurs during reading from the input streampublic void reset()
RecordReaderreset in interface RecordReaderreset in class FileRecordReaderCopyright © 2016. All rights reserved.