Package adams.data.io.input
Class TsvSpreadSheetReader
-
- All Implemented Interfaces:
AdditionalInformationHandler
,Destroyable
,ErrorProvider
,GlobalInfoSupporter
,EncodingSupporter
,FileFormatHandler
,LoggingLevelHandler
,LoggingSupporter
,LocaleSupporter
,OptionHandlingLocaleSupporter
,OptionHandler
,SizeOfHandler
,Stoppable
,StoppableWithFeedback
,ChunkedSpreadSheetReader
,MissingValueSpreadSheetReader
,NoHeaderSpreadSheetReader
,SpreadSheetReader
,DataRowTypeHandler
,SpreadSheetTypeHandler
,Serializable
public class TsvSpreadSheetReader extends SimpleCsvSpreadSheetReader
Reads TSV (tab-separated values) files.
It is possible to force columns to be text. In that case no intelligent parsing is attempted to determine the type of data a cell has.
For very large files, one can turn on chunking, which returns spreadsheet objects till all the data has been read.
-logging-level <OFF|SEVERE|WARNING|INFO|CONFIG|FINE|FINER|FINEST> (property: loggingLevel) The logging level for outputting errors and debugging output. default: WARNING
-missing <adams.core.base.BaseRegExp> (property: missingValue) The placeholder for missing values. default:
-encoding <adams.core.base.BaseCharset> (property: encoding) The type of encoding to use when reading using a reader, leave empty for default. default: Default
-quote-char <java.lang.String> (property: quoteCharacter) The character to use for surrounding text cells. default: \"
-separator <java.lang.String> (property: separator) The separator to use for the columns; use '\t' for tab. default: \\t
-trim <boolean> (property: trim) If enabled, the content of the cells gets trimmed before added. default: false
-text-columns <adams.core.Range> (property: textColumns) The range of columns to treat as text. default: example: A range is a comma-separated list of single 1-based indices or sub-ranges of indices ('start-end'); 'inv(...)' inverts the range '...'; the following placeholders can be used as well: first, second, third, last_2, last_1, last
-datetime-columns <adams.core.Range> (property: dateTimeColumns) The range of columns to treat as date/time msec. default: example: A range is a comma-separated list of single 1-based indices or sub-ranges of indices ('start-end'); 'inv(...)' inverts the range '...'; the following placeholders can be used as well: first, second, third, last_2, last_1, last
-datetime-format <adams.data.DateFormatString> (property: dateTimeFormat) The format for date/time msecs. default: yyyy-MM-dd HH:mm:ss more: http://docs.oracle.com/javase/6/docs/api/java/text/SimpleDateFormat.html
-datetime-lenient <boolean> (property: dateTimeLenient) Whether date/time msec parsing is lenient or not. default: false
-datetime-type <TIME|TIME_MSEC|DATE|DATE_TIME|DATE_TIME_MSEC> (property: dateTimeType) How to interpret the date/time data. default: DATE_TIME
-time-zone <java.util.TimeZone> (property: timeZone) The time zone to use for interpreting dates/times; default is the system-wide defined one.
-locale <java.util.Locale> (property: locale) The locale to use for parsing the numbers. default: Default
-no-header <boolean> (property: noHeader) If enabled, all rows get added as data rows and a dummy header will get inserted. default: false
-custom-column-headers <java.lang.String> (property: customColumnHeaders) The custom headers to use for the columns instead (comma-separated list); ignored if empty. default:
-chunk-size <int> (property: chunkSize) The maximum number of rows per chunk; using -1 will read put all data into a single spreadsheet object. default: -1 minimum: -1
- Author:
- FracPete (fracpete at waikato dot ac dot nz)
- See Also:
- Serialized Form
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class adams.data.io.input.AbstractSpreadSheetReader
AbstractSpreadSheetReader.InputType
-
-
Field Summary
-
Fields inherited from class adams.data.io.input.SimpleCsvSpreadSheetReader
m_ChunkSize, m_CustomColumnHeaders, m_DateTimeColumns, m_DateTimeFormat, m_DateTimeLenient, m_DateTimeType, m_Locale, m_NoHeader, m_QuoteCharacter, m_Reader, m_Separator, m_TextColumns, m_TimeZone, m_Trim
-
Fields inherited from class adams.data.io.input.AbstractSpreadSheetReaderWithMissingValueSupport
m_MissingValue
-
Fields inherited from class adams.data.io.input.AbstractSpreadSheetReader
m_DataRowType, m_Encoding, m_LastError, m_SpreadSheetType, m_Stopped, OPTION_INPUT, OPTION_OUTPUT
-
Fields inherited from class adams.core.option.AbstractOptionHandler
m_OptionManager
-
Fields inherited from class adams.core.logging.LoggingObject
m_Logger, m_LoggingIsEnabled, m_LoggingLevel
-
-
Constructor Summary
Constructors Constructor Description TsvSpreadSheetReader()
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description SpreadSheetWriter
getCorrespondingWriter()
Returns, if available, the corresponding writer.protected String
getDefaultSeparator()
Returns the default separator.String
getFormatDescription()
Returns a string describing the format (used in the file chooser).String[]
getFormatExtensions()
Returns the extension(s) of the format.String
globalInfo()
Returns a string describing the object.static void
main(String[] args)
Runs the reader from the command-line.-
Methods inherited from class adams.data.io.input.SimpleCsvSpreadSheetReader
chunkSizeTipText, customColumnHeadersTipText, dateTimeColumnsTipText, dateTimeFormatTipText, dateTimeLenientTipText, dateTimeTypeTipText, defineOptions, doRead, getChunkSize, getCustomColumnHeaders, getDateTimeColumns, getDateTimeFormat, getDateTimeType, getDefaultMissingValue, getInputType, getLastError, getLocale, getNoHeader, getQuoteCharacter, getSeparator, getTextColumns, getTimeZone, getTrim, hasLastError, hasMoreChunks, isDateTimeLenient, isStopped, localeTipText, nextChunk, noHeaderTipText, quoteCharacterTipText, separatorTipText, setChunkSize, setCustomColumnHeaders, setDateTimeColumns, setDateTimeFormat, setDateTimeLenient, setDateTimeType, setLastError, setLocale, setNoHeader, setQuoteCharacter, setSeparator, setTextColumns, setTimeZone, setTrim, stopExecution, supportsCompressedInput, textColumnsTipText, timeZoneTipText, trimTipText
-
Methods inherited from class adams.data.io.input.AbstractSpreadSheetReaderWithMissingValueSupport
getMissingValue, missingValueTipText, setMissingValue
-
Methods inherited from class adams.data.io.input.AbstractSpreadSheetReader
canDecompress, check, dataRowTypeTipText, doRead, doRead, encodingTipText, getAdditionalInformation, getDataRowType, getDefaultDataRowType, getDefaultFormatExtension, getDefaultSpreadSheet, getEncoding, getReaders, getSpreadSheetType, initialize, read, read, read, read, runReader, setDataRowType, setEncoding, setSpreadSheetType, spreadSheetTypeTipText
-
Methods inherited from class adams.core.option.AbstractOptionHandler
cleanUpOptions, destroy, finishInit, getDefaultLoggingLevel, getOptionManager, loggingLevelTipText, newOptionManager, reset, setLoggingLevel, toCommandLine, toString
-
Methods inherited from class adams.core.logging.LoggingObject
configureLogger, getLogger, getLoggingLevel, initializeLogging, isLoggingEnabled, sizeOf
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
-
Methods inherited from interface adams.core.Destroyable
destroy
-
Methods inherited from interface adams.core.logging.LoggingLevelHandler
getLoggingLevel
-
Methods inherited from interface adams.core.option.OptionHandler
cleanUpOptions, getOptionManager, toCommandLine
-
Methods inherited from interface adams.data.io.input.SpreadSheetReader
dataRowTypeTipText, getDataRowType, getDefaultFormatExtension, getSpreadSheetType, read, read, read, read, setDataRowType, setSpreadSheetType, spreadSheetTypeTipText
-
-
-
-
Method Detail
-
globalInfo
public String globalInfo()
Returns a string describing the object.- Specified by:
globalInfo
in interfaceGlobalInfoSupporter
- Overrides:
globalInfo
in classSimpleCsvSpreadSheetReader
- Returns:
- a description suitable for displaying in the gui
-
getDefaultSeparator
protected String getDefaultSeparator()
Returns the default separator.- Overrides:
getDefaultSeparator
in classSimpleCsvSpreadSheetReader
- Returns:
- the default
-
getFormatDescription
public String getFormatDescription()
Returns a string describing the format (used in the file chooser).- Specified by:
getFormatDescription
in interfaceFileFormatHandler
- Specified by:
getFormatDescription
in interfaceSpreadSheetReader
- Overrides:
getFormatDescription
in classSimpleCsvSpreadSheetReader
- Returns:
- a description suitable for displaying in the file chooser
-
getFormatExtensions
public String[] getFormatExtensions()
Returns the extension(s) of the format.- Specified by:
getFormatExtensions
in interfaceFileFormatHandler
- Specified by:
getFormatExtensions
in interfaceSpreadSheetReader
- Overrides:
getFormatExtensions
in classSimpleCsvSpreadSheetReader
- Returns:
- the extension (without the dot!)
-
getCorrespondingWriter
public SpreadSheetWriter getCorrespondingWriter()
Returns, if available, the corresponding writer.- Specified by:
getCorrespondingWriter
in interfaceSpreadSheetReader
- Overrides:
getCorrespondingWriter
in classSimpleCsvSpreadSheetReader
- Returns:
- the writer, null if none available
-
main
public static void main(String[] args)
Runs the reader from the command-line. Use the optionAbstractSpreadSheetReader.OPTION_INPUT
to specify the input file. If the optionAbstractSpreadSheetReader.OPTION_OUTPUT
is specified then the read sheet gets output as .csv files in that directory.- Parameters:
args
- the command-line options to use
-
-