Package adams.data.io.input
Class FastCsvSpreadSheetReader
-
- All Implemented Interfaces:
AdditionalInformationHandler,Destroyable,ErrorProvider,GlobalInfoSupporter,EncodingSupporter,FileFormatHandler,LoggingLevelHandler,LoggingSupporter,OptionHandler,SizeOfHandler,Stoppable,StoppableWithFeedback,ChunkedSpreadSheetReader,MissingValueSpreadSheetReader,NoHeaderSpreadSheetReader,SpreadSheetReader,WindowedSpreadSheetReader,DataRowTypeHandler,SpreadSheetTypeHandler,Serializable
public class FastCsvSpreadSheetReader extends AbstractSpreadSheetReaderWithMissingValueSupport implements WindowedSpreadSheetReader, NoHeaderSpreadSheetReader, ChunkedSpreadSheetReader
Simplified CSV spreadsheet reader for loading large files.- Author:
- FracPete (fracpete at waikato dot ac dot nz)
- See Also:
- Serialized Form
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static classFastCsvSpreadSheetReader.ChunkReaderReads CSV files chunk by chunk.-
Nested classes/interfaces inherited from class adams.data.io.input.AbstractSpreadSheetReader
AbstractSpreadSheetReader.InputType
-
-
Field Summary
Fields Modifier and Type Field Description protected intm_ChunkSizethe chunk size to use.protected Stringm_CustomColumnHeadersthe comma-separated list of column header names.protected intm_FirstRowthe first row to retrieve (1-based).protected booleanm_NoHeaderwhether the file has a header or not.protected Rangem_NumericColumnsthe columns to treat as numeric.protected intm_NumRowsthe number of rows to retrieve (less than 1 = unlimited).protected Stringm_QuoteCharacterthe quote character.protected FastCsvSpreadSheetReader.ChunkReaderm_Readerthe low-level reader.protected Stringm_Separatorthe column separator.protected booleanm_Trimwhether to trim the cell content.-
Fields inherited from class adams.data.io.input.AbstractSpreadSheetReaderWithMissingValueSupport
m_MissingValue
-
Fields inherited from class adams.data.io.input.AbstractSpreadSheetReader
m_DataRowType, m_Encoding, m_LastError, m_SpreadSheetType, m_Stopped, OPTION_INPUT, OPTION_OUTPUT
-
Fields inherited from class adams.core.option.AbstractOptionHandler
m_OptionManager
-
Fields inherited from class adams.core.logging.LoggingObject
m_Logger, m_LoggingIsEnabled, m_LoggingLevel
-
-
Constructor Summary
Constructors Constructor Description FastCsvSpreadSheetReader()
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description StringchunkSizeTipText()Returns the tip text for this property.StringcustomColumnHeadersTipText()Returns the tip text for this property.voiddefineOptions()Adds options to the internal list of options.protected SpreadSheetdoRead(Reader r)Reads the spreadsheet content from the specified file.StringfirstRowTipText()Returns the tip text for this property.intgetChunkSize()Returns the current chunk size.SpreadSheetWritergetCorrespondingWriter()Returns, if available, the corresponding writer.StringgetCustomColumnHeaders()Returns whether the file contains a header row or not.intgetFirstRow()Returns the first row to return.StringgetFormatDescription()Returns a string describing the format (used in the file chooser).String[]getFormatExtensions()Returns the extension(s) of the format.protected AbstractSpreadSheetReader.InputTypegetInputType()Returns how to read the data, from a file, stream or reader.booleangetNoHeader()Returns whether the file contains a header row or not.RangegetNumericColumns()Returns the range of columns to treat as numeric.intgetNumRows()Returns the number of data rows to return.StringgetQuoteCharacter()Returns the string used for surrounding text.StringgetSeparator()Returns the string used as separator for the columns, '\t' for tab.booleangetTrim()Returns whether to trim the cell content.StringglobalInfo()Returns a string describing the object.booleanhasMoreChunks()Checks whether there is more data to read.static voidmain(String[] args)Runs the reader from the command-line.SpreadSheetnextChunk()Returns the next chunk.StringnoHeaderTipText()Returns the tip text for this property.StringnumericColumnsTipText()Returns the tip text for this property.StringnumRowsTipText()Returns the tip text for this property.StringquoteCharacterTipText()Returns the tip text for this property.StringseparatorTipText()Returns the tip text for this property.voidsetChunkSize(int value)Sets the maximum chunk size.voidsetCustomColumnHeaders(String value)Sets the custom headers to use.voidsetFirstRow(int value)Sets the first row to return.voidsetNoHeader(boolean value)Sets whether the file contains a header row or not.voidsetNumericColumns(Range value)Sets the range of columns to treat as numeric.voidsetNumRows(int value)Sets the number of data rows to return.voidsetQuoteCharacter(String value)Sets the character used for surrounding text.voidsetSeparator(String value)Sets the string to use as separator for the columns, use '\t' for tab.voidsetTrim(boolean value)Sets whether to trim the cell content.protected booleansupportsCompressedInput()Returns whether to automatically handle gzip compressed files (AbstractSpreadSheetReader.InputType.READER,AbstractSpreadSheetReader.InputType.STREAM).StringtrimTipText()Returns the tip text for this property.-
Methods inherited from class adams.data.io.input.AbstractSpreadSheetReaderWithMissingValueSupport
getDefaultMissingValue, getMissingValue, missingValueTipText, setMissingValue
-
Methods inherited from class adams.data.io.input.AbstractSpreadSheetReader
canDecompress, check, dataRowTypeTipText, doRead, doRead, encodingTipText, getAdditionalInformation, getDataRowType, getDefaultDataRowType, getDefaultFormatExtension, getDefaultSpreadSheet, getEncoding, getLastError, getReaders, getSpreadSheetType, hasLastError, initialize, isStopped, read, read, read, read, runReader, setDataRowType, setEncoding, setLastError, setSpreadSheetType, spreadSheetTypeTipText, stopExecution
-
Methods inherited from class adams.core.option.AbstractOptionHandler
cleanUpOptions, destroy, finishInit, getDefaultLoggingLevel, getOptionManager, loggingLevelTipText, newOptionManager, reset, setLoggingLevel, toCommandLine, toString
-
Methods inherited from class adams.core.logging.LoggingObject
configureLogger, getLogger, getLoggingLevel, initializeLogging, isLoggingEnabled, sizeOf
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
-
Methods inherited from interface adams.core.Destroyable
destroy
-
Methods inherited from interface adams.core.logging.LoggingLevelHandler
getLoggingLevel
-
Methods inherited from interface adams.core.option.OptionHandler
cleanUpOptions, getOptionManager, toCommandLine
-
Methods inherited from interface adams.data.io.input.SpreadSheetReader
dataRowTypeTipText, getDataRowType, getDefaultFormatExtension, getLastError, getSpreadSheetType, hasLastError, isStopped, read, read, read, read, setDataRowType, setSpreadSheetType, spreadSheetTypeTipText, stopExecution
-
-
-
-
Field Detail
-
m_QuoteCharacter
protected String m_QuoteCharacter
the quote character.
-
m_Separator
protected String m_Separator
the column separator.
-
m_NumericColumns
protected Range m_NumericColumns
the columns to treat as numeric.
-
m_Trim
protected boolean m_Trim
whether to trim the cell content.
-
m_NoHeader
protected boolean m_NoHeader
whether the file has a header or not.
-
m_CustomColumnHeaders
protected String m_CustomColumnHeaders
the comma-separated list of column header names.
-
m_FirstRow
protected int m_FirstRow
the first row to retrieve (1-based).
-
m_NumRows
protected int m_NumRows
the number of rows to retrieve (less than 1 = unlimited).
-
m_ChunkSize
protected int m_ChunkSize
the chunk size to use.
-
m_Reader
protected FastCsvSpreadSheetReader.ChunkReader m_Reader
the low-level reader.
-
-
Method Detail
-
globalInfo
public String globalInfo()
Returns a string describing the object.- Specified by:
globalInfoin interfaceGlobalInfoSupporter- Specified by:
globalInfoin classAbstractOptionHandler- Returns:
- a description suitable for displaying in the gui
-
defineOptions
public void defineOptions()
Adds options to the internal list of options.- Specified by:
defineOptionsin interfaceOptionHandler- Overrides:
defineOptionsin classAbstractSpreadSheetReaderWithMissingValueSupport
-
setQuoteCharacter
public void setQuoteCharacter(String value)
Sets the character used for surrounding text.- Parameters:
value- the quote character
-
getQuoteCharacter
public String getQuoteCharacter()
Returns the string used for surrounding text.- Returns:
- the quote character
-
quoteCharacterTipText
public String quoteCharacterTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
setSeparator
public void setSeparator(String value)
Sets the string to use as separator for the columns, use '\t' for tab.- Parameters:
value- the separator
-
getSeparator
public String getSeparator()
Returns the string used as separator for the columns, '\t' for tab.- Returns:
- the separator
-
separatorTipText
public String separatorTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
setNumericColumns
public void setNumericColumns(Range value)
Sets the range of columns to treat as numeric.- Parameters:
value- the range
-
getNumericColumns
public Range getNumericColumns()
Returns the range of columns to treat as numeric.- Returns:
- the range
-
numericColumnsTipText
public String numericColumnsTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the gui
-
setTrim
public void setTrim(boolean value)
Sets whether to trim the cell content.- Parameters:
value- if true the content gets trimmed
-
getTrim
public boolean getTrim()
Returns whether to trim the cell content.- Returns:
- true if to trim content
-
trimTipText
public String trimTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the gui
-
setFirstRow
public void setFirstRow(int value)
Sets the first row to return.- Specified by:
setFirstRowin interfaceWindowedSpreadSheetReader- Parameters:
value- the first row (1-based), greater than 0
-
getFirstRow
public int getFirstRow()
Returns the first row to return.- Specified by:
getFirstRowin interfaceWindowedSpreadSheetReader- Returns:
- the first row (1-based), greater than 0
-
firstRowTipText
public String firstRowTipText()
Returns the tip text for this property.- Specified by:
firstRowTipTextin interfaceWindowedSpreadSheetReader- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
setNumRows
public void setNumRows(int value)
Sets the number of data rows to return.- Specified by:
setNumRowsin interfaceWindowedSpreadSheetReader- Parameters:
value- the number of rows, -1 for unlimited
-
getNumRows
public int getNumRows()
Returns the number of data rows to return.- Specified by:
getNumRowsin interfaceWindowedSpreadSheetReader- Returns:
- the number of rows, -1 for unlimited
-
numRowsTipText
public String numRowsTipText()
Returns the tip text for this property.- Specified by:
numRowsTipTextin interfaceWindowedSpreadSheetReader- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
setNoHeader
public void setNoHeader(boolean value)
Sets whether the file contains a header row or not.- Specified by:
setNoHeaderin interfaceNoHeaderSpreadSheetReader- Parameters:
value- true if no header row available
-
getNoHeader
public boolean getNoHeader()
Returns whether the file contains a header row or not.- Specified by:
getNoHeaderin interfaceNoHeaderSpreadSheetReader- Returns:
- true if no header row available
-
noHeaderTipText
public String noHeaderTipText()
Returns the tip text for this property.- Specified by:
noHeaderTipTextin interfaceNoHeaderSpreadSheetReader- Returns:
- tip text for this property suitable for displaying in the gui
-
setCustomColumnHeaders
public void setCustomColumnHeaders(String value)
Sets the custom headers to use.- Specified by:
setCustomColumnHeadersin interfaceNoHeaderSpreadSheetReader- Parameters:
value- the comma-separated list
-
getCustomColumnHeaders
public String getCustomColumnHeaders()
Returns whether the file contains a header row or not.- Specified by:
getCustomColumnHeadersin interfaceNoHeaderSpreadSheetReader- Returns:
- the comma-separated list
-
customColumnHeadersTipText
public String customColumnHeadersTipText()
Returns the tip text for this property.- Specified by:
customColumnHeadersTipTextin interfaceNoHeaderSpreadSheetReader- Returns:
- tip text for this property suitable for displaying in the gui
-
setChunkSize
public void setChunkSize(int value)
Sets the maximum chunk size.- Specified by:
setChunkSizein interfaceChunkedSpreadSheetReader- Parameters:
value- the size of the chunks, < 1 denotes infinity
-
getChunkSize
public int getChunkSize()
Returns the current chunk size.- Specified by:
getChunkSizein interfaceChunkedSpreadSheetReader- Returns:
- the size of the chunks, < 1 denotes infinity
-
chunkSizeTipText
public String chunkSizeTipText()
Returns the tip text for this property.- Specified by:
chunkSizeTipTextin interfaceChunkedSpreadSheetReader- Returns:
- tip text for this property suitable for displaying in the gui
-
getFormatDescription
public String getFormatDescription()
Returns a string describing the format (used in the file chooser).- Specified by:
getFormatDescriptionin interfaceFileFormatHandler- Specified by:
getFormatDescriptionin interfaceSpreadSheetReader- Specified by:
getFormatDescriptionin classAbstractSpreadSheetReader- Returns:
- a description suitable for displaying in the file chooser
-
getFormatExtensions
public String[] getFormatExtensions()
Returns the extension(s) of the format.- Specified by:
getFormatExtensionsin interfaceFileFormatHandler- Specified by:
getFormatExtensionsin interfaceSpreadSheetReader- Specified by:
getFormatExtensionsin classAbstractSpreadSheetReader- Returns:
- the extension (without the dot!)
-
getCorrespondingWriter
public SpreadSheetWriter getCorrespondingWriter()
Returns, if available, the corresponding writer.- Specified by:
getCorrespondingWriterin interfaceSpreadSheetReader- Returns:
- the writer, null if none available
-
getInputType
protected AbstractSpreadSheetReader.InputType getInputType()
Returns how to read the data, from a file, stream or reader.- Specified by:
getInputTypein classAbstractSpreadSheetReader- Returns:
- how to read the data
-
supportsCompressedInput
protected boolean supportsCompressedInput()
Returns whether to automatically handle gzip compressed files (AbstractSpreadSheetReader.InputType.READER,AbstractSpreadSheetReader.InputType.STREAM).- Overrides:
supportsCompressedInputin classAbstractSpreadSheetReader- Returns:
- true if to automatically decompress
-
doRead
protected SpreadSheet doRead(Reader r)
Reads the spreadsheet content from the specified file.- Overrides:
doReadin classAbstractSpreadSheetReader- Parameters:
r- the reader to read from- Returns:
- the spreadsheet or null in case of an error
- See Also:
AbstractSpreadSheetReader.getInputType()
-
hasMoreChunks
public boolean hasMoreChunks()
Checks whether there is more data to read.- Specified by:
hasMoreChunksin interfaceChunkedSpreadSheetReader- Returns:
- true if there is more data available
-
nextChunk
public SpreadSheet nextChunk()
Returns the next chunk.- Specified by:
nextChunkin interfaceChunkedSpreadSheetReader- Returns:
- the next chunk, null if no data available
-
main
public static void main(String[] args)
Runs the reader from the command-line. Use the optionAbstractSpreadSheetReader.OPTION_INPUTto specify the input file. If the optionAbstractSpreadSheetReader.OPTION_OUTPUTis specified then the read sheet gets output as .csv files in that directory.- Parameters:
args- the command-line options to use
-
-