Package adams.data.io.input
Class FastCsvSpreadSheetReader
-
- All Implemented Interfaces:
AdditionalInformationHandler
,Destroyable
,ErrorProvider
,GlobalInfoSupporter
,EncodingSupporter
,FileFormatHandler
,LoggingLevelHandler
,LoggingSupporter
,OptionHandler
,SizeOfHandler
,Stoppable
,StoppableWithFeedback
,ChunkedSpreadSheetReader
,MissingValueSpreadSheetReader
,NoHeaderSpreadSheetReader
,SpreadSheetReader
,WindowedSpreadSheetReader
,DataRowTypeHandler
,SpreadSheetTypeHandler
,Serializable
public class FastCsvSpreadSheetReader extends AbstractSpreadSheetReaderWithMissingValueSupport implements WindowedSpreadSheetReader, NoHeaderSpreadSheetReader, ChunkedSpreadSheetReader
Simplified CSV spreadsheet reader for loading large files.- Author:
- FracPete (fracpete at waikato dot ac dot nz)
- See Also:
- Serialized Form
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
FastCsvSpreadSheetReader.ChunkReader
Reads CSV files chunk by chunk.-
Nested classes/interfaces inherited from class adams.data.io.input.AbstractSpreadSheetReader
AbstractSpreadSheetReader.InputType
-
-
Field Summary
Fields Modifier and Type Field Description protected int
m_ChunkSize
the chunk size to use.protected String
m_CustomColumnHeaders
the comma-separated list of column header names.protected int
m_FirstRow
the first row to retrieve (1-based).protected boolean
m_NoHeader
whether the file has a header or not.protected Range
m_NumericColumns
the columns to treat as numeric.protected int
m_NumRows
the number of rows to retrieve (less than 1 = unlimited).protected String
m_QuoteCharacter
the quote character.protected FastCsvSpreadSheetReader.ChunkReader
m_Reader
the low-level reader.protected String
m_Separator
the column separator.protected boolean
m_Trim
whether to trim the cell content.-
Fields inherited from class adams.data.io.input.AbstractSpreadSheetReaderWithMissingValueSupport
m_MissingValue
-
Fields inherited from class adams.data.io.input.AbstractSpreadSheetReader
m_DataRowType, m_Encoding, m_LastError, m_SpreadSheetType, m_Stopped, OPTION_INPUT, OPTION_OUTPUT
-
Fields inherited from class adams.core.option.AbstractOptionHandler
m_OptionManager
-
Fields inherited from class adams.core.logging.LoggingObject
m_Logger, m_LoggingIsEnabled, m_LoggingLevel
-
-
Constructor Summary
Constructors Constructor Description FastCsvSpreadSheetReader()
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description String
chunkSizeTipText()
Returns the tip text for this property.String
customColumnHeadersTipText()
Returns the tip text for this property.void
defineOptions()
Adds options to the internal list of options.protected SpreadSheet
doRead(Reader r)
Reads the spreadsheet content from the specified file.String
firstRowTipText()
Returns the tip text for this property.int
getChunkSize()
Returns the current chunk size.SpreadSheetWriter
getCorrespondingWriter()
Returns, if available, the corresponding writer.String
getCustomColumnHeaders()
Returns whether the file contains a header row or not.int
getFirstRow()
Returns the first row to return.String
getFormatDescription()
Returns a string describing the format (used in the file chooser).String[]
getFormatExtensions()
Returns the extension(s) of the format.protected AbstractSpreadSheetReader.InputType
getInputType()
Returns how to read the data, from a file, stream or reader.boolean
getNoHeader()
Returns whether the file contains a header row or not.Range
getNumericColumns()
Returns the range of columns to treat as numeric.int
getNumRows()
Returns the number of data rows to return.String
getQuoteCharacter()
Returns the string used for surrounding text.String
getSeparator()
Returns the string used as separator for the columns, '\t' for tab.boolean
getTrim()
Returns whether to trim the cell content.String
globalInfo()
Returns a string describing the object.boolean
hasMoreChunks()
Checks whether there is more data to read.static void
main(String[] args)
Runs the reader from the command-line.SpreadSheet
nextChunk()
Returns the next chunk.String
noHeaderTipText()
Returns the tip text for this property.String
numericColumnsTipText()
Returns the tip text for this property.String
numRowsTipText()
Returns the tip text for this property.String
quoteCharacterTipText()
Returns the tip text for this property.String
separatorTipText()
Returns the tip text for this property.void
setChunkSize(int value)
Sets the maximum chunk size.void
setCustomColumnHeaders(String value)
Sets the custom headers to use.void
setFirstRow(int value)
Sets the first row to return.void
setNoHeader(boolean value)
Sets whether the file contains a header row or not.void
setNumericColumns(Range value)
Sets the range of columns to treat as numeric.void
setNumRows(int value)
Sets the number of data rows to return.void
setQuoteCharacter(String value)
Sets the character used for surrounding text.void
setSeparator(String value)
Sets the string to use as separator for the columns, use '\t' for tab.void
setTrim(boolean value)
Sets whether to trim the cell content.protected boolean
supportsCompressedInput()
Returns whether to automatically handle gzip compressed files (AbstractSpreadSheetReader.InputType.READER
,AbstractSpreadSheetReader.InputType.STREAM
).String
trimTipText()
Returns the tip text for this property.-
Methods inherited from class adams.data.io.input.AbstractSpreadSheetReaderWithMissingValueSupport
getDefaultMissingValue, getMissingValue, missingValueTipText, setMissingValue
-
Methods inherited from class adams.data.io.input.AbstractSpreadSheetReader
canDecompress, check, dataRowTypeTipText, doRead, doRead, encodingTipText, getAdditionalInformation, getDataRowType, getDefaultDataRowType, getDefaultFormatExtension, getDefaultSpreadSheet, getEncoding, getLastError, getReaders, getSpreadSheetType, hasLastError, initialize, isStopped, read, read, read, read, runReader, setDataRowType, setEncoding, setLastError, setSpreadSheetType, spreadSheetTypeTipText, stopExecution
-
Methods inherited from class adams.core.option.AbstractOptionHandler
cleanUpOptions, destroy, finishInit, getDefaultLoggingLevel, getOptionManager, loggingLevelTipText, newOptionManager, reset, setLoggingLevel, toCommandLine, toString
-
Methods inherited from class adams.core.logging.LoggingObject
configureLogger, getLogger, getLoggingLevel, initializeLogging, isLoggingEnabled, sizeOf
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
-
Methods inherited from interface adams.core.Destroyable
destroy
-
Methods inherited from interface adams.core.logging.LoggingLevelHandler
getLoggingLevel
-
Methods inherited from interface adams.core.option.OptionHandler
cleanUpOptions, getOptionManager, toCommandLine
-
Methods inherited from interface adams.data.io.input.SpreadSheetReader
dataRowTypeTipText, getDataRowType, getDefaultFormatExtension, getLastError, getSpreadSheetType, hasLastError, isStopped, read, read, read, read, setDataRowType, setSpreadSheetType, spreadSheetTypeTipText, stopExecution
-
-
-
-
Field Detail
-
m_QuoteCharacter
protected String m_QuoteCharacter
the quote character.
-
m_Separator
protected String m_Separator
the column separator.
-
m_NumericColumns
protected Range m_NumericColumns
the columns to treat as numeric.
-
m_Trim
protected boolean m_Trim
whether to trim the cell content.
-
m_NoHeader
protected boolean m_NoHeader
whether the file has a header or not.
-
m_CustomColumnHeaders
protected String m_CustomColumnHeaders
the comma-separated list of column header names.
-
m_FirstRow
protected int m_FirstRow
the first row to retrieve (1-based).
-
m_NumRows
protected int m_NumRows
the number of rows to retrieve (less than 1 = unlimited).
-
m_ChunkSize
protected int m_ChunkSize
the chunk size to use.
-
m_Reader
protected FastCsvSpreadSheetReader.ChunkReader m_Reader
the low-level reader.
-
-
Method Detail
-
globalInfo
public String globalInfo()
Returns a string describing the object.- Specified by:
globalInfo
in interfaceGlobalInfoSupporter
- Specified by:
globalInfo
in classAbstractOptionHandler
- Returns:
- a description suitable for displaying in the gui
-
defineOptions
public void defineOptions()
Adds options to the internal list of options.- Specified by:
defineOptions
in interfaceOptionHandler
- Overrides:
defineOptions
in classAbstractSpreadSheetReaderWithMissingValueSupport
-
setQuoteCharacter
public void setQuoteCharacter(String value)
Sets the character used for surrounding text.- Parameters:
value
- the quote character
-
getQuoteCharacter
public String getQuoteCharacter()
Returns the string used for surrounding text.- Returns:
- the quote character
-
quoteCharacterTipText
public String quoteCharacterTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
setSeparator
public void setSeparator(String value)
Sets the string to use as separator for the columns, use '\t' for tab.- Parameters:
value
- the separator
-
getSeparator
public String getSeparator()
Returns the string used as separator for the columns, '\t' for tab.- Returns:
- the separator
-
separatorTipText
public String separatorTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
setNumericColumns
public void setNumericColumns(Range value)
Sets the range of columns to treat as numeric.- Parameters:
value
- the range
-
getNumericColumns
public Range getNumericColumns()
Returns the range of columns to treat as numeric.- Returns:
- the range
-
numericColumnsTipText
public String numericColumnsTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the gui
-
setTrim
public void setTrim(boolean value)
Sets whether to trim the cell content.- Parameters:
value
- if true the content gets trimmed
-
getTrim
public boolean getTrim()
Returns whether to trim the cell content.- Returns:
- true if to trim content
-
trimTipText
public String trimTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the gui
-
setFirstRow
public void setFirstRow(int value)
Sets the first row to return.- Specified by:
setFirstRow
in interfaceWindowedSpreadSheetReader
- Parameters:
value
- the first row (1-based), greater than 0
-
getFirstRow
public int getFirstRow()
Returns the first row to return.- Specified by:
getFirstRow
in interfaceWindowedSpreadSheetReader
- Returns:
- the first row (1-based), greater than 0
-
firstRowTipText
public String firstRowTipText()
Returns the tip text for this property.- Specified by:
firstRowTipText
in interfaceWindowedSpreadSheetReader
- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
setNumRows
public void setNumRows(int value)
Sets the number of data rows to return.- Specified by:
setNumRows
in interfaceWindowedSpreadSheetReader
- Parameters:
value
- the number of rows, -1 for unlimited
-
getNumRows
public int getNumRows()
Returns the number of data rows to return.- Specified by:
getNumRows
in interfaceWindowedSpreadSheetReader
- Returns:
- the number of rows, -1 for unlimited
-
numRowsTipText
public String numRowsTipText()
Returns the tip text for this property.- Specified by:
numRowsTipText
in interfaceWindowedSpreadSheetReader
- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
setNoHeader
public void setNoHeader(boolean value)
Sets whether the file contains a header row or not.- Specified by:
setNoHeader
in interfaceNoHeaderSpreadSheetReader
- Parameters:
value
- true if no header row available
-
getNoHeader
public boolean getNoHeader()
Returns whether the file contains a header row or not.- Specified by:
getNoHeader
in interfaceNoHeaderSpreadSheetReader
- Returns:
- true if no header row available
-
noHeaderTipText
public String noHeaderTipText()
Returns the tip text for this property.- Specified by:
noHeaderTipText
in interfaceNoHeaderSpreadSheetReader
- Returns:
- tip text for this property suitable for displaying in the gui
-
setCustomColumnHeaders
public void setCustomColumnHeaders(String value)
Sets the custom headers to use.- Specified by:
setCustomColumnHeaders
in interfaceNoHeaderSpreadSheetReader
- Parameters:
value
- the comma-separated list
-
getCustomColumnHeaders
public String getCustomColumnHeaders()
Returns whether the file contains a header row or not.- Specified by:
getCustomColumnHeaders
in interfaceNoHeaderSpreadSheetReader
- Returns:
- the comma-separated list
-
customColumnHeadersTipText
public String customColumnHeadersTipText()
Returns the tip text for this property.- Specified by:
customColumnHeadersTipText
in interfaceNoHeaderSpreadSheetReader
- Returns:
- tip text for this property suitable for displaying in the gui
-
setChunkSize
public void setChunkSize(int value)
Sets the maximum chunk size.- Specified by:
setChunkSize
in interfaceChunkedSpreadSheetReader
- Parameters:
value
- the size of the chunks, < 1 denotes infinity
-
getChunkSize
public int getChunkSize()
Returns the current chunk size.- Specified by:
getChunkSize
in interfaceChunkedSpreadSheetReader
- Returns:
- the size of the chunks, < 1 denotes infinity
-
chunkSizeTipText
public String chunkSizeTipText()
Returns the tip text for this property.- Specified by:
chunkSizeTipText
in interfaceChunkedSpreadSheetReader
- Returns:
- tip text for this property suitable for displaying in the gui
-
getFormatDescription
public String getFormatDescription()
Returns a string describing the format (used in the file chooser).- Specified by:
getFormatDescription
in interfaceFileFormatHandler
- Specified by:
getFormatDescription
in interfaceSpreadSheetReader
- Specified by:
getFormatDescription
in classAbstractSpreadSheetReader
- Returns:
- a description suitable for displaying in the file chooser
-
getFormatExtensions
public String[] getFormatExtensions()
Returns the extension(s) of the format.- Specified by:
getFormatExtensions
in interfaceFileFormatHandler
- Specified by:
getFormatExtensions
in interfaceSpreadSheetReader
- Specified by:
getFormatExtensions
in classAbstractSpreadSheetReader
- Returns:
- the extension (without the dot!)
-
getCorrespondingWriter
public SpreadSheetWriter getCorrespondingWriter()
Returns, if available, the corresponding writer.- Specified by:
getCorrespondingWriter
in interfaceSpreadSheetReader
- Returns:
- the writer, null if none available
-
getInputType
protected AbstractSpreadSheetReader.InputType getInputType()
Returns how to read the data, from a file, stream or reader.- Specified by:
getInputType
in classAbstractSpreadSheetReader
- Returns:
- how to read the data
-
supportsCompressedInput
protected boolean supportsCompressedInput()
Returns whether to automatically handle gzip compressed files (AbstractSpreadSheetReader.InputType.READER
,AbstractSpreadSheetReader.InputType.STREAM
).- Overrides:
supportsCompressedInput
in classAbstractSpreadSheetReader
- Returns:
- true if to automatically decompress
-
doRead
protected SpreadSheet doRead(Reader r)
Reads the spreadsheet content from the specified file.- Overrides:
doRead
in classAbstractSpreadSheetReader
- Parameters:
r
- the reader to read from- Returns:
- the spreadsheet or null in case of an error
- See Also:
AbstractSpreadSheetReader.getInputType()
-
hasMoreChunks
public boolean hasMoreChunks()
Checks whether there is more data to read.- Specified by:
hasMoreChunks
in interfaceChunkedSpreadSheetReader
- Returns:
- true if there is more data available
-
nextChunk
public SpreadSheet nextChunk()
Returns the next chunk.- Specified by:
nextChunk
in interfaceChunkedSpreadSheetReader
- Returns:
- the next chunk, null if no data available
-
main
public static void main(String[] args)
Runs the reader from the command-line. Use the optionAbstractSpreadSheetReader.OPTION_INPUT
to specify the input file. If the optionAbstractSpreadSheetReader.OPTION_OUTPUT
is specified then the read sheet gets output as .csv files in that directory.- Parameters:
args
- the command-line options to use
-
-