Package adams.data.io.input
Class SimpleCsvSpreadSheetReader
-
- All Implemented Interfaces:
AdditionalInformationHandler,Destroyable,ErrorProvider,GlobalInfoSupporter,EncodingSupporter,FileFormatHandler,LoggingLevelHandler,LoggingSupporter,LocaleSupporter,OptionHandlingLocaleSupporter,OptionHandler,SizeOfHandler,Stoppable,StoppableWithFeedback,ChunkedSpreadSheetReader,MissingValueSpreadSheetReader,NoHeaderSpreadSheetReader,SpreadSheetReader,DataRowTypeHandler,SpreadSheetTypeHandler,Serializable
- Direct Known Subclasses:
TsvSpreadSheetReader
public class SimpleCsvSpreadSheetReader extends AbstractSpreadSheetReaderWithMissingValueSupport implements ChunkedSpreadSheetReader, OptionHandlingLocaleSupporter, NoHeaderSpreadSheetReader
Reads CSV files.
It is possible to force columns to be text. In that case no intelligent parsing is attempted to determine the type of data a cell has.
For very large files, one can turn on chunking, which returns spreadsheet objects till all the data has been read.
-logging-level <OFF|SEVERE|WARNING|INFO|CONFIG|FINE|FINER|FINEST> (property: loggingLevel) The logging level for outputting errors and debugging output. default: WARNING
-missing <java.lang.String> (property: missingValue) The placeholder for missing values. default:
-encoding <adams.core.base.BaseCharset> (property: encoding) The type of encoding to use when reading using a reader, leave empty for default. default: Default
-quote-char <java.lang.String> (property: quoteCharacter) The character to use for surrounding text cells. default: \"
-separator <java.lang.String> (property: separator) The separator to use for the columns; use '\t' for tab. default: ,
-trim <boolean> (property: trim) If enabled, the content of the cells gets trimmed before added. default: false
-text-columns <adams.core.Range> (property: textColumns) The range of columns to treat as text. default: example: A range is a comma-separated list of single 1-based indices or sub-ranges of indices ('start-end'); 'inv(...)' inverts the range '...'; the following placeholders can be used as well: first, second, third, last_2, last_1, last-datetime-columns <adams.core.Range> (property: dateTimeColumns) The range of columns to treat as date/time msec. default: example: A range is a comma-separated list of single 1-based indices or sub-ranges of indices ('start-end'); 'inv(...)' inverts the range '...'; the following placeholders can be used as well: first, second, third, last_2, last_1, last-datetime-format <adams.data.DateFormatString> (property: dateTimeFormat) The format for date/time msecs. default: yyyy-MM-dd HH:mm:ss more: http://docs.oracle.com/javase/6/docs/api/java/text/SimpleDateFormat.html
-datetime-lenient <boolean> (property: dateTimeLenient) Whether date/time msec parsing is lenient or not. default: false
-datetime-type <TIME|TIME_MSEC|DATE|DATE_TIME|DATE_TIME_MSEC> (property: dateTimeType) How to interpret the date/time data. default: DATE_TIME
-time-zone <java.util.TimeZone> (property: timeZone) The time zone to use for interpreting dates/times; default is the system-wide defined one.
-locale <java.util.Locale> (property: locale) The locale to use for parsing the numbers. default: Default
-no-header <boolean> (property: noHeader) If enabled, all rows get added as data rows and a dummy header will get inserted. default: false
-custom-column-headers <java.lang.String> (property: customColumnHeaders) The custom headers to use for the columns instead (comma-separated list); ignored if empty. default:
-chunk-size <int> (property: chunkSize) The maximum number of rows per chunk; using -1 will read put all data into a single spreadsheet object. default: -1 minimum: -1
- Author:
- FracPete (fracpete at waikato dot ac dot nz)
- See Also:
- Serialized Form
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class adams.data.io.input.AbstractSpreadSheetReader
AbstractSpreadSheetReader.InputType
-
-
Field Summary
Fields Modifier and Type Field Description protected intm_ChunkSizethe chunk size to use.protected Stringm_CustomColumnHeadersthe comma-separated list of column header names.protected Rangem_DateTimeColumnsthe columns to treat as date/time.protected DateFormatStringm_DateTimeFormatthe format string for the date/times.protected booleanm_DateTimeLenientwhether date/time parsing is lenient.protected BasicDateTimeTypem_DateTimeTypethe type of date/time.protected Localem_Localethe locale to use.protected booleanm_NoHeaderwhether the file has a header or not.protected Stringm_QuoteCharacterthe quote character.protected CsvSpreadSheetReaderm_Readerthe actual reader.protected Stringm_Separatorthe column separator.protected Rangem_TextColumnsthe columns to treat as text.protected TimeZonem_TimeZonethe timezone to use.protected booleanm_Trimwhether to trim the cells.-
Fields inherited from class adams.data.io.input.AbstractSpreadSheetReaderWithMissingValueSupport
m_MissingValue
-
Fields inherited from class adams.data.io.input.AbstractSpreadSheetReader
m_DataRowType, m_Encoding, m_LastError, m_SpreadSheetType, m_Stopped, OPTION_INPUT, OPTION_OUTPUT
-
Fields inherited from class adams.core.option.AbstractOptionHandler
m_OptionManager
-
Fields inherited from class adams.core.logging.LoggingObject
m_Logger, m_LoggingIsEnabled, m_LoggingLevel
-
-
Constructor Summary
Constructors Constructor Description SimpleCsvSpreadSheetReader()
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description StringchunkSizeTipText()Returns the tip text for this property.StringcustomColumnHeadersTipText()Returns the tip text for this property.StringdateTimeColumnsTipText()Returns the tip date for this property.StringdateTimeFormatTipText()Returns the tip date/time for this property.StringdateTimeLenientTipText()Returns the tip text for this property.StringdateTimeTypeTipText()Returns the tip date/time for this property.voiddefineOptions()Adds options to the internal list of options.protected SpreadSheetdoRead(Reader r)Reads the spreadsheet content from the specified file.intgetChunkSize()Returns the current chunk size.SpreadSheetWritergetCorrespondingWriter()Returns, if available, the corresponding writer.StringgetCustomColumnHeaders()Returns whether the file contains a header row or not.RangegetDateTimeColumns()Returns the range of columns to treat as date/time msec.DateFormatStringgetDateTimeFormat()Returns the format for date/time msec columns.BasicDateTimeTypegetDateTimeType()Returns the type for date/time columns.protected BaseRegExpgetDefaultMissingValue()Returns the default missing value to use.protected StringgetDefaultSeparator()Returns the default separator.StringgetFormatDescription()Returns a string describing the format (used in the file chooser).String[]getFormatExtensions()Returns the extension(s) of the format.protected AbstractSpreadSheetReader.InputTypegetInputType()Returns how to read the data, from a file, stream or reader.StringgetLastError()Returns the error that occurred during the last read.LocalegetLocale()Returns the locale in use.booleangetNoHeader()Returns whether the file contains a header row or not.StringgetQuoteCharacter()Returns the string used as separator for the columns, '\t' for tab.StringgetSeparator()Returns the string used as separator for the columns, '\t' for tab.RangegetTextColumns()Returns the range of columns to treat as text.TimeZonegetTimeZone()Returns the time zone in use.booleangetTrim()Returns whether to trim the cell content.StringglobalInfo()Returns a string describing the object.booleanhasLastError()Returns whether an error was encountered during the last read.booleanhasMoreChunks()Checks whether there is more data to read.booleanisDateTimeLenient()Returns whether the parsing of date/time msecs is lenient or not.booleanisStopped()Returns whether the reading was stopped.StringlocaleTipText()Returns the tip text for this property.static voidmain(String[] args)Runs the reader from the command-line.SpreadSheetnextChunk()Returns the next chunk.StringnoHeaderTipText()Returns the tip text for this property.StringquoteCharacterTipText()Returns the tip text for this property.StringseparatorTipText()Returns the tip text for this property.voidsetChunkSize(int value)Sets the maximum chunk size.voidsetCustomColumnHeaders(String value)Sets the custom headers to use.voidsetDateTimeColumns(Range value)Sets the range of columns to treat as date/time msec.voidsetDateTimeFormat(DateFormatString value)Sets the format for date/time msec columns.voidsetDateTimeLenient(boolean value)Sets whether parsing of date/time msecs is to be lenient or not.voidsetDateTimeType(BasicDateTimeType value)Sets the type for date/time columns.protected voidsetLastError(String value)Sets the value for the last error that occurred during read.voidsetLocale(Locale value)Sets the locale to use.voidsetNoHeader(boolean value)Sets whether the file contains a header row or not.voidsetQuoteCharacter(String value)Sets the character used for surrounding text.voidsetSeparator(String value)Sets the string to use as separator for the columns, use '\t' for tab.voidsetTextColumns(Range value)Sets the range of columns to treat as text.voidsetTimeZone(TimeZone value)Sets the time zone to use.voidsetTrim(boolean value)Sets whether to trim the cell content.voidstopExecution()Stops the reading (might not be immediate, depending on reader).protected booleansupportsCompressedInput()Returns whether to automatically handle gzip compressed files (AbstractSpreadSheetReader.InputType.READER,AbstractSpreadSheetReader.InputType.STREAM).StringtextColumnsTipText()Returns the tip text for this property.StringtimeZoneTipText()Returns the tip text for this property.StringtrimTipText()Returns the tip text for this property.-
Methods inherited from class adams.data.io.input.AbstractSpreadSheetReaderWithMissingValueSupport
getMissingValue, missingValueTipText, setMissingValue
-
Methods inherited from class adams.data.io.input.AbstractSpreadSheetReader
canDecompress, check, dataRowTypeTipText, doRead, doRead, encodingTipText, getAdditionalInformation, getDataRowType, getDefaultDataRowType, getDefaultFormatExtension, getDefaultSpreadSheet, getEncoding, getReaders, getSpreadSheetType, initialize, read, read, read, read, runReader, setDataRowType, setEncoding, setSpreadSheetType, spreadSheetTypeTipText
-
Methods inherited from class adams.core.option.AbstractOptionHandler
cleanUpOptions, destroy, finishInit, getDefaultLoggingLevel, getOptionManager, loggingLevelTipText, newOptionManager, reset, setLoggingLevel, toCommandLine, toString
-
Methods inherited from class adams.core.logging.LoggingObject
configureLogger, getLogger, getLoggingLevel, initializeLogging, isLoggingEnabled, sizeOf
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
-
Methods inherited from interface adams.core.Destroyable
destroy
-
Methods inherited from interface adams.core.logging.LoggingLevelHandler
getLoggingLevel
-
Methods inherited from interface adams.core.option.OptionHandler
cleanUpOptions, getOptionManager, toCommandLine
-
Methods inherited from interface adams.data.io.input.SpreadSheetReader
dataRowTypeTipText, getDataRowType, getDefaultFormatExtension, getSpreadSheetType, read, read, read, read, setDataRowType, setSpreadSheetType, spreadSheetTypeTipText
-
-
-
-
Field Detail
-
m_QuoteCharacter
protected String m_QuoteCharacter
the quote character.
-
m_Separator
protected String m_Separator
the column separator.
-
m_TextColumns
protected Range m_TextColumns
the columns to treat as text.
-
m_DateTimeColumns
protected Range m_DateTimeColumns
the columns to treat as date/time.
-
m_DateTimeFormat
protected DateFormatString m_DateTimeFormat
the format string for the date/times.
-
m_DateTimeLenient
protected boolean m_DateTimeLenient
whether date/time parsing is lenient.
-
m_DateTimeType
protected BasicDateTimeType m_DateTimeType
the type of date/time.
-
m_TimeZone
protected TimeZone m_TimeZone
the timezone to use.
-
m_Locale
protected Locale m_Locale
the locale to use.
-
m_NoHeader
protected boolean m_NoHeader
whether the file has a header or not.
-
m_CustomColumnHeaders
protected String m_CustomColumnHeaders
the comma-separated list of column header names.
-
m_ChunkSize
protected int m_ChunkSize
the chunk size to use.
-
m_Trim
protected boolean m_Trim
whether to trim the cells.
-
m_Reader
protected CsvSpreadSheetReader m_Reader
the actual reader.
-
-
Method Detail
-
globalInfo
public String globalInfo()
Returns a string describing the object.- Specified by:
globalInfoin interfaceGlobalInfoSupporter- Specified by:
globalInfoin classAbstractOptionHandler- Returns:
- a description suitable for displaying in the gui
-
defineOptions
public void defineOptions()
Adds options to the internal list of options.- Specified by:
defineOptionsin interfaceOptionHandler- Overrides:
defineOptionsin classAbstractSpreadSheetReaderWithMissingValueSupport
-
getDefaultMissingValue
protected BaseRegExp getDefaultMissingValue()
Returns the default missing value to use.- Overrides:
getDefaultMissingValuein classAbstractSpreadSheetReaderWithMissingValueSupport- Returns:
- the default
-
getDefaultSeparator
protected String getDefaultSeparator()
Returns the default separator.- Returns:
- the default
-
setQuoteCharacter
public void setQuoteCharacter(String value)
Sets the character used for surrounding text.- Parameters:
value- the quote character
-
getQuoteCharacter
public String getQuoteCharacter()
Returns the string used as separator for the columns, '\t' for tab.- Returns:
- the separator
-
quoteCharacterTipText
public String quoteCharacterTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
setSeparator
public void setSeparator(String value)
Sets the string to use as separator for the columns, use '\t' for tab.- Parameters:
value- the separator
-
getSeparator
public String getSeparator()
Returns the string used as separator for the columns, '\t' for tab.- Returns:
- the separator
-
separatorTipText
public String separatorTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
setTextColumns
public void setTextColumns(Range value)
Sets the range of columns to treat as text.- Parameters:
value- the range
-
getTextColumns
public Range getTextColumns()
Returns the range of columns to treat as text.- Returns:
- the range
-
textColumnsTipText
public String textColumnsTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the gui
-
setDateTimeColumns
public void setDateTimeColumns(Range value)
Sets the range of columns to treat as date/time msec.- Parameters:
value- the range
-
getDateTimeColumns
public Range getDateTimeColumns()
Returns the range of columns to treat as date/time msec.- Returns:
- the range
-
dateTimeColumnsTipText
public String dateTimeColumnsTipText()
Returns the tip date for this property.- Returns:
- tip date for this property suitable for displaying in the gui
-
setDateTimeFormat
public void setDateTimeFormat(DateFormatString value)
Sets the format for date/time msec columns.- Parameters:
value- the format
-
getDateTimeFormat
public DateFormatString getDateTimeFormat()
Returns the format for date/time msec columns.- Returns:
- the format
-
dateTimeFormatTipText
public String dateTimeFormatTipText()
Returns the tip date/time for this property.- Returns:
- tip date for this property suitable for displaying in the gui
-
setDateTimeLenient
public void setDateTimeLenient(boolean value)
Sets whether parsing of date/time msecs is to be lenient or not.- Parameters:
value- if true lenient parsing is used, otherwise not- See Also:
DateFormat.setLenient(boolean)
-
isDateTimeLenient
public boolean isDateTimeLenient()
Returns whether the parsing of date/time msecs is lenient or not.- Returns:
- true if parsing is lenient
- See Also:
DateFormat.isLenient()
-
dateTimeLenientTipText
public String dateTimeLenientTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the gui
-
setDateTimeType
public void setDateTimeType(BasicDateTimeType value)
Sets the type for date/time columns.- Parameters:
value- the type
-
getDateTimeType
public BasicDateTimeType getDateTimeType()
Returns the type for date/time columns.- Returns:
- the type
-
dateTimeTypeTipText
public String dateTimeTypeTipText()
Returns the tip date/time for this property.- Returns:
- tip date for this property suitable for displaying in the gui
-
setTimeZone
public void setTimeZone(TimeZone value)
Sets the time zone to use.- Parameters:
value- the time zone
-
getTimeZone
public TimeZone getTimeZone()
Returns the time zone in use.- Returns:
- the time zone
-
timeZoneTipText
public String timeZoneTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the gui
-
setLocale
public void setLocale(Locale value)
Sets the locale to use.- Specified by:
setLocalein interfaceLocaleSupporter- Parameters:
value- the locale
-
getLocale
public Locale getLocale()
Returns the locale in use.- Specified by:
getLocalein interfaceLocaleSupporter- Returns:
- the locale
-
localeTipText
public String localeTipText()
Returns the tip text for this property.- Specified by:
localeTipTextin interfaceOptionHandlingLocaleSupporter- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
setNoHeader
public void setNoHeader(boolean value)
Sets whether the file contains a header row or not.- Specified by:
setNoHeaderin interfaceNoHeaderSpreadSheetReader- Parameters:
value- true if no header row available
-
getNoHeader
public boolean getNoHeader()
Returns whether the file contains a header row or not.- Specified by:
getNoHeaderin interfaceNoHeaderSpreadSheetReader- Returns:
- true if no header row available
-
noHeaderTipText
public String noHeaderTipText()
Returns the tip text for this property.- Specified by:
noHeaderTipTextin interfaceNoHeaderSpreadSheetReader- Returns:
- tip text for this property suitable for displaying in the gui
-
setCustomColumnHeaders
public void setCustomColumnHeaders(String value)
Sets the custom headers to use.- Specified by:
setCustomColumnHeadersin interfaceNoHeaderSpreadSheetReader- Parameters:
value- the comma-separated list
-
getCustomColumnHeaders
public String getCustomColumnHeaders()
Returns whether the file contains a header row or not.- Specified by:
getCustomColumnHeadersin interfaceNoHeaderSpreadSheetReader- Returns:
- the comma-separated list
-
customColumnHeadersTipText
public String customColumnHeadersTipText()
Returns the tip text for this property.- Specified by:
customColumnHeadersTipTextin interfaceNoHeaderSpreadSheetReader- Returns:
- tip text for this property suitable for displaying in the gui
-
setChunkSize
public void setChunkSize(int value)
Sets the maximum chunk size.- Specified by:
setChunkSizein interfaceChunkedSpreadSheetReader- Parameters:
value- the size of the chunks, < 1 denotes infinity
-
getChunkSize
public int getChunkSize()
Returns the current chunk size.- Specified by:
getChunkSizein interfaceChunkedSpreadSheetReader- Returns:
- the size of the chunks, < 1 denotes infinity
-
chunkSizeTipText
public String chunkSizeTipText()
Returns the tip text for this property.- Specified by:
chunkSizeTipTextin interfaceChunkedSpreadSheetReader- Returns:
- tip text for this property suitable for displaying in the gui
-
setTrim
public void setTrim(boolean value)
Sets whether to trim the cell content.- Parameters:
value- if true the content gets trimmed
-
getTrim
public boolean getTrim()
Returns whether to trim the cell content.- Returns:
- true if to trim content
-
trimTipText
public String trimTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the gui
-
getFormatDescription
public String getFormatDescription()
Returns a string describing the format (used in the file chooser).- Specified by:
getFormatDescriptionin interfaceFileFormatHandler- Specified by:
getFormatDescriptionin interfaceSpreadSheetReader- Specified by:
getFormatDescriptionin classAbstractSpreadSheetReader- Returns:
- a description suitable for displaying in the file chooser
-
getFormatExtensions
public String[] getFormatExtensions()
Returns the extension(s) of the format.- Specified by:
getFormatExtensionsin interfaceFileFormatHandler- Specified by:
getFormatExtensionsin interfaceSpreadSheetReader- Specified by:
getFormatExtensionsin classAbstractSpreadSheetReader- Returns:
- the extension (without the dot!)
-
getCorrespondingWriter
public SpreadSheetWriter getCorrespondingWriter()
Returns, if available, the corresponding writer.- Specified by:
getCorrespondingWriterin interfaceSpreadSheetReader- Returns:
- the writer, null if none available
-
getInputType
protected AbstractSpreadSheetReader.InputType getInputType()
Returns how to read the data, from a file, stream or reader.- Specified by:
getInputTypein classAbstractSpreadSheetReader- Returns:
- how to read the data
-
supportsCompressedInput
protected boolean supportsCompressedInput()
Returns whether to automatically handle gzip compressed files (AbstractSpreadSheetReader.InputType.READER,AbstractSpreadSheetReader.InputType.STREAM).- Overrides:
supportsCompressedInputin classAbstractSpreadSheetReader- Returns:
- true if to automatically decompress
-
doRead
protected SpreadSheet doRead(Reader r)
Reads the spreadsheet content from the specified file.- Overrides:
doReadin classAbstractSpreadSheetReader- Parameters:
r- the reader to read from- Returns:
- the spreadsheet or null in case of an error
- See Also:
AbstractSpreadSheetReader.getInputType()
-
hasMoreChunks
public boolean hasMoreChunks()
Checks whether there is more data to read.- Specified by:
hasMoreChunksin interfaceChunkedSpreadSheetReader- Returns:
- true if there is more data available
-
nextChunk
public SpreadSheet nextChunk()
Returns the next chunk.- Specified by:
nextChunkin interfaceChunkedSpreadSheetReader- Returns:
- the next chunk, null if no data available
-
hasLastError
public boolean hasLastError()
Returns whether an error was encountered during the last read.- Specified by:
hasLastErrorin interfaceErrorProvider- Specified by:
hasLastErrorin interfaceSpreadSheetReader- Overrides:
hasLastErrorin classAbstractSpreadSheetReader- Returns:
- true if an error occurred
-
setLastError
protected void setLastError(String value)
Sets the value for the last error that occurred during read.- Overrides:
setLastErrorin classAbstractSpreadSheetReader- Parameters:
value- the error string, null if none occurred
-
getLastError
public String getLastError()
Returns the error that occurred during the last read.- Specified by:
getLastErrorin interfaceErrorProvider- Specified by:
getLastErrorin interfaceSpreadSheetReader- Overrides:
getLastErrorin classAbstractSpreadSheetReader- Returns:
- the error string, null if none occurred
-
stopExecution
public void stopExecution()
Stops the reading (might not be immediate, depending on reader).- Specified by:
stopExecutionin interfaceSpreadSheetReader- Specified by:
stopExecutionin interfaceStoppable- Overrides:
stopExecutionin classAbstractSpreadSheetReader
-
isStopped
public boolean isStopped()
Returns whether the reading was stopped.- Specified by:
isStoppedin interfaceSpreadSheetReader- Specified by:
isStoppedin interfaceStoppableWithFeedback- Overrides:
isStoppedin classAbstractSpreadSheetReader- Returns:
- true if stopped
-
main
public static void main(String[] args)
Runs the reader from the command-line. Use the optionAbstractSpreadSheetReader.OPTION_INPUTto specify the input file. If the optionAbstractSpreadSheetReader.OPTION_OUTPUTis specified then the read sheet gets output as .csv files in that directory.- Parameters:
args- the command-line options to use
-
-