Package adams.data.io.input
Class AutoWidthTabularSpreadSheetReader
- java.lang.Object
-
- adams.core.logging.LoggingObject
-
- adams.core.logging.CustomLoggingLevelObject
-
- adams.core.option.AbstractOptionHandler
-
- adams.data.io.input.AbstractSpreadSheetReader
-
- adams.data.io.input.AbstractSpreadSheetReaderWithMissingValueSupport
-
- adams.data.io.input.AutoWidthTabularSpreadSheetReader
-
- All Implemented Interfaces:
AdditionalInformationHandler
,Destroyable
,ErrorProvider
,GlobalInfoSupporter
,EncodingSupporter
,FileFormatHandler
,LoggingLevelHandler
,LoggingSupporter
,LocaleSupporter
,OptionHandlingLocaleSupporter
,OptionHandler
,SizeOfHandler
,Stoppable
,StoppableWithFeedback
,MissingValueSpreadSheetReader
,SpreadSheetReader
,DataRowTypeHandler
,SpreadSheetTypeHandler
,Serializable
public class AutoWidthTabularSpreadSheetReader extends AbstractSpreadSheetReaderWithMissingValueSupport implements OptionHandlingLocaleSupporter
Reads simple tabular text files, using column widths as defined by the header row.
-logging-level <OFF|SEVERE|WARNING|INFO|CONFIG|FINE|FINER|FINEST> (property: loggingLevel) The logging level for outputting errors and debugging output. default: WARNING min-user-mode: Expert
-data-row-type <adams.data.spreadsheet.DataRow> (property: dataRowType) The type of row to use for the data. default: adams.data.spreadsheet.DenseDataRow
-spreadsheet-type <adams.data.spreadsheet.SpreadSheet> (property: spreadSheetType) The type of spreadsheet to use for the data. default: adams.data.spreadsheet.DefaultSpreadSheet
-missing <adams.core.base.BaseRegExp> (property: missingValue) The placeholder for missing values. default: more: https://docs.oracle.com/javase/tutorial/essential/regex/ https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html
-encoding <adams.core.base.BaseCharset> (property: encoding) The type of encoding to use when reading using a reader, leave empty for default. default: Default
-min-spaces <int> (property: minSpaces) The minimum number of spaces between columns. default: 1 minimum: 1
-trim <boolean> (property: trim) If enabled, the content of the cells gets trimmed before added. default: true
-text-columns <adams.core.Range> (property: textColumns) The range of columns to treat as text. default: example: A range is a comma-separated list of single 1-based indices or sub-ranges of indices ('start-end'); 'inv(...)' inverts the range '...'; the following placeholders can be used as well: first, second, third, last_2, last_1, last
-datetime-columns <adams.core.Range> (property: dateTimeColumns) The range of columns to treat as date/time msec. default: example: A range is a comma-separated list of single 1-based indices or sub-ranges of indices ('start-end'); 'inv(...)' inverts the range '...'; the following placeholders can be used as well: first, second, third, last_2, last_1, last
-datetime-format <adams.data.DateFormatString> (property: dateTimeFormat) The format for date/time msecs. default: yyyy-MM-dd HH:mm:ss more: https://docs.oracle.com/javase/8/docs/api/java/text/SimpleDateFormat.html
-datetime-lenient <boolean> (property: dateTimeLenient) Whether date/time msec parsing is lenient or not. default: false
-datetime-type <TIME|TIME_MSEC|DATE|DATE_TIME|DATE_TIME_MSEC> (property: dateTimeType) How to interpret the date/time data. default: DATE_TIME
- Author:
- fracpete (fracpete at waikato dot ac dot nz)
- See Also:
- Serialized Form
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class adams.data.io.input.AbstractSpreadSheetReader
AbstractSpreadSheetReader.InputType
-
-
Field Summary
Fields Modifier and Type Field Description protected Range
m_DateTimeColumns
the columns to treat as date/time.protected DateFormatString
m_DateTimeFormat
the format string for the date/times.protected boolean
m_DateTimeLenient
whether date/time parsing is lenient.protected BasicDateTimeType
m_DateTimeType
the type of date/time.protected Locale
m_Locale
the locale to use.protected int
m_MinSpaces
the minimum number of spaces between columns.protected Range
m_TextColumns
the columns to treat as text.protected TimeZone
m_TimeZone
the timezone to use.protected boolean
m_Trim
whether to trim the cells.-
Fields inherited from class adams.data.io.input.AbstractSpreadSheetReaderWithMissingValueSupport
m_MissingValue
-
Fields inherited from class adams.data.io.input.AbstractSpreadSheetReader
m_DataRowType, m_Encoding, m_LastError, m_SpreadSheetType, m_Stopped, OPTION_INPUT, OPTION_OUTPUT
-
Fields inherited from class adams.core.option.AbstractOptionHandler
m_OptionManager
-
Fields inherited from class adams.core.logging.LoggingObject
m_Logger, m_LoggingIsEnabled, m_LoggingLevel
-
-
Constructor Summary
Constructors Constructor Description AutoWidthTabularSpreadSheetReader()
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description String
dateTimeColumnsTipText()
Returns the tip date for this property.String
dateTimeFormatTipText()
Returns the tip date/time for this property.String
dateTimeLenientTipText()
Returns the tip text for this property.String
dateTimeTypeTipText()
Returns the tip date/time for this property.void
defineOptions()
Adds options to the internal list of options.protected int[]
determineColStarts(String header, int minSpaces)
Determines where the columns start based on the header.protected SpreadSheet
doRead(Reader r)
Reads the spreadsheet content from the specified file.SpreadSheetWriter
getCorrespondingWriter()
Returns, if available, the corresponding writer.Range
getDateTimeColumns()
Returns the range of columns to treat as date/time msec.DateFormatString
getDateTimeFormat()
Returns the format for date/time msec columns.BasicDateTimeType
getDateTimeType()
Returns the type for date/time columns.protected BaseRegExp
getDefaultMissingValue()
Returns the default missing value to use.String
getFormatDescription()
Returns a string describing the format (used in the file chooser).String[]
getFormatExtensions()
Returns the extension(s) of the format.protected AbstractSpreadSheetReader.InputType
getInputType()
Returns how to read the data, from a file, stream or reader.Locale
getLocale()
Returns the locale in use.int
getMinSpaces()
Returns the minimum number of spaces between columns.Range
getTextColumns()
Returns the range of columns to treat as text.TimeZone
getTimeZone()
Returns the time zone in use.boolean
getTrim()
Returns whether to trim the cell content.String
globalInfo()
Returns a string describing the object.boolean
isDateTimeLenient()
Returns whether the parsing of date/time msecs is lenient or not.String
localeTipText()
Returns the tip text for this property.static void
main(String[] args)
Runs the reader from the command-line.String
minSpacesTipText()
Returns the tip text for this property.void
setDateTimeColumns(Range value)
Sets the range of columns to treat as date/time msec.void
setDateTimeFormat(DateFormatString value)
Sets the format for date/time msec columns.void
setDateTimeLenient(boolean value)
Sets whether parsing of date/time msecs is to be lenient or not.void
setDateTimeType(BasicDateTimeType value)
Sets the type for date/time columns.void
setLocale(Locale value)
Sets the locale to use.void
setMinSpaces(int value)
Sets the minimum number of spaces between columns.void
setTextColumns(Range value)
Sets the range of columns to treat as text.void
setTimeZone(TimeZone value)
Sets the time zone to use.void
setTrim(boolean value)
Sets whether to trim the cell content.protected boolean
supportsCompressedInput()
Returns whether to automatically handle gzip compressed files (AbstractSpreadSheetReader.InputType.READER
,AbstractSpreadSheetReader.InputType.STREAM
).String
textColumnsTipText()
Returns the tip text for this property.String
timeZoneTipText()
Returns the tip text for this property.String
trimTipText()
Returns the tip text for this property.-
Methods inherited from class adams.data.io.input.AbstractSpreadSheetReaderWithMissingValueSupport
getMissingValue, missingValueTipText, setMissingValue
-
Methods inherited from class adams.data.io.input.AbstractSpreadSheetReader
canDecompress, check, dataRowTypeTipText, doRead, doRead, encodingTipText, getAdditionalInformation, getDataRowType, getDefaultDataRowType, getDefaultFormatExtension, getDefaultSpreadSheet, getEncoding, getLastError, getReaders, getSpreadSheetType, hasLastError, initialize, isStopped, read, read, read, read, runReader, setDataRowType, setEncoding, setLastError, setSpreadSheetType, spreadSheetTypeTipText, stopExecution
-
Methods inherited from class adams.core.option.AbstractOptionHandler
cleanUpOptions, destroy, finishInit, getDefaultLoggingLevel, getOptionManager, loggingLevelTipText, newOptionManager, reset, setLoggingLevel, toCommandLine, toString
-
Methods inherited from class adams.core.logging.LoggingObject
configureLogger, getLogger, getLoggingLevel, initializeLogging, isLoggingEnabled, sizeOf
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
-
Methods inherited from interface adams.core.Destroyable
destroy
-
Methods inherited from interface adams.core.logging.LoggingLevelHandler
getLoggingLevel
-
Methods inherited from interface adams.core.option.OptionHandler
cleanUpOptions, getOptionManager, toCommandLine
-
Methods inherited from interface adams.data.io.input.SpreadSheetReader
dataRowTypeTipText, getDataRowType, getDefaultFormatExtension, getLastError, getSpreadSheetType, hasLastError, isStopped, read, read, read, read, setDataRowType, setSpreadSheetType, spreadSheetTypeTipText, stopExecution
-
-
-
-
Field Detail
-
m_MinSpaces
protected int m_MinSpaces
the minimum number of spaces between columns.
-
m_TextColumns
protected Range m_TextColumns
the columns to treat as text.
-
m_DateTimeColumns
protected Range m_DateTimeColumns
the columns to treat as date/time.
-
m_DateTimeFormat
protected DateFormatString m_DateTimeFormat
the format string for the date/times.
-
m_DateTimeLenient
protected boolean m_DateTimeLenient
whether date/time parsing is lenient.
-
m_DateTimeType
protected BasicDateTimeType m_DateTimeType
the type of date/time.
-
m_TimeZone
protected TimeZone m_TimeZone
the timezone to use.
-
m_Locale
protected Locale m_Locale
the locale to use.
-
m_Trim
protected boolean m_Trim
whether to trim the cells.
-
-
Method Detail
-
globalInfo
public String globalInfo()
Returns a string describing the object.- Specified by:
globalInfo
in interfaceGlobalInfoSupporter
- Specified by:
globalInfo
in classAbstractOptionHandler
- Returns:
- a description suitable for displaying in the gui
-
defineOptions
public void defineOptions()
Adds options to the internal list of options.- Specified by:
defineOptions
in interfaceOptionHandler
- Overrides:
defineOptions
in classAbstractSpreadSheetReaderWithMissingValueSupport
-
getDefaultMissingValue
protected BaseRegExp getDefaultMissingValue()
Returns the default missing value to use.- Overrides:
getDefaultMissingValue
in classAbstractSpreadSheetReaderWithMissingValueSupport
- Returns:
- the default
-
setMinSpaces
public void setMinSpaces(int value)
Sets the minimum number of spaces between columns.- Parameters:
value
- the minimum
-
getMinSpaces
public int getMinSpaces()
Returns the minimum number of spaces between columns.- Returns:
- the minimum
-
minSpacesTipText
public String minSpacesTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the gui
-
setTextColumns
public void setTextColumns(Range value)
Sets the range of columns to treat as text.- Parameters:
value
- the range
-
getTextColumns
public Range getTextColumns()
Returns the range of columns to treat as text.- Returns:
- the range
-
textColumnsTipText
public String textColumnsTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the gui
-
setDateTimeColumns
public void setDateTimeColumns(Range value)
Sets the range of columns to treat as date/time msec.- Parameters:
value
- the range
-
getDateTimeColumns
public Range getDateTimeColumns()
Returns the range of columns to treat as date/time msec.- Returns:
- the range
-
dateTimeColumnsTipText
public String dateTimeColumnsTipText()
Returns the tip date for this property.- Returns:
- tip date for this property suitable for displaying in the gui
-
setDateTimeFormat
public void setDateTimeFormat(DateFormatString value)
Sets the format for date/time msec columns.- Parameters:
value
- the format
-
getDateTimeFormat
public DateFormatString getDateTimeFormat()
Returns the format for date/time msec columns.- Returns:
- the format
-
dateTimeFormatTipText
public String dateTimeFormatTipText()
Returns the tip date/time for this property.- Returns:
- tip date for this property suitable for displaying in the gui
-
setDateTimeLenient
public void setDateTimeLenient(boolean value)
Sets whether parsing of date/time msecs is to be lenient or not.- Parameters:
value
- if true lenient parsing is used, otherwise not- See Also:
DateFormat.setLenient(boolean)
-
isDateTimeLenient
public boolean isDateTimeLenient()
Returns whether the parsing of date/time msecs is lenient or not.- Returns:
- true if parsing is lenient
- See Also:
DateFormat.isLenient()
-
dateTimeLenientTipText
public String dateTimeLenientTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the gui
-
setDateTimeType
public void setDateTimeType(BasicDateTimeType value)
Sets the type for date/time columns.- Parameters:
value
- the type
-
getDateTimeType
public BasicDateTimeType getDateTimeType()
Returns the type for date/time columns.- Returns:
- the type
-
dateTimeTypeTipText
public String dateTimeTypeTipText()
Returns the tip date/time for this property.- Returns:
- tip date for this property suitable for displaying in the gui
-
setTimeZone
public void setTimeZone(TimeZone value)
Sets the time zone to use.- Parameters:
value
- the time zone
-
getTimeZone
public TimeZone getTimeZone()
Returns the time zone in use.- Returns:
- the time zone
-
timeZoneTipText
public String timeZoneTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the gui
-
setLocale
public void setLocale(Locale value)
Sets the locale to use.- Specified by:
setLocale
in interfaceLocaleSupporter
- Parameters:
value
- the locale
-
getLocale
public Locale getLocale()
Returns the locale in use.- Specified by:
getLocale
in interfaceLocaleSupporter
- Returns:
- the locale
-
localeTipText
public String localeTipText()
Returns the tip text for this property.- Specified by:
localeTipText
in interfaceOptionHandlingLocaleSupporter
- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
setTrim
public void setTrim(boolean value)
Sets whether to trim the cell content.- Parameters:
value
- if true the content gets trimmed
-
getTrim
public boolean getTrim()
Returns whether to trim the cell content.- Returns:
- true if to trim content
-
trimTipText
public String trimTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the gui
-
getFormatDescription
public String getFormatDescription()
Returns a string describing the format (used in the file chooser).- Specified by:
getFormatDescription
in interfaceFileFormatHandler
- Specified by:
getFormatDescription
in interfaceSpreadSheetReader
- Specified by:
getFormatDescription
in classAbstractSpreadSheetReader
- Returns:
- a description suitable for displaying in the file chooser
-
getFormatExtensions
public String[] getFormatExtensions()
Returns the extension(s) of the format.- Specified by:
getFormatExtensions
in interfaceFileFormatHandler
- Specified by:
getFormatExtensions
in interfaceSpreadSheetReader
- Specified by:
getFormatExtensions
in classAbstractSpreadSheetReader
- Returns:
- the extension (without the dot!)
-
getCorrespondingWriter
public SpreadSheetWriter getCorrespondingWriter()
Returns, if available, the corresponding writer.- Specified by:
getCorrespondingWriter
in interfaceSpreadSheetReader
- Returns:
- the writer, null if none available
-
getInputType
protected AbstractSpreadSheetReader.InputType getInputType()
Returns how to read the data, from a file, stream or reader.- Specified by:
getInputType
in classAbstractSpreadSheetReader
- Returns:
- how to read the data
-
supportsCompressedInput
protected boolean supportsCompressedInput()
Returns whether to automatically handle gzip compressed files (AbstractSpreadSheetReader.InputType.READER
,AbstractSpreadSheetReader.InputType.STREAM
).- Overrides:
supportsCompressedInput
in classAbstractSpreadSheetReader
- Returns:
- true if to automatically decompress
-
determineColStarts
protected int[] determineColStarts(String header, int minSpaces)
Determines where the columns start based on the header.- Parameters:
header
- the header rowminSpaces
- the minimum number of spaces to use- Returns:
- the start positions
-
doRead
protected SpreadSheet doRead(Reader r)
Reads the spreadsheet content from the specified file.- Overrides:
doRead
in classAbstractSpreadSheetReader
- Parameters:
r
- the reader to read from- Returns:
- the spreadsheet or null in case of an error
- See Also:
AbstractSpreadSheetReader.getInputType()
-
main
public static void main(String[] args)
Runs the reader from the command-line. Use the optionAbstractSpreadSheetReader.OPTION_INPUT
to specify the input file. If the optionAbstractSpreadSheetReader.OPTION_OUTPUT
is specified then the read sheet gets output as .csv files in that directory.- Parameters:
args
- the command-line options to use
-
-