Package adams.data.io.input
Class ExcelStreamingSpreadSheetReader
- java.lang.Object
-
- adams.core.logging.LoggingObject
-
- adams.core.logging.CustomLoggingLevelObject
-
- adams.core.option.AbstractOptionHandler
-
- adams.data.io.input.AbstractSpreadSheetReader
-
- adams.data.io.input.AbstractMultiSheetSpreadSheetReader<T>
-
- adams.data.io.input.AbstractMultiSheetSpreadSheetReaderWithMissingValueSupport<T>
-
- adams.data.io.input.AbstractExcelSpreadSheetReader<Range>
-
- adams.data.io.input.ExcelStreamingSpreadSheetReader
-
- All Implemented Interfaces:
AdditionalInformationHandler
,Destroyable
,ErrorProvider
,GlobalInfoSupporter
,EncodingSupporter
,FileFormatHandler
,LoggingLevelHandler
,LoggingSupporter
,OptionHandler
,SizeOfHandler
,Stoppable
,StoppableWithFeedback
,MissingValueSpreadSheetReader
,MultiSheetSpreadSheetReader<Range>
,NoHeaderSpreadSheetReader
,SpreadSheetReader
,WindowedSpreadSheetReader
,DataRowTypeHandler
,SpreadSheetTypeHandler
,Serializable
public class ExcelStreamingSpreadSheetReader extends AbstractExcelSpreadSheetReader<Range>
Reads large MS Excel XML files (using streaming via SAX).
Increasing the debug level to more than 1 results in outputting detailed information on cells.
-logging-level <OFF|SEVERE|WARNING|INFO|CONFIG|FINE|FINER|FINEST> (property: loggingLevel) The logging level for outputting errors and debugging output. default: WARNING
-data-row-type <adams.data.spreadsheet.DataRow> (property: dataRowType) The type of row to use for the data. default: adams.data.spreadsheet.DenseDataRow
-spreadsheet-type <adams.data.spreadsheet.SpreadSheet> (property: spreadSheetType) The type of spreadsheet to use for the data. default: adams.data.spreadsheet.DefaultSpreadSheet
-sheets <adams.core.Range> (property: sheetRange) The range of sheets to load. default: first example: A range is a comma-separated list of single 1-based indices or sub-ranges of indices ('start-end'); 'inv(...)' inverts the range '...'; the following placeholders can be used as well: first, second, third, last_2, last_1, last
-missing <java.lang.String> (property: missingValue) The placeholder for missing values. default:
-no-auto-extend-header <boolean> (property: autoExtendHeader) If enabled, the header gets automatically extended if rows have more cells than the header. default: true
-text-columns <adams.core.Range> (property: textColumns) The range of columns to treat as text. default: example: A range is a comma-separated list of single 1-based indices or sub-ranges of indices ('start-end'); 'inv(...)' inverts the range '...'; the following placeholders can be used as well: first, second, third, last_2, last_1, last
-no-header <boolean> (property: noHeader) If enabled, all rows get added as data rows and a dummy header will get inserted. default: false
-custom-column-headers <java.lang.String> (property: customColumnHeaders) The custom headers to use for the columns instead (comma-separated list); ignored if empty. default:
-first-row <int> (property: firstRow) The index of the first row to retrieve (1-based). default: 1 minimum: 1
-num-rows <int> (property: numRows) The number of data rows to retrieve; use -1 for unlimited. default: -1 minimum: -1
-cell-type-id <adams.core.base.BaseString> [-cell-type-id ...] (property: cellTypeID) The IDs (= strings) for the cell types to parse. default: b, s
-cell-type-contenttype <MISSING|STRING|BOOLEAN|LONG|DOUBLE|DATE|DATETIME|DATETIMEMSEC|TIME|TIMEMSEC|OBJECT> [-cell-type-contenttype ...] (property: cellTypeContentType) The corresponding content types for the cell types to parse. default: BOOLEAN, STRING
-cell-string-id <adams.core.base.BaseString> [-cell-string-id ...] (property: cellStringID) The IDs (= strings) for the cell strings to parse. default:
-cell-string-contenttype <MISSING|STRING|BOOLEAN|LONG|DOUBLE|DATE|DATETIME|DATETIMEMSEC|TIME|TIMEMSEC|OBJECT> [-cell-string-contenttype ...] (property: cellStringContentType) The corresponding content types for the cell strings to parse. default:
- Author:
- fracpete (fracpete at waikato dot ac dot nz)
- See Also:
- Serialized Form
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
ExcelStreamingSpreadSheetReader.ParseStopException
Dummy exception to stop the parsing.static class
ExcelStreamingSpreadSheetReader.SheetHandler
For reading a sheet from XML.-
Nested classes/interfaces inherited from class adams.data.io.input.AbstractSpreadSheetReader
AbstractSpreadSheetReader.InputType
-
-
Field Summary
Fields Modifier and Type Field Description protected Cell.ContentType[]
m_CellStringContentType
the corresponding types.protected BaseString[]
m_CellStringID
the extra cell strings to manage.protected Cell.ContentType[]
m_CellTypeContentType
the corresponding types.protected BaseString[]
m_CellTypeID
the extra cell types to manage.protected ExcelStreamingSpreadSheetReader.SheetHandler
m_Handler
the currently used handler for parsing.-
Fields inherited from class adams.data.io.input.AbstractExcelSpreadSheetReader
m_AutoExtendHeader, m_CustomColumnHeaders, m_FirstRow, m_NoHeader, m_NumRows, m_TextColumns
-
Fields inherited from class adams.data.io.input.AbstractMultiSheetSpreadSheetReaderWithMissingValueSupport
m_MissingValue
-
Fields inherited from class adams.data.io.input.AbstractMultiSheetSpreadSheetReader
m_SheetRange
-
Fields inherited from class adams.data.io.input.AbstractSpreadSheetReader
m_DataRowType, m_Encoding, m_LastError, m_SpreadSheetType, m_Stopped, OPTION_INPUT, OPTION_OUTPUT
-
Fields inherited from class adams.core.option.AbstractOptionHandler
m_OptionManager
-
Fields inherited from class adams.core.logging.LoggingObject
m_Logger, m_LoggingIsEnabled, m_LoggingLevel
-
-
Constructor Summary
Constructors Constructor Description ExcelStreamingSpreadSheetReader()
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description String
cellStringContentTypeTipText()
Returns the tip text for this property.String
cellStringIDTipText()
Returns the tip text for this property.String
cellTypeContentTypeTipText()
Returns the tip text for this property.String
cellTypeIDTipText()
Returns the tip text for this property.protected void
check()
Hook method to perform some checks before performing the actual read.void
defineOptions()
Adds options to the internal list of options.protected List<SpreadSheet>
doReadRange(File file)
Reads the spreadsheet content from the specified file.Cell.ContentType[]
getCellStringContentType()
Returns the array of cell string content types.BaseString[]
getCellStringID()
Returns the array of cell string IDs.Cell.ContentType[]
getCellTypeContentType()
Returns the array of cell type content types.BaseString[]
getCellTypeID()
Returns the array of cell type IDs.SpreadSheetWriter
getCorrespondingWriter()
Returns, if available, the corresponding writer.protected Range
getDefaultSheetRange()
Returns the default sheet range.String
getFormatDescription()
Returns a string describing the format (used in the file chooser).String[]
getFormatExtensions()
Returns the extension(s) of the format.protected AbstractSpreadSheetReader.InputType
getInputType()
Returns how to read the data, from a file, stream or reader.protected int
getSheetCount(File file)
Determines the number of sheets in the file.String
globalInfo()
Returns a string describing the object.static void
main(String[] args)
Runs the reader from the command-line.void
setCellStringContentType(Cell.ContentType[] value)
Sets the array of cell string content types.void
setCellStringID(BaseString[] value)
Sets the array of cell string IDs.void
setCellTypeContentType(Cell.ContentType[] value)
Sets the array of cell type content types.void
setCellTypeID(BaseString[] value)
Sets the array of cell type IDs.void
stopExecution()
Stops the reading (might not be immediate, depending on reader).-
Methods inherited from class adams.data.io.input.AbstractExcelSpreadSheetReader
autoExtendHeaderTipText, customColumnHeadersTipText, firstRowTipText, getAutoExtendHeader, getCustomColumnHeaders, getFirstRow, getNoHeader, getNumRows, getTextColumns, initialize, noHeaderTipText, numRowsTipText, setAutoExtendHeader, setCustomColumnHeaders, setFirstRow, setNoHeader, setNumRows, setTextColumns, textColumnsTipText
-
Methods inherited from class adams.data.io.input.AbstractMultiSheetSpreadSheetReaderWithMissingValueSupport
getDefaultMissingValue, getMissingValue, missingValueTipText, setMissingValue
-
Methods inherited from class adams.data.io.input.AbstractMultiSheetSpreadSheetReader
doRead, doRead, doRead, doReadRange, doReadRange, getSheetRange, readRange, readRange, readRange, readRange, setSheetRange, sheetRangeTipText
-
Methods inherited from class adams.data.io.input.AbstractSpreadSheetReader
canDecompress, dataRowTypeTipText, encodingTipText, getAdditionalInformation, getDataRowType, getDefaultDataRowType, getDefaultFormatExtension, getDefaultSpreadSheet, getEncoding, getLastError, getReaders, getSpreadSheetType, hasLastError, isStopped, read, read, read, read, runReader, setDataRowType, setEncoding, setLastError, setSpreadSheetType, spreadSheetTypeTipText, supportsCompressedInput
-
Methods inherited from class adams.core.option.AbstractOptionHandler
cleanUpOptions, destroy, finishInit, getDefaultLoggingLevel, getOptionManager, loggingLevelTipText, newOptionManager, reset, setLoggingLevel, toCommandLine, toString
-
Methods inherited from class adams.core.logging.LoggingObject
configureLogger, getLogger, getLoggingLevel, initializeLogging, isLoggingEnabled, sizeOf
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
-
Methods inherited from interface adams.core.Destroyable
destroy
-
Methods inherited from interface adams.core.logging.LoggingLevelHandler
getLoggingLevel
-
Methods inherited from interface adams.core.option.OptionHandler
cleanUpOptions, getOptionManager, toCommandLine
-
Methods inherited from interface adams.data.io.input.SpreadSheetReader
dataRowTypeTipText, getDataRowType, getDefaultFormatExtension, getLastError, getSpreadSheetType, hasLastError, isStopped, read, read, read, read, setDataRowType, setSpreadSheetType, spreadSheetTypeTipText
-
-
-
-
Field Detail
-
m_Handler
protected ExcelStreamingSpreadSheetReader.SheetHandler m_Handler
the currently used handler for parsing.
-
m_CellTypeID
protected BaseString[] m_CellTypeID
the extra cell types to manage.
-
m_CellTypeContentType
protected Cell.ContentType[] m_CellTypeContentType
the corresponding types.
-
m_CellStringID
protected BaseString[] m_CellStringID
the extra cell strings to manage.
-
m_CellStringContentType
protected Cell.ContentType[] m_CellStringContentType
the corresponding types.
-
-
Method Detail
-
globalInfo
public String globalInfo()
Returns a string describing the object.- Specified by:
globalInfo
in interfaceGlobalInfoSupporter
- Specified by:
globalInfo
in classAbstractOptionHandler
- Returns:
- a description suitable for displaying in the gui
-
defineOptions
public void defineOptions()
Adds options to the internal list of options.- Specified by:
defineOptions
in interfaceOptionHandler
- Overrides:
defineOptions
in classAbstractExcelSpreadSheetReader<Range>
-
getDefaultSheetRange
protected Range getDefaultSheetRange()
Returns the default sheet range.- Specified by:
getDefaultSheetRange
in classAbstractMultiSheetSpreadSheetReader<Range>
- Returns:
- the default
-
getFormatDescription
public String getFormatDescription()
Returns a string describing the format (used in the file chooser).- Specified by:
getFormatDescription
in interfaceFileFormatHandler
- Specified by:
getFormatDescription
in interfaceSpreadSheetReader
- Specified by:
getFormatDescription
in classAbstractSpreadSheetReader
- Returns:
- a description suitable for displaying in the file chooser
-
getFormatExtensions
public String[] getFormatExtensions()
Returns the extension(s) of the format.- Specified by:
getFormatExtensions
in interfaceFileFormatHandler
- Specified by:
getFormatExtensions
in interfaceSpreadSheetReader
- Specified by:
getFormatExtensions
in classAbstractSpreadSheetReader
- Returns:
- the extension (without the dot!)
-
getCorrespondingWriter
public SpreadSheetWriter getCorrespondingWriter()
Returns, if available, the corresponding writer.- Returns:
- the writer, null if none available
-
getInputType
protected AbstractSpreadSheetReader.InputType getInputType()
Returns how to read the data, from a file, stream or reader.- Specified by:
getInputType
in classAbstractSpreadSheetReader
- Returns:
- how to read the data
-
setCellTypeID
public void setCellTypeID(BaseString[] value)
Sets the array of cell type IDs.- Parameters:
value
- the IDs
-
getCellTypeID
public BaseString[] getCellTypeID()
Returns the array of cell type IDs.- Returns:
- the IDs
-
cellTypeIDTipText
public String cellTypeIDTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setCellTypeContentType
public void setCellTypeContentType(Cell.ContentType[] value)
Sets the array of cell type content types.- Parameters:
value
- the types
-
getCellTypeContentType
public Cell.ContentType[] getCellTypeContentType()
Returns the array of cell type content types.- Returns:
- the types
-
cellTypeContentTypeTipText
public String cellTypeContentTypeTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setCellStringID
public void setCellStringID(BaseString[] value)
Sets the array of cell string IDs.- Parameters:
value
- the IDs
-
getCellStringID
public BaseString[] getCellStringID()
Returns the array of cell string IDs.- Returns:
- the IDs
-
cellStringIDTipText
public String cellStringIDTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setCellStringContentType
public void setCellStringContentType(Cell.ContentType[] value)
Sets the array of cell string content types.- Parameters:
value
- the types
-
getCellStringContentType
public Cell.ContentType[] getCellStringContentType()
Returns the array of cell string content types.- Returns:
- the types
-
cellStringContentTypeTipText
public String cellStringContentTypeTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
check
protected void check()
Hook method to perform some checks before performing the actual read.- Overrides:
check
in classAbstractSpreadSheetReader
-
getSheetCount
protected int getSheetCount(File file) throws Exception
Determines the number of sheets in the file.- Parameters:
file
- the file to inspec- Returns:
- the number of sheets
- Throws:
Exception
- if reading of file fails
-
doReadRange
protected List<SpreadSheet> doReadRange(File file)
Reads the spreadsheet content from the specified file.- Overrides:
doReadRange
in classAbstractMultiSheetSpreadSheetReader<Range>
- Parameters:
file
- the file to read from- Returns:
- the spreadsheets or null in case of an error
- See Also:
AbstractSpreadSheetReader.getInputType()
-
stopExecution
public void stopExecution()
Stops the reading (might not be immediate, depending on reader).- Specified by:
stopExecution
in interfaceSpreadSheetReader
- Specified by:
stopExecution
in interfaceStoppable
- Overrides:
stopExecution
in classAbstractSpreadSheetReader
-
main
public static void main(String[] args)
Runs the reader from the command-line. Use the optionAbstractSpreadSheetReader.OPTION_INPUT
to specify the input file. If the optionAbstractSpreadSheetReader.OPTION_OUTPUT
is specified then the read sheet gets output as .csv files in that directory.- Parameters:
args
- the command-line options to use
-
-