adams.data.io.input
Class ExcelStreamingSpreadSheetReader

java.lang.Object
  extended by adams.core.ConsoleObject
      extended by adams.core.option.AbstractOptionHandler
          extended by adams.data.io.input.AbstractSpreadSheetReader
              extended by adams.data.io.input.AbstractMultiSheetSpreadSheetReader
                  extended by adams.data.io.input.AbstractMultiSheetSpreadSheetReaderWithMissingValueSupport
                      extended by adams.data.io.input.AbstractExcelSpreadSheetReader
                          extended by adams.data.io.input.ExcelStreamingSpreadSheetReader
All Implemented Interfaces:
Debuggable, Destroyable, OptionHandler, SizeOfHandler, Stoppable, MultiSheetSpreadSheetReader, SpreadSheetReader, Serializable

public class ExcelStreamingSpreadSheetReader
extends AbstractExcelSpreadSheetReader

Reads large MS Excel XML files (using streaming via SAX).
Increasing the debug level to more than 1 results in outputting detailed information on cells.

Valid options are:

-D <int> (property: debugLevel)
    The greater the number the more additional info the scheme may output to 
    the console (0 = off).
    default: 0
    minimum: 0
 
-data-row-type <DENSE|SPARSE> (property: dataRowType)
    The type of row to use for the data.
    default: DENSE
 
-sheets <adams.core.Range> (property: sheetRange)
    The range of sheets to load; A range is a comma-separated list of single 
    1-based indices or sub-ranges of indices ('start-end'); 'inv(...)' inverts 
    the range '...'; the following placeholders can be used as well: first, 
    second, third, last_2, last_1, last
    default: first
 
-missing <java.lang.String> (property: missingValue)
    The placeholder for missing values.
    default: ?
 
-no-auto-extend-header (property: autoExtendHeader)
    If enabled, the header gets automatically extended if rows have more cells 
    than the header.
 
-text-columns <java.lang.String> (property: textColumns)
    The range of columns to treat as text; A range is a comma-separated list 
    of single 1-based indices or sub-ranges of indices ('start-end'); 'inv(..
    .)' inverts the range '...'; the following placeholders can be used as well:
     first, second, third, last_2, last_1, last
    default: 
 
-cell-type-id <adams.core.base.BaseString> [-cell-type-id ...] (property: cellTypeID)
    The IDs (= strings) for the cell types to parse.
    default: b, s
 
-cell-type-contenttype <MISSING|STRING|BOOLEAN|LONG|DOUBLE|DATE|DATETIME|TIME|OBJECT> [-cell-type-contenttype ...] (property: cellTypeContentType)
    The corresponding content types for the cell types to parse.
    default: BOOLEAN, STRING
 
-cell-string-id <adams.core.base.BaseString> [-cell-string-id ...] (property: cellStringID)
    The IDs (= strings) for the cell strings to parse.
    default: 1, 2, 3, 4, 7, 8
 
-cell-string-contenttype <MISSING|STRING|BOOLEAN|LONG|DOUBLE|DATE|DATETIME|TIME|OBJECT> [-cell-string-contenttype ...] (property: cellStringContentType)
    The corresponding content types for the cell strings to parse.
    default: DATE, TIME, DOUBLE, DATE, DATE, LONG
 

Version:
$Revision: 7064 $
Author:
fracpete (fracpete at waikato dot ac dot nz)
See Also:
Serialized Form

Nested Class Summary
static class ExcelStreamingSpreadSheetReader.ParseStopException
          Dummy exception to stop the parsing.
static class ExcelStreamingSpreadSheetReader.SheetHandler
          For reading a sheet from XML.
 
Nested classes/interfaces inherited from class adams.data.io.input.AbstractSpreadSheetReader
AbstractSpreadSheetReader.InputType
 
Field Summary
protected  Cell.ContentType[] m_CellStringContentType
          the corresponding types.
protected  BaseString[] m_CellStringID
          the extra cell strings to manage.
protected  Cell.ContentType[] m_CellTypeContentType
          the corresponding types.
protected  BaseString[] m_CellTypeID
          the extra cell types to manage.
protected  ExcelStreamingSpreadSheetReader.SheetHandler m_Handler
          the currently used handler for parsing.
 
Fields inherited from class adams.data.io.input.AbstractExcelSpreadSheetReader
m_AutoExtendHeader, m_TextColumns
 
Fields inherited from class adams.data.io.input.AbstractMultiSheetSpreadSheetReaderWithMissingValueSupport
m_MissingValue
 
Fields inherited from class adams.data.io.input.AbstractMultiSheetSpreadSheetReader
m_SheetRange
 
Fields inherited from class adams.data.io.input.AbstractSpreadSheetReader
m_DataRowType, m_Stopped
 
Fields inherited from class adams.core.option.AbstractOptionHandler
m_DebugLevel, m_OptionManager
 
Constructor Summary
ExcelStreamingSpreadSheetReader()
           
 
Method Summary
 String cellStringContentTypeTipText()
          Returns the tip text for this property.
 String cellStringIDTipText()
          Returns the tip text for this property.
 String cellTypeContentTypeTipText()
          Returns the tip text for this property.
 String cellTypeIDTipText()
          Returns the tip text for this property.
protected  void check()
          Hook method to perform some checks before performing the actual read.
 void defineOptions()
          Adds options to the internal list of options.
protected  List<SpreadSheet> doReadRange(File file)
          Reads the spreadsheet content from the specified file.
 Cell.ContentType[] getCellStringContentType()
          Returns the array of cell string content types.
 BaseString[] getCellStringID()
          Returns the array of cell string IDs.
 Cell.ContentType[] getCellTypeContentType()
          Returns the array of cell type content types.
 BaseString[] getCellTypeID()
          Returns the array of cell type IDs.
 String getFormatDescription()
          Returns a string describing the format (used in the file chooser).
 String[] getFormatExtensions()
          Returns the extension(s) of the format.
protected  AbstractSpreadSheetReader.InputType getInputType()
          Returns how to read the data, from a file, stream or reader.
protected  int getSheetCount(File file)
          Determines the number of sheets in the file.
 String globalInfo()
          Returns a string describing the object.
 void setCellStringContentType(Cell.ContentType[] value)
          Sets the array of cell string content types.
 void setCellStringID(BaseString[] value)
          Sets the array of cell string IDs.
 void setCellTypeContentType(Cell.ContentType[] value)
          Sets the array of cell type content types.
 void setCellTypeID(BaseString[] value)
          Sets the array of cell type IDs.
 void stopExecution()
          Stops the reading (might not be immediate, depending on reader).
 
Methods inherited from class adams.data.io.input.AbstractExcelSpreadSheetReader
autoExtendHeaderTipText, getAutoExtendHeader, getTextColumns, initialize, setAutoExtendHeader, setTextColumns, textColumnsTipText
 
Methods inherited from class adams.data.io.input.AbstractMultiSheetSpreadSheetReaderWithMissingValueSupport
getMissingValue, missingValueTipText, setMissingValue
 
Methods inherited from class adams.data.io.input.AbstractMultiSheetSpreadSheetReader
doRead, doRead, doRead, doReadRange, doReadRange, getSheetRange, readRange, readRange, readRange, readRange, setSheetRange, sheetRangeTipText
 
Methods inherited from class adams.data.io.input.AbstractSpreadSheetReader
dataRowTypeTipText, getDataRowType, getDefaultDataRowType, getReaders, isStopped, read, read, read, read, setDataRowType
 
Methods inherited from class adams.core.option.AbstractOptionHandler
cleanUpOptions, debug, debug, debugLevelTipText, destroy, finishInit, getDebugLevel, getOptionManager, isDebugOn, newOptionManager, reset, setDebugLevel, toCommandLine, toString
 
Methods inherited from class adams.core.ConsoleObject
getDebugging, getSystemErr, getSystemOut, sizeOf
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 
Methods inherited from interface adams.data.io.input.SpreadSheetReader
dataRowTypeTipText, getDataRowType, isStopped, read, read, read, read, setDataRowType
 
Methods inherited from interface adams.core.option.OptionHandler
cleanUpOptions, getOptionManager
 
Methods inherited from interface adams.core.Destroyable
destroy
 

Field Detail

m_Handler

protected ExcelStreamingSpreadSheetReader.SheetHandler m_Handler
the currently used handler for parsing.


m_CellTypeID

protected BaseString[] m_CellTypeID
the extra cell types to manage.


m_CellTypeContentType

protected Cell.ContentType[] m_CellTypeContentType
the corresponding types.


m_CellStringID

protected BaseString[] m_CellStringID
the extra cell strings to manage.


m_CellStringContentType

protected Cell.ContentType[] m_CellStringContentType
the corresponding types.

Constructor Detail

ExcelStreamingSpreadSheetReader

public ExcelStreamingSpreadSheetReader()
Method Detail

globalInfo

public String globalInfo()
Returns a string describing the object.

Specified by:
globalInfo in class AbstractOptionHandler
Returns:
a description suitable for displaying in the gui

defineOptions

public void defineOptions()
Adds options to the internal list of options.

Specified by:
defineOptions in interface OptionHandler
Overrides:
defineOptions in class AbstractExcelSpreadSheetReader

getFormatDescription

public String getFormatDescription()
Returns a string describing the format (used in the file chooser).

Specified by:
getFormatDescription in interface SpreadSheetReader
Specified by:
getFormatDescription in class AbstractSpreadSheetReader
Returns:
a description suitable for displaying in the file chooser

getFormatExtensions

public String[] getFormatExtensions()
Returns the extension(s) of the format.

Specified by:
getFormatExtensions in interface SpreadSheetReader
Specified by:
getFormatExtensions in class AbstractSpreadSheetReader
Returns:
the extension (without the dot!)

getInputType

protected AbstractSpreadSheetReader.InputType getInputType()
Returns how to read the data, from a file, stream or reader.

Specified by:
getInputType in class AbstractSpreadSheetReader
Returns:
how to read the data

setCellTypeID

public void setCellTypeID(BaseString[] value)
Sets the array of cell type IDs.

Parameters:
value - the IDs

getCellTypeID

public BaseString[] getCellTypeID()
Returns the array of cell type IDs.

Returns:
the IDs

cellTypeIDTipText

public String cellTypeIDTipText()
Returns the tip text for this property.

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

setCellTypeContentType

public void setCellTypeContentType(Cell.ContentType[] value)
Sets the array of cell type content types.

Parameters:
value - the types

getCellTypeContentType

public Cell.ContentType[] getCellTypeContentType()
Returns the array of cell type content types.

Returns:
the types

cellTypeContentTypeTipText

public String cellTypeContentTypeTipText()
Returns the tip text for this property.

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

setCellStringID

public void setCellStringID(BaseString[] value)
Sets the array of cell string IDs.

Parameters:
value - the IDs

getCellStringID

public BaseString[] getCellStringID()
Returns the array of cell string IDs.

Returns:
the IDs

cellStringIDTipText

public String cellStringIDTipText()
Returns the tip text for this property.

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

setCellStringContentType

public void setCellStringContentType(Cell.ContentType[] value)
Sets the array of cell string content types.

Parameters:
value - the types

getCellStringContentType

public Cell.ContentType[] getCellStringContentType()
Returns the array of cell string content types.

Returns:
the types

cellStringContentTypeTipText

public String cellStringContentTypeTipText()
Returns the tip text for this property.

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

check

protected void check()
Hook method to perform some checks before performing the actual read.

Overrides:
check in class AbstractSpreadSheetReader

getSheetCount

protected int getSheetCount(File file)
                     throws Exception
Determines the number of sheets in the file.

Parameters:
file - the file to inspec
Returns:
the number of sheets
Throws:
ExcelStreamingSpreadSheetReader.ParseStopException - if reading of file fails
Exception

doReadRange

protected List<SpreadSheet> doReadRange(File file)
Reads the spreadsheet content from the specified file.

Overrides:
doReadRange in class AbstractMultiSheetSpreadSheetReader
Parameters:
file - the file to read from
Returns:
the spreadsheets or null in case of an error

stopExecution

public void stopExecution()
Stops the reading (might not be immediate, depending on reader).

Specified by:
stopExecution in interface Stoppable
Specified by:
stopExecution in interface SpreadSheetReader
Overrides:
stopExecution in class AbstractSpreadSheetReader


Copyright © 2013 University of Waikato, Hamilton, NZ. All Rights Reserved.