Package adams.data.io.input
Class CsvSpreadSheetReader.ChunkReader
- java.lang.Object
-
- adams.data.io.input.CsvSpreadSheetReader.ChunkReader
-
- All Implemented Interfaces:
Serializable
- Enclosing class:
- CsvSpreadSheetReader
public static class CsvSpreadSheetReader.ChunkReader extends Object implements Serializable
Reads CSV files chunk by chunk.- Version:
- $Revision$
- Author:
- fracpete (fracpete at waikato dot ac dot nz)
- See Also:
- Serialized Form
-
-
Field Summary
Fields Modifier and Type Field Description static char[]COLLAPSED_QUOTESthe replacement for doubled up quotes.static String[]DOUBLED_UP_QUOTESthe doubled up quotes to replace.protected Cell.ContentType[]m_AutoTypesthe automatically determined column types.protected intm_ChunkSizethe chunk size.protected Stringm_Commentthe comment string.protected gnu.trove.set.hash.TIntHashSetm_DateColsthe date column indices.protected DateFormatm_DateFormatthe date format.protected gnu.trove.set.hash.TIntHashSetm_DateTimeColsthe date/time column indices.protected DateFormatm_DateTimeFormatthe date/time format.protected gnu.trove.set.hash.TIntHashSetm_DateTimeMsecColsthe date/time msec column indices.protected DateFormatm_DateTimeMsecFormatthe date/time msec format.protected intm_FirstRowthe first row to retrieve (1-based).protected booleanm_HasDateColswhether any date columns are defined.protected booleanm_HasDateTimeColswhether any date/time columns are defined.protected booleanm_HasDateTimeMsecColswhether any date/time msec columns are defined.protected booleanm_HasTextColswhether any text columns are defined.protected booleanm_HasTimeColswhether any time columns are defined.protected booleanm_HasTimeMsecColswhether any time/msec columns are defined.protected SpreadSheetm_Headerthe header.protected List<String>m_HeaderCellsthe header cells to use.protected charm_LastCharthe last character that was read too far.protected BaseRegExpm_MissingValuethe missing value.protected NumberFormatm_NumberFormatthe number format.protected intm_NumRowsthe number of rows to retrieve (less than 1 = unlimited).protected intm_NumRowsAutothe number of rows to use for automatically determining the column types.protected CsvSpreadSheetReaderm_Ownerthe owning reader.protected booleanm_ParseFormulaswhether to parse formula-like cells.protected charm_QuoteCharthe quote char.protected BufferedReaderm_Readerthe reader in use.protected intm_RowCountthe rows read so far.protected charm_Separatorthe column separator.protected booleanm_SkipDifferingRowswhether to drop rows with too few or too many cells.protected gnu.trove.set.hash.TIntHashSetm_TextColsthe text column indices.protected gnu.trove.set.hash.TIntHashSetm_TimeColsthe time column indices.protected DateFormatm_TimeFormatthe time format.protected gnu.trove.set.hash.TIntHashSetm_TimeMsecColsthe time/msec column indices.protected DateFormatm_TimeMsecFormatthe time/smec format.protected booleanm_Trimwhether to trim the cells.
-
Constructor Summary
Constructors Constructor Description ChunkReader(CsvSpreadSheetReader owner)Initializes the low-level reader.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected voidaddCell(StringBuilder current, List<String> cells)Adds the current string to the cells.protected voidclose()Closes the reader.booleanhasNext()Returns whether there is more data to be read.SpreadSheetnext()Reads the next chunk.SpreadSheetread(Reader r)Reads the spreadsheet content from the specified reader.protected List<String>readCells(Reader reader)Reads a row and breaks it up into cells.protected voidremoveTrailingCR(StringBuilder current)Removes a trailing CR.protected Stringunquote(String s)Unquotes the given string.
-
-
-
Field Detail
-
DOUBLED_UP_QUOTES
public static final String[] DOUBLED_UP_QUOTES
the doubled up quotes to replace.
-
COLLAPSED_QUOTES
public static final char[] COLLAPSED_QUOTES
the replacement for doubled up quotes.
-
m_Owner
protected CsvSpreadSheetReader m_Owner
the owning reader.
-
m_Reader
protected BufferedReader m_Reader
the reader in use.
-
m_Header
protected SpreadSheet m_Header
the header.
-
m_MissingValue
protected BaseRegExp m_MissingValue
the missing value.
-
m_HasTextCols
protected boolean m_HasTextCols
whether any text columns are defined.
-
m_TextCols
protected gnu.trove.set.hash.TIntHashSet m_TextCols
the text column indices.
-
m_HasDateCols
protected boolean m_HasDateCols
whether any date columns are defined.
-
m_DateCols
protected gnu.trove.set.hash.TIntHashSet m_DateCols
the date column indices.
-
m_HasDateTimeCols
protected boolean m_HasDateTimeCols
whether any date/time columns are defined.
-
m_DateTimeCols
protected gnu.trove.set.hash.TIntHashSet m_DateTimeCols
the date/time column indices.
-
m_HasDateTimeMsecCols
protected boolean m_HasDateTimeMsecCols
whether any date/time msec columns are defined.
-
m_DateTimeMsecCols
protected gnu.trove.set.hash.TIntHashSet m_DateTimeMsecCols
the date/time msec column indices.
-
m_HasTimeCols
protected boolean m_HasTimeCols
whether any time columns are defined.
-
m_HasTimeMsecCols
protected boolean m_HasTimeMsecCols
whether any time/msec columns are defined.
-
m_TimeCols
protected gnu.trove.set.hash.TIntHashSet m_TimeCols
the time column indices.
-
m_TimeMsecCols
protected gnu.trove.set.hash.TIntHashSet m_TimeMsecCols
the time/msec column indices.
-
m_DateFormat
protected DateFormat m_DateFormat
the date format.
-
m_DateTimeFormat
protected DateFormat m_DateTimeFormat
the date/time format.
-
m_DateTimeMsecFormat
protected DateFormat m_DateTimeMsecFormat
the date/time msec format.
-
m_TimeFormat
protected DateFormat m_TimeFormat
the time format.
-
m_TimeMsecFormat
protected DateFormat m_TimeMsecFormat
the time/smec format.
-
m_NumberFormat
protected NumberFormat m_NumberFormat
the number format.
-
m_ChunkSize
protected int m_ChunkSize
the chunk size.
-
m_QuoteChar
protected char m_QuoteChar
the quote char.
-
m_Separator
protected char m_Separator
the column separator.
-
m_Comment
protected String m_Comment
the comment string.
-
m_Trim
protected boolean m_Trim
whether to trim the cells.
-
m_LastChar
protected char m_LastChar
the last character that was read too far.
-
m_RowCount
protected int m_RowCount
the rows read so far.
-
m_FirstRow
protected int m_FirstRow
the first row to retrieve (1-based).
-
m_NumRows
protected int m_NumRows
the number of rows to retrieve (less than 1 = unlimited).
-
m_NumRowsAuto
protected int m_NumRowsAuto
the number of rows to use for automatically determining the column types.
-
m_ParseFormulas
protected boolean m_ParseFormulas
whether to parse formula-like cells.
-
m_SkipDifferingRows
protected boolean m_SkipDifferingRows
whether to drop rows with too few or too many cells.
-
m_AutoTypes
protected Cell.ContentType[] m_AutoTypes
the automatically determined column types.
-
-
Constructor Detail
-
ChunkReader
public ChunkReader(CsvSpreadSheetReader owner)
Initializes the low-level reader.- Parameters:
owner- the owning reader
-
-
Method Detail
-
unquote
protected String unquote(String s)
Unquotes the given string.- Parameters:
s- the string to unquote, if necessary- Returns:
- the processed string
-
removeTrailingCR
protected void removeTrailingCR(StringBuilder current)
Removes a trailing CR.- Parameters:
current- the current buffer
-
addCell
protected void addCell(StringBuilder current, List<String> cells)
Adds the current string to the cells.- Parameters:
current- the current stringcells- the cells to add to
-
readCells
protected List<String> readCells(Reader reader) throws IOException
Reads a row and breaks it up into cells.- Parameters:
reader- the reader to read from- Returns:
- the cells, null if nothing could be read (EOF)
- Throws:
IOException- if reading fails, e.g., due to IO error
-
next
public SpreadSheet next()
Reads the next chunk.- Returns:
- the next chunk
-
close
protected void close()
Closes the reader.
-
hasNext
public boolean hasNext()
Returns whether there is more data to be read.- Returns:
- true if more data available
-
read
public SpreadSheet read(Reader r)
Reads the spreadsheet content from the specified reader.- Parameters:
r- the reader to read from- Returns:
- the spreadsheet or null in case of an error
-
-