Package adams.data.io.input
Class CsvSpreadSheetReader.ChunkReader
- java.lang.Object
-
- adams.data.io.input.CsvSpreadSheetReader.ChunkReader
-
- All Implemented Interfaces:
Serializable
- Enclosing class:
- CsvSpreadSheetReader
public static class CsvSpreadSheetReader.ChunkReader extends Object implements Serializable
Reads CSV files chunk by chunk.- Version:
- $Revision$
- Author:
- fracpete (fracpete at waikato dot ac dot nz)
- See Also:
- Serialized Form
-
-
Field Summary
Fields Modifier and Type Field Description static char[]
COLLAPSED_QUOTES
the replacement for doubled up quotes.static String[]
DOUBLED_UP_QUOTES
the doubled up quotes to replace.protected Cell.ContentType[]
m_AutoTypes
the automatically determined column types.protected int
m_ChunkSize
the chunk size.protected String
m_Comment
the comment string.protected gnu.trove.set.hash.TIntHashSet
m_DateCols
the date column indices.protected DateFormat
m_DateFormat
the date format.protected gnu.trove.set.hash.TIntHashSet
m_DateTimeCols
the date/time column indices.protected DateFormat
m_DateTimeFormat
the date/time format.protected gnu.trove.set.hash.TIntHashSet
m_DateTimeMsecCols
the date/time msec column indices.protected DateFormat
m_DateTimeMsecFormat
the date/time msec format.protected int
m_FirstRow
the first row to retrieve (1-based).protected boolean
m_HasDateCols
whether any date columns are defined.protected boolean
m_HasDateTimeCols
whether any date/time columns are defined.protected boolean
m_HasDateTimeMsecCols
whether any date/time msec columns are defined.protected boolean
m_HasTextCols
whether any text columns are defined.protected boolean
m_HasTimeCols
whether any time columns are defined.protected boolean
m_HasTimeMsecCols
whether any time/msec columns are defined.protected SpreadSheet
m_Header
the header.protected List<String>
m_HeaderCells
the header cells to use.protected char
m_LastChar
the last character that was read too far.protected BaseRegExp
m_MissingValue
the missing value.protected NumberFormat
m_NumberFormat
the number format.protected int
m_NumRows
the number of rows to retrieve (less than 1 = unlimited).protected int
m_NumRowsAuto
the number of rows to use for automatically determining the column types.protected CsvSpreadSheetReader
m_Owner
the owning reader.protected boolean
m_ParseFormulas
whether to parse formula-like cells.protected char
m_QuoteChar
the quote char.protected BufferedReader
m_Reader
the reader in use.protected int
m_RowCount
the rows read so far.protected char
m_Separator
the column separator.protected boolean
m_SkipDifferingRows
whether to drop rows with too few or too many cells.protected gnu.trove.set.hash.TIntHashSet
m_TextCols
the text column indices.protected gnu.trove.set.hash.TIntHashSet
m_TimeCols
the time column indices.protected DateFormat
m_TimeFormat
the time format.protected gnu.trove.set.hash.TIntHashSet
m_TimeMsecCols
the time/msec column indices.protected DateFormat
m_TimeMsecFormat
the time/smec format.protected boolean
m_Trim
whether to trim the cells.
-
Constructor Summary
Constructors Constructor Description ChunkReader(CsvSpreadSheetReader owner)
Initializes the low-level reader.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected void
addCell(StringBuilder current, List<String> cells)
Adds the current string to the cells.protected void
close()
Closes the reader.boolean
hasNext()
Returns whether there is more data to be read.SpreadSheet
next()
Reads the next chunk.SpreadSheet
read(Reader r)
Reads the spreadsheet content from the specified reader.protected List<String>
readCells(Reader reader)
Reads a row and breaks it up into cells.protected void
removeTrailingCR(StringBuilder current)
Removes a trailing CR.protected String
unquote(String s)
Unquotes the given string.
-
-
-
Field Detail
-
DOUBLED_UP_QUOTES
public static final String[] DOUBLED_UP_QUOTES
the doubled up quotes to replace.
-
COLLAPSED_QUOTES
public static final char[] COLLAPSED_QUOTES
the replacement for doubled up quotes.
-
m_Owner
protected CsvSpreadSheetReader m_Owner
the owning reader.
-
m_Reader
protected BufferedReader m_Reader
the reader in use.
-
m_Header
protected SpreadSheet m_Header
the header.
-
m_MissingValue
protected BaseRegExp m_MissingValue
the missing value.
-
m_HasTextCols
protected boolean m_HasTextCols
whether any text columns are defined.
-
m_TextCols
protected gnu.trove.set.hash.TIntHashSet m_TextCols
the text column indices.
-
m_HasDateCols
protected boolean m_HasDateCols
whether any date columns are defined.
-
m_DateCols
protected gnu.trove.set.hash.TIntHashSet m_DateCols
the date column indices.
-
m_HasDateTimeCols
protected boolean m_HasDateTimeCols
whether any date/time columns are defined.
-
m_DateTimeCols
protected gnu.trove.set.hash.TIntHashSet m_DateTimeCols
the date/time column indices.
-
m_HasDateTimeMsecCols
protected boolean m_HasDateTimeMsecCols
whether any date/time msec columns are defined.
-
m_DateTimeMsecCols
protected gnu.trove.set.hash.TIntHashSet m_DateTimeMsecCols
the date/time msec column indices.
-
m_HasTimeCols
protected boolean m_HasTimeCols
whether any time columns are defined.
-
m_HasTimeMsecCols
protected boolean m_HasTimeMsecCols
whether any time/msec columns are defined.
-
m_TimeCols
protected gnu.trove.set.hash.TIntHashSet m_TimeCols
the time column indices.
-
m_TimeMsecCols
protected gnu.trove.set.hash.TIntHashSet m_TimeMsecCols
the time/msec column indices.
-
m_DateFormat
protected DateFormat m_DateFormat
the date format.
-
m_DateTimeFormat
protected DateFormat m_DateTimeFormat
the date/time format.
-
m_DateTimeMsecFormat
protected DateFormat m_DateTimeMsecFormat
the date/time msec format.
-
m_TimeFormat
protected DateFormat m_TimeFormat
the time format.
-
m_TimeMsecFormat
protected DateFormat m_TimeMsecFormat
the time/smec format.
-
m_NumberFormat
protected NumberFormat m_NumberFormat
the number format.
-
m_ChunkSize
protected int m_ChunkSize
the chunk size.
-
m_QuoteChar
protected char m_QuoteChar
the quote char.
-
m_Separator
protected char m_Separator
the column separator.
-
m_Comment
protected String m_Comment
the comment string.
-
m_Trim
protected boolean m_Trim
whether to trim the cells.
-
m_LastChar
protected char m_LastChar
the last character that was read too far.
-
m_RowCount
protected int m_RowCount
the rows read so far.
-
m_FirstRow
protected int m_FirstRow
the first row to retrieve (1-based).
-
m_NumRows
protected int m_NumRows
the number of rows to retrieve (less than 1 = unlimited).
-
m_NumRowsAuto
protected int m_NumRowsAuto
the number of rows to use for automatically determining the column types.
-
m_ParseFormulas
protected boolean m_ParseFormulas
whether to parse formula-like cells.
-
m_SkipDifferingRows
protected boolean m_SkipDifferingRows
whether to drop rows with too few or too many cells.
-
m_AutoTypes
protected Cell.ContentType[] m_AutoTypes
the automatically determined column types.
-
-
Constructor Detail
-
ChunkReader
public ChunkReader(CsvSpreadSheetReader owner)
Initializes the low-level reader.- Parameters:
owner
- the owning reader
-
-
Method Detail
-
unquote
protected String unquote(String s)
Unquotes the given string.- Parameters:
s
- the string to unquote, if necessary- Returns:
- the processed string
-
removeTrailingCR
protected void removeTrailingCR(StringBuilder current)
Removes a trailing CR.- Parameters:
current
- the current buffer
-
addCell
protected void addCell(StringBuilder current, List<String> cells)
Adds the current string to the cells.- Parameters:
current
- the current stringcells
- the cells to add to
-
readCells
protected List<String> readCells(Reader reader) throws IOException
Reads a row and breaks it up into cells.- Parameters:
reader
- the reader to read from- Returns:
- the cells, null if nothing could be read (EOF)
- Throws:
IOException
- if reading fails, e.g., due to IO error
-
next
public SpreadSheet next()
Reads the next chunk.- Returns:
- the next chunk
-
close
protected void close()
Closes the reader.
-
hasNext
public boolean hasNext()
Returns whether there is more data to be read.- Returns:
- true if more data available
-
read
public SpreadSheet read(Reader r)
Reads the spreadsheet content from the specified reader.- Parameters:
r
- the reader to read from- Returns:
- the spreadsheet or null in case of an error
-
-