class |
AbstractInPlaceSpreadSheetTransformer |
Ancestor for spreadsheet transformers that allow the processing to
happen in-place, rather than on a copy of the data.
|
class |
LookUpInit |
Creates a lookup table from a spreadsheet, using one column as key and another one as value.
|
class |
LookUpUpdate |
Updates the lookup table (in form of a spreadsheet) that passes through using the specified rules.
The rules can contain variables.
The rules use the following grammar:
expr_list ::= expr_list expr_part | expr_part
expr_part ::= conditional | assignment
conditional ::= if expr then assignments end
| if expr then assignments else assignments end
assignments ::= assignments assignment | assignment
assignment ::=
VARIABLE := expr;
| all ( "regexp" ) := expr;
expr ::= ( expr )
| NUMBER
| STRING
| BOOLEAN
| VARIABLE
| true
| false
| -expr
| expr < expr
| expr <= expr
| expr > expr
| expr >= expr
| expr = expr
| expr != expr
| not expr
| expr and expr
| expr or expr
| expr + expr
| expr - expr
| expr * expr
| expr / expr
| expr % expr
| expr ^ expr
| abs ( expr )
| sqrt ( expr )
| cbrt ( expr )
| log ( expr )
| log10 ( expr )
| exp ( expr )
| sin ( expr )
| sinh ( expr )
| cos ( expr )
| cosh ( expr )
| tan ( expr )
| tanh ( expr )
| atan ( expr )
| atan2 ( exprY , exprX )
| hypot ( exprX , exprY )
| signum ( expr )
| rint ( expr )
| floor ( expr )
| pow[er] ( expr , expr )
| ceil ( expr )
| min ( expr1 , expr2 )
| max ( expr1 , expr2 )
Notes:
- Variables are either all alphanumeric and -/_ (e.g., "ABc_1-2") or any character
apart from "'" enclosed by "'" and "'" (e.g., "'Hello World'").
- The 'all' method applies the value to all the values in the lookup table
that match the regular expression.
- Variables starting with '_' are considered local and don't get transferred back out.
Input/output:
- accepts:
adams.data.spreadsheet.SpreadSheet
- generates:
adams.data.spreadsheet.SpreadSheet
|
class |
SpreadSheetAggregate |
Aggregates rows (min, max, avg, etc) in a spreadsheet using key columns.
All numeric columns in the specified aggregrate range (excluding the key columns) get aggregated.
|
class |
SpreadSheetAnonymize |
Anonymizes a range of columns in a spreadsheet.
|
class |
SpreadSheetAppend |
Appends the incoming spreadsheet to one in storage.
If there is none in storage yet, the incoming spreadsheet will simply get stored in storage.
The spreadsheets need not have the same structure, but it is assumed that column names are unique within a spreadsheet.
The combined spreadsheet is then forwarded.
|
class |
SpreadSheetAppendComments |
Appends the comments of the spreadsheet.
|
class |
SpreadSheetCollapse |
Uses the specified key columns to identify groups of rows.
|
class |
SpreadSheetColumnFilter |
Filters spreadsheets using the specified column finder.
The output contains all the columns that the specified finder selected.
|
class |
SpreadSheetColumnsByName |
Creates a new spreadsheet with the columns that matched the regular expression.
|
class |
SpreadSheetColumnStatistic |
Generates statistics for a chosen colunm.
|
class |
SpreadSheetConvertCells |
Finds cells in a spreadsheet and converts them with a conversion scheme.
If the conversion scheme generates a adams.data.spreadsheet.SpreadSheet object itself, this will get merged with the enclosing one: any additional columns get added and the content of the first row gets added to the row the converted cell belongs to.
|
class |
SpreadSheetConvertHeaderCells |
Converts the header cells of a spreadsheet with a conversion scheme.
|
class |
SpreadSheetCopyColumns |
Copies a range of columns to a specific position in the spreadsheets coming through.
|
class |
SpreadSheetCopyRows |
Copies a range of columnrows to a specific position in the spreadsheets coming through.
|
class |
SpreadSheetFilter |
Applies the specified spreadsheet filter to the data.
|
class |
SpreadSheetInsertColumn |
Inserts a column at a specific position into spreadsheets coming through.
The cells are initialized with a pre-defined value.
|
class |
SpreadSheetInsertRow |
Inserts a row at a specific position into spreadsheets coming through.
The cells are initialized with a pre-defined value.
|
class |
SpreadSheetInsertRowScore |
Inserts a score column at a specific position into spreadsheets coming through.
|
class |
SpreadSheetMatrixStatistic |
Generates statistics for the spreadsheet.
|
class |
SpreadSheetQuery |
Applies a query (SELECT, UPDATE, DELETE) on a spreadsheet.
Variables are supported as well, e.g., : SELECT * WHERE Blah = @{val} with 'val' being a variable available at execution time.
The following grammar is used for the query:
expr_list ::= expr_list expr_part | expr_part;
expr_part ::= select | update | delete;
select ::= SELECT col_list [limit]
| SELECT col_list WHERE cond_list [limit]
| SELECT col_list ORDER BY order_list [limit]
| SELECT col_list WHERE cond_list ORDER BY order_list [limit]
| SELECT agg_list
| SELECT agg_list GROUP BY col_list
| SELECT agg_list HAVING cond_list
| SELECT agg_list GROUP BY col_list HAVING cond_list
;
update ::= UPDATE SET upd_list
| UPDATE SET upd_list WHERE cond_list
;
delete ::= DELETE WHERE cond_list
;
col_list ::= col_list COMMA col
| col
| SELECT NUMBER [subsample: <1 = percent; >= 1 number of rows]
;
col ::= *
| COLUMN
| COLUMN AS COLUMN
;
upd_list ::= upd_list COMMA upd | upd;
upd ::= COLUMN = value
;
order_list::= order_list COMMA order | order;
order ::= COLUMN
| COLUMN ASC
| COLUMN DESC
;
cond_list ::= cond_list cond
| cond
;
cond ::= COLUMN < value
| COLUMN <= value
| COLUMN = value
| COLUMN <> value
| COLUMN >= value
| COLUMN > value
| COLUMN REGEXP STRING
| COLUMN IS NULL
| CELLTYPE ( COLUMN ) = "numeric|long|double|boolean|string|time|date|datetime|timestamp|object|missing"
| ( cond )
| cond:c1 AND cond:c2
| cond:c1 OR cond:c2
| NOT cond
;
value ::= NUMBER
| STRING
| PARSE ( "number" , STRING )
| PARSE ( "date" , STRING )
| PARSE ( "time" , STRING )
| PARSE ( "timestamp" , STRING )
;
limit ::= LIMIT NUMBER:max
| LIMIT NUMBER:offset , NUMBER:max
;
agg_list ::= agg_list COMMA agg
| agg
;
agg ::= COUNT [(*)] [AS COLUMN]
| MIN ( COLUMN ) [AS COLUMN]
| MAX ( COLUMN ) [AS COLUMN]
| RANGE ( COLUMN ) [AS COLUMN] (= MIN - MAX)
| MEAN ( COLUMN ) [AS COLUMN]
| AVERAGE ( COLUMN ) [AS COLUMN]
| STDEV ( COLUMN ) [AS COLUMN]
| STDEVP ( COLUMN ) [AS COLUMN]
| SUM ( COLUMN ) [AS COLUMN]
| IQR ( COLUMN ) [AS COLUMN]
| INTERQUARTILE ( COLUMN ) [AS COLUMN]
Notes:
- time format: 'HH:mm'
- date format: 'yyyy-MM-dd'
- timestamp format: 'yyyy-MM-dd HH:mm'
- STRING is referring to characters enclosed by double quotes
- COLUMN is either a string with no blanks (consisting of letters, numbers, hyphen or underscore; eg 'MyCol-1') or a bracket enclosed string when containing blanks (eg '[Some other col]')
- columns used in the ORDER BY clause must be present in the SELECT part; also, any alias given to them in SELECT must be used instead of original column name
Input/output:
- accepts:
adams.data.spreadsheet.SpreadSheet
- generates:
adams.data.spreadsheet.SpreadSheet
|
class |
SpreadSheetRandomSystematicSample |
Performs random systematic sampling on the rows of the incoming spreadsheet.
Divides the rows into N blocks with N being the sample size.
|
class |
SpreadSheetRemoveColumn |
Removes the column(s) at the specific position from spreadsheets coming through.
|
class |
SpreadSheetRemoveRow |
Removes one or more rows at the specific position from spreadsheets coming through.
|
class |
SpreadSheetReorderColumns |
Reorders the columns in a spreadsheet according to a user-supplied order.
|
class |
SpreadSheetReorderRows |
Reorders the rows in a spreadsheet according to a user-supplied order.
|
class |
SpreadSheetReplaceCellValue |
Replaces cell values that match a regular expression with a predefined value.
|
class |
SpreadSheetRowBinning |
Applies a binning algorithm to the values from the specified binning column to filter the rows into specific bins.
A new column is then added containing the corresponding bin index.
|
class |
SpreadSheetRowFilter |
Filters spreadsheets using the specified row finder.
The output contains all the rows that the specified finder selected.
|
class |
SpreadSheetRowStatistic |
Generates statistics for a chosen row.
|
class |
SpreadSheetSetCell |
Sets the value of the specified cells in a spreadsheet.
|
class |
SpreadSheetSetHeaderCell |
Sets a single header cell value in a spreadsheet.
|
class |
SpreadSheetSort |
Sorts the rows of the spreadsheet according to the selected column indices and sort order (ascending/descending).
|
class |
SpreadSheetSortColumns |
Reorders a user-defined subset of columns by name using the specified comparator.
|
class |
SpreadSheetStatistic |
Generates statistics from a SpreadSheet object.
If cells aren't numeric or missing, a default value of zero is used.
|
class |
SpreadSheetSubset |
Extracts a subset of rows/columns from a spreadsheet.
|
class |
SpreadSheetSubsetByValue |
Generates subsets from a spreadsheet, grouped by the same string value in the specified column.
For instance, if a spreadsheet has 3 unique values (A, B, C) in column 2, then 3 subsheets will generated, each containing the rows that have the value A, B or C.
|
class |
SpreadSheetSubsetFromGroup |
Splits the spreadsheet into subsets using the supplied column and then returns the specified range of rows from each generated subset.
The spreadsheet is expected to be sorted on the grouping column.
|
class |
SpreadSheetTransformCells |
Finds cells in a spreadsheet and transforms them with a callable transformer.
In case of transformers having Object or Unknown in their types of classes that they accept, no proper type can be inferred automatically.
|
class |
SpreadSheetTransformHeaderCells |
Transforms header cells with a callable transformer.
In case of transformers having Object or Unknown in their types of classes that they accept, no proper type can be inferred automatically.
|
class |
SummaryStatistics |
Calculates the selected summary statistics and outputs a spreadsheet.
|