adams.flow.transformer.SpreadSheetQuery
Applies a query (SELECT, UPDATE, DELETE) on a spreadsheet.
Variables are supported as well, e.g., : SELECT * WHERE Blah = @{val} with 'val' being a variable available at execution time.
The following grammar is used for the query:
expr_list ::= expr_list expr_part | expr_part;
expr_part ::= select | update | delete;
select ::= SELECT col_list [limit]
| SELECT col_list WHERE cond_list [limit]
| SELECT col_list ORDER BY order_list [limit]
| SELECT col_list WHERE cond_list ORDER BY order_list [limit]
| SELECT agg_list
| SELECT agg_list GROUP BY col_list
| SELECT agg_list HAVING cond_list
| SELECT agg_list GROUP BY col_list HAVING cond_list
;
update ::= UPDATE SET upd_list
| UPDATE SET upd_list WHERE cond_list
;
delete ::= DELETE WHERE cond_list
;
col_list ::= col_list COMMA col
| col
| SELECT NUMBER [subsample: <1 = percent; >= 1 number of rows]
;
col ::= *
| COLUMN
| COLUMN AS COLUMN
;
upd_list ::= upd_list COMMA upd | upd;
upd ::= COLUMN = value
;
order_list::= order_list COMMA order | order;
order ::= COLUMN
| COLUMN ASC
| COLUMN DESC
;
cond_list ::= cond_list cond
| cond
;
cond ::= COLUMN < value
| COLUMN <= value
| COLUMN = value
| COLUMN <> value
| COLUMN >= value
| COLUMN > value
| COLUMN REGEXP STRING
| COLUMN IS NULL
| CELLTYPE ( COLUMN ) = "numeric|long|double|boolean|string|time|date|datetime|timestamp|object|missing"
| ( cond )
| cond:c1 AND cond:c2
| cond:c1 OR cond:c2
| NOT cond
;
value ::= NUMBER
| STRING
| PARSE ( "number" , STRING )
| PARSE ( "date" , STRING )
| PARSE ( "time" , STRING )
| PARSE ( "timestamp" , STRING )
;
limit ::= LIMIT NUMBER:max
| LIMIT NUMBER:offset , NUMBER:max
;
agg_list ::= agg_list COMMA agg
| agg
;
agg ::= COUNT [(*)|(COLUMN)] [AS COLUMN]
| MIN ( COLUMN ) [AS COLUMN]
| MAX ( COLUMN ) [AS COLUMN]
| RANGE ( COLUMN ) [AS COLUMN] (= MIN - MAX)
| MEAN ( COLUMN ) [AS COLUMN]
| AVERAGE ( COLUMN ) [AS COLUMN]
| STDEV ( COLUMN ) [AS COLUMN]
| STDEVP ( COLUMN ) [AS COLUMN]
| SUM ( COLUMN ) [AS COLUMN]
| IQR ( COLUMN ) [AS COLUMN]
| INTERQUARTILE ( COLUMN ) [AS COLUMN]
Notes:
- time format: 'HH:mm'
- date format: 'yyyy-MM-dd'
- timestamp format: 'yyyy-MM-dd HH:mm'
- STRING is referring to characters enclosed by double quotes
- COLUMN is either a string with no blanks (consisting of letters, numbers, hyphen or underscore; eg 'MyCol-1') or a bracket enclosed string when containing blanks (eg '[Some other col]')
- columns used in the ORDER BY clause must be present in the SELECT part; also, any alias given to them in SELECT must be used instead of original column name
Flow input/output:
- input: adams.data.spreadsheet.SpreadSheet
- output: adams.data.spreadsheet.SpreadSheet
The logging level for outputting errors and debugging output.
command-line | -logging-level <OFF|SEVERE|WARNING|INFO|CONFIG|FINE|FINER|FINEST> |
default | WARNING |
min-user-mode | Expert |
The name of the actor.
command-line | -name <java.lang.String> |
default | SpreadSheetQuery |
The annotations to attach to this actor.
command-line | -annotation <adams.core.base.BaseAnnotation> |
default |
|
If set to true, transformation is skipped and the input token is just forwarded as it is.
command-line | -skip <boolean> |
default | false |
If set to true, the flow execution at this level gets stopped in case this actor encounters an error; the error gets propagated; useful for critical actors.
command-line | -stop-flow-on-error <boolean> |
default | false |
min-user-mode | Expert |
If enabled, then no errors are output in the console; Note: the enclosing actor handler must have this enabled as well.
command-line | -silent <boolean> |
default | false |
min-user-mode | Expert |
The query to execute.
command-line | -query <adams.parser.SpreadSheetQueryText> |
default | SELECT * |