Name

adams.flow.transformer.SpreadSheetQuery


Synopsis

Applies a query (SELECT, UPDATE, DELETE) on a spreadsheet.
Variables are supported as well, e.g., : SELECT * WHERE Blah = @{val} with 'val' being a variable available at execution time.
The following grammar is used for the query:

expr_list ::= expr_list expr_part | expr_part;

expr_part ::= select | update | delete;

select ::= SELECT col_list [limit]
              | SELECT col_list WHERE cond_list [limit]
              | SELECT col_list ORDER BY order_list [limit]
              | SELECT col_list WHERE cond_list ORDER BY order_list [limit]
              | SELECT agg_list
              | SELECT agg_list GROUP BY col_list
              | SELECT agg_list HAVING cond_list
              | SELECT agg_list GROUP BY col_list HAVING cond_list
              ;

update ::= UPDATE SET upd_list
              | UPDATE SET upd_list WHERE cond_list
              ;

delete ::= DELETE WHERE cond_list
              ;

col_list ::= col_list COMMA col
              | col
              | SELECT NUMBER [subsample: <1 = percent; >= 1 number of rows]
              ;

col ::= *
              | COLUMN
              | COLUMN AS COLUMN
              ;

upd_list ::= upd_list COMMA upd | upd;

upd ::= COLUMN = value
              ;

order_list::= order_list COMMA order | order;

order ::= COLUMN
              | COLUMN ASC
              | COLUMN DESC
              ;
              
cond_list ::= cond_list cond
              | cond
              ;

cond ::= COLUMN < value
              | COLUMN <= value
              | COLUMN = value
              | COLUMN <> value
              | COLUMN >= value
              | COLUMN > value
              | COLUMN REGEXP STRING
              | COLUMN IS NULL
              | CELLTYPE ( COLUMN ) = "numeric|long|double|boolean|string|time|date|datetime|timestamp|object|missing"
              | ( cond )
              | cond:c1 AND cond:c2
              | cond:c1 OR cond:c2
              | NOT cond
              ;

value ::= NUMBER
              | STRING
              | PARSE ( "number" , STRING )
              | PARSE ( "date" , STRING )
              | PARSE ( "time" , STRING )
              | PARSE ( "timestamp" , STRING )
              ;

limit ::= LIMIT NUMBER:max
              | LIMIT NUMBER:offset , NUMBER:max
              ;
agg_list ::= agg_list COMMA agg
              | agg
              ;

agg ::= COUNT [(*)|(COLUMN)] [AS COLUMN]
              | MIN ( COLUMN ) [AS COLUMN]
              | MAX ( COLUMN ) [AS COLUMN]
              | RANGE ( COLUMN ) [AS COLUMN] (= MIN - MAX)
              | MEAN ( COLUMN ) [AS COLUMN]
              | AVERAGE ( COLUMN ) [AS COLUMN]
              | STDEV ( COLUMN ) [AS COLUMN]
              | STDEVP ( COLUMN ) [AS COLUMN]
              | SUM ( COLUMN ) [AS COLUMN]
              | IQR ( COLUMN ) [AS COLUMN]
              | INTERQUARTILE ( COLUMN ) [AS COLUMN]

Notes:
- time format: 'HH:mm'
- date format: 'yyyy-MM-dd'
- timestamp format: 'yyyy-MM-dd HH:mm'
- STRING is referring to characters enclosed by double quotes
- COLUMN is either a string with no blanks (consisting of letters, numbers, hyphen or underscore; eg 'MyCol-1') or a bracket enclosed string when containing blanks (eg '[Some other col]')
- columns used in the ORDER BY clause must be present in the SELECT part; also, any alias given to them in SELECT must be used instead of original column name


Additional information

Flow input/output:
- input: adams.data.spreadsheet.SpreadSheet
- output: adams.data.spreadsheet.SpreadSheet


Options