Updates 2017/10/13

FracPete

2017-10-13 17:01

It is the end of semester and I'm busy marking from my students and working on some commercial projects. However, there was still time to fix and improve things. The modules that received the most attention are centered around Microsoft's deep learning framework CNTK (https://cntk.ai/).

Fixes

The LocalScopeTransformer now outputs the name of the last active actor in the error message instead of the first one's, in case the last active actor is not an actual transformer.
If you should encounter a problem on MySQL trying to create a table with a timestamp column that uses as DEFAULT '0000-00-00 00:00:00', then remove the NO_ZERO_DATE directive from the sql_mode option in your my.cnf/my.ini. E.g., use the following: sql_mode=IGNORE_SPACE,ERROR_FOR_DIVISION_BY_ZERO,NO_ZERO_IN_DATE,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION
adams-weka: The SpreadSheetSaver Weka converter no longer loses the configured writer (setOptions now calls the super method first, as it calls resetOptions).
adams-cntk-weka: The CNTKPrebuiltModel classifier now handles the flexible structure of CNTK regression models better.

Changes

Upgraded lanterna (terminal applications) to 3.0.0.
Bumped up CNTK to 2.2.
The Preview browser now uses simple file monitoring to update the display (using the LastModified monitor).
Spreadsheet writers are now stoppable (at least the ones that write the data incrementally).
Introduced StopRestrictor interface for better stopping of flow or parts of flow, affecting the Stop actor, interactive actors and LocalScopeTransformer/LocalScopeTrigger.
Added has(sym) operator that checks whether a certain symbol is present, affecting BooleanExpression, MathematicalExpression and ReportMathExpression.
The Revert menu item in the Flow editor now detects if a file was modified on disk.
The ProgressBar sink now has an optonal title option for displaying a title string within the panel itself, which is useful when displaying multiple progress bars at the same time.

Additions

OptionProducer transformer for turning objects into option strings using the specified OptionProducer scheme.
The CloseCallableDisplay control actor allows closing of frames of callable actors.
Added the Command source for executing an external command: it is similar to the Exec source, but outputs stdout and/or stderr as continuous stream instead of waiting for the process to finish before outputting them.
adams-meta:
- Added NewFlow for generating a flow using a template.
- FlowFileReader reads a flow from disk and FlowFileWriter writes it to disk.
- FlowDisplay simply displays the incoming flow as tree (like the flow editor).
adams-cntk:
- Added a preference panel for CNTK, at this stage only for specifying the binary.
- The CNTKSetup standalone allows overriding the global settings (ie binary) for CNTK within a flow.
- With the CNTKBrainScriptExec flow it is possible to execute a brainscript and display the output. With the usual string processing facilities in ADAMS, you can also plot the performance of the network per epoch, e.g., plotting the RMSE.
- Added reader for CNTK text files: CNTKSpreadSheetReader
- Added CNTKModelReader and CNTKModelInfo for reading a model and outputting information about a model.

Have a great Friday 13th! ;-)

Move on Github

FracPete

2017-09-28 12:04

Apologies, the ADAMS repositories have moved again, but only from one GitHub organization to another. This was done mainly to align the repositories with our commercial activity. The organization account on GitHub is called waikato-datamining.

You can find the relocated repositories now here:

Don't worry, if you have already cloned the repository, you can simply update the URL in the .git/config file from:

url = https://github.com/Waikato/<repo>.git

to:

url = https://github.com/waikato-datamining/<repo>.git

OpenHub is back!

FracPete

2017-09-19 17:17

Thanks to the move to GitHub, the metrics on OpenHub are back!

https://www.openhub.net/p/theadamsflow

Over a year ago, OpenHub failed to update their copy of the source code via subversion. For some reason, they couldn't handle the anonymous access that was used by our in-house subversion server.

Anyhow, this is sorted now!

Updates 2017/09/15

FracPete

2017-09-15 16:35

The most notable change is that external actors now can have a file change monitor defined, which can trigger reloads of the external flow next time it gets executed. Useful for flows running in the cloud, which need to reload settings stored in external flows.

Fixes

The ImageViewer now clears selection listeners, left click listeners and image overlays when displaying an image.
Fixed suggestions for actor templates in the Flow editor. EndlessLoop template is now the default one.
The MultiPaintlet for sequences now lets the base paintlets set the correct stroke themselves.
adams-imaging: The MultiImageOverlay now use allows to select the base overlays again.
adams-spectral-2dim-handheld: The ScioLabExportSpectrumReader now handles the updated column names in the CSV export files (eg "spectrum_0 + 740" instead of "spectrum_740.0").
adams-webservice: The XMLLoggingInInterceptor now handles multi-part messages when pretty-printing the message itself.

Changes

Added a file change monitor to the external actors (ExternalStandalone/Source/Transformer/Sink), reloading the external flow at the next execution if the monitor triggered. The first execution of the actor is used for initializing the monitor.
Added on-the-fly flag for external actors using flows that get generated at runtime (eg by a script).
Added support for BUSINESSDAY in date/time expressions (BaseDate, BaseDateTime, BaseDateTimeMsec) and in ExtractDateTimeField conversion.
Most interactive actors now allow the selection of callable actor to use as parent component (e.g., for relative location); it is also possible to use the enclosing window (frame/dialog) as parent component rather than the callable actor (eg when the actor is inside a GridView/TabView).
adams-imaging:
- The Image processor visualization now applies the same flow to all the tabs. Adding loading, saving of flow snippets. Zoom for all original/processed can be selected from the menu.
adams-latex: The LatexSpreadSheetWriter can output the header now in bold face.
adams-rats:
- The FileLister rat input can output files as array now as well.
- The generator used by LabRat now returns an array of Rat actors.

Additions

Added simple, single file change monitors, accessible through the FileChanged transformer.
Added transformer for updating the 'last modified' timestamp of files: Touch
Added PromptUser actor template.
adams-spectral-2dim-core:
- added spectrum reader for MPS XRF files.
- simple spreadsheet writer that outputs spectra with the specified spreadsheet writer: SpreadSheetSpectrumWriter
- added several preview handlers for spectra
adams-rats: added generic scripting support for rat generators for LabRat, via Scripted generator.
adams-spreadsheet: added SpreadSheetSelectSubset interactive actor that allows the user to select a subset from a spreadsheet.

Move to Github

FracPete

2017-09-02 18:10

Finally came across a migration guide (subversion to git), one that actually worked:

https://git-scm.com/book/en/v2/Git-and-Other-Systems-Migrating-to-Git

Main issue with the migration was the sub-division of the repository into base/addons/incubator/etc. However, I decided to skip really old code and just start at a revision after the sub-division.

You can find the repositories now here:

Contributing should be much easier now, so go on, fork the repos! :-)

Updates 2017/09/01

FracPete

2017-09-01 17:29

Been working on an image processing project again, so more work done in related modules. Added some common statistics to the Weka module.

Fixes

fixed Variables.extractNames(expr) (used by the CheckVariableUsage actor processor and the green tick in the Flow editor's toolbar): no longer ends up in endless loop if invalid variable names are used in a string expression
The filter field of the BaseFileChooser now only works on files, not directories
The image overlays, left-click-listeners and selection listeners in the ImageViewer sink now get cleared whenever a token arrives (otherwise get multiplied in case of Inspect control actor)
The PromptUser boolean flow condition now expands variables in the message string as well.
Removed the cleanUp call in the doExecute method of the AbstractFilter transformer, to avoid problems with trainable filters.
Improved unique name generation in Flow editor: when copying Actor (2) it will now generate Actor (3) instead of Actor (2) (2).
adams-ml: The ConfusionMatrix transformer now always generates a square matrix, taking all the labels (actual/predicted) into account.
adams-weka: The Weka package manager is working again, after backporting some changes to the modified Weka 3.9.0 version that is currently in use.
adams-imaging: MergeObjectLocations now correctly performs a merge if there aren't any objects in the current report.

Changes

Updated jclasslocator to make ADAMS work with Java 9 (for automatic class discovery).
Updated jeneric-cmdline.
adams-weka:
- The Predictions table for classification and the Cluster assignments table for clustering in the Weka Investigator are now searchable.
- The Preprocess tab in the Weka Investigator now automatically selects the last opened/filtered dataset.
- The WekaFilter transformer now stores the input data in the WekaFilterContainer (when outputting containers) as well.
- WekaCrossValidationEvaluator, WekaTrainTestSetEvaluator and WekaTestSetEvaluator now store the test data in the container as well (if possible).
- WekaPredictionsToInstances and WekaPredictionsToSpreadSheet can output the test data along side the measures now (if possible).
adams-cntk: now uses CNTK 2.1
adams-dl4j: upgraded deeplearning4j to 0.9.1
adams-imaging: The ImageReader transformer now can load the meta-data directly (optional).

Additions

Added support to the adams.flow.FlowRunner class for installing custom JVM shutdown hooks, like executing a remote command.
adams-imaging: added channels splitters for HSV, YUV and YIQ color models, acting as BufferedImageTransformer plugins: SplitChannelsHSV, SplitChannelsYUV, SplitChannelsYIQ.
adams-weka:
- Added actor for nearest neighbor search: WekaNearestNeighborSearch.
- Added SDR (Standard Deviation of Residuals) statistic to evaluation output.
- Added RPD (Ratio of Performance to Deviation) statistic to evaluation output.
- Added RowSum filter for replacing all attributes (except class) with sum of numeric attributes in a row.
adams-spectral-2dim-core:
- reader for Nicolet SPA spectral data files: SPASpectrumReader.
- trainable batch spectrum filter for multiplicative scatter correction (MSC): MultiplicativeScatterCorrection
- The ApplyMultiplicativeScatterCorrection amplitude transform scheme allows the application of MSC using slopes and intercepts stored in the report.
- added SegmentedDownSample Weka filter in conjunction with its SegmentedDownSampleNthPoints Hermione handler.
- added spectrum writer outputting spectra as images, using the amplitude to determine color of pixels in image: IntensityImageSpectrumWriter
adams-cntk: added CNTKModelGenerator source for outputting CNTK model blocks using the specified model generator.
adams-basic-app: stripped down version sporting CSV spreadsheet support and Groovy scripting.
adams-imaging:
- added CountObjectsInRegion transformer for counting objects in report that fall within the defined region, e.g., when processing annotated objects in images.
- added ObjectLocationsFromReport preview to the Preview browser, which displays an image with an object locations overlay, using locations obtained from a report (simple format) with the same name as the image.
- added ImageObjectFilter which utilizes the new object finder class hierarchy for filtering objects in the report attached to an image.
- added conversions for handling rectangles: StringToRectangle, RectangleToString, RectangleCenter
- added feature generator that uses a image transformer as filter before applying the base generator: FilteredBufferedImageFeatureGenerator
- added convenience actors for handling image objects: GetImageObjectIndices, ImageObjectInfo, RemoveImageObject.
- added image overlay for displaying object locations as circle/ellipse: ObjectCentersOverlayFromReport.
- added preview to Preview browser using ObjectCentersOverlayFromReport: ObjectCentersFromReport.
adams-imaging-boofcv:
- added feature generator that uses a image transformer as filter before applying the base generator: FilteredBufferedImageFeatureGenerator
adams-imaging-imagej:
- added feature generator that uses a image transformer as filter before applying the base generator: FilteredBufferedImageFeatureGenerator

Updates 2017/07/21

FracPete

2017-07-21 17:15

The new semester started last week, so I was busy with my students. Development has mainly happened around deeplearning4j and prediction support for Microsoft's deeplearning library CNTK.

Fixes

LoadBalancer: fixed losing of outer variables; uses a Flow control actor now internally for better encapsulation.

Changes

Added support for outputting relative paths with the FileSystemSearch source: LocalDirectorySearch, LocalDirectorySearchWithComparator, LocalDirectorySearchWithCustomSort, LocalFileSearch
Panels managed by the DisplayPanelManager get re-used via their unique ID now properly (eg when using a variable), not just when mergable. Allows out of order updates of sequence plots now.
The Min and Max transformers can return 1-based indices now.
Added support for ADAMS_LIBRARY_PATH environment variable to adams.core.management.Launcher: its content gets supplied to the JVM via -Djava.library.path (used for native libraries like CNTK, MKL).
adams-dl4j:
- Added ability to DL4JTrainModel transformer for testing the model on a test set (split off the training data) and output the best model found so far, with associated statistic(s).
- Added support for criteria to stop training to DL4JTrainModel rather than just having fixed number of epochs.
adams-weka: The WekaFilter transformer can make use of storage and source actor now for obtaining the actual filter to use, not just serialized file or the filter specification.

Additions

Added the ArrayNormalize array statistic, which normalizes an array to sum up to 1.0.
adams-cntk:
- added support for applying CNTK models: CNTKModelApplier.
- added spreadsheet writer for CNTK text file format: CNTKSpreadSheetWriter
- added image feature generator: DefaultCNTK
adams-cntk-weka: Added pseudo-classifier that uses a pre-built model: functions.CNTKPrebuiltModel
adams-dl4j: Added transformer for randomizing dataset: DL4JRandomizeDataset
adams-imaging:
- Added ScaleReportObjects transformer for scaling objects defined in reports.
- Added example flow for training an OpenCV Haar cascade from annotated images: adams-imaging-opencv_train_haar.flow
adams-imaging-openimaj: added generic object detector class hierarchy, to be used by adams.flow.transformer.locateobjects.OpenIMAJObjectDetector
adams-spreadsheet: added class hierarchy for processors that work on the selected rows in a spreadsheet table, e.g., copying files using the filename from the specified column. Functionality available through SpreadSheetDisplay sink.

Updates 2017/06/30

FracPete

2017-06-30 17:37

A lot of effort has gone into deeplearning4j the last few weeks: upgraded to the latest version, support for random network generation (how doesn't want to avoid hyper parameter fiddling???) and instructions for using Intel's MKL libraries for speeding up model building.

Filters can be serialized from the Weka Investigator now as well and re-used with the filter called SerializedFilter.

Fixes

The FoorLopp source now skips the consistency tests if a variable is attached to at least one of the properties: lower/upper/step
Downgraded MySQL the driver to 5.1.42, after receiving java.sql.SQLNonTransientConnectionException: CLIENT_PLUGIN_AUTH is required exceptions when using 6.0.6 of the JDBC driver.
removed double quotes from default executable of JDeps and JMap control actors.
adams-dl4j:
- The DL4JModelToJson and DL4JModelToYaml conversions now distinguish between Model and MultiLayerNetwork objects, to retrieve the correct configurations to convert.
- The DL4JModelWriter sink ensures now that MultiLayerNetwork has been initialized to avoid errors.
adams-event: fixed forcing of variables in Cron standalone actor.
adams-net:
- JavaMailSendEmail - using the javax.activation.DataHandler class with a URL didn't close the stream of attachements, resulting in locked files on Windows.
- re-using existing sessions now: FTPConnection, SMBConnection, SSHConnection

Changes

The Exec source can output stdout and stderr at the same time, ignore process errors and supports a working directory for the process.
Boolean/Mathematical/StringExpression: added "str(...)" method for converting objects/numbers into strings: str(expr) = any object's toString() method; str(expr,numdec) = any number is output with at most numdec decimals after the decimal point (trailing 0s get chopped off); str(expr,decformat) = applies the format to the number using java.text.DecimalFormat
SelectFile and SelectDirectory now support output with forward slashes.
adams-dl4j:
- Upgraded deeplearning4j to 0.8.0
- DL4JTrainModel now as a monitor variable for resetting the model, allowing for training sequentially on multiple datasets.
- Added instructions for using Intel MKL libraries to speed up processing.
- Moved the InMemoryStatsListenerConfigurator to the new adams-dl4j-insight module.
adams-weka:
- The Weka Investigator now allows filters to be serialized in the pre-process panel.
- The PrincipalComponentsJ filter now has the option -simple-attribute-names, which generates attributes like PCA_1...n instead of compiling them from the other attribute names.

Additions

Added simple GUI tool for performing XSLT (XML, XSL and Output panel), available from the main menu under Maintenance.
adams-dl4j:
- Added the CallableActorScoreListenerConfigurator iteration listener, which forwards the iteration count/score pair to a callable actor (eg for plotting).
- Added conversion for turning DL4J datasets into spreadsheets: DL4JDataSetToSpreadSheet
- Added conversion for converting spreadsheets into DL4J DataSets: SpreadSheetToDL4JDataSet
- Added fake configurator, as it only retrieves model from storage: FromStorage
- DL4JModelGenerator source generates model(s) using the specified generator scheme.
- Added previews in the Preview browser for DL4J models in JSON and YAML
- Conversions for recreating models from JSON and YAML: DL4JJsonToModel and DL4JYamlToModel
- Conversion for creating actual model from configurator: DL4JConfiguratorToModel
New module: adams-dl4j-insight for providing insight in model building, which is not necessary when deploying models (avoiding bloat).
adams-dl4j-weka: added conversions WekaInstancesToDL4JDataSet and WekaInstanceToDL4JINDArray, using Mark Hall's code from the Weka package for DL4J.
adams-imaging: added the RandomBoundingBox left-click processor.
adams-spreadsheet:
- added simple spreadsheet filtering framework via the SpreadSheetFilter transformer and the filter class hierarchy it uses. Initial filters: Normalize, Standardize.
- The SpreadSheetInsertColumnPosition conversion inserts column position in string (eg BG), replacing the specified placeholder
adams-weka:
- The WekaFilter spreadsheet filter allows to apply any Weka filter to a spreadsheet.
- weka.filters.SerializedFilter is a meta-filter that applies a serialized, trained filter to the data (no further training required).

Updates 2017/06/12

FracPete

2017-06-12 13:51

A lot of work has been done on better integration of the deeplearning4j framework. Support for rsync within the flow was added as well, e.g., for syncing local files with ones on a cloud server.

Fixes

adams-dl4j:
- DL4JCrossValidationEvaluator and DL4JTrainTestSetEvaluator now storing the model rather than the configurator in the container that they are forwarding.
- DL4JDatasetIterator now fits preprocessor first if an instance of DataNormalization.
adams-weka: When changing the model file in the Investigator's classify/cluster tab now correctly resets any previously loaded model.

Changes

Using now processoutput4j library (https://github.com/fracpete/processoutput4j) for capturing the output from processes launched from within Java.
The following sources now have an additional conversion option to directly convert their output to a different type: Variable, VariablesArray, CombineVariables, StorageValue, StorageValuesArray, CombineStorage, StringConstants.
The IncVariable and IncStorageValue transformers can output the incremented value now instead of forwarding the input token.
MessageDigest can operate on arrays now as well, computing a single digest over all of them.
Upgraded lanterna to 3.0.0-rc1 (used for terminal-based user interfaces).
Added option to the CsvSpreadSheetReader to drop rows with too few/many cells: -skip-differing-rows
adams-dl4j:
- added support for mini-batches to DL4JCrossValidationEvaluator, DL4JTrainModel and DL4JTrainTestSetEvaluator.
- added support for listeners when training a model
- added deeplearning4j-ui_2.10 dependency, to monitor training progress using InMemoryStatsListenerConfigurator
- DL4JTrainModel/DL4JTrainTestEvaluation/DL4JCrossValidationEvaluator now store the final epoch number in the container (model/evaluation).
- DL4JTrainModel allows incremental training now, outputting the model every X epochs (output interval).
adams-rats: added accepted/generated types of Rat input/output to additional information output displayed in the help screen.
adams-spreadsheet:
- SpreadSheetSubset now supports R-like matrix subset expressions ,3:9 instead of specifying row and col ranges.
- SpreadSheetSplitColumn now uses the header for the generated columns if it can be split into the same number of elements.
adams-weka:
- The SpreadSheetToWekaInstances conversion can enforce STRING attributes now by using -1 as maxLabels.
- The PartitionedMultiFilter2 (and therefore MetaPartitionedMultiFilter) now filters the data only once during the first batch, resulting in speed improvements.

Additions

With the adams.logging.Logging console application, you can connect to an ADAMS instance that is, for instance, running as a daemon/service, listening to its logging output. The logging.sh/logging.bat scripts start the listening application (just outputs the logging to stdout).
New boolean conditions for checking boolean flags: StorageFlagSet and VariableFlagSet.
Convenience transformer for setting a boolean flag in storage: SetStorageFlag.
Added new menu item Full expansion to the Flow editor for creating a fully expanded flow (i.e., pulls in all external actors).
Added FileTailer transformer for monitoring text files ala tail -f on Unix systems.
Added most of the functionality of the Remote Control Center GUI to the terminal-based interface (can be started up with adams.terminal.Main).
Added RightPad conversion for padding strings on the right-hand side.
New module adams-rsync for rsync support:
- RSync - offers all (!) rsync options
- Rsync4jRsyncBinary - outputs the rsync binary used by rsync4j library
- Rsync4jSshBinary - outputs the ssh binary used by rsync4j library
- SimpleRSync - commonly used rsync options
adams-dl4j:
- added NormalizerMinMaxScaler and NormalizerStandardize dataset pre-processors for scaling numeric attributes.
- added SimpleRegressionMultiLayerNetwork as an example for performing regression.
adams-twitter:
- upgraded twitter4j to 4.0.6
- Added TwitterUser transformer for retrieving information about a user.
adams-visualstats: added MOA-based CUSUM (cumulative sum) and Page-Hinkley test control charts.
adams-weka: added Kennard-Stone filter.

Updates 2017/05/12

FracPete

2017-05-12 17:02

Being busy with commercial ADAMS projects still result in ample number of improvements to the base ADAMS system. The last few weeks were no exception.

Fixes

The table model for spreadsheets now displays NaN, +/-Infinity as strings.
Spreadsheet writers that can use formatting now use 'NaN' and '+/-Infinity' strings for these numbers.
Fixed the forceVariables method for Tee/Trigger/LoadBalancer/WhileLoop and derived actors: internally used Sequence actor gets updated correctly now.
adams-net: FTPSend and SFTPSend now forward the successful filenames as the documentation says.
adams-dl4j: fixed handling of regression problems.

Changes

In order to make actor names unique, they now get appended by * (x)* with x being a number starting from 2
The SetVariable standalone/transformer can interpret the variable value now as boolean, string or mathematical expression, making it easier to compute new values.
Added ability to use custom dirs/jars for JDeps control actor instead of the application's classpath.
The CallableActorScreenshot control can forward screenshot as BufferedImageContainer now as well, not just storing it in a file.
The actorFile property can contain now programmatically set variables like flow_dir, enabling the include external actor derived actors to make use of a variable as well (relative to the main flow). Instead of attaching a variable to the property, you have to use mixed notation: @{flow_dir}/some.flow.
Added equal frequency calculation to the ArrayHistogram statistic.
The RandomNumberGenerator source can output arrays now.
With the ArrayHistogramRanges transformer it is possible to output the interval ranges that the ArrayHistogram statistic generates (easier than iterating through the header names of the generated spreadsheet).
Added support for restorable actors, ones that can write/read their state to/from disk during execution; currently supported by: EnterValue, EnterManyValues, SelectDirectory, SelectFile.
adams-spreadsheet:
- SpreadSheetStatistic now supports column names in locations.
- SpreadSheetExtractArray can output strings now as well, instead of just the native cell object type.
adams-weka:
- WekaInstancesStatistic now supports attribute names in locations.
- The WekaGeneticAlgorithm transformer can be initialized from a WekaGeneticAlgorithmInitializationContainer container now, containing algorithm and training data.
adams-spectral-2dim got renamed to adams-spectral-2dim-core.

Additions

Added HasClass boolean condition that checks whether the specified class is available on the classpath.
Added StringExpression source and transformer for evaluation string processing expressions, like left(upper("Hello World!"), 5).
Added meta-marker paintlet ByNameMarkerPaintlet that matches the name of the sequence against the supplied regular expression to determine whether to paint the markers or not.
With the ArrayHistogramRanges transformer it is possible to output the interval ranges that the ArrayHistogram statistic generates (easier than iterating through the header names of the generated spreadsheet).
adams-pdf: added MetaHeadline PDF proclet to insert headline and then apply a base-proclet.
adams-spreadsheet: the SpreadSheetHistogramRanges transformer is the equivalent of ArrayHistogramRanges but for SpreadSheet objects.
adams-weka:
- The WekaInstancesHistogramRanges transformer is the equivalent of ArrayHistogramRanges but for Instances objects.
- Added support for using test data to the WekaGeneticAlgorithm transformer, but only Hermione takes advantage of it.
- Added convenience transformer WekaGeneticAlgorithmInitializer to generate a WekaGeneticAlgorithmInitializationContainer container for priming a genetic algorithm.
Added some modules to the adams-spectral-base framwork:
- adams-spectral-2dim-handheld contains support for some handheld NIR scanners, like the SCiO (https://www.consumerphysics.com/myscio/scio/).
- adams-spectral-2dim-webservice adds webservice capability
- adams-spectral-2dim-rats adds RATS support

Have a great weekend!