Updates 2020/03/29

The biggest change was moving the debug view of the Flow editor into the side panel, making it less confusing when debugging multiple flows. The Preview browser can now reuse previews (and therefore keeping the setup in the view), making flicking through images much more responsive. Apart from that, a bunch of bug fixes and little improvements.

Fixes

  • The Preview browser now handles errors better that occur while generating previews.

  • File choosers with a filter text field now only show items that match the filter and the selected file format.

  • The Debug view in the Flow editor has been integrated into the side panel of the editor window rather than being display as separate frame (this also applies to all the flow execution listeners). This makes it easier to debug several flows in parallel, as the view changes based on what flow has been selected in the editor. It is still possible to detach the view and display it in a separate frame via the right-click popup menu on the tab title.

  • adams-moa: compiled a custom (unofficial) version of MOA to bundle with ADAMS to fix compatibility issue of jclasslocator library versions between MOA (0.0.12) and ADAMS (0.0.15).

  • adams-weka and adams-weka-lts:

    • The WekaSpreadSheetToPredictions transformer now handles spreadsheets with string labels in the actual/predicted columns, not just when containing numeric label indices.

    • The ConfusionMatrix output generators in the Classify tab of the Weka Investigator now maintain the order of the class labels from the dataset.

    • When deserializing a session, the data tables (eg in the Preprocessing tab) get resized using optimial column widths, instead of using a width of only a few pixels.

    • Setting/updating/removing column filters on the Data tab in the Weka Investigator now gets applied to the correct column.

    • The Weka Multi-Experimenter no longer generates a NullPointerException.

Changes

  • The generic search panel now allows toggling between regular expression search and simple sub-string search via its popup menu of the text field.

  • Containers now make use of the more intelligent plain text rendering available through the adams.data.textrenderer.TextRenderer class hierarchy. Speeds up debugging of large objects, like spreadsheets.

  • Added support for caching previews and reusing them in the Preview browser. However, reuse is currently only supported by image preview handlers.

  • adams-imaging:

    • The ObjectLocationsSpreadSheetReader report reader can add polygon coordinates to the report as well (as long as X and Y coordinates are present in two separate columns).

    • The ByMetaDataNumericValue object filter now allows using x/y/width/height as key as well.

    • The left-click processors in the ImageViewer can now specify what additional keys have to be pressed (shift/alt/ctrl/meta).

    • The following content handlers in the Preview browser now support an optional alternative location for meta-data: ObjectCentersFromReport, ObjectLocationsFromReport, ObjectLocationsFromSpreadSheet.

  • adams-ml: The ConfusionMatrix transformer now offers the classLabels option for enforcing the order of labels in the matrix.

  • adams-weka and adams-weka-lts:

    • The WekaSpreadSheetToPredictions transformer now allows sorting of labels with a specified comparator.

    • The SimpleArffLoader now supports the sparse ARFF format as well.

  • adams-weka:

    • upgraded multisearch-weka-package to 2020.2.17

    • upgraded Weka to 3.9.4-fork-0.0.1

  • adams-weka-lts:

    • upgraded Weka to 3.9.0-fork-0.0.9

  • adams-spectral-2dim-core: The WekaFilter post-processor can now turn off wrapping in a SpectrumFilter (on by default).

Additions

  • adams-imaging:

    • Added the MergeGrid multi-image operation for combining images generated by the SubImages/Grid operation. Combines the annotations as well, but does not remove duplicate or overlapping objects.

    • Added new left-click processors for the ImageViewer: ViewObjects, DeleteObjects, AddMetaData.

    • Added report writer for object locations: ObjectLocationsSpreadSheetWriter

    • The object filter AttachOverlappingMetaData allows the copying of meta-data between overlapping objects, eg reference data. The meta-data is copied from a report in storage.

  • adams-weka and adams-weka-lts: The NominalToNumeric filter allows you to either use the interal representation of the labels as numeric values or you can extract the numeric part from the label via regular expression replacement.

ADAMS and Docker

For a while now, I've been working on getting ADAMS running on Docker, with some of the snapshots now being deployed weekly to the public Docker hub. However, these images may not be suitable (or simply too large) and I therefore launched another little tool today called adamsflow2docker. This tool streamlines the process of deploying a worker flow (e.g., for processing data) within a Docker image.

Links to the adamsflow2docker tool and more detailed instructions on running ADAMS from within Docker are available from here.

Custom ADAMS applications

For a long time, I have been mulling over on how to make it easier on building ADAMS applications. Most importantly, without having to write Maven configuration files and/or requiring compilation. The initial approach (on the ADAMS homepage) was to let the user select a number of modules and then generate a download with a complete Maven environment. However, this required that the user not only had Java installed, but also Maven. Not something your average data scientist would have readily available.

Enter bootstrapp, my just released Java library for bootstrapping Maven applications. With this library, you only have to supply the Maven artifacts that should make up the application. It will also generate simple start-up scripts for launching the application (Linux/Mac/Windows).

Using this generic approach under the hood, I put together a similar library for ADAMS, instant-adams. This makes it easy to put together your own, custom applications now, with only those modules and libraries that you really need.

The following command-line generates an application using the ADAMS modules adams-weka, adams-groovy and adams-excel, plus the kfGroovy Weka package:

java -jar instant-adams-0.0.1-spring-boot.jar \
  -M adams-weka,adams-groovy,adams-excel \
  -V 20.2.0-SNAPSHOT \
  -d nz.ac.waikato.cms.weka:kfGroovy:1.0.12 \
  -o ./out \
  -v -Xmx1g

In terms of ADAMS version, you can either use the ones from the daily builds (Y.M.x-SNAPSHOT, Y=2-digit year, M=month, x=patch level, usually 0) or ones from releases (e.g., 20.1.1). Be aware, only the releases are kept indefinitely, daily builds are only available for a short number of days (or number of builds).

The above command-line generates scripts which start up ADAMS with 1GB of heap size (-v -Xmx1g).

Please note, since all ADAMS artifacts are managed by one of our inhouse servers and are not available from Maven Central, the library downloads a custom Maven configuration file behind the scenes to download these artifacts through our server instead. If necessary, This can be changed by pointing to a custom Maven user settings file using the -u/--maven_user_settings option.

Feedback welcome!

Debug view in side panel

The Flow editor now displays the debug view as a side panel rather than in a separate frame, which makes it less confusing when debugging several flows at the same time. ;-)

../../galleries/screenshots/floweditor-debugging1_inspectionpanel.thumbnail.png

Updates 2020/01/28

Minor update post before the 20.1.1 bugfix release (which includes these changes)...

Fixes

  • The enum editor in the GenericObjectEditor now allows entering multiple values as text.

  • adams-addons-all: The static class discovery now works properly.

  • adams-weka and adams-weka-lts: Re-enabled the residual plot plugins in the Weka Investigator.

  • adams-imaging: The SelectObjects image plugin (used for annotating images) now rejects selected objects that have either width or height equal to zero (which can occur easily when using a stylus).

Changes

  • Object editor: added a button with drop-down menu to the PropertySheet panel for quick access to variable actions, rather than right-clicking on label; also functions as indicator whether variable is attached by changing color.

  • adams-tensorflow: switched from wai.tfrecords to wai.annotations library for generating TFRecords.

  • adams-weka: upgraded Weka to 3.9.4-fork-0.0.1

  • adams-weka and adams-weka-lts: The split action in the Weka Investigator has been generalized and now allows the use of any split generator available.

Additions

  • adams-spectral-2dim-core: The ArrayToSpectrum conversion converts a float array of amplitudes back into a Spectrum object.

  • adams-weka and adams-weka-lts: Added the Discard predictions menu item to the result history of the Classify tab in the Weka Investigator.

Updates 2019/12/20

Last update before the Xmas break. Unfortunately, other projects got in the way and I didn't quite get around to making a release. Oh well, this will happen in the new year. :-)

Fixes

  • adams-imaging: The CallableActorScreenshot control actor now works properly again when just outputting a BufferedImageContainer.

  • adams-rats: Rat can have variables again that get set at runtime.

  • adams-spreadsheet: The SpreadSheetReorderColumns transformer now leaves the cell types intact.

  • adams-spectral-2dim-core: the exportVisibleSpectra method of the SpectrumPanel is now more robust (accessible through the Export visible spectra popup menu item).

  • adams-weka and adams-weka-lts:

    • static class listing now works with the Weka class hierarchies as well.

    • the Copy cell of the InstancesTable (as used in the Data tab of the Weka Investigator) now copies the correct content.

Changes

  • Renderers in the object tree view of the debug panel in the Flow editor now cache their view whenever possible, to keep views more consistent between tokens.

  • adams-spreadsheet: The renderer for spreadsheets in the Flow debugger now only displays a maximum of 100 rows by default, with a button to display the rest if there are more. This avoids very large spreadsheets to bring the system to its knees.

  • adams-weka and adams-weka-lts:

    • The renderer for Instances in the Flow debugger now only displays a maximum of 100 rows by default, with a button to display the rest if there are more. Speeds up rendering and with that debugging.

    • The random split action available from the data table of the Weka Investigator now remembers the last entered values and also allows the user to select the type of splitter to use.

Additions

  • Added a class hierarchy for text renderers for specific object types (adams.data.textrenderer.TextRenderer). These renderers can be used in the flow with the TextRenderer transformer. The tokens in the flow now uses these to render their payloads as well, to avoid very large text outputs in the flow debugger.

  • adams-weka and adams-weka-lts:

    • added the WekaSplitGenerator for applying any available split generator, not limited to splitters for random or cross-validation splits.

    • added the MultiLevelSplitGenerator for splitting datasets on multiple string/nominal attributes subsequently.

Updates 2019/12/09

The main thing to be aware of is that workflows are now a lot stricter about variables: actors now check before first execution if all variables attached to them are present. The SetVariable actors and any actor updating storage items check every time when executed whether the variables attached to the variable/storage name are present.

Fixes

  • adams-ml:

    • The SimpleArffSpreadSheetWriter now handles dates correctly. Long values get treated separately from doubles as well, to avoid loss of information (e.g., when loading Tweet IDs).

    • The SimpleArffSpreadSheetReader now tests for long values as well when reading numeric attributes, to avoid loss of information.

  • adams-weka and adams-weka-lts:

    • The SimpleArffLoader now handles quoted attribute names correctly, unquoting them properly.

Changes

  • In order to avoid strange behavior due to typos in variable names, the preExecute method of an actor now checks whether all variables used by it are valid (ie present). The check only gets executed when the isExectuted() methods returns false (usually the first time the actor is being executed). Since this can affect a number of flows, you can turn on lenient checking by setting the environment variable INVALID_VARIABLES_LENIENT to true.

  • switched to 1.0.20 of debian-maven-plugin

  • switched to 0.1.2 of requests4j

  • The SelectArraySubset transformer now has buttons for selection all items, no items or inverting the selection.

  • adams-applications: Dynamic class discovery has been turned off for applications. Instead, these applications use class/package hierarchies generated at build time. You can turn on dynamic class discovery again easily by adding an empty ClassLister.class file in the classpath of the application, e.g., in the same directory that contains the bin sub-directory.

  • Added checks to SetVariable actors and relevant classes implementing StorageUpdater (like SetStorageValue) that ensure that a variable attached to variable/storage name option actually exists, to avoid accidentally storing values under the default name (avoids hard to track errors).

  • adams-weka and adams-weka-lts:

    • The Build model of the Classify and Cluster tab in the Weka Investigator now allows the data to be randomized beforehand.

    • The Train/test set, Train/test split, Train/validate/test set and Reevaluate model evaluation tasks in the Classify tab of the Weka Investigator now take advantage of models supporting batch prediction.

    • Added the "-id-test" option to the RemoveTestInstances Weka filter to allow differing indices between current dataset and test set (eg if the test set is just a list of IDs).

Additions

  • added conversions for converting primitive arrays (eg float[]) to/from byte arrays (IEEE754): ByteArrayToPrimitiveArray and PrimitiveArrayToByteArray.

  • Added the adams-groovy-rest module for writing REST plugins in Groovy.

  • The actor processor ListActorUsage lists all occurrences of the specified actor class.

  • Added Actor locations to the Find usage submenu in the Flow editor tree popup, listing all occurrences of the currently selected actor class.

  • adams-imaging:

    • added reader for object locations stored in spreadsheets: ObjectLocationsSpreadSheetReader

  • adams-ml:

    • added GroupedTrainTestSplit, GroupedCrossValidation, TrainValidateTestSplit and GroupedTrainValidateTestSplit dataset preparation schemes for the PrepareFileBasedDataset transformer.

  • adams-spreadsheet:

    • Added dummy AllFinder for locating all columns and rows.

  • adams-spectral-2dim-core:

    • The Oscillating outlier detector can be used to detect spectra that look like an oscillating signal.

    • The SpectrumToArray conversion turns either the wave numbers or the amplitudes of the spectrum into a float array.

  • adams-weka and adams-weka-lts:

    • Added the LogClassRegressor meta-classifier, which only logs the class attribute, opposed to the LogTargetRegressor which also logs any other numeric attribute.

    • Added dummy AllFinder for locating all columns and rows.

    • The WekaEnsembleGenerator allows the creation of ensembles in the flow: e.g., with the VotedModels generator, an array for Weka classifiers can be turned into a Vote meta-classifier, bypassing the training of the Vote classifier itself. The MultipleClassifiersCombinerModels generator allows you to use any classifier derived from MultipleClassifiersCombiner.

Updates 2019/11/08

Lots of minor changes and fixes happened over the last few weeks. But the most notable one is the generation of Debian and Redhat installer packages. These make it easier to install ADAMS on servers or in Docker images. Also, with the adams-maven-plugin it is now possible to compile flows into Java code during the build process.

Fixes

  • The use of databases other than MySQL in case the adams-db module is present now uses auto-detection again for determining the backend (using the JDBC URL). This avoids having to explicitly configure a DbBackend.props file.

  • The LocalScopeTransformer and LocalScopeTrigger actors now only clean up local variables and storage data structures if not using shared resources.

  • The forceVariables method in the AbstractActor (ancestor for pretty much all actosr) no longer calls cleanUp of current Variables instance: this caused havoc when dynamically instantiating sub-flows at runtime, linking them into the existing flow to have access to variables and storage, as they could remove the outer flow's variables altogether.

  • adams-net: The Base64ToString conversion now applies the selected decoding type.

  • adams-imaging:

    • The BinaryCrop cropping algorithm now works for objects of any shape, by finding the rectangle encompassing the center object.

    • The NegativeRegions transformer now sets the supplied object type.

    • The MinDimensions negative regions meta-algorithm no longer removes two objects at a time when logging is enabled.

  • adams-json: retrieving an array via $.expr no longer results in the elements being forwarded one-by-one, but as a JSONArray object (JSONArray also implements the List interface, which got interpreted incorrectly; $.expr can return more than one value).

  • adams-addons-all: now has a dependency on adams-tensorflow as well.

Changes

  • Flows that had errors when loading, the string " (incomplete)" is now appended to the filename, to mark them as such. Previously, the filename got lost and a simple "FlowXYZ" was used.

  • The TextFileReader now accepts InputStream objects as well.

  • upgraded java-utils dependency to 0.0.3

  • upgraded commons-compress dependency to 1.19 to address CVE-2019-12402 (https://nvd.nist.gov/vuln/detail/CVE-2019-12402)

  • The PromptUser template now allows configuring the restoration of settings as well.

  • The adams.core.logging.FileHandler logging output handler now makes use of the ADAMS_LOGFILE_PREFIX environment variable to inject a prefix into the log file (eg, "console.log" becomes "testing-console.log" with "ADAMS_LOGFILE_PREFIX=testing-"). This allows multiple log outputs from multiple ADAMS services on the same server.

  • The Preview browser now offers a search panel when viewing spreadsheets and no longer outputs exceptions in the console in case custom preview handlers cannot be instantiated (eg when switching from one ADAMS application to another).

  • The ImageAnnotator transformer now maintains last selected label, window position and size between invocations.

  • adams-json:

    • upgraded jsonpath dependency to 2.4.0

    • The StringToJson conversion how has an output type option for casting the JSON object (any, array, object).

  • adams-ml: The ActualVsPredictedPlot now exposes the overlays for the plot, allowing to add other ones in addition to the diagonal (StraightLineOverlay), like LinearRegressionOverlay.

  • adams-weka: upgraded multisearch-weka-package dependency to 2019.10.4

  • adams-weka and adams-weka-lts: The ActualVsPredictedPlot now exposes the overlays for the plot, allowing to add other ones in addition to the diagonal (StraightLineOverlay), like LinearRegressionOverlay.

  • adams-meta: The FlowFileReader now accepts InputStream and Reader objects as well (must be closed separately).

  • adams-rsync: The RSync and SimpleRSync sources now output the output (stdout/stderr)

    as it occurs rather than all of it after the rsync process finishes.

  • adams-spreadsheet: The SpreadSheetFileReader now accepts InputStream and Reader objects as well (must be closed separately).

  • All ADAMS applications are now available as Debian and RPM packages as well.

Additions

  • Added support for InputStream/Reader instance generation via the InputStreamGenerator/ReaderGenerator sources and the CloseInputStream/CloseReader sink. This allows loading files from the classpath, e.g., files that are packaged within the jars of an application.

  • Added BooleanToString and StringToBoolean conversions.

  • With the ForwardSlashSwitch actor processor, it is now possible to switch all ForwardSlashSupporter objects in one go.

  • adams-rest:

    • added RESTPlugin support for processing text and JSON with a callable transformer template ("pipeline"): CallableJsonPipeline, CallableTextPipeline.

    • added RESTplugin for just calling callable transformer with JSON or text: CallableJsonTransformer and CallableTextTransformer.

  • adams-maven-plugin: a Maven plugin that allows compilation of flows into Java code as part of the Maven build process.

  • adams-spectral-2dim-core: added the Export spectra... plugin to the SpreadSheet and Instances tables to allow export of selected rows as spectral files.

  • adams-weka and adams-weka-lts:

    • added the conversion MapToWekaInstance and WekaInstanceToMap to provide an easy way to convert Weka Instance objects to JSON and vice versa.

    • introduced a class hierarchy for building a final model after a cross-validation run of a classifier in the Weka Investigator.

    • added filter for extracting a range of instances from a dataset in the specified order: KeepRange.