Updates 2020/06/21

Over last few months, I was working mainly on image processing projects, which explains the large amount of imaging related additions and changes. One other big change was happening underneath the hood, abstracting class loading and deep copies, in order to get Weka package support working again (Weka's isolating classloader doesn't make it easy integrating Weka in other applications).

Fixes

  • Added the ArrayRSquared array statistic.

  • The ConfirmationDialog transformer no longer blocks the Flow editor window, i.e., it is now possible to stop the flow via the Flow editor's menu.

  • The control actors CloseCallableDisplay and CallableActorScreenshot now allow selecting top-level GridView and TabView instances as well.

  • Validating a flow now sets all the programmatic variables (like 'flow_dir') beforehand (if possible) to ensure correct execution.

  • The GOE editor for BaseClassname now lists all classes, including non-ADAMS and abstract ones, but excluding anonymous ones.

  • adams-imaging:

    • Keeping of all objects did not work properly in the ImageObjectFilter transformer, as indices could change due to filtering, resulting in an inconsistent mix of values in the report.

    • Fixed the compare with itself comparison of the DetermineOverlappingObjects transformer.

    • The logging of interactions is now properly filtered by the interaction loggers used by the ImageAnnotator.

    • fixed collecting of matches in AreaRatio and IntersectOverUnionRatio object overlap schemes: the order of objects could leave out matches as the currently highest value was used as threshold rather than the user-supplied threshold.

  • adams-weka and adams-weka-lts: The R^2 now uses the same approach like R for calculating the statistic.

  • adams-ml: The INTERSECT mode of MultiOutlierDetector now works correctly (used by RemoveOutliers control actor).

Changes

  • The Flow control actor now has a read-only flag. When checked, it will prompt the user wether to continue with editing, just viewing or cancelling the operation altogether.

  • The ConfirmationDialog now allows specifying a custom cancel token as well.

  • Upgraded java-cup to 11b-20160615 and switched from exec-maven-plugin to parsergen-maven-plugin for compiling java-cup/jflex parsers.

  • The MapToVariables transformer can suppress non-primitive objects getting turned into variables with the skipNonPrimitive option.

  • The SetVariable transformer now allows forwarding the current variable value instead of the input token with the outputValue option. Useful when constructing file names on the fly via variable expansion.

  • major class instantiation overhaul to add support for Weka packages again: adams.core.classmanager.ClassManager and the concrete CustomClassManager classes (ADAMS and Weka) are now responsible for instantiating a class from a classname and for generating deep copies of objects

  • The EnterValue source now can specify how large (cols/rows) the text box is, e.g., for entering multi-line text.

  • adams-imaging:

    • The CompareObjectLocations transformer now has the ability to forward screenshots of comparison to a queue and run in non-interactive mode.

    • The STARTING_WITH option for loading associated meta-data with the ImageFileReader now skips all the files that have an extension that the image reader itself manages.

    • The preview handlers ObjectLocationsFromReport and ObjectLocationsFromSpreadsheet can remove overlapping objects now as well.

    • upgraded Apache's commons-imaging to 1.0-alpha1

    • removed Apache Sanselan

  • adams-spreadsheet:

    • The LookUpAdd transformer now has an optional conversion for the value being stored in the look-up table.

    • The SpreadSheetInfo transformer can output the unique values across the sheet now as well (SHEET_VALUES).

  • adams-visualstats: The Histogram and ScatterDisplay sinks can export the plot data now as CSV files.

  • adams-weka and adams-weka-lts: The WekaAttributeSelection transformer now reduces/transforms the test set from a WekaTrainTestSetContainer as well (if not performing cross-valication) and stores the result in the outgoing container.

  • adams-net: upgraded requests4j library to 0.1.7

  • adams-rsync: upgraded rysnc4j library to 3.1.3-1

  • adams-ml-app: added adams-nlp module as dependency

Additions

  • The DirectorySupplier source allows selecting/outputting of multiple directories, analog to the FileSupplier source.

  • The FilterMap can be used for filtering Map objects using the specified filter.

  • Added inspection handlers for java.util.Map and java.util.List to the Flow debugger.

  • The EnterManyValues source allows users now to select enumeration values via the EnumValueDefinition class.

  • adams-imaging:

    • added the following object filters: AttachMetaData (attaches single meta-data key-value pair to all located objects), MakeSquare (turns the bounding box into a square one), RemovePolygons (removes polygon information from objects, eg for enforcing bounding box display).

    • The ClipBoundingBoxes image object filter fixes bounding boxes that stick out of the image boundaries (or removes them completely if width/height are zero).

    • The TransformMetaData image object filter allows you to pass meta-data through callable actors, e.g., for cleaning up.

    • The RenameLabels image object filter allows you to quickly rename labels, e.g., for collapsing multiple categories into a single one.

    • Added CocoAnnotationsReportReader and CocoAnnotationsHandler (Preview browser), which supercede their Detectron counter parts. The report reader is optimized for speed and the Preview browser handler caches the JSON files it read (and monitors, if these should change), allowing for rapid browsing of directories with lots of images and multiple JSON files.

    • The GrayOrIndexedColorizer BufferedImage transformer and the GrayOrIndexedImageHandler allow overriding the unique colors in the image with custom ones using the supplied color provider (eg when viewing PNGs used for image segmentation).

    • The AnnotationsAndPredictions draw operations allows overlaying annotations and predictions using reports from storage, simplifying the combining.

    • added support for reading PNGs as spreadsheet (eg for reading indexed PNG files) using the PNGSpreadSheetReader.

    • The ImageContainerToSpreadSheet conversion turns any image into a spreadsheet of its (A)RGB values.

    • Added the KeepHighestMetaDataValue scheme for removing overlapping objects (eg for post-processing predictions) - used in conjunction with the RemoveOverlappingImageObjects transformer.

  • adams-ml: Added the MeanAbsoluteError outlier detector (used by the RemoveOutliers control actor).

Updates 2020/03/29

The biggest change was moving the debug view of the Flow editor into the side panel, making it less confusing when debugging multiple flows. The Preview browser can now reuse previews (and therefore keeping the setup in the view), making flicking through images much more responsive. Apart from that, a bunch of bug fixes and little improvements.

Fixes

  • The Preview browser now handles errors better that occur while generating previews.

  • File choosers with a filter text field now only show items that match the filter and the selected file format.

  • The Debug view in the Flow editor has been integrated into the side panel of the editor window rather than being display as separate frame (this also applies to all the flow execution listeners). This makes it easier to debug several flows in parallel, as the view changes based on what flow has been selected in the editor. It is still possible to detach the view and display it in a separate frame via the right-click popup menu on the tab title.

  • adams-moa: compiled a custom (unofficial) version of MOA to bundle with ADAMS to fix compatibility issue of jclasslocator library versions between MOA (0.0.12) and ADAMS (0.0.15).

  • adams-weka and adams-weka-lts:

    • The WekaSpreadSheetToPredictions transformer now handles spreadsheets with string labels in the actual/predicted columns, not just when containing numeric label indices.

    • The ConfusionMatrix output generators in the Classify tab of the Weka Investigator now maintain the order of the class labels from the dataset.

    • When deserializing a session, the data tables (eg in the Preprocessing tab) get resized using optimial column widths, instead of using a width of only a few pixels.

    • Setting/updating/removing column filters on the Data tab in the Weka Investigator now gets applied to the correct column.

    • The Weka Multi-Experimenter no longer generates a NullPointerException.

Changes

  • The generic search panel now allows toggling between regular expression search and simple sub-string search via its popup menu of the text field.

  • Containers now make use of the more intelligent plain text rendering available through the adams.data.textrenderer.TextRenderer class hierarchy. Speeds up debugging of large objects, like spreadsheets.

  • Added support for caching previews and reusing them in the Preview browser. However, reuse is currently only supported by image preview handlers.

  • adams-imaging:

    • The ObjectLocationsSpreadSheetReader report reader can add polygon coordinates to the report as well (as long as X and Y coordinates are present in two separate columns).

    • The ByMetaDataNumericValue object filter now allows using x/y/width/height as key as well.

    • The left-click processors in the ImageViewer can now specify what additional keys have to be pressed (shift/alt/ctrl/meta).

    • The following content handlers in the Preview browser now support an optional alternative location for meta-data: ObjectCentersFromReport, ObjectLocationsFromReport, ObjectLocationsFromSpreadSheet.

  • adams-ml: The ConfusionMatrix transformer now offers the classLabels option for enforcing the order of labels in the matrix.

  • adams-weka and adams-weka-lts:

    • The WekaSpreadSheetToPredictions transformer now allows sorting of labels with a specified comparator.

    • The SimpleArffLoader now supports the sparse ARFF format as well.

  • adams-weka:

    • upgraded multisearch-weka-package to 2020.2.17

    • upgraded Weka to 3.9.4-fork-0.0.1

  • adams-weka-lts:

    • upgraded Weka to 3.9.0-fork-0.0.9

  • adams-spectral-2dim-core: The WekaFilter post-processor can now turn off wrapping in a SpectrumFilter (on by default).

Additions

  • adams-imaging:

    • Added the MergeGrid multi-image operation for combining images generated by the SubImages/Grid operation. Combines the annotations as well, but does not remove duplicate or overlapping objects.

    • Added new left-click processors for the ImageViewer: ViewObjects, DeleteObjects, AddMetaData.

    • Added report writer for object locations: ObjectLocationsSpreadSheetWriter

    • The object filter AttachOverlappingMetaData allows the copying of meta-data between overlapping objects, eg reference data. The meta-data is copied from a report in storage.

  • adams-weka and adams-weka-lts: The NominalToNumeric filter allows you to either use the interal representation of the labels as numeric values or you can extract the numeric part from the label via regular expression replacement.

ADAMS and Docker

For a while now, I've been working on getting ADAMS running on Docker, with some of the snapshots now being deployed weekly to the public Docker hub. However, these images may not be suitable (or simply too large) and I therefore launched another little tool today called adamsflow2docker. This tool streamlines the process of deploying a worker flow (e.g., for processing data) within a Docker image.

Links to the adamsflow2docker tool and more detailed instructions on running ADAMS from within Docker are available from here.

Custom ADAMS applications

For a long time, I have been mulling over on how to make it easier on building ADAMS applications. Most importantly, without having to write Maven configuration files and/or requiring compilation. The initial approach (on the ADAMS homepage) was to let the user select a number of modules and then generate a download with a complete Maven environment. However, this required that the user not only had Java installed, but also Maven. Not something your average data scientist would have readily available.

Enter bootstrapp, my just released Java library for bootstrapping Maven applications. With this library, you only have to supply the Maven artifacts that should make up the application. It will also generate simple start-up scripts for launching the application (Linux/Mac/Windows).

Using this generic approach under the hood, I put together a similar library for ADAMS, instant-adams. This makes it easy to put together your own, custom applications now, with only those modules and libraries that you really need.

The following command-line generates an application using the ADAMS modules adams-weka, adams-groovy and adams-excel, plus the kfGroovy Weka package:

java -jar instant-adams-0.0.1-spring-boot.jar \
  -M adams-weka,adams-groovy,adams-excel \
  -V 20.2.0-SNAPSHOT \
  -d nz.ac.waikato.cms.weka:kfGroovy:1.0.12 \
  -o ./out \
  -v -Xmx1g

In terms of ADAMS version, you can either use the ones from the daily builds (Y.M.x-SNAPSHOT, Y=2-digit year, M=month, x=patch level, usually 0) or ones from releases (e.g., 20.1.1). Be aware, only the releases are kept indefinitely, daily builds are only available for a short number of days (or number of builds).

The above command-line generates scripts which start up ADAMS with 1GB of heap size (-v -Xmx1g).

Please note, since all ADAMS artifacts are managed by one of our inhouse servers and are not available from Maven Central, the library downloads a custom Maven configuration file behind the scenes to download these artifacts through our server instead. If necessary, This can be changed by pointing to a custom Maven user settings file using the -u/--maven_user_settings option.

Feedback welcome!

Debug view in side panel

The Flow editor now displays the debug view as a side panel rather than in a separate frame, which makes it less confusing when debugging several flows at the same time. ;-)

../../galleries/screenshots/floweditor-debugging1_inspectionpanel.thumbnail.png

Updates 2020/01/28

Minor update post before the 20.1.1 bugfix release (which includes these changes)...

Fixes

  • The enum editor in the GenericObjectEditor now allows entering multiple values as text.

  • adams-addons-all: The static class discovery now works properly.

  • adams-weka and adams-weka-lts: Re-enabled the residual plot plugins in the Weka Investigator.

  • adams-imaging: The SelectObjects image plugin (used for annotating images) now rejects selected objects that have either width or height equal to zero (which can occur easily when using a stylus).

Changes

  • Object editor: added a button with drop-down menu to the PropertySheet panel for quick access to variable actions, rather than right-clicking on label; also functions as indicator whether variable is attached by changing color.

  • adams-tensorflow: switched from wai.tfrecords to wai.annotations library for generating TFRecords.

  • adams-weka: upgraded Weka to 3.9.4-fork-0.0.1

  • adams-weka and adams-weka-lts: The split action in the Weka Investigator has been generalized and now allows the use of any split generator available.

Additions

  • adams-spectral-2dim-core: The ArrayToSpectrum conversion converts a float array of amplitudes back into a Spectrum object.

  • adams-weka and adams-weka-lts: Added the Discard predictions menu item to the result history of the Classify tab in the Weka Investigator.

Updates 2019/12/20

Last update before the Xmas break. Unfortunately, other projects got in the way and I didn't quite get around to making a release. Oh well, this will happen in the new year. :-)

Fixes

  • adams-imaging: The CallableActorScreenshot control actor now works properly again when just outputting a BufferedImageContainer.

  • adams-rats: Rat can have variables again that get set at runtime.

  • adams-spreadsheet: The SpreadSheetReorderColumns transformer now leaves the cell types intact.

  • adams-spectral-2dim-core: the exportVisibleSpectra method of the SpectrumPanel is now more robust (accessible through the Export visible spectra popup menu item).

  • adams-weka and adams-weka-lts:

    • static class listing now works with the Weka class hierarchies as well.

    • the Copy cell of the InstancesTable (as used in the Data tab of the Weka Investigator) now copies the correct content.

Changes

  • Renderers in the object tree view of the debug panel in the Flow editor now cache their view whenever possible, to keep views more consistent between tokens.

  • adams-spreadsheet: The renderer for spreadsheets in the Flow debugger now only displays a maximum of 100 rows by default, with a button to display the rest if there are more. This avoids very large spreadsheets to bring the system to its knees.

  • adams-weka and adams-weka-lts:

    • The renderer for Instances in the Flow debugger now only displays a maximum of 100 rows by default, with a button to display the rest if there are more. Speeds up rendering and with that debugging.

    • The random split action available from the data table of the Weka Investigator now remembers the last entered values and also allows the user to select the type of splitter to use.

Additions

  • Added a class hierarchy for text renderers for specific object types (adams.data.textrenderer.TextRenderer). These renderers can be used in the flow with the TextRenderer transformer. The tokens in the flow now uses these to render their payloads as well, to avoid very large text outputs in the flow debugger.

  • adams-weka and adams-weka-lts:

    • added the WekaSplitGenerator for applying any available split generator, not limited to splitters for random or cross-validation splits.

    • added the MultiLevelSplitGenerator for splitting datasets on multiple string/nominal attributes subsequently.

Updates 2019/12/09

The main thing to be aware of is that workflows are now a lot stricter about variables: actors now check before first execution if all variables attached to them are present. The SetVariable actors and any actor updating storage items check every time when executed whether the variables attached to the variable/storage name are present.

Fixes

  • adams-ml:

    • The SimpleArffSpreadSheetWriter now handles dates correctly. Long values get treated separately from doubles as well, to avoid loss of information (e.g., when loading Tweet IDs).

    • The SimpleArffSpreadSheetReader now tests for long values as well when reading numeric attributes, to avoid loss of information.

  • adams-weka and adams-weka-lts:

    • The SimpleArffLoader now handles quoted attribute names correctly, unquoting them properly.

Changes

  • In order to avoid strange behavior due to typos in variable names, the preExecute method of an actor now checks whether all variables used by it are valid (ie present). The check only gets executed when the isExectuted() methods returns false (usually the first time the actor is being executed). Since this can affect a number of flows, you can turn on lenient checking by setting the environment variable INVALID_VARIABLES_LENIENT to true.

  • switched to 1.0.20 of debian-maven-plugin

  • switched to 0.1.2 of requests4j

  • The SelectArraySubset transformer now has buttons for selection all items, no items or inverting the selection.

  • adams-applications: Dynamic class discovery has been turned off for applications. Instead, these applications use class/package hierarchies generated at build time. You can turn on dynamic class discovery again easily by adding an empty ClassLister.class file in the classpath of the application, e.g., in the same directory that contains the bin sub-directory.

  • Added checks to SetVariable actors and relevant classes implementing StorageUpdater (like SetStorageValue) that ensure that a variable attached to variable/storage name option actually exists, to avoid accidentally storing values under the default name (avoids hard to track errors).

  • adams-weka and adams-weka-lts:

    • The Build model of the Classify and Cluster tab in the Weka Investigator now allows the data to be randomized beforehand.

    • The Train/test set, Train/test split, Train/validate/test set and Reevaluate model evaluation tasks in the Classify tab of the Weka Investigator now take advantage of models supporting batch prediction.

    • Added the "-id-test" option to the RemoveTestInstances Weka filter to allow differing indices between current dataset and test set (eg if the test set is just a list of IDs).

Additions

  • added conversions for converting primitive arrays (eg float[]) to/from byte arrays (IEEE754): ByteArrayToPrimitiveArray and PrimitiveArrayToByteArray.

  • Added the adams-groovy-rest module for writing REST plugins in Groovy.

  • The actor processor ListActorUsage lists all occurrences of the specified actor class.

  • Added Actor locations to the Find usage submenu in the Flow editor tree popup, listing all occurrences of the currently selected actor class.

  • adams-imaging:

    • added reader for object locations stored in spreadsheets: ObjectLocationsSpreadSheetReader

  • adams-ml:

    • added GroupedTrainTestSplit, GroupedCrossValidation, TrainValidateTestSplit and GroupedTrainValidateTestSplit dataset preparation schemes for the PrepareFileBasedDataset transformer.

  • adams-spreadsheet:

    • Added dummy AllFinder for locating all columns and rows.

  • adams-spectral-2dim-core:

    • The Oscillating outlier detector can be used to detect spectra that look like an oscillating signal.

    • The SpectrumToArray conversion turns either the wave numbers or the amplitudes of the spectrum into a float array.

  • adams-weka and adams-weka-lts:

    • Added the LogClassRegressor meta-classifier, which only logs the class attribute, opposed to the LogTargetRegressor which also logs any other numeric attribute.

    • Added dummy AllFinder for locating all columns and rows.

    • The WekaEnsembleGenerator allows the creation of ensembles in the flow: e.g., with the VotedModels generator, an array for Weka classifiers can be turned into a Vote meta-classifier, bypassing the training of the Vote classifier itself. The MultipleClassifiersCombinerModels generator allows you to use any classifier derived from MultipleClassifiersCombiner.