Mailing list 2017/03/15

The following came through on the ADAMS user mailing list:

Peter, what is your procedure when building ADAMS flows? How do create such novel flows fast? Is there any intuitive steps that help when designing flows in ADAMS to build flows smoothly?

I thought, you might be interested in that, too.

Here is my reply:

Being the main author of ADAMS, it is easy for me to remember what actors I've developed and what they do. ;-)

But here are some strategies:

1. Break down the problem

Like with any other programming language, you need to break down a problem into smaller steps.

E.g., evaluating a Weka model can be broken down into:

  • load dataset -- FileSupplier/SelectFile + WekaFileReader
  • set correct class attribute -- WekaClassSelector
  • evaluate -- WekaCrossValidationEvaluator + CallableActors at start of flow with the classifier you want to use
  • display results -- WekaEvaluationSummary + Display

2. Start with a small flow and grow it

I never set out to write a massive flow from the get go, I always work on little bits (maybe in separate flows) and then combine them, tweak/adapt them. Rearranging actors, encapsulating actors in other control actors is much easier than in other workflow systems, since you don't have to disconnect/reconnect the operators.

3. Make use of variables and internal storage

Variables and storage are extremely powerful tools. Variables can be used for changing actor options on-the-fly or for generating path names and other output. Storage is normally used when you want to re-use the object several times in a flow, e.g., the same dataset or evaluation object.

NB: For non-ADAMS objects (eg Weka classifiers), you can still change parameters, but it is a bit more cumbersome. The UpdateProperties control actor updates a property of the actor below it based on a property path through the object hierarchy, using the value from the associated variable. A property path is the concatenation of the "property names", e.g., for the WekaFilter actor with a PLSFilter, you'd use "filter.numComponents" to change the number of PLS components to use. Arrays can be navigated as well.

4. Use the debugger or output debugging information

Set Breakpoint actors or simply step through the flow to see what the value of the current token is, what values variables to storage items have. Like any other debugger, this is the most powerful tool figuring out what's going on within an application. A workflow is no different there. Outputting debugging or progress information is extremely useful, too. Just "Tee" off the current token and display it in a Display actor.

As a final word, the example flows that come with ADAMS are relatively small flows, demonstrating the usage of certain actors. The idea is to copy/paste the relevant bits into our own flows to build great applications. Even I quite often look up the usage of actors in my example flows, especially if it is stuff that I've written several years ago. ;-)

Updates 2017/03/06

The new year turned out to be extremely busy with loads of projects running parallel. Nonetheless, there are still a number of fixes and improvements happening.

Fixes

  • The HashSetInit transformer now stores the hashset in storage when initializing from array.
  • The EnterValue source uses BaseString now for the message/initial-value options, which updates variable occurrences correctly.
  • adams-weka: The Data and Instance tab now get correctly updated when an undo occurs.
  • adams-webservice: classes implementing the AlternativeUrlSupporter interface only attempt to instantiate the URL if provided parameter neither null nor empty now.
  • Added auto-detect-data-type flag to the SetReportValue transformers, previously on by default, now turned off.
  • adams-spreadsheet:
    • The ColumnSubset row score scheme now uses the correct row-index for applying the base row score algorithm; also uses more efficient views for the subset instead of creating copies.
    • Fixed native use of Object[] in LookUpAdd transformer.
  • adams-rats: The Rat and RatControl standalones now listen to flow state changes, i.e., get notified correctly when pausing/resuming the flow.

Changes

  • Added support for gzip-compressed report files (.csv and .report), including display in Preview browser.
  • SelectArraySubet now has an option to allow search the list.
  • Outsourced dynamic class discovery to jclasslocator library.
  • The storage tab in the control panel of the Breakpoint now has a popup menu for the table listing the storage items. It is now possible to display storage items in multiple dialogs.
  • Filters now get the flow context set if they implement the adams.flow.core.FlowContextHandler interface.
  • The HashSetInit standalone now allows initializing with string values.
  • The HashSet boolean condition now allows specifying of value to check rather than just using token.
  • Standalones now force a variable update in the preExecute method if any variables detected.
  • When debugging a flow, the copy of the debugged flow is no longer editable.
  • The Storage can be viewed in real-time in the Flow editor now as well, just like the variables (in the menu: Run -> Storage).
  • Logging messages that appear in the application's Console window (main menu -> Program), are now also logged to the project's home directory, e.g., $HOME/.adams/log/console.log.
  • The SequencePlotter now allows you to change the margins used on the axis as well, not just the ranges.
  • adams-spreadsheet:
    • SpreadsSheetExplorer now has an additional plot popup menu item for operating on the containers visible in the current viewport.
    • SpreadSheetMerge transformer now ensures that the specified column with the IDs has only unique IDs, throws an exception otherwise (in strict mode only).
  • adams-spectral-dim:
    • SpectrumExplorer now has an additional plot popup menu item for operating on the containers visible in the current viewport.
  • adams-weka:
    • InstanceExplorer now has an additional plot popup menu item for operating on the containers visible in the current viewport.
    • WekaInstancesMerge transformer now ensures that the specified attribute with the IDs has only unique IDs, throws an exception otherwise (in strict mode only).
    • Filters now get the flow context set if they implement the adams.flow.core.FlowContextHandler interface.
    • Weka Investigator
      • If loading of a serialized model (re-evaluate model in Classify and Cluster tab) fails, the error message displayed on the Start is now more expressive, e.g., stating that the file doesn't actually contain a serialized object.
      • Added Insert as dataset action to the data tab's tables to allow inserting selected rows as a new dataset.
  • adams-imaging:
    • added option to image feature generators for specifying a custom prefix for the feature names, e.g., to avoid duplicate names when using the same feature generator twice but with different parameters.
    • The Histogram feature generators can now group by channel rather than just by bin index.
  • adams-imaging-boofcv: reverted BoofCV back to 0.18 for the time being.
  • adams-imaging-imagej: The Histogram feature generator allows grouping of channels now.
  • adams-ml: ActualVsPredictedPlot now has option to supply a plot name, allowing multiple data lots in the same plot.
  • adams-rats: The RatControl standalone now allows stopping/restarting of individual Rat actors, not just pausing/resuming.

Additions

  • Added MergeReport transformer for merging a report with one obtained from storage or a callable source actor.
  • adams-imaging: added new Histogram feature generator that supports Gray, RGB, YUV, YIQ, HSV color models.
  • adams-ml: The ConfusionMatrix transformer generates confusion matrix in spreadsheet format from spreadsheet with actual/predicted labels.
  • adams-spreadsheet: added transformer for sorting spreadsheet columns called SpreadSheetSortColumns
  • adams-weka:
    • Added a file loader for ARFF files (SimpleArffLoader), to avoid file locking issues under Windows with the Weka one. Does not support incremental loading or relational attributes.
    • Added better support for experiments, making use of classes developed for the MultiExperimenter GUI tool (including multi-core support):
      • WekaNewExperiment (source)
      • WekaExperimentFileReader (transformer)
      • WekaExperimentExecution (transformer)
      • WekaExperimentFileWriter (sink)

Have a good week!

Updates 2017/01/27

The new year started mainly with a lot of work and, whenever I had time, with the upgrade of some libraries, like Weka and deeplearning4j.

Fixes

  • The Utils.doubleToStringFixed method now handles NaN and Inf values correctly.
  • Fixed flushing/closing of compressed serialized models (SerializationHelper class).
  • The CheckVariableUsage flow processor now excludes system-supplied variables like flow_dir and flow_id from the check.
  • adams-dl4j:
    • Added icon for DL4JModelReader transformer.
    • RecordReaderDataSetIteratorConfigurator now allows using 0 as minimum for the 0-based indices.
  • adams-weka: The WekaFileReader transformer now handles filenames without extension as long as there is a custom loader defined.

Changes

  • The Display and HistoryPanel sinks now have options for line-wrap and wrap-style-word.
  • The GridView standalone and the DisplayPanelGrid sink now allow the user to change the grid layout at runtime.
  • adams-ml-app:
    • added example flows/scripts for configuring deeplearning4j networks using Groovy and Jython.
    • added adams-imaging dependency to have basic image processing capability
  • adams-spreadsheet: batch import of spreadsheets now output a more detailed error message in case of BatchUpdateException exceptions.
  • adams-weka: Added a popup menu to the dataset table of the Investigator's Preprocess panel and added the Clear action to the menu for removing all datasets at once.
  • Dependency changes
    • Weka 3.9.0 (with patched FilteredClassifier)
    • Apache CXF 3.1.9
    • LIRE 1.0b2
    • deeplearning4j 0.7.2
    • CUDA 8.0 libraries for deeplearning4j
    • ImageJ 1.51h
    • BoofCV 0.26
  • adams-dl4j:
    • DL4JDatasetIterator source now has option to output full dataset instead of batches.
  • adams-spectral-2dim:
    • Added regexp option to CALSpectrumLoader Weka file loader to allow loading of only specific reference value(s).

Additions

  • Added ConditionalSequence control actor, the conditional version of the default Sequence actor.
  • adams-imaging: With the updated version of LIRE, additional feature generators are now available:
    • JointHistogram
    • LocalBinaryPatternsAndOpponent
    • RankAndOpponent
    • SimpleCentrist
    • SpatialPyramidAutoColorCorrelogram
    • SpatialPyramidCEDD
    • SpatialPyramidCentrist
    • SpatialPyramidCentrist
    • SpatialPyramidFCTH
    • SpatialPyramidJCD
    • SpatialPyramidLocalBinaryPatterns
  • adams-dl4j:
    • Added DL4JModelParamsToSpreadSheet conversion for extracting the parameters.
    • Added DL4JModelParamsToSpreadString conversion for extracting the parameters as simple string.
    • Added ImageScaler dataset preprocessor.
    • Added DL4JCrossValidationSplit transformer to generate sequence of train/test set containers.
    • The DL4JCrossValidationEvaluator transformer performs cross-validation on a referenced model using the incoming dataset.
    • The SpreadSheetRecordReaderConfigurator allows to read any spreadsheet that ADAMS can read. However, textual cells get converted to NULLs and date/time ones to their Java epoch equivalent.
    • The DL4JDatasetAppend transformer combines multiple datasets into a single dataset, one after the other
  • adams-rats:
    • Added the Storage and Variable rat inputs, for getting access to the specified storage item/variable.
  • adams-spreadsheet:
    • Added SpreadSheetToNumeric conversion for turning non-numeric cells in a spreadsheet into numeric ones.
    • Added Unique values column action to the SpreadSheetTable column popup menu to display the unique values of the selected column.
  • adams-spectral-2dim:
    • Condition for checking whether spectrum already in database: HasSpectrum.
    • Spectra are now rendered in the Breakpoint and can be exported as well.

Updates 2016/12/16

Alright, this is the final update before I'll be making a release. Definitely going to happen now next week! :-)

Fixes

  • Clearing the history of a GenericObjectEditor panel now correctly clears also the list of commandlines (and no longer generates exceptions).
  • The SimplePlot and SequencePlotter sinks allow you to turn off the tool tips for the plot.
  • The SerializationHelper class now always closes all streams to avoid any race conditions between JVM and OS.
  • Marker data in SequencePlotter is now correctly colored and only displayed if corresponding sequence is visible.
  • Plot plugins SimplePlot and JFreeChart for spreadsheet and Instances tables now respect the sorting of the column when using subsets.
  • adams-ml:
    • The ActualVsPredicted sink now suppresses the plot tool tips as they distract and annoy.
  • adams-spreadsheet:
    • The DefaultTypeMapper now maps Types.DECIMAL to DOUBLE instead of STRING (returned e.g., when using SQL calculations).
  • adams-weka:
    • Weka Investigator:
      • All textual output, like run information or models, is now read-only.
      • Save action now adds the new file to the recent files menu.
      • Fixed the notifications when attributes get checked/unchecked in the Proprocess tab. Also fixed the None button, as it behaved like the All one.
    • Added fastrandomforest dependency.
    • The WekaInstanceDumper did not delete an existing file if buffer size larger than 1 (and keepExisting not checked).
  • adams-spectral-dim: JCampDX readers now have an option to use the filename (without extension) as the ID for the spectrum.

Changes

  • The GridView standalone now has an option for adding headers to the grid, to better distinguish the individual cells in the grid (uses the names of the actors).
  • The SequencePlotter received two more options: -adjust-to-visible-data, -side-panel-width
  • When checking before saving is enabled in the Flow editor, any error message gets output in the notification area now as well in order to be able to jump to problematic actors.
  • The IntToString conversion now supports optional byte format strings, e.g., to generate strings like 3,456.89KB.
  • adams-spreadsheet:
    • SpreadSheetMerge how has a flag to only keep a single instance of the unique ID column in the output: -keep-only-single-unique-id
  • adams-weka:
    • WekaInstancesMerge how has a flag to only keep a single instance of the unique ID attribute in the output: -keep-only-single-unique-id
    • VotedImbalance now supports manual list of thresholds (prob of minority class = # of resampled models) for more targeted resampling based on data.
    • Weka Investigator:
      • It is now optional whether to sort the attribute names in the dropdown list for the class attribute in the data/preprocess table.
      • The run information can be used as tooltip for result history entries now (classify, cluster, attsel tab).
      • The Classify/Cluster/Attribute Selection tabs now allow you to Compare output of several results side-by-side.

Additions

  • The LongToString conversion works like the IntToString one, but on Long objects instead.
  • adams-weka:
    • The MultiPLS filter operates on multiple Y values for the same range of X attributes and places the results side-by-side. Applies the same PLS algorithm to each Xs/Y combination.
  • Added the adams-groovy-webservice module to adams-addons, which uses the groovy-wslite library for making SOAP and REST webservices easily accessible within Groovy (NB: only client-side).

Updates 2016/12/02

Initially, I had a release planned for this week, but with a plethora of minor bug fixes/UI improvements happening and other projects demanding time, I had to abandon this plan. Hopefully, this will eventuate next week instead.

Fixes

  • Interactive actors now call the root actor's stopExecution method instead of their own, in order to stop the flow.
  • The SelectArraySubset interactive transformer now updates the message label with each interaction and double-clicking on an item automatically selects it and accepts the dialog.
  • Spreadsheet tables now interpret Long cell values as Double, in order to get correct sorting of columns with mixed Long/Double values.
  • adams-weka: Fixed the batch-filtering functionality of the Weka Investigator's preprocess tab.

Changes

  • The ImageViewer sink now accepts objects implementing BufferedImageSupporter as well.
  • adams-spectral-core has been renamed to adams-spectral-2dim (as it is for 2-dimensional spectra).
  • adams-weka:
    • Weka Investigator:
      • Start buttons now display a tooltip if disabled, explaining why process cannot be started (hover over the button to display it).
      • PCA tab now offers checkbox for skipping nominal attributes.
      • Closing a tab now prompts the user whether to go ahead with it (to avoid losses of tabs when quickly switching between them and accidentally hitting the close button).
      • The ClassifierErrors output generator now allows anti-aliasing to be configured (default is Auto, i.e., if more than 1000 datapoints, it gets turned off to speed up the plot).
    • The unsupervised instance-based filters DatasetCleaner and DatasetLabeler now have the additional flag -only-first-batch, which applies the filter only to the first batch of the data.

Additions

  • Added RomanToInt and IntToRoman conversions for handling Roman numerals.
  • Added Kendall-Theil robust regression calculation: ArrayKendallTheil, StatUtils.kendallTheil, KendallTheilOverlayPaintlet.
  • Added the BufferedImageSupporterToBufferedImageContainer conversion to allow attaching of metadata.
  • Added meta-data text overlay for images, e.g., used in the ImageViewer sink: MetaDataText.
  • The SetContainerValue control actor allows updating of a single container value, using data from either a callable actor or storage.
  • adams-ml:
    • The PredictionEccentricity transformer allows the calculation of the eccentricity for the predictions generated by a regressor.
    • The ActualVsPredictedPlot sink now implements AntiAliasingSupporter, i.e., you can turn on/off anti-aliasing for pretty/fast plots.
  • adams-spectral-2dim:
    • Added PLS spectrum batch filter, which makes use of the new PLS algorithm class hierarchy to transform the spectral data.
    • Added PCA spectrum batch filter.
  • adams-spreadsheet:
    • The JFreeChartPlot sink allows plots from spreadsheet columns using JFreeChart plot library and the JFreeChartFileWriter outputs image files from generated plots.
    • The Spreadsheet file viewer now has a chart plugin using JFreeChart.
  • adams-weka:
    • Added WekaGenericPLSMatrixAccess transformer which gives access to internal PLS matrices of the new AbstractPLS class hierarchy.
    • PLS-based classifier that uses the new PLS class hierarchy: weka.classifiers.functions.PLSWeighted.
    • The classify/cluster tab in the Weka Investigator now have Build model evaluations that just generate a model and save it to disk.
    • The Data table of the Weka Investigator now has a plugin for plotting using JFreeChart.
    • The WekaSpreadSheetToPredictions transformer allows the recreation of an Evaluation object simply from predictions of a model (actual/predicted).
    • weka.filters.FilteredFilter is a filter for applying a pre-filter to the data before using the main filter (eg for selecting a subset).
    • The Weka Investigator now offers an output generator for displaying/calculating the Prediction Eccentricity of the predictions (Classify tab, numeric classes only).
    • It is now possible to compare the predictions of two models evaluated in the same way on the same dataset (with a numeric class) through the Compare models menu item in the result history of the Classify tab of the Weka Investigator.

Updates 2016/11/14

Didn't manage to get this one out on Friday, been way too busy with preparations for ACML 2016, which we're hosting this week here at Waikato.

Oh, in regards to last night's earthquake here in NZ: everyone at Waikato is safe. Our thoughts go out to the people around NZ that suffered from this quake.

Fixes

  • The directory chooser dialog now reloads the bookmarks before showing the dialog (in case they got modified in another dialog).
  • The table model for reports now performs searches in the values of the fields as well, not just the names of the fields.
  • AdamsCommandLineHandler.setOptions method now correctly resets the options to default ones before applying the supplied options.
  • adams-weka:
    • Weka Investigator
      • The Stop buttons in the Preprocess/Classify tab now work as expected (stopping worked, but didn't update the buttons).
      • deleting attributes or instances in the data tab now updates the preprocess panel accordingly.
      • Moving datasets up or down now correctly updates the comboboxes listing the available datasets (e.g., in the Classify tab).
      • Moving tabs no longer clears their result history.

Changes

  • The SequencePlotter now allows you to save only the visible data to a spreadsheet as well.
  • window titles are now getting prefixed with the hostname (if available), to make it easier locating windows when logged in multiple remote machines.
  • Plots now allow the modification of their paintlets through the right-click menu.
  • Added -minimal-window option to the AbstractApplicationFrame class to avoid main frame being extended to full screensize.
  • adams-imaging: upgraded zxing barcode library to 3.3.0.
  • adams-pdf: The PdfProclets have been extended to allow processing of objects rather than just files; The PDFAppendDocument transformer now can process objects from storage as well.
  • adams-spectral-core: JCampDX2SpectrumReader now handles files with multiple pages.
  • adams-spreadsheet: The ScatterPlot chart of the Spreadsheet file viewer can now plot circles or crosses with specified diameter and overlays.
  • adams-visualstats: The ScatterPlot now allows you to save the data and the visible data to a spreadsheet.
  • adams-weka:
    • Weka Investigator
      • Now allows you to export all currently displayed outputs (attribute selection, classification, clustering): to a directory, to a ZIP file (requires adams-compress), to a PDF file (requires adams-pdf), or via email (requires adams-net).
      • Whole tabs like Classify can be copied now as well, including their current history.
      • The Export action of the data table has been renamed to Save.
      • The data tab now caches tables for faster display (and keeping any custom column sizes).
      • Operations in investigator (loading files, copying tabs, obtaining data from sources) can be stopped now.
    • Weka classes are now using the ADAMS GenericObjectEditor in the GUI.
    • Weka classes now display their full help in the Class help tool (available from the Help menu).

Additions

  • adams-nlp: added ptstemmer 2.0.0 weka package dependency.
  • adams-spectral-core: added Weka file loader for FOSS CAL files called CALSpectrumLoader.
  • adams-spreadsheet: added convenience reader/writer for tab-separated values (TSV) files.
  • adams-weka:
    • Weka Investigator:
      • The output tabs in the Weka Investigator (attribute selection, classify, cluster) now support copying the content to the clipboard as well.
      • A new class hierarchy for updating the relation name of files being loaded is now available (e.g., based on file name or attribute name).
      • Added (optional) model size to investigator's run information (classify/cluster tabs).
    • Added new PLS filter that uses class hierarchy of PLS implementations, with PLS1 and SIMPLS as the currently available implementations.

Updates 2016/10/14

The Weka Investigator is still under heavy development, but now it is more about stability, usability and additional functionality. The general framework is pretty much there.

Fixes

  • Fixed memory leak that occurs within Rat actors in conjunction with directed control actors like Switch, the setUp() method call didn't remove its current Directory instance from the PauseStateManager as listener.
  • The plain text handler of the Preview browser now displays a Creation of preview failed if it cannot load the file as text. Also fixed the occasional occurring use of the wrong content handler (race condition).
  • Added boolean flag to catch circular calculation in spreadsheet formulas, avoiding costly StackOverflows per cell (can take a very long time in large spreadsheets).
  • The XSLT transformer no longer fails if the stylesheet file property is empty (and wrongly trying to load a non-existing file).
  • adams-weka:
    • The Export action in the Weka Investigator now allows overwriting existing files; also uses any existing file as the suggested export file name.
    • The Weka Investigator dropdown lists for the class attribute in the data table are no longer cached to avoid inconsistencies in case the user modifies the attributes of a dataset.
    • Stopping cross-validation in the Weka Investigator now properly waits for the sub-threads to finish.

Changes

  • The Simple plot for spreadsheet/instances tables now asks the user for number of data points to plot if that should exceed 1000.
  • The File commander now pops up remote directory setup if it should require initialization (when changing the chooser).
  • Panels for choosing files and directories now retain a history of selected elements.
  • The EnterManyValues source can output the key-value pairs as a java.util.Map object now as well.
  • SimplePlot and SequencePlotter now allow the export of the first visible plot to CSV.
  • adams-compress: upgraded lzma-java artifact to 1.3.
  • adams-visualstats: added support for mouse click actions to the ScatterDisplay sink.
  • adams-weka:
    • The WekaPrincipalComponents transformer now outputs not only loadings, but also the transformed data. Automatically filters out attributes that PCA does not support (uses PartitionedMultiFilter internally to add them again to the transformed data).
    • WekaPredictionsToInstances and WekaPredictionsToSpreadSheet allow the user to output the absolute error (default) or not.
    • The Weka Investigator now allows you to copy, save and load workspaces (any graphical output has to be regenerated though; simply right-click on history entry and select Regenerate output). Undo can be turned off now as well. Testing classifier models now outputs from progress information as feedback for the user.

Additions

  • Transformer for changing prefixes of report fields to a new one: ChangeReportFieldPrefixes.
  • adams-weka:
    • added AttributeStatistics column processor for tables displaying Instances objects, like the Data tab in the Weka Investigator.
    • The Weka Investigator now has principal components and partial least squares visualization tabs, which display loadings and scores. The Investigator also offers shortcuts now for the tabs, to quickly open new tabs. An action for randomizing a dataset has been added to the Data tab.
  • adams-imaging:
    • Added MergeObjectLocations transformer to merge the located objects in two reports.
    • The CompareObjectTypes transformer compares the type of located objects in two reports and outputs a comparison spreadsheet.
  • adams-spreadsheet:
    • The LookUpInit standalone initializes an empty lookup table in storage.
    • With the ReportToSpreadSheet conversion, you can turn a Report object into a spreadsheet.
    • With the LookUpUpdate transformer, you can update a spreadsheet that acts as a lookup table using custom rules (e.g., if-then-else).
  • adams-spectral-core: added support for PCA/PLS analysis using spectra to the Spectrum Explorer.