Updates 2017/07/21

The new semester started last week, so I was busy with my students. Development has mainly happened around deeplearning4j and prediction support for Microsoft's deeplearning library CNTK.

Fixes

  • LoadBalancer: fixed losing of outer variables; uses a Flow control actor now internally for better encapsulation.

Changes

  • Added support for outputting relative paths with the FileSystemSearch source: LocalDirectorySearch, LocalDirectorySearchWithComparator, LocalDirectorySearchWithCustomSort, LocalFileSearch
  • Panels managed by the DisplayPanelManager get re-used via their unique ID now properly (eg when using a variable), not just when mergable. Allows out of order updates of sequence plots now.
  • The Min and Max transformers can return 1-based indices now.
  • Added support for ADAMS_LIBRARY_PATH environment variable to adams.core.management.Launcher: its content gets supplied to the JVM via -Djava.library.path (used for native libraries like CNTK, MKL).
  • adams-dl4j:
    • Added ability to DL4JTrainModel transformer for testing the model on a test set (split off the training data) and output the best model found so far, with associated statistic(s).
    • Added support for criteria to stop training to DL4JTrainModel rather than just having fixed number of epochs.
  • adams-weka: The WekaFilter transformer can make use of storage and source actor now for obtaining the actual filter to use, not just serialized file or the filter specification.

Additions

  • Added the ArrayNormalize array statistic, which normalizes an array to sum up to 1.0.
  • adams-cntk:
    • added support for applying CNTK models: CNTKModelApplier.
    • added spreadsheet writer for CNTK text file format: CNTKSpreadSheetWriter
    • added image feature generator: DefaultCNTK
  • adams-cntk-weka: Added pseudo-classifier that uses a pre-built model: functions.CNTKPrebuiltModel
  • adams-dl4j: Added transformer for randomizing dataset: DL4JRandomizeDataset
  • adams-imaging:
    • Added ScaleReportObjects transformer for scaling objects defined in reports.
    • Added example flow for training an OpenCV Haar cascade from annotated images: adams-imaging-opencv_train_haar.flow
  • adams-imaging-openimaj: added generic object detector class hierarchy, to be used by adams.flow.transformer.locateobjects.OpenIMAJObjectDetector
  • adams-spreadsheet: added class hierarchy for processors that work on the selected rows in a spreadsheet table, e.g., copying files using the filename from the specified column. Functionality available through SpreadSheetDisplay sink.

Updates 2017/06/30

A lot of effort has gone into deeplearning4j the last few weeks: upgraded to the latest version, support for random network generation (how doesn't want to avoid hyper parameter fiddling???) and instructions for using Intel's MKL libraries for speeding up model building.

Filters can be serialized from the Weka Investigator now as well and re-used with the filter called SerializedFilter.

Fixes

  • The FoorLopp source now skips the consistency tests if a variable is attached to at least one of the properties: lower/upper/step
  • Downgraded MySQL the driver to 5.1.42, after receiving java.sql.SQLNonTransientConnectionException: CLIENT_PLUGIN_AUTH is required exceptions when using 6.0.6 of the JDBC driver.
  • removed double quotes from default executable of JDeps and JMap control actors.
  • adams-dl4j:
    • The DL4JModelToJson and DL4JModelToYaml conversions now distinguish between Model and MultiLayerNetwork objects, to retrieve the correct configurations to convert.
    • The DL4JModelWriter sink ensures now that MultiLayerNetwork has been initialized to avoid errors.
  • adams-event: fixed forcing of variables in Cron standalone actor.
  • adams-net:
    • JavaMailSendEmail - using the javax.activation.DataHandler class with a URL didn't close the stream of attachements, resulting in locked files on Windows.
    • re-using existing sessions now: FTPConnection, SMBConnection, SSHConnection

Changes

  • The Exec source can output stdout and stderr at the same time, ignore process errors and supports a working directory for the process.
  • Boolean/Mathematical/StringExpression: added "str(...)" method for converting objects/numbers into strings: str(expr) = any object's toString() method; str(expr,numdec) = any number is output with at most numdec decimals after the decimal point (trailing 0s get chopped off); str(expr,decformat) = applies the format to the number using java.text.DecimalFormat
  • SelectFile and SelectDirectory now support output with forward slashes.
  • adams-dl4j:
    • Upgraded deeplearning4j to 0.8.0
    • DL4JTrainModel now as a monitor variable for resetting the model, allowing for training sequentially on multiple datasets.
    • Added instructions for using Intel MKL libraries to speed up processing.
    • Moved the InMemoryStatsListenerConfigurator to the new adams-dl4j-insight module.
  • adams-weka:
    • The Weka Investigator now allows filters to be serialized in the pre-process panel.
    • The PrincipalComponentsJ filter now has the option -simple-attribute-names, which generates attributes like PCA_1...n instead of compiling them from the other attribute names.

Additions

  • Added simple GUI tool for performing XSLT (XML, XSL and Output panel), available from the main menu under Maintenance.
  • adams-dl4j:
    • Added the CallableActorScoreListenerConfigurator iteration listener, which forwards the iteration count/score pair to a callable actor (eg for plotting).
    • Added conversion for turning DL4J datasets into spreadsheets: DL4JDataSetToSpreadSheet
    • Added conversion for converting spreadsheets into DL4J DataSets: SpreadSheetToDL4JDataSet
    • Added fake configurator, as it only retrieves model from storage: FromStorage
    • DL4JModelGenerator source generates model(s) using the specified generator scheme.
    • Added previews in the Preview browser for DL4J models in JSON and YAML
    • Conversions for recreating models from JSON and YAML: DL4JJsonToModel and DL4JYamlToModel
    • Conversion for creating actual model from configurator: DL4JConfiguratorToModel
  • New module: adams-dl4j-insight for providing insight in model building, which is not necessary when deploying models (avoiding bloat).
  • adams-dl4j-weka: added conversions WekaInstancesToDL4JDataSet and WekaInstanceToDL4JINDArray, using Mark Hall's code from the Weka package for DL4J.
  • adams-imaging: added the RandomBoundingBox left-click processor.
  • adams-spreadsheet:
    • added simple spreadsheet filtering framework via the SpreadSheetFilter transformer and the filter class hierarchy it uses. Initial filters: Normalize, Standardize.
    • The SpreadSheetInsertColumnPosition conversion inserts column position in string (eg BG), replacing the specified placeholder
  • adams-weka:
    • The WekaFilter spreadsheet filter allows to apply any Weka filter to a spreadsheet.
    • weka.filters.SerializedFilter is a meta-filter that applies a serialized, trained filter to the data (no further training required).

Updates 2017/06/12

A lot of work has been done on better integration of the deeplearning4j framework. Support for rsync within the flow was added as well, e.g., for syncing local files with ones on a cloud server.

Fixes

  • adams-dl4j:
    • DL4JCrossValidationEvaluator and DL4JTrainTestSetEvaluator now storing the model rather than the configurator in the container that they are forwarding.
    • DL4JDatasetIterator now fits preprocessor first if an instance of DataNormalization.
  • adams-weka: When changing the model file in the Investigator's classify/cluster tab now correctly resets any previously loaded model.

Changes

  • Using now processoutput4j library (https://github.com/fracpete/processoutput4j) for capturing the output from processes launched from within Java.
  • The following sources now have an additional conversion option to directly convert their output to a different type: Variable, VariablesArray, CombineVariables, StorageValue, StorageValuesArray, CombineStorage, StringConstants.
  • The IncVariable and IncStorageValue transformers can output the incremented value now instead of forwarding the input token.
  • MessageDigest can operate on arrays now as well, computing a single digest over all of them.
  • Upgraded lanterna to 3.0.0-rc1 (used for terminal-based user interfaces).
  • Added option to the CsvSpreadSheetReader to drop rows with too few/many cells: -skip-differing-rows
  • adams-dl4j:
    • added support for mini-batches to DL4JCrossValidationEvaluator, DL4JTrainModel and DL4JTrainTestSetEvaluator.
    • added support for listeners when training a model
    • added deeplearning4j-ui_2.10 dependency, to monitor training progress using InMemoryStatsListenerConfigurator
    • DL4JTrainModel/DL4JTrainTestEvaluation/DL4JCrossValidationEvaluator now store the final epoch number in the container (model/evaluation).
    • DL4JTrainModel allows incremental training now, outputting the model every X epochs (output interval).
  • adams-rats: added accepted/generated types of Rat input/output to additional information output displayed in the help screen.
  • adams-spreadsheet:
    • SpreadSheetSubset now supports R-like matrix subset expressions ,3:9 instead of specifying row and col ranges.
    • SpreadSheetSplitColumn now uses the header for the generated columns if it can be split into the same number of elements.
  • adams-weka:
    • The SpreadSheetToWekaInstances conversion can enforce STRING attributes now by using -1 as maxLabels.
    • The PartitionedMultiFilter2 (and therefore MetaPartitionedMultiFilter) now filters the data only once during the first batch, resulting in speed improvements.

Additions

  • With the adams.logging.Logging console application, you can connect to an ADAMS instance that is, for instance, running as a daemon/service, listening to its logging output. The logging.sh/logging.bat scripts start the listening application (just outputs the logging to stdout).
  • New boolean conditions for checking boolean flags: StorageFlagSet and VariableFlagSet.
  • Convenience transformer for setting a boolean flag in storage: SetStorageFlag.
  • Added new menu item Full expansion to the Flow editor for creating a fully expanded flow (i.e., pulls in all external actors).
  • Added FileTailer transformer for monitoring text files ala tail -f on Unix systems.
  • Added most of the functionality of the Remote Control Center GUI to the terminal-based interface (can be started up with adams.terminal.Main).
  • Added RightPad conversion for padding strings on the right-hand side.
  • New module adams-rsync for rsync support:
    • RSync - offers all (!) rsync options
    • Rsync4jRsyncBinary - outputs the rsync binary used by rsync4j library
    • Rsync4jSshBinary - outputs the ssh binary used by rsync4j library
    • SimpleRSync - commonly used rsync options
  • adams-dl4j:
    • added NormalizerMinMaxScaler and NormalizerStandardize dataset pre-processors for scaling numeric attributes.
    • added SimpleRegressionMultiLayerNetwork as an example for performing regression.
  • adams-twitter:
    • upgraded twitter4j to 4.0.6
    • Added TwitterUser transformer for retrieving information about a user.
  • adams-visualstats: added MOA-based CUSUM (cumulative sum) and Page-Hinkley test control charts.
  • adams-weka: added Kennard-Stone filter.

Updates 2017/05/12

Being busy with commercial ADAMS projects still result in ample number of improvements to the base ADAMS system. The last few weeks were no exception.

Fixes

  • The table model for spreadsheets now displays NaN, +/-Infinity as strings.
  • Spreadsheet writers that can use formatting now use 'NaN' and '+/-Infinity' strings for these numbers.
  • Fixed the forceVariables method for Tee/Trigger/LoadBalancer/WhileLoop and derived actors: internally used Sequence actor gets updated correctly now.
  • adams-net: FTPSend and SFTPSend now forward the successful filenames as the documentation says.
  • adams-dl4j: fixed handling of regression problems.

Changes

  • In order to make actor names unique, they now get appended by * (x)* with x being a number starting from 2
  • The SetVariable standalone/transformer can interpret the variable value now as boolean, string or mathematical expression, making it easier to compute new values.
  • Added ability to use custom dirs/jars for JDeps control actor instead of the application's classpath.
  • The CallableActorScreenshot control can forward screenshot as BufferedImageContainer now as well, not just storing it in a file.
  • The actorFile property can contain now programmatically set variables like flow_dir, enabling the include external actor derived actors to make use of a variable as well (relative to the main flow). Instead of attaching a variable to the property, you have to use mixed notation: @{flow_dir}/some.flow.
  • Added equal frequency calculation to the ArrayHistogram statistic.
  • The RandomNumberGenerator source can output arrays now.
  • With the ArrayHistogramRanges transformer it is possible to output the interval ranges that the ArrayHistogram statistic generates (easier than iterating through the header names of the generated spreadsheet).
  • Added support for restorable actors, ones that can write/read their state to/from disk during execution; currently supported by: EnterValue, EnterManyValues, SelectDirectory, SelectFile.
  • adams-spreadsheet:
    • SpreadSheetStatistic now supports column names in locations.
    • SpreadSheetExtractArray can output strings now as well, instead of just the native cell object type.
  • adams-weka:
    • WekaInstancesStatistic now supports attribute names in locations.
    • The WekaGeneticAlgorithm transformer can be initialized from a WekaGeneticAlgorithmInitializationContainer container now, containing algorithm and training data.
  • adams-spectral-2dim got renamed to adams-spectral-2dim-core.

Additions

  • Added HasClass boolean condition that checks whether the specified class is available on the classpath.
  • Added StringExpression source and transformer for evaluation string processing expressions, like left(upper("Hello World!"), 5).
  • Added meta-marker paintlet ByNameMarkerPaintlet that matches the name of the sequence against the supplied regular expression to determine whether to paint the markers or not.
  • With the ArrayHistogramRanges transformer it is possible to output the interval ranges that the ArrayHistogram statistic generates (easier than iterating through the header names of the generated spreadsheet).
  • adams-pdf: added MetaHeadline PDF proclet to insert headline and then apply a base-proclet.
  • adams-spreadsheet: the SpreadSheetHistogramRanges transformer is the equivalent of ArrayHistogramRanges but for SpreadSheet objects.
  • adams-weka:
    • The WekaInstancesHistogramRanges transformer is the equivalent of ArrayHistogramRanges but for Instances objects.
    • Added support for using test data to the WekaGeneticAlgorithm transformer, but only Hermione takes advantage of it.
    • Added convenience transformer WekaGeneticAlgorithmInitializer to generate a WekaGeneticAlgorithmInitializationContainer container for priming a genetic algorithm.
  • Added some modules to the adams-spectral-base framwork:
    • adams-spectral-2dim-handheld contains support for some handheld NIR scanners, like the SCiO (https://www.consumerphysics.com/myscio/scio/).
    • adams-spectral-2dim-webservice adds webservice capability
    • adams-spectral-2dim-rats adds RATS support

Have a great weekend!