Friday 13th (2016/05)

A whole bunch of changes, just in time for Friday the 13th. :-)

Fixes

  • The new from clipboard menu item in the Flow editor now no longer requires an editable flow.
  • adams-rats: the Rat standalone and derived actors now handle stopping of flow better, no more NullPointerException when calling stopIfNecessary.
  • adams-spreadsheet: SpreadSheetFileReader's getItemClass() method now checks whether the reader returns a spreadsheet class or not to avoid NullPointerException.
  • SetVariable transformer no longer uses content passing through as value if variable attached to variableValue has an empty value.
  • Upgraded some libraries and barred other, duplicate ones from being added to the build process to avoid duplicate jars. This was triggered by ADAMS no longer starting at one of our customers due to java.lang.IncompatibleClassChangeError exceptions.

Changes

  • The Breakpoint control actor now sports copy to clipboard and paste as new flow buttons on the Source tab.
  • It is now possible to have platform-specific properties files. The default one with extension .props is always read first and an existing platform-specific one can override its values. Extensions for the platforms are as follows:
    • Linux: .linux
    • Android: .android
    • Mac: .mac
    • Windows: .windows
  • The Flow editor now has Mac-specific shortcuts (feedback welcome, whether these make sense or need adapting). Other standard keystrokes in the GUI get replaced with Mac-specific ones as well. The actor tree popup has improved search and display: empty search displays all actors again, search starts with first character, clicking on filter/strict checkboxes expands the tree fully.
  • The Edit external flow menu item in the flow editor's tree popup menu now opens flow in new tab rather than dialog.
  • Upgraded the jfilechooser-bookmarks dependency, with the filechoosers now sporting a copy and a paste button for the current directory.
  • Switched to [jclipboardhelper library](https://github.com/fracpete/jclipboardhelper) for clipboard-related operations.
  • The CombineVariables source now shortens its quickinfo display to 40 characters to avoid overly long displays.
  • adams-spreadsheet: The SpreadSheetDbWriter sink now supports batch import, which can speed up data import dramatically.
  • Boolean conditions in the flow now use the actor's logger to allow locating of offending conditions.
  • adams-visualstats: refactored the module to be SpreadSheet-based and removed the adams-weka dependency.
  • adams-video: the TrailDisplay sink now allows you to save the trail in the simple trail format (Save text as...).
  • adams-weka: the WekaInstancesToSpreadSheet conversion now creates a lightweight view if the InstancesView spreadsheet type is selected. Of course, the view doesn't support all the features of a spreadsheet object, but avoids costly conversion if only ever read operations are performed. Adding and removing rows is possible, but not changing column names, inserting/deleting columns.
  • Internally, the filter class hierarchy got refactored and now uses the new adams.data.filter.Filter interface instead of the abstract superclass adams.data.filter.AbstractFilter. This was necessary in order to introduce the new adams.data.filter.BatchFilter interface. This new interface is for filters that can operate on a number of data containers at a time. The interface does not require to return the same number of data containers after the filtering. Currently, only the PassThrough implements this new interface (as educational example).
  • Using now Groovy 2.4.5 in order to align with Weka's kfGroovy package.

Additions

  • adams-rats: the RatPlague rat simply duplicates itself to enable parallel execution of the same code, but working off different queues (e.g., load balanced queues)
  • adams-core
    • The ForwardingScriptingEngine for the remote command framework forwards commands obtained via a base scripting engine and forwards them to a defined forwarding connection. This can be used for performing load-balancing.
    • DefaultMasterScriptingEngine and DefaultSlaveScriptingEngine allow for distributed computation. The slave engines simply register with the master and then wait for jobs to execute. The master performs load-balancing on the registered slaves. This allows sending of sub-flows (incl storage items and variables) to be executed on remote machines.
    • The ManualFeedScriptingEngine is for internal use only, as it requires the remote commands to be added programmatically.
    • The ExecuteRemoteCommand executes the remote command coming through, making use of the ManualFeedScriptingEngine.
    • ContainerToVariables and ContainerToStorage allow easy transferral of container values into variables/internal storage that match the regular expression.
  • adams-maps: added a spreadsheet reader for Arc/Info ASCII Grid (or Esri Grid) files.
  • adams-weka: added GaussianProcessesNoWeights classifier, which is just a copy of GP before it could natively handle instance weights (merely uses resampling).
  • adams-ml: simple actual vs predicted plots can be generated with the (aptly named) ActualVsPredictedPlot sink, which uses a spreadsheet as input.
  • adams-imaging: added handlers for displaying image metadata in the preview browser.
  • started incubating the adams-jclouds and adams-openstack modules for cloud integration.

Update 2016/04/26

It's been a while with a post regarding updates as I've been for once away from work for a while and the rather busy (because of being away). I just never seemed to get around posting any updates. Anyhow, here you go! :-)

Fixes

  • Turning actor into callable actor didn't refresh the tree properly when there wasn't a CallableActors standalone yet present
  • Main window now shows the currently connected databases again
  • CsvSpreadSheetWriter
    • fixed enncoding typo in command-line flag
    • added no-header flag to output only data cells
    • added check-file-exists flag to reset the writer (ie the stored header), if the file no longer exists (eg got deleted elsewhere in flow)
  • spreadsheet's DoubleCell used wrong format for DateTimeMSec types, DateTimeFormat instead of DateTimeMSecFormat
  • FlowRunner now no longer generates a NullPointerException when using the SpreadSheetVariableRowIterator transformer. Flow Runner and Editor now use the same code base for executing flows.
  • Image viewer's meta-data extraction using Sanselan no longer chokes on placeholder files (eg ${HOME}/some/where/image.jpg)
  • fixed handling of variables within remote command created by NewRemoteCommand source
  • loading flows with just a single actor no longer fails (erroneously got identified as non-compact format)
  • Object editors now display empty strings as [empty] rather than invisible.
  • adams-rats: the Rat standalone now outputs flow errors with the logger of the actor that generated the error, instead of quietly discarding the error messages.

Changes

  • adams-spreadsheet:
    • NewSpreadSheet source now allows initialization of comments
    • FixedTabularSpreadSheetWriter now allows specification of column width per column and the omission of the headers and borders
  • Spreadsheet readers that handle missing values now use a regular expression for the missing value, rather than just a simple string.
  • The flow editor now warns when the user tries to externalize a range of actors that contain actors that reference a callable actor.
  • Renamed control actor TriggerRemoteExecution to RemoteExecutionTrigger.
  • DownloadFile sink in adams-net now shows the output file in the quick info as well.
  • adams-rats: the Rat actor now allows to forward flow execution errors to an error queue as well (incl the associated incoming data that was processed at the time).
  • TryCatch control actor now offers custom error post-processors to be specified, e.g., reacting to a OutOfMemory exception.
  • Swapping actors in the flow editor transfers now more options (however, there is no guarantee that all of them get transferred)
  • adams-maps: finally received some documentation on PostgreSQL/PostGIS handling.
  • adams-spectral-core: the SpectrumExplorer now sports a recent files sub-menu

Additions

  • adams-spreadsheet:
    • SpreadSheetAppendComments transformer allows the modification of spreadsheet comments
    • FixedTabularSpreadSheetReader for tabular data that uses fixed column-widths
  • adams-imaging: the BufferedImage transformer ThresholdReplacement allows the replacement of pixels that fall below/above a threshold in the grayscale space with a specified replacement color
  • adams-core:
    • conversion SimpleUnicodeToAscii and SimpleAsciiToUnicode allow conversion of unicode characters into ASCII counterparts like 'xABCD' and vice versa.
    • added FileBasedScriptingEngine that monitors a directory for remote commands to execute, rather than on a port (eg scp-ing files into a server)
    • added SendFile remote command to transfer a binary file
    • using the FTPConnection scripting connection, it is possible to transmit remote commands using FTP
    • the Multicast scripting connection allows you to send the same remote command to multiple hosts
    • multi handlers for request/response handling are now available for the remote command framework
    • NewTempFile source generates a unique, temp file name.
    • added class hierarchy of error post-processors to be used in conjunction with error handlers (see TryCatch).
  • adams-rats: rat output scheme QueueDistribute allows load-balancing among the define queues
  • adams-weka:
    • command-to-code dependency for turning Weka command-lines into code (WekaCommandToCode conversion). WEKA Command to code tool (user-mode: developer) supercedes WEKA Options conversion
    • added copy of LinearRegression classifier as LinearRegressionJ since the upcoming release of Weka will feature a modified version
    • added new classifier for numeric classes that uses a predefined formula, MathExpressionClassifier, with the attribute values being available in the formula via their names.
  • adams-spectral-core: added new reader for Opus files, called OpusSpectrumReaderExt

Updates 03/03

A bit of a bigger post regarding changes, as I've been busy finalizing my MOOC material - the first two MOOCs are available again and the third one is slated for end of April (at the earliest). And also been quite busy with some of my commercial projects. Hence the accumulation of changes...

Fixes

  • adams-core: it is now possible to turn off parsing of formulas for the CsvSpreadSheetReader, useful when replaying twitter archives.
  • adams-spreadsheet:
    • The conversion SpreadSheetJoinColumns now keeps the cell types intact.
    • SpreadSheetAppend transformer keeps cell types intact, too (SpreadSheetHelper class fixed).
    • added SpreadSheetView class that allows to wrap another spreadsheet with a subset of rows and columns
    • introduced SpreadSheetViewCreator interface, implemented by SpreadSheetColumnFilter, SpreadSheetRowFilter and SpreadSheetSubset (experimental)
  • adams-twitter: reading of CSV archives now avoids parsing of formulas (a lot of tweets contain =)
  • adams-video: added a handler for videos to the Preview Browser, using the VLCj Video Player component
  • fixed saving of font preferences
  • expanded external flows no longer get saved alongside the actual flow with the new flow format
  • option handling: OptionManager now correctly removes (= hides) options
  • adams-weka: the Append datasets wizard now handles string and relational attributes correctly

Changes

  • Flows are now stored by default in a compact format, which also allows the storage of invalid flows. By allowing this, you won't lose any data when saving and reopening a flow, with the check flow turned off. Another trade-off is for large flows, that they load faster - but you have to have 1000s of actors to notice that.
  • The annoying tool tip in Weka's GOE class tree is now disabled by default. You can enable it again by changing the ShowGlobalInfoToolTip property in the GUIEditors.props file. This file should be in the in the root directory of your binary ADAMS installation.
  • Tables displaying spreadsheets now allow to change the number of decimals being displayed via the tabel cell popup menu and the calculation of the optimal column width(s) via the table header popup menu.
  • The Text operation for the Draw transformer in the adams-imaging module now supports multi-line text as well.
  • Comparing text tool now allows copy/pasting of files by right-clicking on the filename display above the diff view
  • adams-weka:
    • WekaInstancesInfo transformer can generate info also from weka.core.Instance objects.
    • WekaFilter now allows output of built filter alongside filtered data in a container (allows filter serialization); serialized filters can be loaded from disk now as well (pre-built filters, for instance)
  • The SpreadSheet class has been refactored into the SpreadSheet interface and the DefaultSpreadSheet class. This will allow for more memory efficient wrappers for spreadsheet-like objects (eg database tables, Weka Instances) in the future, instead of having to convert the data in a CPU and memory intensive step.

Additions

  • adams-core: With the HexReplace transformer you can replace byte sequences (find and replace entered in hex notation).
  • adams-spreadsheet: added simplified version of the CsvSpreadSheetReader called SimpleCsvSpreadSheetReader (uses CsvSpreadSheetReader behind the scenes).
  • generic framework for remote commands (see below for details)
  • adams-heatmap: added support for selection processors in the heatmap viewer, currently implemented is Crop

Remote commands

Below, a short overview of new functionality around the remote command framework. Ultimately, this framework will be used to send sub-flows for execution off into the cloud. It is already possible to send worker flows to remote servers, e.g., for performing computational expensive modeling. But with the planned cloud integration, you will be able to send such jobs into, for instance, OpenStack clouds.

  • adams-core:
    • NewRemoteCommand instantiates a new command
    • RemoteCommandReader loads a command from disk
    • RemoteCommandWriter saves a command to disk
    • GetRemoteCommandPayload retrieves the payload objects from a command
    • SendRemoteCommand sends a command to remote ADAMS instance for execution (eg executing worker flow)
  • adams-net: added connection schemes for the remote command framework that uses SSH tunnelling, sending commands through the SSH port or via scp (secure copy)

That's it with ADAMS development for the time being, as I'll be enjoying time away from work. :-)

AbstractActor -> Actor - update

Just finished committing and kicked off a build.

All repositories should have a new tag from just before the commit:

2016-02-15_before_abstractactor_to_actor_refact

Let me know if you come across any errors or weird behavior after updating your code base.

AbstractActor -> Actor

On Friday, I embarked on a major refactoring mission, something that I've put off for years now.

A few years into the development of ADAMS, I introduced the Actor interface, attempting to make it simpler with other derived frameworks, in order to avoid casting etc underneath the hood. However, this required touching a lot of classes, including a lot of production code. The reason why I shied away from doing it.

But, things change and I'm in the process of developing some other commercial frameworks from the ground up (but still based on ADAMS). Instead of using AbstractActor in methods and member variables, I've switched to using the Actor interface, getting rid of a lot of unnecessary (and potentially dangerous) casts. There were about 6,500 occurrences of AbstractActor in the various code bases (including unit tests), so It took a while going through all of them and changing the ones where it made sense. At this stage (haven't checked anything in yet), everything compiles and the unit tests pass. I'll be making a lot more tests still, but I'm aiming for a massive commit on Monday night with all those changes. So, the next time you update your code, it might take a bit longer than usual... ;-)

I'll keep you posted!

updates 9/2/16

I finally finished recording my lessons for the upcoming MOOC, which gives me a bit of breathing space for some ADAMS stuff again:

Fixes

  • adams-spreadsheet: spreadsheet file viewer's close operation now actually closes the window
  • implemented more efficient undo/redo in the Flow editor, which is noticeably faster for large flows with 100s or 1000s of actors
  • adams-spreadsheet: SpreadSheetVariableRowIterator and SpreadSheetStorageRowIterator now handle missing values correctly (not just missing cells)
  • adams-excel: the ExcelStreamingSpreadSheetReader now handles missing cells in the XML data stream properly

Changes

  • flow editor tabs Variables and Storage names now allow locating all usages of the selected variable/storage name - like the tree popup menu's Find usages
  • adams-weka: Weka upgraded to revision 12423 to include new GaussianProcesses classifier that can handle instance weights natively

Additions

  • Swap actor menu item got added to the Flow editor's tree popup menu.

Thanks for your feedback!

Cheers, Peter

updates 25/1

Fixes

  • adams-weka: added note to WekaAttributeSelection transformer that cross-validation will not produce any reduced/transformed data
  • adams-weka: upgraded Weka to revision 12379 to fix handling of string attributes in some filters

Changes

  • Flow tab Parameters now displays arrays as ordered list
  • adams-video: MjpegImageSequence and MovieImageSequence now have a parameter to limit the number of images generated
  • containers now have a short help string attached including the class that the item represents; this help gets displayed in an actor's help dialog, section Additional information
  • adams-core:
    • SelectObjects interactive source can be used for objects that don't belong to a class hierarchy now as well, e.g., adams.core.base.BaseString. The only requirement is that the object can be instantiated from a String.
    • ContainerValuePicker now has an ignoreMissing flag (on by default), which allows you to generate logging/error in case of missing container values.

Additions

  • adams-core: the SelectArraySubset transformer allows the user to select the desired array elements interactively
  • adams-imaging: added Average/Median multi-image operations
  • adams-heatmap: added HeatmapToBufferedImageExpression conversion which uses a mathematical expression for converting the heatmap values
  • adams-weka:
    • added main menu item for the new WEKA Workbench
    • added WekaAttributeSummary sink for visualizing attributes in a dataset
  • adams-video
    • added MovieInfo transformer for extracting information from a movie file
    • added MovieImageSampler transformer that outputs the images obtained through a sampling algorithm

fixed Weka performance issue

This week, I have been mainly working on bugfixes in various frameworks, website development (including the ADAMS one!) and on my content for the third MOOC in the Weka MOOC series. Hence it was a bit quieter on the ADAMS side of things.

Fixes

  • I took the unusual step of adding a non-release Weka to ADAMS. It is a post-3.7.13 release Weka (rev 12311), which fixes the severe performance issue that I uncovered the other week when using the setOptions method of Weka classes. Several large flows, with lots of Weka classes defined, in one of our commercial tools suddenly took more than 10 minutes to load.

Changes

  • adams-weka: the multi-search-weka-package dependency got upgraded to 2016.1.15

Additions

  • the new flow editor tabs Variables and Storage names allow you to list all variables/storage names that are used in the current flow. These tabs simply make the ListAllVariables and ListAllStorageNames actor processors available in a convenient way.

That's it for this week!

New Website launched

Up till now, the ADAMS website was very much static and a hand-crafted bootstrap3-based one. However, this has changed now and I switched to a Nikola powered one.

Nikola is a static site generator written in Python. As it uses plain text files as input, like reStructured Text or Markdown, and converts these into HTML using themes, it allows you to concentrate on the content rather than having to worry about manually crafting HTML code.

upgrades and fixes

The new year started quietly with a few library upgrades and only a few bug fixes.

Fixes

  • The Jump to functionality of the Find usages dialog in the flow editor was broken.
  • Find usages now also lists attached variables

Changes

  • adams-weka: Weka was upgraded to 3.7.13 - I also uncovered/reported a performance bottleneck in Weka, relating to a scheme's setOptions method. Due to some new functionality, this call was almost 100x slower than previously. This was only noticeable in classifiers that configured other schemes using this approach.
  • adams-weka: GridSearch was upgraded to 1.0.9 - mainly to work with 3.7.13
  • adams-video: VLCj was upgraded to the latest 3.10.1
  • adams-core: Highlight variables and Remove variable highlights menu items have been removed from the flow editor's View menu. Instead, you can use the Locate variable menu item from the Edit menu (also new is the Locate storage name menu item).

Additions

  • adams-video: new tool Screencast added, which allows you to record screencast data (sound/webcam/screen) to be used for creating tutorials, for instance.
  • adams-video: added the following standalones for automatically recording screencast data while a flow runs - but there is still a performance issue, which I need to investigate at some stage
    • RecordingSetup
    • StartRecording
    • StopRecording
  • adams-meka: added new visualization sinks for Meka
    • MekaGraphVisualizer
    • MekaMacroCurve
    • MekaMicroCurve
    • MekaPrecisionRecall
    • MekaROC