Docker

Docker is a great way of deploying applications for processing data, running webservices, etc., all nicely packaged up in a container. Java applications can be packaged with Docker as well and ADAMS is no exception. If you are looking for an introduction to Docker, then have a look at our:

Docker for Data Scientists

Pre-built images

Once a week, our build server pushes out the current snapshots to the public Docker Hub registry, so you do not have to do this. These images are built on top of an Ubuntu Linux distribution, with ADAMS installed from the Debian packages that also get generated by our build server.

These pre-built docker images are available from the following location:

hub.docker.com/u/waikatodatamining/

Currently, the following ADAMS snapshots get packaged in Docker images:

  • adams-base-all

  • adams-addons-all

  • adams-ml-app

  • adams-spectral-app

These images have the following scripts:

  • SNAPSHOT-gui - Starts the ADAMS user interface (requires a local X-Server, see below)

  • SNAPSHOT-exec - Executes a flow using the adams.flow.FlowRunner class

  • SNAPSHOT-daemon - Like *-exec it uses the FlowRunner class, but runs it the background with a scripting engine. This scripting engine is then used for stopping the background flow.

The following docker command will spin up a container of the adams-ml-app image and map the current directory to the /workspace directory inside the container (-v `pwd`:/workspace). By using the local user's user ID and group ID (-u $(id -u):$(id -g)), we won't have any problems with permissions on any files that may get generated by the ADAMS system in the /workspace directory from within the container. For ADAMS to work correctly within the container, we have to set a few environment variables (ADAMS_... and WEKA_HOME), pointing to the right directories and user name.

docker run \
    --rm \
    --pull always \
    -u $(id -u):$(id -g) \
    -e USER=$USER \
    -e ADAMS_USERNAME=$USER \
    -e ADAMS_USERDIR=/workspace \
    -e ADAMS_USERHOME=/workspace \
    -e ADAMS_HOME=/workspace/adams \
    -e "ADAMS_PLACEHOLDERS=FLOWS=/workspace;EXAMPLE_FLOWS=/workspace" \
    -e WEKA_HOME=/workspace/wekafiles \
    -v `pwd`:/workspace \
    -it waikatodatamining/adams-ml-app:latest

Once the container is up and running, we can grab a flow and its relevant data for execution by running the following commands:

mkdir /workspace/data
wget -O /workspace/data/anneal.arff https://github.com/waikato-datamining/adams-base/raw/master/adams-weka/src/main/flows/data/anneal.arff
wget -O /workspace/adams-weka-build_classifier.flow https://github.com/waikato-datamining/adams-base/raw/master/adams-weka/src/main/flows/adams-weka-build_classifier.flow

And now we can execute the flow as follows:

adams-ml-app-exec \
  -headless true \
  -clean-up true \
  -force-exit true \
  -i /workspace/adams-weka-build_classifier.flow

Since we are working in a headless environment, i.e., one without a user interface, all input/output occurs in the terminal.

NB: Webservices or some user-interface related operations can leave some lingering threads which prevent the flow from exiting properly. Therefore, we are using the -force-exit true option to terminate any potential lingering threads once the flow has finished.

Local X-Server

If not already done in the current session, you need to expose your xhost in order to allow the Docker container to display the ADAMS user interface using your local X-Server:

xhost +local:root

Add the following two options to your docker command-line to pass through the X-Server:

-e "DISPLAY" \
-v "/tmp/.X11-unix:/tmp/.X11-unix" \

Which gives us the following full command:

docker run \
    --rm \
    --pull always \
    -u $(id -u):$(id -g) \
    -e USER=$USER \
    -e ADAMS_USERNAME=$USER \
    -e ADAMS_USERDIR=/workspace \
    -e ADAMS_USERHOME=/workspace \
    -e ADAMS_HOME=/workspace/adams \
    -e "ADAMS_PLACEHOLDERS=FLOWS=/workspace;EXAMPLE_FLOWS=/workspace" \
    -e WEKA_HOME=/workspace/wekafiles \
    -e "DISPLAY" \
    -v "/tmp/.X11-unix:/tmp/.X11-unix" \
    -v `pwd`:/workspace \
    -it waikatodatamining/adams-ml-app:latest

Now we can execute the previous flow as follows and view the results graphically:

adams-ml-app-exec \
  -i /workspace/adams-weka-build_classifier.flow

Since we wanted to view the results, we had to drop the -clean-up true and -force-exit true options, which will remove all graphical output otherwise. But this requires us now to kill the process by pressing Cltr+C.

Of course, you can then start up the full ADAMS user interface from the console as well. This is done by using the SNAPSHOT-gui command. In case of the adams-ml-app snapshot, this would be adams-ml-app-gui.

Once you have closed ADAMS in an interactive container (-i), you can exit the container with the exit command (or just use Ctrl+D).

Windows

The above instructions assume that you have docker installed on your Linux machine. If you are on Windows, you can run Docker and graphical X applications from WSL2 as well, as long as you have a new enough Windows build (Windows 10 Build 19044+) and an up-to-date WSL2 installed:

The MOA blog post also has details on getting the X-Server working on Windows and Mac OSX.

Custom images

However, these pre-built base images may not suit your needs, as they may be too large or lacking functionality that you need. If you want to containerize a single worker flow, then you can have a look at the adamsflow2docker library:

github.com/waikato-datamining/adamsflow2docker

This project generates a Dockerfile from a list of ADAMS modules (and version) that the application should be comprised of and executes the specified flow inside the image using the adams.flow.FlowRunner class.