Last week, I gave a talk at the 2014 Workshop on Software Engineering for Machine Learning at NIPS held in Montreal, Canada. The slides are embedded below.
As of half an hour ago, the official home for the REEF (and Tang and Wake) source code is at the Apache Software Foundation. You can check out the current code via:
git clone https://git-wip-us.apache.org/repos/asf/incubator-reef.git reef
That repository contains some changes compared to the latest version we had on GitHub:
- All Maven artifacts now live in the org.apache.reef groupId.
- All packages have been renamed into the org.apache.reef namespace
- Tang and Wake are now in the same repository as REEF and are built as sub-modules in the same Maven project.
While those changes are somewhat painful to our current users, we are excited about the next phase of the REEF project in the Apache Software Foundation.
This is the first in what will hopefully become a series of blog posts on the internals of REEF to help new contributors find their footing within the REEF code base. The following is based on REEF 0.8.
REEF spawns several processes on the application’s behalf. The simplest feasible REEF program has one, the Driver, but more realistic applications add many Evaluators to that as well. Despite their different roles in the application, all REEF processes share the same basic structure, both in terms of their working directory and in terms of their internal structure. Whenever REEF sets up a working directory for a Driver or Evaluator, it will create a common file system layout, consisting of 2 mandatory and one optional folders:
- reef/local This contains files that are local to this process only. For instance, this is where the basic Configuration file for this process goes (see below).
- reef/global This contains files marked as global to all processes in the same job. Typically, those are enumerated in DriverConfiguration.GLOBAL_FILES. REEF copies such files to all processes, possibly using efficient means provided by the resource manager to do so.
- reef/temp This is used to store temporary files for the current process. This folder is optional and only created on first use. Depending on the resource manager, it can be desirable to keep them here instead of the system-wide temp folder.
If you want to learn more about the file system structure in REEF or add to it, start with the class REEFFileNames. It enumerates all the constants needed for programmatic access to these folders. Also, use an injected instance of TempFileCreator to make sure that your creation of temp files adheres to the conventions of the deployment.
REEF, Tang and Wake version 0.8 are on their way to maven central. As always, this release contains general improvements and bug fixes, but also a few nice new features:
The detailed release notes can be found in the GitHub issue trackes: Tang, Wake, REEF.
It is the 21st of August and as for the last couple of months, this means that there is a new REEF release. Given that we have recently been accepted into the Apache Incubator, this is one of if not the last release before we transition development into the Apache infrastructure. And it is a good one:
We found and fixed that last issue when scaling our tests to the 1000 Evaluator mark, which is a new record for REEF applications.
The release is making its way to Maven Central as I am writing this. More detailed changes can be found in the milestones on GitHub (REEF 0.7, Tang 0.7, Wake 0.7)
Today, we released REEF, Tang and Wake 0.6. It is making its way to Maven Central as I am writing this. As always, we make use of GitHub milestones to track changes in a release. You can find the notes here: