The Directory Layout of a REEF process

This is the first in what will hopefully become a series of blog posts on the internals of REEF to help new contributors find their footing within the REEF code base. The following is based on REEF 0.8.

REEF spawns several processes on the application’s behalf. The simplest feasible REEF program has one, the Driver, but more realistic applications add many Evaluators to that as well. Despite their different roles in the application, all REEF processes share the same basic structure, both in terms of their working directory and in terms of their internal structure. Whenever REEF sets up a working directory for a Driver or Evaluator, it will create a common file system layout, consisting of 2 mandatory and one optional folders:

  • reef/local This contains files that are local to this process only. For instance, this is where the basic Configuration file for this process goes (see below).
  • reef/global This contains files marked as global to all processes in the same job. Typically, those are enumerated in DriverConfiguration.GLOBAL_FILES. REEF copies such files to all processes, possibly using efficient means provided by the resource manager to do so.
  • reef/temp This is used to store temporary files for the current process. This folder is optional and only created on first use. Depending on the resource manager, it can be desirable to keep them here instead of the system-wide temp folder.

If you want to learn more about the file system structure in REEF or add to it, start with the class REEFFileNames. It enumerates all the constants needed for programmatic access to these folders. Also, use an injected instance of TempFileCreator to make sure that your creation of temp files adheres to the conventions of the deployment.