\name{Rdsm}
\alias{Rdsm}

\title{Software Distributed Shared-Memory for R}

   \description{Rdsm is a package for parallel programming in R, in
   which multiple instances of R are run on either multicore and/or
   networked machines.  It provides the programmer with a
   \emph{shared-memory} world view even if run on networked machines.
   
   One of the R processes runs as a server, while the other processes
   run as clients.  The application code runs on the clients.  }

\section{Advantages}{

   Whether or not the platform is true multicore, a major advantage of Rdsm
   is that it gives the programmer a shared-memory view, considered by many
   in the parallel processing community to be one of the clearest forms of
   parallel programming.  
   
   Another advantage of Rdsm is that it is easy to convert sequential R
   code to parallel Rdsm code.  This is because (a) Rdsm code does not
   explicitly move data across R processes and (b) Rdsm objects are
   accessed with the ordinary R syntax for vectors and matrices.

   Finally, shared-memory code is considered more portable than
   \emph{message-passing} code, e.g. Rmpi, as Rmpi code is often
   written in a manner meant to take advantage of a particular network
   structure.

   To be sure, the message-passing approach can achieve somewhat greater
   speed in some applications.  However, many programmers believe that
   the clarity and development ease/speed of the shared-memory paradigm
   more than offsets this aspect.  Rdsm is thus aimed at providing this
   paradigm in for parallel R applications.

   See for example Chandra (2001), Hess \emph{et al} (2003) etc.

}

\section{Application-Specific Variables}{

   Rdsm variables consist of R vectors and matrices.  Though they must be
   created as of class \code{dsmv} and \code{dsmm}, respectively, ordinary
   R syntax is used to read or write them.  
   
   For example, your Rdsm program includes \code{m}, a 4x5 matrix
   variable of class \code{dsmm}.  If you wished to fill the second column
   with 1, 2, 3 and 4, you would write
   
   \preformatted{
   m[,2] <- 1:4
   }
   
   just as you would in ordinary R.  (And of course it \emph{is} ordinary R
   code; you are still running R.)  
   
   In other words, other than a call to the Rdsm function \code{newdsm()}
   at the beginning of an Rdsm program to create each Rdsm variable, Rdsm
   computational code looks identical to that of ordinary R.  Thus it is
   easy to convert a sequential R program to a parallel Rdsm program.

}

\section{Built-in Variables}{

   Rdsm's built-in variables are stored in a single global variable
   \code{myinfo}, a list consisting of these components:
   
   \itemize{
      \item \code{myid}: the ID number of this client
      
      \item \code{nclnt}: the total number of clients
   
      \item \code{platform}:  either "MPI" or "SOCK", depending on which
      underlying communications platform we are running Rdsm on (see
      below)
   }

}

\section{Built-in Synchronization Functions}{

   Rdsm includes some built-in synchronization functions similar to those
   of threaded or other shared-memory programming systems:
   
   \itemize{
      \item \code{barr()}: classical barrier function
      \item \code{lock()}: classical lock function
      \item \code{unlock()}: classical lock function
      \item \code{fa()}: fetch-and-add function
   }
   
   There is also \code{dsmexit()}, called when a client has finished its
   work.
   
   All of these functions have documentation in this directory.

}

\section{Built-in Server Initialization Functions}{

   These too have documentation in this directory.
   
   \itemize{
      \item \code{init()}:  initializes a client's connection to the server
      \item \code{srvrinit()}:  initializes the server
      \item \code{srvrloop()}:  runs the server
   }

}

\section{Internal Structure}{

   Though transparent to the Rdsm programmer, internally Rdsm has the
   following architecture.

   The Rdsm application variables reside on the server.  Each read from
   or write to an Rdsm variable involves a transaction with the server.

   Rdsm runs on one of two communications platforms:  MPI, via the Rmpi
   package, or network sockets.  The advantages and disadvantages of each
   are:
   
   \itemize{
      \item Rdsm code will generally run much faster on the MPI platform,
      as it does not involve the packing and unpacking of messages used in 
      socket mode.  These operations slow things down not only because they
      themselves take time, but even more because packed messages are much
      longer and thus have longer communication time.  Moreover, if Rdsm is
      running on a multicore machine, MPI may be able to take advantage of
      the physical shared memory for very fast communication.
   
      \item The socket version does not require setting up Rmpi.  In
      addition, it is much easier to debug Rdsm code on this platform, as
      one can use the R debugging tools on each client.
   }
   
   Again, all this is transparent to the Rdsm programmer.  However, as with
   any system, a good understanding of the internals can result in your
   writing much faster code.

}

\section{Debugging Your Rdsm Code}{

   As noted above, it is easier to debug your Rdsm code in socket mode,
   even if you intend to run the code in MPI mode.  This is because Rmpi
   does not provide the user with a terminal for the spawned R processes,
   and thus R's debugging functions, e.g. \code{debug()}, cannot be used.
   By contrast, in socket mode you do have a terminal for each R process,
   and thus can debug as usual.
   
   If you do debug directly in Rmpi mode, you'll have to resort to print
   statements, by calling \code{message()}.  As the various R processes may
   intermingle their output, you may wish to use \code{paste()} within your
   call to \code{message()}.  See examples (though not for debugging) in
   the matrix-multiply code cited below.

}

\section{Quick Introduction to Rdsm}{

   The Rdsm code in \code{examples/MatMul.r} of this package serves as
   as a quick introduction, using a matrix-multiply example common in
   parallel processing packages.  There are especially detailed comments
   in this example, both in the code itself and in instructions on how
   to actually run it.

}

\author{Norm Matloff}

\references{

Chandra, Rohit (2001), \emph{Parallel Programming in OpenMP}, Kaufmann,
pp.10ff (especially Table 1.1).

Hess, Matthias \emph{et al} (2003), Experiences Using OpenMP Based on Compiler
Directive Software DSM on a PC Cluster, in \emph{OpenMP Shared Memory
Parallel Programming: International Workshop on OpenMP Applications and
Tools}, Michael Voss (ed.), Springer, p.216.

}
