ARC
                                       
                              INTRODUCTION TO MPI
                                       
  GEORGI YANAKIEV (YANAKIEV@ARC.UNM.EDU)
  MARK ENLOW (MENLOW@ARC.UNM.EDU)
  CHUANYI DING (DING@ARC.UNM.EDU)
  JIM WARSA (WARSA@BOLTZMANN.UNM.EDU)
  
    Š Copyright 1995, The Maui Project/University of New Mexico
    
   
     _________________________________________________________________
   
  WHAT IS MPI?
     * Message passing is the communication model used on massively
       parallel machines with distributed-memory architectures (can be
       used on shared memory machines also)
     * The goal of the standard is to establish a widely used language-
       and platform-independent standard for writing message-passing
       programs
     * The interface should establish a practical, portable, efficient,
       and flexible standard not too different from current practice
       
  MPI STANDARDIZATION EFFORT
     * About 60 people from 40 organizations in the United States and
       Europe participated
     * Influenced by:
          + Venus (IBM)
          + NX/2 (INTEL)
          + Express (Parasoft)
          + Vertex (nCUBE)
          + P4 (ANL)
          + PARMACS (ANL)
     * Other Contributions:
          + Zipcode (MSU)
          + Chimp (Edinburgh University)
          + PVM (ORNL, UTK, Emory U.)
          + Chameleon (ANL)
          + PICL (ANL)
            
  MILESTONES OF THE MPI FORUM
     * April 1992: Williamsburg, VA, Workshop on Message Passing
       Standards
     * November 1992: MPI draft proposal (MPI1) from ORNL
     * November 1992: Minneapolis working group meeting and e-mail
       discussion
     * November 1993: Draft MPI standard at Supercomputing '93
     * May 5, 1994: Current Version
     * Meetings and e-mail discussion groups constitute the MPI Forum
       
   
     _________________________________________________________________
   
  FURTHER INFORMATION
     * "MPI: A Message-Passing Interface Standard" document available at
       ARC in /usr/local/ftp/pub/mpi-report.ps (comments@cs.utk.edu)
     * Usenet newsgroup comp.parallel.mpi
     * WWW Sites
          + UTK/ORNL http://www.netlib.org/mpi/index.html 
          + ANL http://www.mcs.anl.gov/mpi 
          + MSU http://www.erc.msstate.edu/mpi 
          + LAM Project http://www.osc.edu/lam.html 
            
   
     _________________________________________________________________
   
  TARGET IMPLEMENTATION
     * MIMD parallel computation models, including SPMD
     * Distributed-memory clusters and multi-processors
     * Shared-memory platform support
     * Support for virtual process topologies
     * No dynamic task spawning (fixed number of available processes)
     * Initial processor allocation and binding to physical processors
       and interprocessor hardware communications are left to vendor
       implementations
     * Explicit shared-memory operations, I/O functions, and task
       management are not specified in the standard
       
  LANGUAGE BINDING
     * Fortran 77 and ANSI C
     * F90 is assumed to follow the Fortran 77 binding and C++ the ANSI C
       binding
     * Standards for message passing between languages are not specified
     * MPI_ is prefixed to all MPI names (functions, subroutines, and
       constants)
       
   
     _________________________________________________________________
   
  POINT-TO-POINT COMMUNICATION
     * Message sending example

   MPI_SEND (mesg, len(mesg), MPI_CHAR, 1, 99,

                 MPI_COMM_WORLD, ierror)
     * Message receiving example

   MPI_RECV (mesg, 20, MPI_CHAR, 0, 99,

                 MPI_COMM_WORLD, status, ierror)
     * Message envelope: source, destination, tag, communicator
     * Blocking and non-blocking modes available
     * Standard, buffered, ready, and synchronous modes available
     * Data type matching at all three stages of message-passing
       communication
     * Data type and representation conversion to support message-passing
       in a heterogeneous environment
     * Communication semantics (order, progress, fairness, and resource
       limitations) are specified by the standard
       
  WEB PAGES FOR MPI ROUTINES NON-BLOCKING POINT-TO-POINT COMMUNICATION
     * Performance improvement by overlapping communication and
       computation
     * Communication initiation (the prefix I is for immediate)

   MPI_ISEND (buf,count,datatype,dest,tag,comm,request,ierror)
   MPI_IRECV (buf,count,datatype,dest,tag,comm,request,ierror)
     * Communication completion

   MPI_WAIT (request, status, ierror)
   MPI_TEST (request, flag, status, ierror)
   MPI_REQUEST_FREE (request, ierror)
     * Multiple completion wait/test also available
     * Incoming messages can be checked without actually receiving them:
       flag returns true if a match occurs and status returns the
       message envelope

   MPI_IPROBE (source, tag, comm, flag, status, ierror)
     * Pending communication requests can be cancelled to gracefully free
       resources

   MPI_CANCEL (request, ierror)
     * Multiple probe request modes also available
       
   
     _________________________________________________________________
   
  COLLECTIVE COMMUNICATION
     * The standard specifies collective communication operations over
       groups of processes
     * Barrier synchronization across all group members

   MPI_BARRIER (comm, ierror)
     * Broadcast from one member of the group to all other members

   MPI_BCAST (buffer, count, datatype, root, comm, ierror)
     * Gather data from all group members to one process

   MPI_GATHER (sendbuf, sendcount, sendtype, recvbuf,

                recvcount, recvtype, root, comm, ierror)
     * Scatter data from one group member to all other members

   MPI_SCATTER (sendbuf, sendcount, sendtype, recvbuf,

                 recvcount, recvtype, root, comm, ierror)
     * Global reduction operations such as max, min, sum, product, and
       min and max operations are also available
       
   
     _________________________________________________________________
   
  MPI ENVIRONMENTAL AND PROCESS MANAGEMENT
     * Startup and shutdown

   MPI_INIT (ierror)
   MPI_FINALIZE (ierror)
   MPI_INITIALIZED (flag, ierror)
   MPI_ABORT (comm, errorcode, ierror)
     * Process management

   MPI_COMM_RANK (comm, rank, ierror)
   MPI_COMM_SIZE (comm, size, ierror)
     * To use all available processes in a single process group without
       further management, use MPI_COMM_WORLD as the communicator
     * Process group communicators, virtual process topologies, and
       communication context functions are provided to manage
       communication "universes" between and among process groups
     * Environmental inquiry functions, program execution timers, and
       error handling routines are also specified in the standard
       
   
     _________________________________________________________________
   
Running MPI Programs at ARC

  MPI IMPLEMENTATIONS AT ARC
    1. MPICH (MPI/Chameleon) - Argonne National Laboratory and Missippi
       State University
    2. LAM (Local Area Multicomputer) - Ohio Supercomputing Center
    3. CHIMP (Common High-level Interface to Message Passing) - Edinburgh
       Parallel Computing Centre
       
   
     _________________________________________________________________
   
  EXAMPLES
  
    Initial Example
    
   The initial examples use mpich, the Argonne MPI implementation.
     * Add your username to your .rhosts file if your login is the same
       on all machines
     * Create working directory

   > mkdir MPI
   > cd MPI
     * Obtain the example files

   > cp /usr/local/mpi/workshop/examples/* .
     * Your working directory should now contain the files:

 Makefile 
 test1.F 
 test2.F 
 test3.F 
     * In order to run each program you have to create, in the working
       directory, a file called filename.pg (e.g., test1.pg). This
       processor group file should contain:

   local 0
   machineA     1       /u/username/MPI/test1
   machineB     1       /u/username/MPI/test1

       Similarly, for the files test2.pg and test3.pg.
     * The previous processor group file is required in order to start
       execution on the local machine and two remote machines. Executing
       the program test1.F on the local machine starts 3 processes: the
       "master" process on the local machine and one on each of machineA
       and machineB.
     * A sample processor group file (e.g., test1.pg), for running on
       ARC, is the following:

   local        0
   taos 1       /u/username/MPI/test1
   zia  1       /u/username/MPI/test1
     * Therefore if you are logged onto acoma, the "master" process will
       run on acoma (local machine) and two more processes will run on
       taos and zia.
     * In order to compile a program (e.g., test1.F), type:

   > make test1
     * In order to run the above program you type:

   > test1
     * By modifying the corresponding processor group file you may run
       the program on 1, 2, or 3 processors.
     * When running the programs, messages on the screen refer to
       'Process 0' (or 1, or 2). The rank of each process corresponds to
       the order in which they appear in the "process group" file. For
       the file test1.pg shown earlier:

   acoma is Process 0
   taos is Process 1
   zia is Process 2

        File: test1.F
                A simple exchange of messages from one process to
                another. One process sends two messages, (2 arrays),
                which are received in the reverse order by the other
                process.
                
        Files: test2.F and test3.F
                Calculation of the definite integral of a function using
                the trapezoid rule. The range of integration is divided
                into n sub-intervals. Each process calculates the result
                of integration over a specific number of sub-intervals.
                Each process sends its result to a root process, where
                the final result is stored. Program test2.F uses
                point-to-point communication mode and program test3.F
                uses collective communication mode.
                
   
     _________________________________________________________________
   
    More Advanced Examples (not currently available, due to hardware problems) 
    
   These examples demonstrate the heterogeneous makefile system that
   quickly switches between the various MPI implementations and
   automatically performs the make for each of the Unix machines
   (RS/6000, SGI, HP, and SUN).

Package Name            Developer               Location of Package

mpich                   Argonne and MSU         /usr/local/mpich
chimp                   Edinburgh               /usr/local/chimp
lam                     Ohio State University   /usr/local/lam52

    Need for a Makefile System
    
   Switching between 3 MPI implementations
          The ability to switch between three different MPI
          implementations allows the developer to check
          portability/compatibility, helps with debugging, and allows the
          selection of the optimum implementation for an individual
          application and environment
          
   Heterogeneous cluster of workstations
          A single make invocation compiles code for all Unix
          architectures, greatly simplifying the use of a heterogeneous
          cluster of workstations.
          
  HETEROGENEOUS MAKEFILE

ALL:  aix_build sgi_build   **** Architecture List ****

ARCH      = rs6000  **** Default Architecture ****
                    (needed to prevent include error)

# Settings for mpich using p4   -
#MPI       = mpich                |
#COMM      = ch_p4                |
# Settings for chimp              |  Switches between
#MPI       = chimp                |  MPI Architectures
#COMM      =                      |
# Settings for lam                |
MPI       = lam                   |
COMM      =                       |
# End Settings                   -

WORK_DIR  = mpi   *** Working Directory ***
                  *** Set to source code directory ***
RM        = rm

     **** All architecture and MPI implementation differences ****
     **** are handled with the following include              ****

include ../$(ARCH).$(MPI).$(COMM).make

programs: $(ARCH)/test1 $(ARCH)/first

     **** Recursive make for each architecture  ****

aix_build:
        rsh acoma "cd $(WORK_DIR); make -e ARCH=rs6000 programs"

sgi_build:
        rsh cochiti "cd $(WORK_DIR); make -e ARCH=sgi programs"

   *** Instructions to create object and executable ***

$(ARCH)/first: $(ARCH)/first.o Makefile
        cd $(ARCH); $(CC) -o first first.o $(LOPT)

$(ARCH)/first.o:        first.c Makefile
        cd $(ARCH); $(CC) -c $(COPT) ../first.c

$(ARCH)/test1: $(ARCH)/test1.o Makefile
        cd $(ARCH) ; $(FC) -o test1 test1.o $(FLOPT)

$(ARCH)/test1.o: test1.F Makefile
        cd $(ARCH) ; $(FC) -c $(FCOPT) ../test1.F

   *** Housekeeping -- please use ***

clean:
        $(RM) -f rs6000/*
        $(RM) -f sgi/*

  MAKEFILE SYSTEM DEMONSTRATION
    1. Create a working directory called mpi under your home directory
    2. Copy all files from /usr/local/mpi to this directory
    3. Correct .cshrc

     a. run /usr/local/bin/reset_environment

          OR

     b. compare with /usr/local/environment/.cshrc and correct manually
    4. Edit Makefile and uncomment the two lines for the desired
       implementation
    5. Type make to create executables
    6. Edit machine configuration files

    a. mpich --> *first.pg

    b. chimp --> *first.chp

    c. lam  --> *first.bhost
                lamboot.conf

    * user dependent paths!
    7. Run executables

     a. mpich

            rs6000/first -p4pg ../first.pg

     b. chimp

            chimp first.chp

     c. lam

            lam52_mpi first.bhost

  ADDITIONAL EXAMPLES
  
   
   /usr/local/mpich/examples
   /usr/local/chimp/examples
   /usr/local/lam52/examples
   
   
   
   
     _________________________________________________________________
   
   
   
   ŠCopyright, The Maui Project/University of New Mexico
   Last revised on March 1, 1995
   by Georgi Yanakiev, yanakiev@arc.unm.edu