Parallel Computing

| |

DAKOTA uses 'Single Program Multiple Data' (SPMD) parallel programming model. It uses message passing routines from MPI standard to communicate data between processors. The SPMD designation denotes that same DAKOTA executable is loaded on all processors during
the parallel invocation.


There are four main parallelism levels in DAKOTA,


1) Algorithmic coarse-grained parallelism: This parallelism involves concurrent execution of independent function evaluations, 
   where a function evaluation is defined as a 'Data request' from an algorithm.
2) Algorithmic fine-grained parallelism: This involves computing basic computational steps of an optimization algorithm (i.e., an internal linear algebra) in parallel.
3) Function evaluation coarse-grained parallelism: This involves concurrent computation of separable parts of a single function evaluation.
4) Function evaluation fine grained parallelism: This involves parallelization of solution steps within a single analysis code.

Coarse grained parallelism requires very little inter-processor communication and is therefore "embrrassingly parallel" i.e.,
there is very little loss in parallel efficiency due to communication as the number of processors increases.


Fine grained parallelism, on the other hand, involves much more communication among processors and care must be taken to avoid the case
of inefficient  machine utilization in which the communication demands among processors outstrip the amount of actual computational
work to be performed.


DAKOTA supports a total of three tiers of scheduling and four levels of parallelism which in combination can minimize efficiency losses and achieve near linear scaling on MP computers.


Concurrent iterators within a strategy ( Scheduling performed by DAKOTA )
Concurrent function evaluations within a iterator ( Scheduling performed by DAKOTA )
Concurrent analyses within each function evaluation ( scheduling performed by DAKOTA )
Multiprocessor analysis ( work distributed by the parallel analysis code )
All the optimization algorithms provided with DAKOTA support parallelism. Simulation invocation through DAKOTA can be synchronous (i.e., Blocking) or asynchronous (i.e., Nonblocking).

DAKOTA supports three approaches to local simulation invocation: Direct function, System call and Fork application. In direct function simulation code is linked as functions within DAKOTA source code. System call and fork interface are used for black-box interface, both are similar except that fork interface prevents file read race condition for asynchronous function evaluations.

DAKOTA uses MPI communicators to identify groups of processors. The global MPI_COMM_WORLD communicator provides the total set of processors allocated to the DAKOTA run. MPI_COMM_WORLD can be partitioned into new intra-communicators which each define a set of processors to be used for a multiprocessor server.

DAKOTA's nested parallelism hierarchy can use either of two processor partitioning models:

1) A "dedicated master" partitioning in which a single processor is dedicated to scheduling operations and the remaining 
processors are split into server partitions,
2) "Peer partitioning" approach in which the loss of processor to scheduling is avoided.
Dedicated master and Peer partition models

Scheduling within each level can be


a) Self Scheduling i.e., a Master processor manages a single processing queue and maintains a prescribed number of jobs active on  
   each slave
b) Static scheduling, Scheduling is statically determined at start-up, no master processor is needed to direct traffic and a peer partitioning approach is applicable.
c) Distributed Scheduling,  a peer partitioning is used and each peer maintains a separate queue of pending jobs.

DAKOTA is designed to allow the freedom to configure each parallelism level with either the dedicated master partitioning/self scheduling
combination  or the peer partition/static scheduling combination. 


In the link below peer partition/distributed scheduling is applied at level 1, each optimizer partition employs concurrent function
evaluations in a dedicated master partition/self scheduling model at level 2, and each function evaluation partition employs concurrent multi processor analyses in a peer partition/static scheduling model at level 3.

Recursive partitioning for Nested parallelism

The link below depicts the asynchronous local, message passing, and hybrid approaches for a dedicated-master partition.


External, Internal and Hybrid job management in DAKOTA


Thanks,

Vinay Rao