Parallelising serial software systems presents many challenges. In particular,
the task of decomposing large, data-intensive applications for execution on distributed
architectures is described in the literature as error-prone and time-consuming.
The Message Passing Interface (MPI) specification is the de facto industry standard
to program for such architectures, but requires low level knowledge of data
distribution details as programmers must explicitly invoke inter-process communication
routines. This research reports the findings from empirical studies
conducted in industry, to explore and characterise the challenges associated with
performing data decomposition. Findings from these studies culminated in a
list of derived requirements for tool support, encompassing automation of grid
indexing, generation of data structures and communication calls, and provision
of assistance when changing from an implemented decomposition strategy. Additional
requirements include the need for a tool to be MPI focused, initially
target structured grids and have a low impact on the application code. These
requirements were subsequently buttressed to address gaps in the state-of-the-art
and provided motivation for the development of a tool named MPIGen.
MPIGen provides an abstraction for MPI, encapsulating the low level details
involved in decomposing data and exchanging messages between processors.
Users can express the parallel intent of their application through input parameters
and then generate code containing wrapper functions that encompass the MPI
functionality. The wrapper functions can then be invoked within the serial code
resulting in a semi-automated parallelised solution. The programmer is relieved
of the burden of deciphering memory locations when exchanging data between
processors. The tool was evaluated in two studies involving both students and
High Performance Computing (HPC) practitioners as subjects. The findings
concluded that MPIGen provides an efficient abstraction for performing data
decomposition and that it satisfies the list of empirically derived requirements.
Parallel programming is a difficult skill that software developers need to learn,
yet the low level nature of specifications such as MPI is an adverse factor to its adoption.
MPIGen makes it easier to adopt this skill-set as it offers effective support
to parallel programmers when undertaking decomposition and communication.