Mixed MPI-CUDA implementation of full PB equation
José Colmenares, Antonella Galizia, Jésus Ortiz, Walter Rocchia
A finite-difference solver of the full non-linear Poisson-Boltzmann equation was implemented in C++. The solver is highly optimized and exploits the checkerboard structure deriving from the Laplace finite-difference stencil. The algorithm was first parallelized using the CUDA language, and a second layer of parallelism was applied to use the MPI paradigm so that the algorithm can actually be used with one single GPU card, with many GPUs on the same node, with multiple nodes having one GPU each or simply on different nodes with plain MPI. The MPI version showed a better scaling respect the GPU one due to the high cost of the data transfer. The non-linear version showed a higher speed up than the linear one due to a more favorable number crunching versus data transfer ratio.