Using Yambo in parallel
Jump to navigation
Jump to search
This module presents examples of how to run Yambo in a parallel environment.
Prerequisites
Previous modules
- Initialization for bulk hBN.
You will need:
- The
SAVE
databases for bulk hBN (Download here) - The
yambo
executable
Yambo Parallelism in a nutshell
Yambo implements a hybrid MPI+OpenMP parallel scheme, suited to run on large partitions of HPC machines (as of Apr 2017, runs over several tens of thousands of cores, with a computational power > 1 PFl/s, have been achieved).
- MPI is particularly suited to distribute both memory and computation, and has a very efficient implementation in Yambo, which needs very few communications though may be prone to load unbalance (see parallel_tutorial).
- OpenMP instead works within a shared memory paradigm, meaning that multiple threads perform computation in parallel on the same data in memory (no memory replica, at least in principle)
- Concerning Yambo, tend to use MPI parallelism as much as possible (i.e. as much as memory allows), then resort to OpenMP parallelism in order not to increase memory usage any further while exploiting more cores and computing power.
- The number of MPI tasks and OpenMP threads per task can be controlled in a standard way, e.g. as
$ export OMP_NUM_THREADS=4 $ mpirun -np 12 yambo -F yambo.in -J label
resulting in a run exploiting up to 48 threads (best fit on 48 physical cores, though hyper-threading, i.e. more threads than cores, can be exploited to feed computing units at best)
- A fine tuning of the parallel structure of Yambo (both MPI and OpenMP) can be obtained by operating on specific input variables (run level dependent), which can be activated, during input generation, via the flag
$ -V par
(verbosity high on parallel variables).
- Yambo can take advantage of parallel dense linear algebra (e.g. via [ScaLAPACK], SLK in the following). Control is provided by input variables
(see e.g. RPA response in parallel)
- When running in parallel, one report file is written, while multiple log files are dumped (one per MPI task, by default), and stored in a newly created
./LOG
folder. When running with thousands of MPI tasks, the number of log files can be reduced by setting, in the input file, something like:
NLogCPUs = 4 # [PARALLEL] Live-timing CPU`s (0 for all)
In the following we give direct examples of parallel setup for some among the most standard Yambo kernels.