Parallelization for non-linear response calculations: Difference between revisions
(Created page with "By default yambo_nl is parallelized on frequencies, that is the most efficient way to distribute calculations among the different processors, other two parallelizations are available in the code: '''K-points parallelization''' if your system is large and requires more memory or you have few frequencies you can change the parallelization strategy. By using the flag "-V par" you will get the parallelization options in your input, you can decide to turn on the parallelizat...") |
No edit summary |
||
Line 31: | Line 31: | ||
"4 cores Open-MP" x "10 cores k-points" x "60 cores frequencies" = 2400 cores. | "4 cores Open-MP" x "10 cores k-points" x "60 cores frequencies" = 2400 cores. | ||
we do not advice to use more than 4 open-MP threads at least you need more memory in the calculations. | we do not advice to use more than 4 open-MP threads at least you need more memory in the calculations.<br> | ||
Notice that the restart for interrupted calculations works only on frequencies. | Notice that the restart for interrupted calculations works only on frequencies. |
Revision as of 11:22, 23 March 2022
By default yambo_nl is parallelized on frequencies, that is the most efficient way to distribute calculations among the different processors, other two parallelizations are available in the code:
K-points parallelization if your system is large and requires more memory or you have few frequencies you can change the parallelization strategy. By using the flag "-V par" you will get the parallelization options in your input, you can decide to turn on the parallelization on k-points in such a way that the product of cores in k-space and in frequency-space is equal to the total number of cores. For example if have 16 cores you can set:
NL_CPU= "4 4" # [PARALLEL] CPUs for each role NL_ROLEs= "w k" # [PARALLEL] CPUs roles (w,k) DIP_CPU= "4 2 2" # [PARALLEL] CPUs for each role DIP_ROLEs= "k c v" # [PARALLEL] CPUs roles (k,c,v)
in this way the code will distribute the wave-function on 4 cores and reduce the amount of memory. If this is not enough you can use the Open-MP parallelization, see below.
Open-MP Another possibility is to compile the code with the --enable-open-mp flag and then use the OpenMP parallelization. \\ For example set the number of threads to 2 with the command:
export OMP_NUM_THREADS="2"
and yambo_nl automatically will use the threads available. In the log file will find:
..... <---> P1: MPI Cores-Threads : 16(CPU)-2(threads) <---> P1: MPI Cores-Threads : NL(environment)-4 4(CPUs)-w k(ROLEs) .....
Using all these parallelization you can use a large number of cores for example image you want to calculate the response for 40 different frequencies you could set
"4 cores Open-MP" x "10 cores k-points" x "60 cores frequencies" = 2400 cores.
we do not advice to use more than 4 open-MP threads at least you need more memory in the calculations.
Notice that the restart for interrupted calculations works only on frequencies.