Parallelization for non-linear response calculations: Difference between revisions
No edit summary |
No edit summary |
||
(9 intermediate revisions by the same user not shown) | |||
Line 4: | Line 4: | ||
'''K-points parallelization'''<br> | '''K-points parallelization'''<br> | ||
if your system is large and requires more memory or you have few frequencies you can change the parallelization strategy. By using the flag "-V par" you will get the parallelization options in your input, you can decide to turn on the parallelization on k-points in such a way that the product of cores in k-space and in frequency-space is equal to the total number of cores. For example if have 16 cores you can set: | if your system is large and requires more memory or you have few frequencies you can change the parallelization strategy. By using the flag "-V par" you will get the parallelization options in your input, you can decide to turn on the parallelization on k-points in such a way that the product of cores in k-space and in frequency-space is equal to the total number of cores. For example if have 16 cores you can set: | ||
..... | |||
NL_CPU= "4 4" # [PARALLEL] CPUs for each role | NL_CPU= "4 4" # [PARALLEL] CPUs for each role | ||
NL_ROLEs= "w k" # [PARALLEL] CPUs roles (w,k) | NL_ROLEs= "w k" # [PARALLEL] CPUs roles (w,k) | ||
DIP_CPU= "4 2 2" # [PARALLEL] CPUs for each role | DIP_CPU= "4 2 2" # [PARALLEL] CPUs for each role | ||
DIP_ROLEs= "k c v" # [PARALLEL] CPUs roles (k,c,v) | DIP_ROLEs= "k c v" # [PARALLEL] CPUs roles (k,c,v) | ||
..... | |||
in this way the code will distribute the wave-function on 4 cores and reduce the amount of memory. | in this way the code will distribute the wave-function on 4 cores and reduce the amount of memory. | ||
If this is not enough you can use the Open-MP parallelization, see below. | If this is not enough you can use the Open-MP parallelization, see below.<br> | ||
'''Open-MP'''<br> | '''Open-MP'''<br> | ||
Another possibility is | Another possibility is active openMP in the configure with the flags <span style="color:blue">--enable-open-mp --enable-openmp-int-linalg</span>, | ||
recompile the code and then you can use the [https://www.openmp.org/ OpenMP] parallelization. <br> | |||
For example set the number of threads to 2 with the command: | For example set the number of threads to 2 with the command: | ||
export OMP_NUM_THREADS= | export OMP_NUM_THREADS=2 | ||
and yambo_nl automatically will use the threads available. In the log file will find: | and yambo_nl automatically will use the threads available. In the log file will find: | ||
Line 25: | Line 25: | ||
<---> P1: MPI Cores-Threads : NL(environment)-4 4(CPUs)-w k(ROLEs) | <---> P1: MPI Cores-Threads : NL(environment)-4 4(CPUs)-w k(ROLEs) | ||
..... | ..... | ||
Notice the setting of threads number can depend from the queue system and configuration of your machine.<br> | |||
Using all these parallelization you can use a large number of cores | Using all these parallelization you can use a large number of cores. | ||
Image you want to calculate the response for 40 different frequencies you could set | |||
"4 cores Open-MP" x "10 cores k-points" x " | "4 cores Open-MP" x "10 cores k-points" x "40 cores frequencies" = 1600 cores. | ||
We advice not to use more than 4 open-MP threads, unless you have memory problems in your calculations, in order to have an efficient parallelization.<br> | |||
Finally remember that the restart for interrupted calculations works only on frequencies. |
Latest revision as of 12:41, 23 March 2022
By default yambo_nl is parallelized on frequencies, that is the most efficient way to distribute calculations among the different processors, other two parallelizations are available in the code:
K-points parallelization
if your system is large and requires more memory or you have few frequencies you can change the parallelization strategy. By using the flag "-V par" you will get the parallelization options in your input, you can decide to turn on the parallelization on k-points in such a way that the product of cores in k-space and in frequency-space is equal to the total number of cores. For example if have 16 cores you can set:
..... NL_CPU= "4 4" # [PARALLEL] CPUs for each role NL_ROLEs= "w k" # [PARALLEL] CPUs roles (w,k) DIP_CPU= "4 2 2" # [PARALLEL] CPUs for each role DIP_ROLEs= "k c v" # [PARALLEL] CPUs roles (k,c,v) .....
in this way the code will distribute the wave-function on 4 cores and reduce the amount of memory.
If this is not enough you can use the Open-MP parallelization, see below.
Open-MP
Another possibility is active openMP in the configure with the flags --enable-open-mp --enable-openmp-int-linalg,
recompile the code and then you can use the OpenMP parallelization.
For example set the number of threads to 2 with the command:
export OMP_NUM_THREADS=2
and yambo_nl automatically will use the threads available. In the log file will find:
..... <---> P1: MPI Cores-Threads : 16(CPU)-2(threads) <---> P1: MPI Cores-Threads : NL(environment)-4 4(CPUs)-w k(ROLEs) .....
Notice the setting of threads number can depend from the queue system and configuration of your machine.
Using all these parallelization you can use a large number of cores. Image you want to calculate the response for 40 different frequencies you could set
"4 cores Open-MP" x "10 cores k-points" x "40 cores frequencies" = 1600 cores.
We advice not to use more than 4 open-MP threads, unless you have memory problems in your calculations, in order to have an efficient parallelization.
Finally remember that the restart for interrupted calculations works only on frequencies.